Building An IP Studio: Connecting Cameras - Part 1

Connecting a camera in an SDI infrastructure is easy. Just connect the camera output to the monitor input, and all being well, a picture will appear. The story is very different in the IP domain.

In this short series of articles, we look at what it means to connect a camera to a studio infrastructure in the IP domain. We discuss the challenges, solutions, and the practical aspects of routing moving pictures in the IP studio.

Although I say connecting SDI is easy, that wasn’t always the case. In the very early days of SDI, it was very difficult to transport reliable images as the chipsets that provided automatic equalization were yet to be developed and the cable design that could work to 270mb/s was in its infancy. Although we probably don’t remember, those earlier trailblazers working with SDI found themselves ascending a steep learning curve. Much as we are doing today with IP.

Fundamentally, in communication systems we have two methods of routing signals, either as circuit switched, or packet switched.

With circuit switched infrastructures a point-to-point connection exists between two devices, but to deliver greater flexibility, some sort of routing matrix is employed. In the old days of television this consisted of relays and patch bays, but as electronics developed, FET and MOSFET routing matrices were used that allowed a one-to-many connectivity of the matrix input to one or multiple outputs.

Television has always used some form of synchronous distribution. Although no longer needed, the line, field and frame syncs provided a method of synchronizing the scanning coils of the camera with the scanning coils in the studio monitors, and the scanning coils of all the television sets at home. As we moved to SDI and AES, and to maintain backwards compatibility we kept this timing information, which in effect delivered the synchronous delivery transport streams that SDI and AES provides. And this in turn led to the adoption of circuit switched networks as they easily maintain the synchronous nature of the SDI and AES transport streams.

Figure 1 – left) Early ethernet (IEEE 802.3) used a single coaxial cable to connect multiple computers and devices. CSMA/CD (Carrier Sense Multiple Access with Collision Detection) was a method of allowing multiple devices to send packets and detect collisions. This provided the sender with the option of resending the packet should a packet collision be detected. Right) demonstrates how ethernet is used in modern broadcast facilities. The point-to-point connection between the cameras and the ethernet switch removes collisions, but each port uses a memory buffer to reduce the chance of congestion when the video packets are sent to the next device, in this case, the monitor. Consequently, variable latency occurs.

IP was originally developed to allow computer systems to communicate with each other and exchange data. Critically, the data often consisted of documents and database queries and responses, which in turn led to exchanging just short bursts of data. At the time, packet delivery was the optimal communication method as computer-to-computer data exchange was often short. Furthermore, computers within a network would often share the same physical cable, so methods of arbitration had to be devised to reduce the probability of two computers sending a packet simultaneously and causing a collision, resulting in corruption of the data and lost packets.

The bottom line is that IP was never designed to send continuous streams of video or audio. But to take advantage of the advances in IT technology, broadcast engineers and innovators have had to devise methods of distributing continuous streams of video and audio over IP networks.

It’s worth remembering that IP datagrams exist independently of their underlying transport streams. And this allows us to easily transport IP datagrams over ethernet, WiFi, or fiber, and switch between them. From the perspective of the IP datagram, it doesn’t know or care about its transport stream. But transferring between different transport streams can have a significant effect on the streamed media. This is both its greatest strength and most difficult challenge. As we’ve abstracted away the streamed video and audio media from the timing plane, we can no longer rely on clocks within the transport domain to provide reliable timing information.

Although we don’t need line, field and frame syncs anymore, television is still a sampled system for both the video and audio, and so we must synchronize the video playout of the monitor with the video playout of the camera, and we must synchronize the audio playback of the loudspeaker with the audio sampling of the microphone.

Therefore, connecting a camera to a monitor in an IP network presents us with the following challenges:

The monitor and other devices need to know how to receive the camera feed
We must restore the timing to synchronize the camera and downstream devices
A reliable and flexible method of labelling the video streams must be found
Routing the camera to its destinations is required
Monitoring and understanding where the packets are going, and interacting is required

Also, we have to contend with the thorny issue of latency. Again, as SDI and AES are synchronous transport streams, the transmit and receive clocks are very small, potentially in the order of a few samples. There may be some clock jitter on the circuit, but the phase locked loop in the receiver will be able to remove this assuming it was within tolerable limits. Furthermore, the sync pulse generator would have supplied a reliable clock source so that most SDI senders and receivers will be relatively close further removing the need for large input buffers.

As IP is asynchronous, we cannot rely on the clocks within the transport stream. For example, a camera may be sending IP datagrams over fiber, and a graphics generator may be sending images over CAT8/ethernet. Both use wildly different transport methods, and it would be almost impossible to choose which one to select.

Buffers are used extensively within IT networks to synchronize asynchronous events, and to remove the potential for network congestion. As television is still a synchronous system, with the current need to provide backwards compatibility we have no option but to use buffers in both send and receive devices, as well as within the network. Although buffers solve the problem of congestion collision and synchronization of asynchronous events, they do so at the expense of latency. More worryingly, this latency is no longer static and predictable, but is highly variable.

In the next article in this series, we will look at how a monitor displays a picture being sent from a camera.

Other related articles posted on The Broadcast Bridge.

Building An IP Studio: Connecting Cameras - Part 2 - Session Description Protocols

You might also like...

Standards: Video - High Efficiency Video Coding (HEVC)

Designed to halve the bitrate of AVC while supporting resolutions up to 16K, HEVC represents a significant leap in video coding efficiency. This guide explores its profiles, tiers and levels, and examines whether it can overcome the challenges of entrenched…

SMPTE Education Launches Summer 2026 Lineup Of IP And ST 2110 Courses

Boasting two standalone courses, an intensive boot camp, and a hands-on practical lab, SMPTE Education has launched its summer 2026 Lineup of IP and ST 2110 Courses.

Standards: Video - Advanced Video Coding (AVC)

AVC remains one of the most widely deployed video codecs in the world, but navigating its profiles, levels and signaling mechanisms is far from straightforward.

Network Traffic Engineering: RIST & SRT - The Success Of ARQ Based Protocols

IP networks are inherently unreliable. We kick off this series on IP Network Traffic Engineering with a look at how RIST and SRT give broadcast engineers user-configurable control over the latency-versus-reliability trade-off for real-time media streaming.

Standards: Video - Standards For Video Coding

From 4K to 32K, the demand for ever-larger video formats is pushing codec technology to its limits. This guide surveys the landscape of video coding standards – from legacy MPEG formats to AI-driven neural network compression – to help navigate the choices sha…