Building An IP Studio: Connecting Cameras - Part 2 - Session Description Protocols

IP is incredibly versatile. It’s data payload agnostic and multiple transport streams have the capability to transport it over many different types of networks. However, this versatility provides many challenges, especially when sending video and audio over networks.

Although SDI networks are fixed, the types of video formats they can transport fits within a relatively tight constraint. This could be HD, SD, 50 fields or 60/1.001 fields per second. SMPTE have done a fantastic job of tightly defining the standards so that there are no surprises when connecting equipment. This virtually guarantees that if I connect an HD-SDI camera to an HD-SDI monitor then I will see the pictures.

The historic backwards compatibility requirements of 50 or 60/1.001 fields per second is more or less finished, and that’s before we even start talking about interlace. We no longer have to maintain the tightly specified timing constraints imposed by the electromagnetic coils of the camera and monitor. Consequently, we can explore a whole new generation of higher frame rates, and different horizontal and vertical image arrays. Although this may still be somewhat futuristic, IP provides the infrastructure that will allow this to happen.

Other than defining the type of higher-level protocol employed, such as UDP or TCP, IP packets have no knowledge of the type of data their payloads are carrying. And as the media essence has been abstracted away from the constraints of the transport stream, the IP stream no longer implies any type of media format. It could equally be carrying audio, video, metadata, control information, or just about anything else. This implies great flexibility, but how do we know what type of media or data an IP packet is carrying?

For each stream we must define a set of attributes that specify the media essence. For example, the image is 1920 x 1080 pixels, the color subsampling is 4:2:2, and the frame rate is 60/1.001 fields per second. These are just a few of the attributes and we could quite easily need ten or more to define a meaningful video signal. And this information must be maintained for every stream.

One method is to provide a spreadsheet of every video, audio, and metadata source defining each of the attributes that specify the signal. For a few media flows this is possible, but when we start reaching hundreds and even thousands of video and audio flows, not to mention their associated metadata streams, providing a spreadsheet approach is clearly an administrative nightmare, bordering on the impossible, especially if we want to take advantage of the versatility, scalability, and flexibility that IP can provide.

Another more practical solution, and one that has been used in AoIP for many years, is the SDP (Session Description Protocol). This is a small file that contains all the parameters needed to specify the media stream. The SDP was originally specified in 1998 by the IETF with a revised version being released in 2006 called RFC 4566. SMPTE adopted this in the early 2000s.

Without the SDP, streams based on formats such as SMPTEs ST2110 are almost impossible to manage. It is theoretically possible to use a spreadsheet to keep a record of the source IP addresses, frame rates, color space, etc., however, the practicalities of maintaining such a spread sheet render the exercise virtually unmanageable. Instead, each source generator, such as a camera, microphone, or frame synchronizer, creates an SDP file. This can then be issued on a periodic basis by the device or retrieved by a management control system.

SDP files are not restricted to ST2110 streams, instead, they are used by a large number of streaming formats to identify the audio and video streams such as RTP/MPEG and DASH. This provides the potential for distributing and identifying many different streams in a broadcast network leading to massive flexibility and scalability. That said, it is the role of the generating source or destination device to make sure the information defined in the SDP is accurate and correct.

Fig 1 – each device on the network can generate SDP files and send them to the broadcasters control system. This gives the control system an overview of all the connected devices and their formats.

Both source and destination devices on the network can generate SDP files. A monitor or loudspeaker can issue SDP files to advertise its connectivity allowing other devices to be connected to it.

In its simplest form, the SDP provides a plug-and-play type service, assuming the device is configured to send its SDP files periodically. Typically, this is often once per second. If the transmit rate of the SDP files is too fast, then the system runs the risk of creating network congestion. And if it is too slow, then system management and other devices may not be able to detect a newly connected piece of equipment fast enough.

SDP files are not particularly long and are typically 4K bytes in length. But care must be taken when calculating their network bandwidth allocation as a single device could consist of multiple streams resulting in one SDP file for each essence. In a studio camera this soon becomes an issue as there could easily be ten separate video and audio input and output streams. Therefore, a 4Kbyte file suddenly consumes 320Kbits/s of network bandwidth (4Kbyte x 8 x 10 = 320Kbits/s). And if there are 500 sources, then this could result in 160Mbits/s of bandwidth resulting in careful network management.

If there are too many devices creating SDP files, then there are two further options: send the files over a different network or use the system control software to actively pull the SDP files from the devices instead of the devices constantly sending them. Using a different network has its merits as the SDP file delivery times are not as critical as the media streams, resulting in their being no interference with the media streams.

The broadcast facilities control software is largely responsible for collecting the SDP files and creating an up-to-date database of the connected devices and their audio and video parameters, as well as their network addresses. This can be performed in the background by the management service so that an overview of all the connected devices is easily viewable. Web page type views allow users to monitor the devices and their associated streams with varying degrees of hierarchy and granularity to determine how the system is configured.

Identifying media streams in a complex broadcast network is not a trivial task. Streaming specifications adopting SDPs help keep track of the media essence parameters so that down stream equipment can easily receive and connect to the streams.

Other related articles posted on The Broadcast Bridge.

Building An IP Studio: Connecting Cameras - Part 3 - Network Switching And Routing

You might also like...

IP Security For Broadcasters: Part 12 - Zero Trust

As users working from home are no longer limited to their working environment by the concept of a physical location, and infrastructures are moving more and more to the cloud-hybrid approach, the outdated concept of perimeter security is moving aside…

IP Security For Broadcasters: Part 11 - EBU R143 Security Recommendations

EBU R143 formalizes security practices for both broadcasters and vendors. This comprehensive list should be at the forefront of every broadcaster’s and vendor’s thoughts when designing and implementing IP media facilities.

IP Security For Broadcasters: Part 10 - NATS Advanced Messaging

As IT and broadcast infrastructures become ever more complex, the need to securely exchange data is becoming more challenging. NATS messaging is designed to simplify collaboration between often diverse software applications.

IP Security For Broadcasters: Part 9 - NMOS Security

NMOS has succeeded in providing interoperability between media devices on IP infrastructures, and there are provisions within the specifications to help maintain system security.

IP Security For Broadcasters: Part 8 - RADIUS Network Access

Maintaining controlled access is critical for any secure network, especially when working with high-value media in broadcast environments.