Media streaming over the internet is unique. Packet switched networks were never designed to deliver continuous and long streams of media but instead were built to efficiently process transactional and short bursts of data. The long streams of video and audio data are relentless in their network demands and to distribute them effectively requires the adoption of specialist CDNs.
Networks assume the data will be bursty and to a certain extent they rely on this transactional demand to make systems more efficient. The synchronous nature of broadcasting may keep latency very low, but it does so at the expense of flexibility. Packet switched networks have incredible flexibility, but we always need to keep an eye on the latency.
Web server technology is built on layers of underlying protocols. HTTP provides a data structure for web pages, TCP guarantees delivery of packets, and IP is the lowest level protocol until we reach the data link layer at the physical network interconnect, such as ethernet, WiFi, or HDLC. The combination of HTTP/TCP/IP describes the protocols needed along with their hierarchy to transport data across the internet.
Broadcasters are used to maintaining backwards compatibility and we’ve spent the past seventy years making sure the latest technology is put to good use while guaranteeing the viewers using the previous generation of technology can still watch their programs.
However, media streaming over the internet is different as broadcasters have been forced to adopt a technology that wasn’t designed for them and has led to a lot of shoehorning and compromise.
Latency is an inherent characteristic of the internet and any TCP/IP network. From a web browsing experience, users are generally interested in the response to mouse clicks and other isolated events. Any variable latency isn’t usually noticed and has little effect on the user experience. If the response times are too long, then this will have an obvious detrimental consequence for the user, but the recovery time will often be very quick. However, the continuous data streams created by video and audio streaming is another matter and doesn’t lend itself well to variable and long latency, in part due to the widespread use of large video buffers.
CDNs have been available for many years and seem promising for media streaming, but they are often built to optimize web server type applications. Due to the complex nature of media streams, they often demand special consideration to maintain high data throughput while keeping latency low.
A new breed of CDN is now emerging that optimizes media streaming by taking into consideration the specific needs of video and audio data. By focusing on the QoE and QoS associated with video and audio, the specialist CDNs help keep data throughput high, and latency low.
Machine learning (ML) is playing an increasingly important role in these specialist CDNs as a wealth of metadata is available from distributed monitoring probes throughout the internet. Combined with the monitoring information and datasets available for video and audio streaming, ML is improving reliability and resilience while keeping flexibility high.
Although IP networks and COTS infrastructures are providing unprecedented opportunities for broadcasters, we should always remember that video and audio is different to web server traffic, and this demands more focused and specialist solutions to be provided, especially for CDN.
CDN is a term that is used regularly and many hope it will solve the challenges of distributing IP media over the internet. However, the CDNs needed for broadcast applications are unique and specific technological challenges must be overcome.
The internet is built from just over 100,000 privately owned networks known as Autonomous Systems (AS) that collaborate to connect servers and ISPs to give the impression of one single network.
When a user browses a web page, the IP datagrams forming the messages and server responses traverse through the connected networks. Each AS has a commercial relationship with the ASs it connects to so seamless routing can be achieved.
One of the challenges we have with the individual networks that comprise the internet is that they were originally designed to transport HTML pages between web servers and browsers, not long media streams. HTML provides a simple text-based markup language that is easily edited by users, a system that is the core of web browsing today.
HTTP provides a specialist transport mechanism that encapsulates the HTML files so that compliant browsers and webpage servers can easily exchange data. Control messages and data responses must have their own protocol otherwise the server would just receive meaningless data and not know what to do with it. Adding the HTTP layer brings structure to the message and data exchanges.
TCP is needed to guarantee delivery of the IP packets as the underlying IP protocol only manages best-effort packet distribution. Although this may sound like an oversight, it is a great strength as it keeps latency incredibly low and provides much greater flexibility for system designers when building protocols to work with IP.
Short Duration Messages
The combination of HTML, HTTP, TCP, and IP leads to a message structure that encourages short bursty data exchanges. Intuitively this is correct as anybody surfing a non-media type webpage will be clicking on menu selections and hyperlinks, and then receiving data from the server in response to these requests.
From a user’s point of view, short messages are preferred as latency will be low, and hence response time will be fast. In webpage surfing, one of the key quality checks is the response time of the webpage from the users input.
The need to keep user-response times as fast as possible has led to the internet being developed and streamlined to exchange short messages. Generalized CDNs used throughout the internet embrace this philosophy to maintain high levels of user experience. However, this type of operation is completely the opposite to the type of delivery required by media.
Streaming video and audio, whether within an embedded webpage or a full blow OTT system uses very long continuous files, and for live broadcasts the live stream doesn’t even have a file length as it carries on for the duration of the transmission. But to transfer a media file over the internet, we must be compliant with the fundamental transport stream and the webpage server and browser architecture, that is HTTP/TCP, and these require short transactional bursts of data. Consequently, media streams are subdivided into smaller blocks of data through the process of chunking, but this leads to quality compromises.
A browser requesting a webpage update must transfer from HTTP to TCP to IP and then across the physical networks, the response from the web server is the opposite. This mechanism provides the fundamental operation of the internet and has been fine-tuned over many years for short messages. Streaming media uses long files resulting in many compromises when delivering over the internet.
Also, streaming media suffers from RTT (Round Trip Time) delays which are a consequence of the TCP protocol and are particularly problematic for long distances. When a server sends a block of data to the web browser it waits for the browser to acknowledge the data by sending a message back to the server. Only when the server receives this ack-message can it send the next sequence of the media stream. The objective of this exchange is to guarantee packet delivery and resend any packets that are lost in the network, a consequence of this is increased and unpredictable latency.
Media Special Requirements
Media streaming relies on video and audio compression. If an unrecoverable packet loss occurs in the GOP sequence, the video disturbance is evident for many seconds. One method of reducing this sort of risk is to increase the media players buffer size, however, this often results in long latencies.
Many scenarios occur within a network, such as lost packets and buffer management issues in switches and routers, that result in the TCP packets sent from server “timing out”, so it must resend the data, which in turn leads to the potential for duplicated packets wasting valuable bandwidth, thus exasperating a poor user experience due to increased latency.
CDNs succeed by moving the media files closer to the user by caching the data. In principle, this sounds like the perfect solution as the RTT times are significantly reduced leading to a decrease in latency, and congestion can be better managed. However, many of the CDNs are designed for general internet traffic consisting of short messages from servers delivering static content, not long media streaming files.
You might also like...
The Fourier Transform is complex in the mathematical sense, which means that each coefficient is represented by complex number.
We’ve encountered media companies along all aspects of migrating their workflows to the cloud. Some with large on-premises media processing capabilities are just beginning to design their path, while others have transformed some of their workflows to be cloud-native, a…
Netflix appears on the verge of introducing measures to curb sharing of passwords by subscribers with friends or others outside their household, after years resisting such a move.
The 2022 Commonwealth Games will be the biggest sports event on UK shores since London 2012 with around 1.5 billion global audience expected to watch over the 11 day event beginning July 28. Bidding for the host broadcast contract began in summer of 2019 with the…
Our sports media COO featured in this article continues to reflect on how the D2C business opportunity drives their decisions about where content is made available, how content is created and produced for different audiences, and how the “D2C…