Re-Evaluating OTT/Streaming Security: Part 3 - Forensic Watermarking

Forensic watermarking is extending beyond premium movie and live sports content to protect lower value on demand assets as well. It is becoming an essential pillar of revenue protection, alongside DRM and encryption.

Forensic watermarking has taken two decades to progress from being a niche system for protecting premium movie content against camcording to an essential pillar of revenue protection for streaming services, permeating across the whole video ecosystem. At first it was applied in the streaming world mostly to premium movie content, encouraged by Movie Labs mandate that watermarking must be used to protect Ultra HD content from April 2014.

Then as streaming services proliferated and increasingly distributed content previously confined to traditional channels, watermarking came in to play for protection of valuable live content, especially sports. This added an additional dimension, because the value of live sports content decays rapidly after the event begins and so any actions to combat illicit redistribution of the streams must be taken quickly.

Watermarking in general has various purposes, originally to identify counterfeits in the case of banknotes. For photographs it can impede copying by being perceptible in published online images, or to help identify the creator. In the case of video, the watermark should be imperceptible to the viewer with the function being primarily to enable identification of infringing individual streams of content that have been coopted for illicit distribution over the internet.

The technologies are therefore very different, although there is some overlap with the photography case for single stream watermarking. For the video case there are various trade-offs, between robustness of the mark against deliberate or accidental tampering, computational complexity, and time taken to extract the marks. The number of options has proliferated as popularity has increased, and despite the arguments of individual vendors and technology proponents, it really is a case of horses for courses.

The most appropriate watermarking technology will vary with the service and factors such as content profile, geographical location of recipients, and scale of the operation in terms of audience size and content diversity. There are some common factors, and one is that watermarking, while it has come center stage in the battle against streaming piracy, is insufficient on its own to protect against revenue theft.

In most cases it requires preexisting DRM and Conditional Access (CA) based on encryption to ensure in the first instance that content is not misappropriated even without being redistributed illicitly. The DRM enforces business rules, such that subscribers can only access content they have paid for, while CA is designed to protect against unauthorized access generally.

Watermarking then kicks in by protecting against illicit redistribution of streams that may well have been acquired legally in the first instance. This has become the major front against piracy as streaming has become predominant while traditional forms of circumvention, such as sharing of smart card control words, has withered away.

As a result, watermarking itself has come under sustained attack from pirates and become embroiled in a technological arms race more even than encryption, which has more often been compromised by theft of credentials or lapses by users, than by sophisticated countermeasures.

Irrespective of the technique, application of forensic watermarking involves four fundamental steps. The first is injection of the marks, which in the absence of infringement is as far as it gets, with users being unaware of the process. The second step is then the piracy, acquiring a stream and then redistributing it to users who may or may not be paying for the stolen content.

The third step then lies in detecting that piracy has occurred and identifying which content has been misappropriated, under the banner of “network forensics”. This third step may involve video fingerprinting, recording a snapshot of an asset for storing in a database, which is then run against the illicitly redistributed content to identify which items have been stolen.

This process is usually automated since it is impossible to do it practically or affordably through manual inspection. The process is computationally light because one hour of video yields a fingerprint of just 115 KB of data. Unlike watermarking, fingerprinting does not require insertion of any data into the content payload, just taking a snapshot.

Then the fourth step is extraction of the watermarks from a stream for identification of its source. This information can inform immediate measures, such as blocking distribution of that stream at the CDN level, and also be used as evidence for subsequent legal action.

Forensic watermarking can be categorized both by where the marks are inserted, and the technology used to achieve this. Development of the latter has been driven in turn by evolution of methods employed by pirates to disable them or circumvent their protection.

There are now three places the marks can be inserted in the delivery chain, at the head end where content originates for distribution, at the client where viewing takes place, or increasingly at the edge of the network or CDN.

Watermarking is also increasingly being employed in the contribution chain to identify leaks there which apply mostly to valuable on demand content. In that case it tends to be on a larger scale, to identify leaks from within studios for example, rather than redistribution of individual stream instances.

There is also a third dimension, in that marks can be inserted either in the compressed or uncompressed domain, which also have respective pros and cons. Video is often stored compressed so watermarking at that level can be convenient and as the files are smaller, computational cost is reduced, providing scope for employment of more sophisticated real time techniques.

The downside is that since compressed video undergoes subsequent decompression and often transcoding between different formats, marks are vulnerable to distortion and often do not survive intact for later extraction. There is also the fact that small changes in the encoded information are amplified by decoding, which makes the watermarks more likely to cause visual artefacts upon viewing.

Most early watermarking was applied at the headend, exploiting the greater computational power available there to apply marks considered robust against attack at the time. The disadvantage is scale given that unique marks have to be applied to every stream instance, which means it has to be done a million times for that many users. At the server-side marks would typically be inserted in individual pixels, while often exploiting temporal redundancy across frames to increase complexity without introducing artefacts visible to the viewer.

The technique commonly applied at the head end, or session, case is known as A/B watermarking. This operates at the level of individual chunks or segments and scales quite well itself to large numbers of users given access to reasonable computational resources. Two versions of the content are created, with different marks inserted in each, which we can call A and B.

Then during playout from the head end, each stream is played out as a unique combination of chunks selected from one or other of the two versions. One user might start with a chunk from the A version followed by one from the B version. Another stream might start with B. The number of permutations increases with the square of the number of chunks. With just one chunk there are two possibilities, A or B. With two chunks there are four, AA, BB, AB and BA, and so on. Therefore it only requires 23 chunks to provide all of 10 million users with a uniquely marked stream.

The main downside of A/B streaming is that it is vulnerable to attacks that exploit knowledge of the implementation, which is quite well understood. For example, by accessing two streams, pirates can potentially scramble the order of the chunks such that it is impossible to identify the infringing source upon extraction of the marks.

For this reason alternatives have been developed such as bitstream modification, where just one version is recorded within which selected chunks are modified individually for each stream instance. This process, as with A/B streaming in principle, can be “content aware”, exploiting structure of individual frames to minimize risk of the marks being perceptible to the user and spoiling the viewing experience. It can also be made more resistant to attacks in various ways.

Client-side watermarking also gained in popularity as streaming proliferated, partly because it appears more scalable. Each client inserts its own unique marks, so its computational load stays the same regardless of how large the total audience is.

But the client is also computationally constrained, which makes it harder to be robust against attacks. This also makes it impossible to be content aware, because of the computational cost that would be incurred analyzing frames. Therefore it is more challenging to optimize robustness against invisibility of the marks to the viewers, given that content cannot be taken into account.

Unlike session-based watermarking, client-side marks are usually inserted via a software development kit (SDK) integrated with the video player. The SDK also extracts the watermark for forensic analysis, so that the whole process is quite efficient, with the caveat over robustness and perceptibility.

More recently the alternative of edge-based watermarking has evolved, which to some extent combines the best of both the client and session-based approaches. This has descended especially from head end marketing for valuable on demand content such as movies, riding with the wave towards distribution of functions towards the user, at the edge of CDNs where fixed or mobile broadband networks take over. Indeed, edge watermarking is a good fit for distribution over 5G mobile networks at the point where individual streams break out.

Edge watermarking is more scalable in the sense that the insertion of marks is distributed from a single headend to multiple edge points closer to the user. There may still be a large number of users served from the egress points of a given CDN, but then CDNs themselves have been built for scale with capabilities for handling large numbers of unicast streams.

As a result, edge watermarking tends to be done in tandem with a given CDN, which is both a strength and weakness. The strength is that it builds on the robustness and scale of CDNs with their support for large numbers of streams. The weakness is that it introduces dependency on a given CDN. So we find that vendors of watermarking technology promote their partnerships with specific CDN vendors as a strength, but it makes them somewhat dependent on the CDN and also means it will not work with any that is not supported.

All forms of watermarking are subject to attacks, many of which target the process of extraction with the aim of causing recognition failure. This extraction process relies on synchronization and recognition of pixel patterns.

Rotation attacks attempt to discombobulate the extractor by rendering the pattern unrecognizable since the extractor does not know the image pattern has been geometrically altered. Such geometric attacks can be countered by exploiting so called Zernike moments, which describe objects in frames or images as a total structure not dependent on being at a given orientation. This makes them invariant to rotation in the sense that a circle is, in this case enabling the angle of rotation to be calculated so that the original orientation can be determined.

Collusion attacks are also commonly employed where the aim is to delete some of the embedded marks by comparing two or more streams of the same content. This allows some marks to be eliminated by subtraction, since they stand out as being the only differences between two given frames taken from the streams. All the rest of the content payload is the same for the two streams.

Various methods have been developed to counter collusion attacks, one involving random distribution of marks across successive frames, exploiting the temporal dimension of video. This has to be reflected in the mark extraction process, which has to scan multiple frames, some of which may not be watermarked at all. But this makes it difficult for attackers to identify where the marks begin and end without prior knowledge.

The whole field is still developing, perhaps faster than ever given the growing employment of watermarking. Service providers face a bewildering range of options in a sector where eventual commitment to a vendor is likely. It may well be worth courting multiple opinions, or seeking an independent advisor not affiliated to a given technology or vendor.

You might also like...

Next-Gen 5G Contribution: Part 1 - The Technology Of 5G

5G is a collection of standards that encompass a wide array of different use cases, across the entire spectrum of consumer and commercial users. Here we discuss the aspects of it that apply to live video contribution in broadcast production.

Why AI Won’t Roll Out In Broadcasting As Quickly As You’d Think

We’ve all witnessed its phenomenal growth recently. The question is: how do we manage the process of adopting and adjusting to AI in the broadcasting industry? This article is more about our approach than specific examples of AI integration;…

Designing IP Broadcast Systems: Integrating Cloud Infrastructure

Connecting on-prem broadcast infrastructures to the public cloud leads to a hybrid system which requires reliable secure high value media exchange and delivery.

Video Quality: Part 1 - Video Quality Faces New Challenges In Generative AI Era

In this first in a new series about Video Quality, we look at how the continuing proliferation of User Generated Content has brought new challenges for video quality assurance, with AI in turn helping address some of them. But new…

Minimizing OTT Churn Rates Through Viewer Engagement

A D2C streaming service requires an understanding of satisfaction with the service – the quality of it, the ease of use, the style of use – which requires the right technology and a focused information-gathering approach.