Software Infrastructure Global Viewpoint – June 2020

Beyond Video Frames

With our constant attempts to improve the picture quality, dare we now even contemplate the idea of breaking away from the shackles of the decisions made in the 1930’s and significantly increase our frame rates or even remove them completely?

Understanding television is a lesson in history. Most, if not all broadcast formats are derived from either PAL and SECAM 625-line or NTSC 525-line systems. In turn, these 1960’s standards were derived from the earlier 1930’s serial scanning and display technology which established the idea of video frames firmly in our workflows.

There are no moving pictures in television, just lots of still images played very quickly to give the perception of motion. In fact, television is an illusion.

I remember seeing my first high frame rate progressive scan images some ten years ago at an IBC demonstration and was completely overwhelmed by the fluidity of motion. This in part was due to my interlaced conditioning, but the images were so smooth they were almost real, delivering what we now refer to as an “immersive experience”. But what we’ve learned in recent years is that as we increase the size of our viewing screens, we also need to increase our frame rates, otherwise the pictures have a staccato feel about them and can even look jittery.

I realize that 4K, UHD and 8K are taking the frame rates increasingly higher with ST-2082 now delivering 120fps, but I fear this just takes us to where we already were as we move to 8K?

There is another option, an option that is completely future proof. Instead of sampling images using the traditional broadcast approach of scanning sequential frames, “event cameras” provide a data event whenever a photosite changes in luminance value. The smallest change in brightness will generate a data sample that is proportional to the luminance level for each photosite.

Clearly this system creates an incredible amount of data, but the data rate is proportional to the luminance variances in the scene, or put another way, the amount of motion.

Event cameras lend themselves well to machine learning (ML), a branch of artificial intelligence. Neural networks used extensively in ML can receive fantastic amounts of data and process in parallel making them adaptable to image detection, facial recognition and object perception. IP computer networks further enhance event cameras as their data can be easily distributed to facilitate many different methods of processing and storage.

We take an image sensor that only generates data when a change in luminance has occurred and then feed all the data into a neural network, and there’s no reason why each photosite in the sensor cannot be fed directly into an input node on the neural network. Then the “taught” neural network can recognize objects, people, fast cars, or anything else we like. And what have we done? Created a beautiful working simulation of the human visual system! The very system we’re trying to deceive in the first place.

Why are event cameras future proof? Well, it’s relatively easy to take a varying scene that’s generating lots of time invariant data and sample it at a predefined rate to form parallel images, thus making them backwards compatible with existing television systems. Furthermore, if we have enough data from the event camera’s sensor, we can create incredibly high frame rate image sequences to guarantee fluidity of motion, especially in high speed sports productions. I just hope the slo-mo vendors are watching these!

Commenting is not available in this channel entry.