Recall the actor Tom Cruise in the movie, "Minority Report"? Cruise stands before a large, wraparound screen and with his hands in mid-air manipulates and directs dynamic elements and information being displayed. The movie calls the process, "scrubbing the image." Could such virtual control of video servers, cameras and production switchers be just around the corner for broadcasters?
As media makers in a broadcast world we tend to think of VR as Media, however using mixed media (includes VR and AR) to create a user interface for highly complicated operations is much easier than creating content and will cause disruption across many industrial segments.
No, I am not talking about using VR as a design tool, what I mean is that the user interface to any complicated task such as managing a nuclear power plant or, for us in TV, a live sports production can be created in VR.
Imagine sitting in a specially designed chair with a pair of data gloves. Your available sources are arranged around the feed you are creating. Machine intelligence is automatically switching those sources to be the ones most relevant to the narrative you are telling. How and when to arrange those sources into the feed is still up to you but the toolset is now faster and more powerful. The hardware cost for this kind of interface is going to be a lot less than that for an OB van!
This seems to me to be a natural result of the trend toward remote production. Once your cameras are all being fed to “the cloud” and virtualized production can occur in some data warehouse with oodles of processing power, what is actually required on-site?
VR requires sufficiently detailed image displays. While early versions were a disappointment, new technologies will make VR imagery lifelike. Click to enlarge.
Can current VR headsets present a convincing display?
What is a convincing display? I recently met with Josef Schinwald, Chair of the MMA school at the University of Applied Science in Salzburg. His answer to the question; “We are not there yet”. A convincing display is only part of the solution, which must also include motion tracking, eye tracking and directional audio. Professor Schinwald points to developments in foveated rendering* and light field displays as the keys to achieving the required image quality. *(The image resolution, or amount of detail, varies across the image according to one or more "fixation points." A fixation point indicates the highest resolution region of the image and corresponds to the center of the eye's retina, the fovea. Bing)
Foveated rendering takes advantage of the eyes built in compression scheme. We can only see sharply with the central portion of the eye, the fovea. We move our eyes around so that the fovea scans the scene, the brain then puts the image together. By displaying at high resolution only where the fovea is pointing we can significantly reduce both processing power and total display resolution. This is what the folks at VARJO are doing and the results are impressive.
On the left, Occulus original display. On the right, Occulus+ shows a much improved displayed image.
Light field displays allow for natural depth vision. Our eyes naturally adjust focus depending on the distance to an object. If the visual cues within a scene tell our brain that it has to adjust focus and nothing happens we will get a headache! You have all seen the demos from LYTRO which make it possible to adjust focus and depth of field in post. Avegant have incorporated this technology into a head mounted display.
The biggest advantage of this is in mixed reality applications as it solves the problem of near field focus that is currently bedeviling Sony, Microsoft, etal.
It is interesting that Professor Schinwald is quite confident that the technical challenges will be solved in the next couple of years, he is more concerned that the headsets become more comfortable as he is currently wearing the clunky prototypes for hours each day!
Is the added delay acceptable?
Will the audience accept the necessary delay between the live action and viewing experience? We could emulate software developers and say “That’s not a bug it’s a feature”, obviously! We want the live in-stadium experience to be better (despite the fact that you will see and hear more on your 100” living room screen). Seriously, there is already a delay and no-one is complaining. Is that delay going to increase with the added hop to the cloud and back?
The pixels captured by a camera go through several stages of processing and transmission before they are visible at the users TV set. The delays contributed by each of these processing steps, as well as the time required to transmit the compressed video stream,produce the overall delay, sometimes referred to as end-to-end latency. The biggest contributors to video latency are the processing stages. Such as encoding and decoding that require temporary storage of data, i.e., short-term buffering in some form of memory. Video systems engineers therefore tend to measure latency in terms of video data, for example, a latency of two frames or eight horizontal lines, this is really not very helpful in today's world of multiple resolutions and frame rates.
This table illustrates that the major contributor to lower resolution is the buffering required to compensate for the indeterminate nature of the transmission pipeline. In order to avoid additional delays as a result of using a virtual control room, we will need to ensure a strict QoS from the cloud to the computer running the VR software. Click to enlarge.
Visually lossless 4K bandwidth requirements (10-30 cameras)
This has always been a problem for me ever since I got involved in Television 50 years ago; “What exactly is broadcast quality?”. I used to run IVC 9000 analog VTR’s. These were way better than the AMPEX that everyone else was using, but this was never mentioned in a trade journal or textbook. Even SMPTE agreed that that kind of S/N was not necessary. So why is everybody pushing uncompressed or mathematically lossless signal quality these days?
First, I believe that as the resolution of the cameras goes up the number of cameras required to present an engaging presentation will go down. In any case, I find it unrealistic to transmit 30 uncompressed video streams simultaneously and to store this information indefinitely. So we are going to assume that visually lossless compression will be used.
Here is the funny thing about “visually lossless”, it is subjective!
In addition inter-frame compression is going to have a variable bit rate. This brings us back to the latency problem i.e. in order to reduce the total required bandwidth you have to increase the possible latency.
The bottom line? It is going to happen because it is cheaper. It will be sold as allowing the consumer access to all the footage of any sporting event. I can even imagine an app (that most people will never use but really cool) to allow the end user to cut their own live show!
Related Editorial Content
“81% of internet and mobile audiences watched more live video in 2016 than in 2015.”— Livestream.com
Next release of Creative Cloud, due summer 2016, adds several key workflow enhancements to Premiere Pro. Included are first steps to VR Video capabilities including field of view mode for spherical stitched media.
ATEME’s TITAN live software-based video compression solution was successfully demonstrated at the recent 2017 French Open to support a Virtual Reality (VR) application available during the entire tennis tournament. The application offered a fully immersive 180-dgree and 360-degree live viewing e…