With the recent advances in artificial intelligence, post-production rotoscope tools are now so good that some DOPs are asking if we still need to use green screen at all, or at least, in quite the same way. Suddenly, it seems any background, no matter how complex, can be replaced in seconds on the editor’s timeline, allowing DOPs to shoot complex compositing assignments faster and more economically. For broadcasters, the benefits of an AI-powered rotoscoping tool are substantial as producers can repurpose a treasure trove of existing footage packed away in libraries and potentially reduce the need for new production or costly reshoots.
Truth is, artificial intelligence has been a part of some sports photographers’ kits for at least a year, since the introduction of the Canon EOS 1D Mark III camera featuring a sophisticated auto-focusing capability powered by a form of AI.
a) Canon EOS 1D Mark III; b) Sony Alpha A9. While auto-focus is something of an anathema to many DOPs, recent still cameras from Canon and Sony feature AI-powered auto-focusing systems that may eventually change that. Looking ahead, broadcast video cameras may soon offer a ‘compositional assist’ and reliable auto-focus drawing on the rapid advances in machine-learning.
I can explain the notion of artificial Intelligence. Long before the arrival of microprocessors, as a numbingly nerdy junior high school student in 1968, I set out to construct a ‘learning computer’ utilizing thousands of playing cards to navigate a 16-square game of chess. The retired cards from a Las Vegas casino were placed in a raft of large manila envelopes affixed to the classroom walls, each envelope containing all the possibilities for a single move beginning with Move Number 1. It was a serious undertaking, to say the least, managing and labeling a veritable sea of playing cards.
When it came time to play, defeating my playing-card ‘computer’ was at first a no-brainer. For each of the computer’s moves, I would draw a card at random from the respective move envelope. Then, every time the ‘computer’ made a mistake that resulted in a loss, that card would be moved to the Trash, which in this case was an actual, honest-to-goodness trash can. Over days of constant play and arriving at school two hours early, the discarding of dozens then hundreds of losing cards left enough winning cards in the envelopes that suddenly my exceedingly modest computer became adept at serving up only intelligent moves. As a display of artificial intelligence, my attempt at a learning machine wasn’t half bad, especially for a 13-year old with little or no chance for a social life.
Today, of course, thanks to ever more powerful microprocessors, engineers are able to sort through billions of playing cards per second and jettison the losing ones with infinitely greater efficiency. For camera manufacturers and DOPs, particularly, the major advances in AI are especially applicable to developing a smarter, more efficient auto-focusing system.
But alas, like my 16-square chessboard with the many thousands of possible moves and permutations, the challenges to developing a truly reliable auto-focus algorithm can be overwhelming.
In early 2020, Canon introduced the EOS 1D Mark III camera that specifically targeted the demands of professional sports photographers. Taking advantage of artificial intelligence, Canon’s engineers developed a smart auto-focus system by exposing the camera’s Deep Learning algorithm to tens of thousands of athletes’ images from libraries, agency archives, and pro photographers’ collections. Just as in my playing card computer from junior high, the auto-focus algorithm, unable to distinguish the athlete from other objects, would be ‘punished’ by removing the parameter decision that led to the error and loss of focus.
While the camera’s deep machine-learning algorithm did improve as expected over time, its reliability and ultimate usability lagged as athletes appear quite different depending on the sport, often with their faces covered by a mask or helmet. This meant that standard existing facial recognition formulas couldn’t be used. More than merely tracking an athlete’s eye, the AI-powered algorithm had to learn to recognize other human clues like the pin-stripes in a uniform or the number on the back of a jersey.
a) Nature Fights Back; b) Red Tailed Hawk. Given large format cameras and long telephoto lenses, wildlife DOPs face a range of perils, including the daunting challenge of finding and keeping critical focus in very tight close ups. While today’s auto-focus cameras take ample advantage of machine-learning, cameras fitted with such systems effectively stop learning once the camera and firmware leave the factory. As true artificial intelligence develops, our cameras may continue to learn from in-field use. But then we will need to be careful. A camera’s AI system that is constantly learning may also learn our bad habits, so DOPs hoping to exploit artificial intelligence might actually become worse at the craft, and not better!
Some Sony mirrorless still cameras also feature a form of AI-powered auto-focus called Eye AF, an AI-powered technology optimized for shooting animals and wildlife. Compared to humans, developing an auto-focus algorithm for animals poses an even greater challenge since animals can range so widely in appearance – some are large with eyes at the front; others are small, with eyes at the side. There are snakes and rabbits, moose and crocodiles, and who knows how many birds, so it is hardly surprising that no algorithm, even after extensive deep learning, can reliably recognize and track the faces of every animal. Still, with the rapid advances in artificial intelligence and machine-learning, broadcast DOPs can reasonably expect that video cameras, too, will eventually acquire an auto-focus system capable of differentiating a broad range of complex creatures and objects.
Ironically, the technology provider having the greatest impact on DOPs today may not be a camera manufacturer at all. At NAB 2016, to little or no fanfare, Adobe introduced its Sensei machine-learning algorithm. The implications of AI machine-learning entering post-production were enormous. While post-production is not normally in most DOPs’ job descriptions, the fact is that many DOPs today already exercise some degree of post-camera image control, to remove flicker from discontinuous light sources, for example, or to stabilize images.
In 2019, taking advantage of machine-learning and the Sensei algorithm, Adobe introduced the Content Aware Fill feature for After Effects. For broadcasters, the feature extended the power of AI for the first time to video, and editors could now easily and quickly remove an unwanted object like a mic or light stand from a shot. To DOPs, the reasoning behind CAF was reminiscent of long GOP recording in some ENG cameras, where pixel information from one frame is used to fill in the missing or discarded compressed information in another. Adobe’s Sensei algorithm proved particularly adept at tracking pixels from one frame to the next, blending and fusing the pixel data to appear seamless.
After Effects Content Aware Fill. Applying artificial intelligence to moving video poses many challenges due to the consistency required for frame-to-frame tracking of gamma and color. The Content Aware Feature in Adobe After Effects takes advantage of the Sensei machine-learning algorithm to capture lonely scenes like this one without having to stop traffic on a busy highway.
Roto Brush 2 further extends the power of machine learning to the laborious, time-consuming task of rotoscoping. While Adobe’s Roto Brush 1 mainly used edge detection to identify color differences, Roto Brush 2 uses Sensei to look for common or uncommon patterns, sharp versus blurry pixels, and a panoply of three-dimensional depth cues to separate people from objects.
But even with the considerable advantage of AI learning, Roto Brush 2 can still only accomplish about 80% of the rotoscoping task; a human being is still required to input the additional intelligence to craft and tweak the final matte.
Green screen set. Can AI-powered rotoscoping tools like Adobe’s Roto Brush 2 really replace green screen? Some DOPs are beginning to think so, especially in more complex scenes and setups such as many cityscapes.
And there’s another thing to consider. The AI machine-learning from Canon, Sony, and Adobe, is static, meaning any learning ends once the algorithm is loaded into a camera or downloaded to a user’s computer. Unlike true artificial intelligence, there is no ongoing learning capability to improve from actual DOP use in the field.
a) iPhone vertical movie; b) Adobe Premiere Pro SS. The iPhone and software applications like Adobe Premiere Pro 2020, use a form AI to crop images, nudging the composition one way or the other to transform the horizontal into a vertical frame. A machine-learning algorithm can accomplish the task in a more tasteful way than simply applying a one-size-fits-all solution without regard to content, people, or objects, inside the frame.
So can artificial intelligence really obviate the need for green screen? In some cases, no doubt. In THE AVIATOR (2004), DOP Robert Richardson was said to have not bothered cropping out an entire side of an aircraft hangar because he knew it could be done more quickly and easily in the Digital Intermediate. Broadcasters, today, using inexpensive AI-powered tools like Roto Brush 2, have approximately the same capability on the desktop to remove and rotoscope unwieldy objects like skyscrapers with ease, convenience, and economy.
a) rotoscoped baseball player; b) Roto Brush 2 SS. Roto Brush 2 extends the power of machine-learning to the laborious, time-consuming task of rotoscoping. Roto Brush 2 has learned to recognize the human form, and is thus able to isolate it, frame by frame, from a background. But even with the power of AI and machine-learning (b), Roto Brush 2 still requires some human input to produce the final matte. [Screenshots from CM de la Vega ‘The Art of Motion Graphics’ https://www.youtube.com/watch?v=uu3_sTom_kQ].
For most routine applications, it makes sense to continue to use green screen, as the process for most of us is familiar and straightforward. But the option is there for DOPs, as never before, to easily remove or replace a background element in a landscape or cityscape where green screen isn’t practical or possible.
Why Did You Read This?
You might also like...
It should constantly be borne in mind that although digital audio is a form of data, those data represent an audio waveform and there are therefore some constraints on what can and cannot be done to the data without causing…
New York City has been impersonated by a lot of places, from Gangs of New York (shot in Italy) to Escape From New York (Atlanta) to Wall Street story American Psycho (Toronto), and even the toponymic 42nd Street, which was…
Every decade has had a buzzword. Watch a 1950s educational movie and realize how dated the term “atomic” sounds now, and not only because the downsides of nuclear power have since become so painfully apparent. Since then, we’ve been sold …
The peculiarities of the motion of planet Earth are responsible for much more than seasons and the midnight sun and it took a while before it was all figured out.
In this new series John Watkinson looks at all aspects of microphones, including how they work and how they don’t work.