The recently released Adobe Firefly beta created these fake images from the simple prompt ‘business suit, primary colors, serious look.’
The technology used to create deepfake videos is evolving very rapidly. Is the technology used to detect them keeping pace and are there other approaches broadcast newsrooms can use to protect themselves?
Access to the latest deepfake technology is as near as the internet. It is simple to learn and use while becoming more difficult to detect. Only three things are needed to create a deepfake video: A video with a target face, a faceswap image, and software will do the job. Standby to be amazed.
Artificial intelligence (AI) is about creating something new before the competition does. In the AI world it is becoming increasingly difficult to tell what is real from what is not. Audio and video AI technology has accelerated rapidly in 2023 and will continue ramping up faster in the foreseeable future.
There are so many potential TV AI upsides... for example algorithms are being developed and tested to colorize and upconvert virtually all video and film content to 4K HDR. The goal is to automatically convert analog SD VHS and U-matic videotapes, as well as old 2” quad and 1” Type C tapes and 8 and 16 mm film to 4K without artifacts. AI-based upconversion will refresh and revitalize a huge library of outstanding but technically outdated program content. Colorized, 4K ‘Three Stooges’ shorts or ‘I Love Lucy’ episodes could attract new TV audiences. In a home market filled with 4K and better monitors and TV sets, younger generations have lost interest in classic 525-line B&W TV dinosaurs.
One downside to TV AI is deepfake video and audio capabilities that are moving as fast as AI technology itself, and technology improvements are making it harder to recognize fakes. More than a few experts are concerned that deepfake video and audio can significantly sway opinions in 2024 politics in the US and elsewhere.
A YouTube search for ‘deepfake video’ and ‘detect deepfake video’ reveals hundreds of deepfake creation and detection tools, most a few months old or less, immediately available on the internet. Many are beta or free or restricted by a daily token allocation. Deepfake videos are as cheap to produce as they are easy to make.
There are literally hundreds of internet websites claiming to provide everything necessary to make a TV station-quality studio production, from on-camera talent, writing, and teleprompter, to directing, switching, editing, and content distribution, using deep-learning algorithms. Many of the same systems can also be used to create and produce professional-looking industrial and training videos with virtually no video equipment or on-camera talent.
Most AI deepfake or synthetic video systems are primarily drag-and-drop, and work on modern PCs, tablets, iPhones, and Androids. AI generated “Deepfake” videos don’t need much other local technology power. Many people use ChatGPT for ideas, scripts, conversations, and scene descriptions, and Midjourney for faceswaps and to generate images from ‘prompt’ descriptions to create characters and scene designs. A similarly capable AI image generator beta version of Adobe Firefly was also recently released.
invideo.ai AI software created a video clip about ‘ATSC 3.0 Nextgen TV’ that looked as professional as a smart intern would produce, given my lack of more specific instructions. The software identified every video source, which was all public domain.
There’s AI software available to create TV content directly from Wikipedia and from the Internet, like ChatGPT but with video and graphics. One is nicknamed VideoGPT. I logged on to inVideo.io and asked it to create a three-minute video report on “ATSC 3.0 NextGen TV.” The video was ready to view in 5 minutes and every visual source was identified. It wasn’t fake but it wasn’t human either.
How to Identify Deepfake Videos
TV broadcasters aren’t the only target of deepfake videos. In 2023, social media and YouTube attract more viewers than licensed TV stations. Nearly all the deepfake video systems are designed to pass YouTube’s Content ID, which interacts with the Digital Millennium Copyright Act (DCMA). Content ID does not question or verify honesty.
Content ID restricts creators beyond what the law allows to protect YouTube from large copyright holder issues. It particularly affects audio, making music a copyright risk that can get a YouTube clip or feed pulled immediately. TV stations and groups usually pay for the rights to use copyrighted music. My personal experience has taught me that YouTube doesn’t care if you have broadcast rights or not, especially on weekends.
The concern about broadcasting deepfake videos is rooted more in the distribution of fake content that appears to be real video. Broadcasters employ professionals who edit news videos all day, every day, who are trained to recognize, evaluate, and correct all anomalies, artifacts, and other technical issues as they edit. Deepfake video is also easy to reverse engineer because it can be examined frame-by-frame and in forward and reverse slo-mo by many trained eyes.
Currently, most stations rely on trained eyes and trusted sources. Typical station policies require news producers to identify and credit sources for all outside video and audio content used in news stories and newscasts. That source policy is a deepfake safety net for stations.
Common video red flags to look out for include too-perfect symmetry, visual distortion, unnatural shadows, and unnatural textures. AI artifacts often occur in landscape and closeup details. Count everyone’s fingers and watch for telltale watermark clues. If it looks or smells fake in any way, it deserves further investigation or rejection.
Stations are suspicious of deepfake, but a major meltdown crisis will fortify the deepfake defense at the group and station level. Some groups and stations are using outside sources to train their crews to recognize deepfake videos. Nearly all stations are relying on their most experienced local ‘golden eyes’ and ‘golden ears’ to recognize questionable content.
Identifying Deepfake Audio
Engineers I spoke with said recognizing deepfake audio is much more challenging than finding deepfake video. If the lip sync matches, an alternate method to identify and recognize an audio anomaly is to view the audio track analog signal on a general-purpose oscilloscope. On a properly adjusted oscilloscope screen you can literally see subtle changes in background sound levels or ambient noise, if any, when a word or a short series of words are changed by editing.
A mid-sentence background sound disturbance is a significant red flag. If the words being said are questionable, discuss the issue with the source before it airs, and error on the side of caution. YouTube is filled with amazing examples of audio deepfakes buried in deepfake videos.
Automated Detection Progress
Automated deepfake creation and detection is starting to run in neck-to-neck competition for technical dominance, and it’s all software. There are many kinds of AI detectors on the internet, some detect AI-written text, others detect AI-created artwork. There are also hundreds of websites touted to detect deepfake video and audio. Many are free but none claim 100% accuracy. Most give a percentage of probability. More accurate deepfake detection work is underway at many levels and facilities worldwide.
For example, the Content Authenticity Initiative (CAI) was co-founded by Adobe in 2019 to counter the rise of misinformation, and bring trust and transparency to digital content such as suspected deepfake video and audio. CAI is developing a new technology called Content Credentials, a so-called “nutrition label” for content that could be embedded into digital content. CAI is a community of media and tech companies, NGOs, academics, and others promoting adoption of an open industry standard for content authenticity and source origin.
The C2Pa is another “Coalition for Content Provenance and Authenticity” tied to Adobe, BBC, Intel, Microsoft, Sony, Truepic and Publicis Groupe. The C2PA specification uses blockchain based encryption to bind provenance (initial location) information to media assets, and then adds additional data record layers throughout its entire journey from the moment of creation through every edit made. If the asset is edited without adding a new data record it shows a red flag.
Another system under development takes a more active approach to deepfake detection. It embeds digital watermarks in the audio track and uses them to detect fake news clips using voice impersonation.
Many other sophisticated AI defense systems being developed use proven lie-detector techniques to monitor for voice stress and visual clues. A platform called FakeCatcher studies color changes in faces to infer blood flow. It monitors natural color fluctuations over time as the heart pumps blood, and there’s coherence across facial regions. In one test, the detector achieved 91 percent accuracy.
Video watermarks are also being tested to trace the origin of deepfakes. A couple of camera manufacturers have added watermarking, but it doesn’t seem to have reached the broadcast camera level yet because it doesn’t take an expensive broadcast camera to make a deepfake video.
Some manufacturers such as Adobe are using specialized security for their own systems such as frami.io for Premiere, but that’s not where most mischievous deepfakes are coming from. Deepfakes come from easily accessed technologies that can become a threat to the legitimacy of TV news broadcasting.
There are hundreds of web- or cloud-based systems designed to detect deepfakes, and a larger number of systems to create them. Most are not perfect and worth what you pay for them, but the technology is moving so fast that many solutions are available for free in beta or as trial software. Now is a terrific time for hands-on deepfake learning at your own pace, on the internet, for free. Become the local expert and help prevent a meltdown.
You might also like...
TV stations have mostly parked their satellite trucks and ENG vans in favor of mobile bi-directional wireless digital systems such as bonded cellular, wireless, and direct-to-modem wired internet connections. Is Starlink part of the future?
Scalable Dynamic Software For Broadcasters is a free 88 page eBook containing a collection of 12 articles which give a detailed explanation of the principles, terminology and technology required to leverage microservices based, software only broadcast production infrastructure.
John Watkinson continues his exploration of the potential for a true motion tv system that requires the complete removal of frame sampling to make each pixel a continuous representation of the image thus removing motion artefacts.
Moving beyond the use of three primary colors could significantly increase the range of colors we can reproduce. There is no doubt it could improve the viewer experience but what are the barriers to adoption?
Traditional monolithic software applications were often difficult to maintain and upgrade. In part, this was due to the massive interdependencies within the code that required the entire application to be upgraded and restarted resulting in down-time that regularly created many…