Machine Generated Content - Part 2
In part 1 of our ‘Essential Guide: Machine Generated Content’, we focused on the technical challenges of real-time rendering and the different approaches for automating their creation. Part 2 outlines how Generative AI can play its part, the implications and risks of using it, and why it might not always be the best approach.
Generative AI - How It Works
Using AI to create video clips begins by writing a textual prompt to describe what your goal is. The outcome depends on how well you describe what you want and the source material available to the AI engine for training.
Here is a very simplified outline of how Generative AI creates video:
- Create a textual prompt or document to describe your goal to the AI system. Automation can be applied here.
- The prompt is parsed to understand the semantics and theme of the request.
- The parsed results create the search queries required to locate suitable source material.
- This may involve an Internet search if the content is not already ingested. Only use content that you have cleared the legal rights and licences for.
- Candidate stills and moving images are selected to create a dataset based on the detected theme.
- The AI model is trained on this sub-set of content using computer-vision and neural network tools.
- The rendering process uses a technique called Latent Diffusion. The starting point is a frame of random noise. Latent Diffusion is an iterative process that gradually removes the random noise from the image until it resembles the desired output.
- The output is reflected through a neural network or computer vision system to recognize the image and determine whether it conforms to the description in the initial prompt based on the found assets.
- The resulting video may also be passed through a transformer to ensure the sequence of frames is coherent.
- The feedback loop is iterated until the result is satisfactory. This is why it is so compute intensive.
The Latent Diffusion algorithm is based on earlier noise reduction techniques that improved video quality by removing random noise from the image. The starting point is 100% noise that is iteratively ‘cleaned’ to produce the final image.
Stable Diffusion is an open-source implementation that applies Latent Diffusion algorithms in a practical way.
Neural Networks are used as pattern recognizers in a feedback loop to compare the images with the original prompt. This coerces the process until it converges on the desired result. Large Language Model Transforms maintain coherence over a sequence of frames.
Transformers are typically used to steer text generators so their output makes sense. They are now being used with video to enhance the frame-to-frame coherence.
Client Side And Edge Rendering
Video compression continues to improve with new algorithms reducing the bandwidth required for delivery. At the same time screen sizes and resolution are increasing which increases the bandwidth needed.
Further bandwidth reduction is possible if the player or edge-server can locally composite multiple layers of video. Only the active area needs to be compressed and delivered; static areas are replaced with a stationary image as a background and a moving alpha channel mask is delivered with the presenter overlay to assist the compositing process.
This would work for weather forecasts and BSL signer overlays which can then be switched on or off at the receiver like a subtitle overlay. Signed presentations could then be delivered at primetime rather than unsociable hours for viewers to record and watch offline. The weather forecaster overlay could be omitted entirely on a very low bandwidth local connection to a client device.
Using Web Technologies To Create A Smart Frame Store
In sports coverage, statistics and scoreboards are generated from incoming data feeds. These can be generated automatically by AI or manually entered by an operator at a sporting event instantaneously as things happen.
A simple device for generating viewable content can be built around a low-cost PC with video output. If this is running an application that hosts a full-screen web view, the content can be described in an HTML container. Using a combination of the W3C web standards, most info-graphic displays would be very easy to create. The application can respond to incoming messages arriving via a service listener port to trigger an update and redraw. Maintaining several separate buffering web-views allows the content to be drawn offscreen and then moved into view when it is complete:
| Technology | Description |
|---|---|
| <html> container | Manages the display viewport and automated refreshing. |
| <body> object | Frames the content area. |
| CSS style sheet or <style> tags | Adds styling, matte masking and coloring to any of the objects in the view. |
| <img> tags | Supports the display of rendered pictures and photographs. |
| <video> tag | Supports the playback of multiple video segments. |
| <svg> container | Scalable Vector Graphic objects support vector drawing operations. |
| JavaScript | Dynamically alters the appearance, position and content of any object. |
| VTT timed text tracks | Provides synchronization between moving video and objects in the frame using JavaScript triggers. |
| JavaScript libraries | Various open-source libraries are available for plotting graphs or other more sophisticated drawing tasks. |
Ecological Issues
AI-driven rendering using Latent Diffusion techniques is popular but in computational terms it is very expensive and uses a great deal of energy. Much more than a simple search engine query and considerably more than a 3D rendering.
Consider whether Generative AI is the best solution or whether something more ecologically sound is better. A 3D render pipeline would be a more efficient solution for many applications. It can still be steered with AI input.
Some Consequences And Risks
Search engines such as Google are getting smarter at detecting AI generated content. This is an evolving landscape. Generative AI systems will improve to make content that more closely approximates human created content but search engines will also become more adept at detecting it. It is an arms-race.
There are implications for your Search Engine Optimization (SEO) rankings if you upload low-quality machine generated content. There is no substitute for human intervention at the Quality Assurance stage. Review the output for accuracy and visual quality before making it widely available.
There are also serious issues relating to copyright and moral rights to use content. Since AI generated material is based on crawling the Internet for its source material, the provenance of what it has found is uncertain. It is highly likely that some proprietary and copyrighted material will find its way into your AI training dataset. You must be certain that you are not inadvertently infringing a copyright as the penalties may be severe. This is especially important if you are using AI generated material commercially. There may be recognisable aspects of someone else’s work included in your output, so you must constrain the data foraging to only ingest licensed material.
Running your own closed system with properly licensed source material or content you have created yourself may be the optimum way forward. This protects you from copyright problems and ensures your content will have a unique look that is unlike everyone else’s.
Conclusion
AI systems depend on huge development efforts to build and maintain. There are some free to use trials but longer term, this is designed to be a huge revenue generating investment by the companies building the AI systems. If your product depends on AI, build some costings into your business plan to avoid nasty financial surprises later on. Current fees range from 30 to 80 dollars a month for frequent users.
Some of the selling points for AI generated video are based on eliminating employment costs from the production process. There is a moral dilemma here about making content without hiring actors, equipment or video editing experts. Public opinion may turn against you causing reputational damage if you deprive people of their livelihood. That could cost more than you save.
Whilst we might worry about a dystopian future, AI is not likely to entirely replace human creativity with machine generated content. It is useful as a tool for generating ideas at the start of a project. It provides some leverage just like any other creative tool. Continue to manually refine and adjust that generated content to arrive at a finished product.
You might also like...
Production–Delivery Convergence: Part 6 - Designing Experiences That Viewers Trust
Performance reliability is an invisible contract between a streaming service and its customer, and it is fundamental to guaranteeing viewer retention. The problem is that performance isn’t just about delivery. Here we identify where to look and why it’s c…
SMPTE Education Launches Summer 2026 Lineup Of IP And ST 2110 Courses
Boasting two standalone courses, an intensive boot camp, and a hands-on practical lab, SMPTE Education has launched its summer 2026 Lineup of IP and ST 2110 Courses.
Virtual Production For Broadcast: Principles, Terminology & Technology
The technology and techniques of virtual production, from the camera back through the video wall, processors, and rendering servers.
Standards: Video - Advanced Video Coding (AVC)
AVC remains one of the most widely deployed video codecs in the world, but navigating its profiles, levels and signaling mechanisms is far from straightforward.
Live Sports & Monetization: Public Service Broadcasters Maximizing Live Sports Opportunities
PSBs across the world are making the most of limited resources to enrich live sports coverage around ancillary content and platforms, and monetizing the resulting services. Here we focus on the content and coverage rather than technical issues around workflow…