Virtual Production For Broadcast: Motion Capture

One of the creative advantages of virtual production for performers is seeing the virtual environment in which they are performing. Using motion capture techniques extends this into capturing the motion of performers to drive CGI characters. New technologies are rapidly transforming the creative freedom this brings.

As we saw in a previous piece on camera tracking, it’s not inevitable that virtual production need involve tracking technology at all – it’s quite possible to use the LED video wall in the same way as back projection was used for decades. Motion capture is normally used as a post-production process for visual effects work, although some particularly advanced setups have used it to animate a virtual character in real time, so that an actor – and thereby the character – can react in real time to the live action scene.

Figures Of Merit

Some of the same technology which is used to capture camera position can also be used to track people, although those tasks can have sufficiently different requirements that separate systems are used to track cameras and performers. For a three-dimensional scene to be displayed on an LED wall with proper perspective, the camera position relative to the wall must be known with good accuracy. Tracking a person, meanwhile, can sometimes accept small errors so long as the overall effect is convincing.

Evaluating motion tracking technologies for any particular application requires some knowledge of the underlying principles and the limits of various technologies.


The most familiar camera-based optical capture system is an outside-in configuration, with cameras surrounding the action and observing passive, reflective markers on the performer. This configuration can offer a large working volume, with the option to trade off accuracy and volume by altering the location of the witness cameras. Placing cameras to cover a larger space allows more room for the performance, but may reduce accuracy when the performer is far from the cameras.

Inside-out systems place a witness camera on the taking camera which observes markers in the environment. These systems are often recognisable by the scattering of reflective dots or circular barcodes in the ceiling of the studio. This arrangement allows them to cover large areas, but they are usually made to locate one single point per witness camera. Systems of this type are often used to track several cameras in a broadcast studio, but they are generally not capable of tracking the multiple locations on a human figure that would be required to recreate a performance.

Inertial systems measure position by sensing acceleration and deceleration over time. Like the inertial navigation system on an aircraft, they may be subject to some degree of drift over time. Similar inertial reference systems are sometimes built into modern lenses to report approximate camera position for later visual effects work. The compensating advantage is that these systems can work over a large area, often limited only by the range of a radio data link between the performer and a base station. An optical system can only operate in the area covered by a sufficient number of witness cameras.

Similar benefits attend mechanical motion capture devices. Mechanical systems detect the position of the performer’s joints using potentiometers or optical encoders. The approach is often combined with other techniques, particularly optical or inertial, which allow the device to establish its overall position in space. Still other technologies, particularly such as those based on magnetic field sensing, may have a capture volume strictly limited by the physical structure of the device. Because magnetic fields pass through many objects, they can locate all of their tracking markers at all times, regardless the position of the performer. Some active-marker systems, which rely on the performer wearing markers which might also rely on a fixed frame to detect the position of those markers limiting space.

Finally, markerless motion capture systems are often based on machine learning (which is not necessarily the same thing as AI). Markerless systems can derive motion capture data from something as simple as a video image of the performance, ideally with reasonable lighting creating a clear view of the performer. At the time of writing (late Spring 2023), the results of these systems were generally not as precise as those using more conventional approaches, although machine learning is a rapidly-developing field and improvements are widely anticipated.


Motion capture as a technique for post production visual effects can produce highly realistic results which contribute significantly to the believability of an effect. It can also work quickly, potentially avoiding the hours of exacting work involved in animating something by hand. Actors appreciate the process because the captured motion reflects all the subtlety of a real performance, although sometimes, motion capture may be performed by a stand-in or stunt specialist. 

Recording the finest details of motion is also one of the downsides. Where motion capture data must be recorded and potentially modified, it quickly becomes clear that is difficult to edit the unprocessed data. In conventional animation, the motion of an object between two positions is usually described using only those two positions – waypoints – which are separated in time. Changing the speed of the object’s motion simply means reducing the time it takes to move between the two points.

Motion capture data records a large number of waypoints representing the exact position of an object at discrete intervals. It’s often recommended that motion data should be captured at least twice as frequently as the frame rate of the final project, so that a 24fps cinema project should capture at least 48 times per second. That’s well within the capabilities of most systems, but it does complicate the process of editing motion data. It’s impractical to manually alter dozens of recorded positions per second and achieve a result that looks realistic.

Tools have been developed to facilitate motion capture data editing. Some of them rely on modifying groups of recorded positions using various proportional editing tools; a sort of warping. Others try to reduce the number of recorded positions, often by finding sequences of them which can be closely approximated with a mathematical curve. This can make motion capture data more editable, but too aggressive a reduction of points can also rob it of the realism of a live performance, risking a more mechanical, artificial look which is exactly what motion capture is intended to avoid.

Often, motion capture used where a performer is working live alongside a virtual production stage won’t be recorded, so there won’t be any need or opportunity to edit it. Other problems, such as intermittent failures to recognise tracking markers, might cause glitches in positioning that might usually be edited out. Working live, a retake might be necessary, although well-configured systems are surprisingly resistant to – for instance – markers being obscured by parts of the performer’s body.

Rigging And Scale

Connecting motion capture data to a virtual character, requires that character model to be designed and rigged for animation. Where the character is substantially humanoid, this may not present too many conceptual problems, although the varying proportions of different people can still sometimes cause awkwardness when there’s a mismatch between the physique of the performer and the virtual character concerned.

Very often, the character will be one which looks something other than human. It may be on a substantially different shape, scale or even configuration of limbs to the human performer whose movements will drive the virtual character. Various software offers different solutions to these considerations, allowing the performer’s motions to be scaled, remapped and generally altered to suit the animated character, although this has limits. Although motion capture technicians will typically strive to avoid imposing requirements on the performer, the performer might need to spend time working out how to perform in a manner which suits the virtual character. This approach which can make a wide variety of virtual characters possible.

On Set With Motion Capture

Most motion capture systems require at least some calibration, which might be as simple as moving around the capture volume with a specially-designed test target. Some of the most common systems, using spherical reflective markers, may require some calibration for each performer, especially if the performer removes or disturbs the markers. Many virtual production setups rely on motion tracking to locate the camera, even when motion capture is not being used to animate a virtual character. As such, almost any virtual production stage might rely on at least some calibration work, though there is often some variability in how often this is done; performance capture spaces might do so twice daily, requiring a few minutes each time.

As with many of the technologies associated with virtual production, motion capture, where it’s used, is likely to be the responsibility of a team provided by the studio itself. Most of the work required of the production will be associated with the design of the virtual character which will be controlled with motion capture. The technical work of connecting that character’s motion to the capture system is an item of preparation to be carefully planned and tested before the day. With those requirements fulfilled, using an actor’s performance to control a virtual character can provide an unprecedented degree of immediacy. While it certainly adds another layer of technology to the already very technology-dependent environment of virtual production, it creates a level of interactivity which was never possible with post production VFX.

You might also like...

The Meaning Of Metadata

Metadata is increasingly used to automate media management, from creation and acquisition to increasingly granular delivery channels and everything in-between. There’s nothing much new about metadata—it predated digital media by decades—but it is poised to become pivotal in …

Designing IP Broadcast Systems: Remote Control

Why mixing video and audio UDP/IP streams alongside time sensitive TCP/IP flows can cause many challenges for remote control applications such as a camera OCP, as the switches may be configured to prioritize the UDP feeds, or vice…

AI In The Content Lifecycle: Part 2 - Gen AI In TV Production

This series takes a practical, no-hype look at the role of Generative AI technology in the various steps in the broadcast content lifecycle – from pre-production through production, post, delivery and in regulation/ethics. Here we examine how Generative AI is b…

Managing Paradigm Change

When disruptive technologies transform how we do things it can be a shock to the system that feels like sudden and sometimes daunting change – managing that change is a little easier when viewed through the lens of modular, incremental, and c…

AI In The Content Lifecycle: Part 1 - Pre-Production

This is the first of a new series taking a practical, no-hype look at the role of Generative AI technology in the various steps in the broadcast content lifecycle – from pre-production through production, post, delivery and in regulation/ethics. Here w…