One of the creative advantages of virtual production for performers is seeing the virtual environment in which they are performing. Using motion capture techniques extends this into capturing the motion of performers to drive CGI characters. New technologies are rapidly transforming the creative freedom this brings.
All articles are also available individually:
As we saw in a previous piece on camera tracking, it’s not inevitable that virtual production need involve tracking technology at all – it’s quite possible to use the LED video wall in the same way as back projection was used for decades. Motion capture is normally used as a post-production process for visual effects work, although some particularly advanced setups have used it to animate a virtual character in real time, so that an actor – and thereby the character – can react in real time to the live action scene.
Figures Of Merit
Some of the same technology which is used to capture camera position can also be used to track people, although those tasks can have sufficiently different requirements that separate systems are used to track cameras and performers. For a three-dimensional scene to be displayed on an LED wall with proper perspective, the camera position relative to the wall must be known with good accuracy. Tracking a person, meanwhile, can sometimes accept small errors so long as the overall effect is convincing.
Evaluating motion tracking technologies for any particular application requires some knowledge of the underlying principles and the limits of various technologies.
The most familiar camera-based optical capture system is an outside-in configuration, with cameras surrounding the action and observing passive, reflective markers on the performer. This configuration can offer a large working volume, with the option to trade off accuracy and volume by altering the location of the witness cameras. Placing cameras to cover a larger space allows more room for the performance, but may reduce accuracy when the performer is far from the cameras.
Inside-out systems place a witness camera on the taking camera which observes markers in the environment. These systems are often recognisable by the scattering of reflective dots or circular barcodes in the ceiling of the studio. This arrangement allows them to cover large areas, but they are usually made to locate one single point per witness camera. Systems of this type are often used to track several cameras in a broadcast studio, but they are generally not capable of tracking the multiple locations on a human figure that would be required to recreate a performance.
Inertial systems measure position by sensing acceleration and deceleration over time. Like the inertial navigation system on an aircraft, they may be subject to some degree of drift over time. Similar inertial reference systems are sometimes built into modern lenses to report approximate camera position for later visual effects work. The compensating advantage is that these systems can work over a large area, often limited only by the range of a radio data link between the performer and a base station. An optical system can only operate in the area covered by a sufficient number of witness cameras.
Similar benefits attend mechanical motion capture devices. Mechanical systems detect the position of the performer’s joints using potentiometers or optical encoders. The approach is often combined with other techniques, particularly optical or inertial, which allow the device to establish its overall position in space. Still other technologies, particularly such as those based on magnetic field sensing, may have a capture volume strictly limited by the physical structure of the device. Because magnetic fields pass through many objects, they can locate all of their tracking markers at all times, regardless the position of the performer. Some active-marker systems, which rely on the performer wearing markers which might also rely on a fixed frame to detect the position of those markers limiting space.
Finally, markerless motion capture systems are often based on machine learning (which is not necessarily the same thing as AI). Markerless systems can derive motion capture data from something as simple as a video image of the performance, ideally with reasonable lighting creating a clear view of the performer. At the time of writing (late Spring 2023), the results of these systems were generally not as precise as those using more conventional approaches, although machine learning is a rapidly-developing field and improvements are widely anticipated.
Motion capture as a technique for post production visual effects can produce highly realistic results which contribute significantly to the believability of an effect. It can also work quickly, potentially avoiding the hours of exacting work involved in animating something by hand. Actors appreciate the process because the captured motion reflects all the subtlety of a real performance, although sometimes, motion capture may be performed by a stand-in or stunt specialist.
Recording the finest details of motion is also one of the downsides. Where motion capture data must be recorded and potentially modified, it quickly becomes clear that is difficult to edit the unprocessed data. In conventional animation, the motion of an object between two positions is usually described using only those two positions – waypoints – which are separated in time. Changing the speed of the object’s motion simply means reducing the time it takes to move between the two points.
Motion capture data records a large number of waypoints representing the exact position of an object at discrete intervals. It’s often recommended that motion data should be captured at least twice as frequently as the frame rate of the final project, so that a 24fps cinema project should capture at least 48 times per second. That’s well within the capabilities of most systems, but it does complicate the process of editing motion data. It’s impractical to manually alter dozens of recorded positions per second and achieve a result that looks realistic.
Tools have been developed to facilitate motion capture data editing. Some of them rely on modifying groups of recorded positions using various proportional editing tools; a sort of warping. Others try to reduce the number of recorded positions, often by finding sequences of them which can be closely approximated with a mathematical curve. This can make motion capture data more editable, but too aggressive a reduction of points can also rob it of the realism of a live performance, risking a more mechanical, artificial look which is exactly what motion capture is intended to avoid.
Often, motion capture used where a performer is working live alongside a virtual production stage won’t be recorded, so there won’t be any need or opportunity to edit it. Other problems, such as intermittent failures to recognise tracking markers, might cause glitches in positioning that might usually be edited out. Working live, a retake might be necessary, although well-configured systems are surprisingly resistant to – for instance – markers being obscured by parts of the performer’s body.
Rigging And Scale
Connecting motion capture data to a virtual character, requires that character model to be designed and rigged for animation. Where the character is substantially humanoid, this may not present too many conceptual problems, although the varying proportions of different people can still sometimes cause awkwardness when there’s a mismatch between the physique of the performer and the virtual character concerned.
Very often, the character will be one which looks something other than human. It may be on a substantially different shape, scale or even configuration of limbs to the human performer whose movements will drive the virtual character. Various software offers different solutions to these considerations, allowing the performer’s motions to be scaled, remapped and generally altered to suit the animated character, although this has limits. Although motion capture technicians will typically strive to avoid imposing requirements on the performer, the performer might need to spend time working out how to perform in a manner which suits the virtual character. This approach which can make a wide variety of virtual characters possible.
On Set With Motion Capture
Most motion capture systems require at least some calibration, which might be as simple as moving around the capture volume with a specially-designed test target. Some of the most common systems, using spherical reflective markers, may require some calibration for each performer, especially if the performer removes or disturbs the markers. Many virtual production setups rely on motion tracking to locate the camera, even when motion capture is not being used to animate a virtual character. As such, almost any virtual production stage might rely on at least some calibration work, though there is often some variability in how often this is done; performance capture spaces might do so twice daily, requiring a few minutes each time.
As with many of the technologies associated with virtual production, motion capture, where it’s used, is likely to be the responsibility of a team provided by the studio itself. Most of the work required of the production will be associated with the design of the virtual character which will be controlled with motion capture. The technical work of connecting that character’s motion to the capture system is an item of preparation to be carefully planned and tested before the day. With those requirements fulfilled, using an actor’s performance to control a virtual character can provide an unprecedented degree of immediacy. While it certainly adds another layer of technology to the already very technology-dependent environment of virtual production, it creates a level of interactivity which was never possible with post production VFX.
You might also like...
Huw Bevan is an Executive Producer, Consultant and Head of Cricket for Sunset+Vine, in London, one of the UK’s leading independent sports production companies that produces a full slate of rugby, soccer and cricket events each year. This…
How adding PTP to asynchronous IP networks provides a synchronization layer that maintains fluidity of motion and distortion free sound in the audio domain.
This article describes the various codecs in common use and their symbiotic relationship to the media container files which are essential when it comes to packaging the resulting content for storage or delivery.
This list of file container formats and their extensions is not exhaustive but it does describe the important ones whose standards are in everyday use in a broadcasting environment.
The Bathurst 1000 is a massive production by anybody’s standards, with 175 cameras, 10 OB’s, 250 crew and 31 miles of fiber cable. Here is how the team at Gravity Media Australia pull it off.