Predicting Editing Performance From Games And Benchmarks: Part 1

In the good old days when you were thinking about upgrading your computer you began by reading printed reviews such as those published by Byte magazine. These reviews usually included industry standard benchmarks. Now, of course, you are far more likely to watch internet video reviews.

When I view these on-line reviews it’s clear the majority have been written by “gamers.” These reviews will have titles such as the “DeathAxe Laptop Reviewed Using 23 Games.”

A gaming computer reads compressed artificial world descriptions from a disk file. This artificial world is regenerated by the CPU and loaded into the GPU where it is displayed to the gamer. The gamer’s actions are fed back to the GPU which dynamically modifies the artificial world it displays.

This is a different process than video playback from a content creator’s system. Here a continuous sequence of frames is rapidly pulled from a disk, decompressed by the CPU, sent to GPU buffers where mathematical operations, such as those for color correction, are applied to the buffer - and then displayed for a precise interval. This process is repeated for the next frame.

When a content creator’s computer is transcoding media the workflow is also different. Transcoding involves reading compressed video frames from a disk, each of which is then decompressed and recompressed, with the recompressed frame written back to a disk.

There is one thing that gaming and content creation do have in common - both involve long task execution times during which heat is generated by the CPU and GPU. Even when a laptop has a good cooling system (see Figure 1), playing games for hours or working with high-resolution compressed video - often with complex effects - can push a computer toward, or into, thermal throttling. So, while gaming and content creation tasks do not employ the same compute intensive processes, they both can generate high thermal loads.

Figure 1: Laptop Cooling System.

Figure 1: Laptop Cooling System.

Understanding this, we want to know whether game play performance or industry standard benchmark performance has a higher correlation with content creation performance.

Let’s, for the time being, ignore measuring content creation performance. At this point, we need only a measure of game play performance and a benchmark performance measure. (When data were being collected, each laptop was powered by its charger from mains power.)

Both the gaming and Geekbench tests ran for well over a minute thus allowing heat to build as it would during editing and color-correction. Laptop display resolution, at 1920x1080, remained constant during testing.

We will need to measure performance on multiple computers because we need multiple data points for the measure of game play performance and multiple data points for the measure of benchmark performance.

Before describing the measures I took, you need descriptions of the computers. In the first phase of this exploration I had access to four systems. Thankfully these systems offered a wide performance range.

Number 1.  Samsung Galaxy TabPro S Windows 10 Tablet: Intel m3-6Y30 CPU/GPU (similar to an i3); 4GB RAM; 2GB Video memory; 256GB SSD. (See Figure 2.)

Figure 2: Tiny Samsung Galaxy Windows Tablet Editing 1080p24 using DaVinci Resolve.

Figure 2: Tiny Samsung Galaxy Windows Tablet Editing 1080p24 using DaVinci Resolve.

Number 2.  Lenovo Ideacenter Y910-27:  Intel i7-6700 (3.4-4.0GHz); 16GB DDR4-2133; NVIDIA GTX 1080, 8GB GDDR5 memory; 128GB M.2 NVMe PCIe SSD. (See Figure 3.)

Number 3.  Gigabyte 15 OLED Laptop:  Intel i7-9750H (2.6-4.5GHz); 16GB DDR4-2666 dual-channel RAM; NVIDIA GTX 1660Ti 8GB GDDR6 memory; 512GB M.2 NVMe PCIe SSD. (See Figure 4.)

Number 4.  HP Omen 15: Intel i7-9750H (2.6-4.5GHz); 16GB 2666MHz DDR4 dual-channel RAM; NVIDIA GTX 1660Ti 6GB GDDR6 memory; 512GB M.2 NVMe PCIe SSD; 32GB Intel Optane memory. The Intel Optane RAM option increases system performance. (See Figure 5.)

My Experiment

The first phase of my two-part exploration I call an “experiment” because this phase involves a null hypothesis. My null hypothesis is “there is no difference between a measure of game performance and a measure of benchmark performance.”

Why might this be my finding? Perhaps my free game, War Thunder, did not stress my computers as much as would an expensive game owned by a gamer.

To maximize the load put on a computer, I set the game’s visual quality to Movie which is the maximum possible. See Figure 6.

Figure 6: War Thunder Test Settings.

Figure 6: War Thunder Test Settings.

Were data from my experiment to not allow me to reject the null hypothesis, I would be unable to say anything about the experiment. Since you are reading this, we know I was able to reject the null hypothesis. Figure 7 presents Geekbench 4 performance. (AERO Geekbench 5 Multi-core performance is 5486.)

Figure 7: Geekbench 4 Single-core and Multi-core Score.

Figure 7: Geekbench 4 Single-core and Multi-core Score.

How did I determine there was a difference in performance? As shown by the following example, I merely “looked” at data plots of the two measures.

To better understand how this would work, I created the three sets of data shown by Figure 8. The orange plot shows a linear correlation between two variables: a Y-axis variable (e.g., heating BTU estimate) and an X-axis variable (e.g., number of office windows). The blue plot shows a linear correlation between a Y-axis variable (cooling BTU estimate) and the same X-axis variable (windows).

Figure 8: Three Sets of Created Data.

Figure 8: Three Sets of Created Data.

We can see that although the orange and blue data values are different (2.4 to 7.8 and 3.4 to 8.3), their plots do have the same slopes (correlations).

The green plot, however, shows a non-linear correlation between a Y-axis variable (heating/cooling power costs) and an X-axis variable—number of windows. (When bundling heating and cooling together, the power company provides a “consumption discount.”)

An Excel logarithmic trendline (dashed red) has been overlaid on the green curve. Looking at this logarithmic trendline and the two linear data plots, we can see the nature of the correlations is different.

Experiment Results

Figure 9 presents a histogram of Geekbench 4 multi-core performance generated by the four computers. (A histogram is the correct way to plot discrete data.) A linear trendline has been overlaid on these data. The Coefficient of Determination, r2 provides an estimate of the correlation (0.0 to 1.00) between the test data and the trendline.

Figure 9: Geekbench 4 Multi-core Performance.

Figure 9: Geekbench 4 Multi-core Performance.

Figure 10 presents game performance, in frames-per-second, from these four computers. Each computer’s data point is the average of the automatic playback of three benchmark battles: Pacific War (morning), Battle of Berlin, and Tank Battle. A logarithmic trendline has been overlaid on these data. Again, r2 provides an estimate of the correlation between the test data and the trendline.

Figure 10: War Thunder Game Performance.

Figure 10: War Thunder Game Performance.

Figures 11 and 12 present these same data using connected data points so the shape of their plots is rendered more clearly.

Figure 11: Geekbench 4 Multi-core Performance.

Figure 11: Geekbench 4 Multi-core Performance.

Figure 12: War Thunder Performance.

Figure 12: War Thunder Performance.

To make the plot shapes more comparable, Figure 13 shows the two measures superimposed.

Figure 13: Multi-core Geekbench Performance verses Web Thunder Performance.

Figure 13: Multi-core Geekbench Performance verses Web Thunder Performance.

In the second of this two-part exploration, we will compare these two performance measures with performance data collected when editing with DaVinci Resolve. We should then see whether the game play curve or the multi-core benchmark curve is a better match to content creation performance.

You might also like...

Creative Analysis: Part 11 - Cinematographer Stephen Whitehead On An Elephant’s Journey

There’s a famous saying about working with children and animals. During production of An Elephant’s Journey, cinematographer Stephen Whitehead would encounter both, and face the challenge of depicting the vast African landscape in a manner befitting a story f…

HDR: Part 16 - Creative Technology - LED Vs HMI

Big movies still demand big setups, no matter what anyone tells you about the battery-powered light they’re trying to sell. Battery-powered lights are wonderful, of course, even if we only use the battery power for long enough to walk a…

Creative Analysis: Part 10 - Cinematographer John Brawley On The Great

Cinematographer John Brawley finds himself happily amidst of an unprecedented renaissance of high-end television. The Great is a production that presents a lavish (if fictionalised) spectacle of eighteenth-century Russia, with Brawley photographing five episodes, with the remainder shot by Maja…

HDR: Part 15 - Using Vintage Stills Lenses For Digital Cinematography

In the mid-70s, Canon released the K35 series of primes, based on its then top-of-the-line FD mount stills lenses. It wasn’t the first or last time a set of glass elements designed for stills had been repackaged for m…

Creative Analysis: Part 9 - Cinematographer Cathal Watters On Alienist: Angel Of Darkness

Recreating a period New York in Hungary might seem an unreasonable challenge until it becomes clear that the country has become a hub for international production with at least two large-scale backlots for just that purpose. In the summer of 2019,…