Predicting Editing Performance From Games And Benchmarks: Part 1

In the good old days when you were thinking about upgrading your computer you began by reading printed reviews such as those published by Byte magazine. These reviews usually included industry standard benchmarks. Now, of course, you are far more likely to watch internet video reviews.

When I view these on-line reviews it’s clear the majority have been written by “gamers.” These reviews will have titles such as the “DeathAxe Laptop Reviewed Using 23 Games.”

A gaming computer reads compressed artificial world descriptions from a disk file. This artificial world is regenerated by the CPU and loaded into the GPU where it is displayed to the gamer. The gamer’s actions are fed back to the GPU which dynamically modifies the artificial world it displays.

This is a different process than video playback from a content creator’s system. Here a continuous sequence of frames is rapidly pulled from a disk, decompressed by the CPU, sent to GPU buffers where mathematical operations, such as those for color correction, are applied to the buffer - and then displayed for a precise interval. This process is repeated for the next frame.

When a content creator’s computer is transcoding media the workflow is also different. Transcoding involves reading compressed video frames from a disk, each of which is then decompressed and recompressed, with the recompressed frame written back to a disk.

There is one thing that gaming and content creation do have in common - both involve long task execution times during which heat is generated by the CPU and GPU. Even when a laptop has a good cooling system (see Figure 1), playing games for hours or working with high-resolution compressed video - often with complex effects - can push a computer toward, or into, thermal throttling. So, while gaming and content creation tasks do not employ the same compute intensive processes, they both can generate high thermal loads.

Figure 1: Laptop Cooling System.

Figure 1: Laptop Cooling System.

Understanding this, we want to know whether game play performance or industry standard benchmark performance has a higher correlation with content creation performance.

Let’s, for the time being, ignore measuring content creation performance. At this point, we need only a measure of game play performance and a benchmark performance measure. (When data were being collected, each laptop was powered by its charger from mains power.)

Both the gaming and Geekbench tests ran for well over a minute thus allowing heat to build as it would during editing and color-correction. Laptop display resolution, at 1920x1080, remained constant during testing.

We will need to measure performance on multiple computers because we need multiple data points for the measure of game play performance and multiple data points for the measure of benchmark performance.

Before describing the measures I took, you need descriptions of the computers. In the first phase of this exploration I had access to four systems. Thankfully these systems offered a wide performance range.

Number 1.  Samsung Galaxy TabPro S Windows 10 Tablet: Intel m3-6Y30 CPU/GPU (similar to an i3); 4GB RAM; 2GB Video memory; 256GB SSD. (See Figure 2.)

Figure 2: Tiny Samsung Galaxy Windows Tablet Editing 1080p24 using DaVinci Resolve.

Figure 2: Tiny Samsung Galaxy Windows Tablet Editing 1080p24 using DaVinci Resolve.

Number 2.  Lenovo Ideacenter Y910-27:  Intel i7-6700 (3.4-4.0GHz); 16GB DDR4-2133; NVIDIA GTX 1080, 8GB GDDR5 memory; 128GB M.2 NVMe PCIe SSD. (See Figure 3.)

Number 3.  Gigabyte 15 OLED Laptop:  Intel i7-9750H (2.6-4.5GHz); 16GB DDR4-2666 dual-channel RAM; NVIDIA GTX 1660Ti 8GB GDDR6 memory; 512GB M.2 NVMe PCIe SSD. (See Figure 4.)

Number 4.  HP Omen 15: Intel i7-9750H (2.6-4.5GHz); 16GB 2666MHz DDR4 dual-channel RAM; NVIDIA GTX 1660Ti 6GB GDDR6 memory; 512GB M.2 NVMe PCIe SSD; 32GB Intel Optane memory. The Intel Optane RAM option increases system performance. (See Figure 5.)

My Experiment

The first phase of my two-part exploration I call an “experiment” because this phase involves a null hypothesis. My null hypothesis is “there is no difference between a measure of game performance and a measure of benchmark performance.”

Why might this be my finding? Perhaps my free game, War Thunder, did not stress my computers as much as would an expensive game owned by a gamer.

To maximize the load put on a computer, I set the game’s visual quality to Movie which is the maximum possible. See Figure 6.

Figure 6: War Thunder Test Settings.

Figure 6: War Thunder Test Settings.

Were data from my experiment to not allow me to reject the null hypothesis, I would be unable to say anything about the experiment. Since you are reading this, we know I was able to reject the null hypothesis. Figure 7 presents Geekbench 4 performance. (AERO Geekbench 5 Multi-core performance is 5486.)

Figure 7: Geekbench 4 Single-core and Multi-core Score.

Figure 7: Geekbench 4 Single-core and Multi-core Score.

How did I determine there was a difference in performance? As shown by the following example, I merely “looked” at data plots of the two measures.

To better understand how this would work, I created the three sets of data shown by Figure 8. The orange plot shows a linear correlation between two variables: a Y-axis variable (e.g., heating BTU estimate) and an X-axis variable (e.g., number of office windows). The blue plot shows a linear correlation between a Y-axis variable (cooling BTU estimate) and the same X-axis variable (windows).

Figure 8: Three Sets of Created Data.

Figure 8: Three Sets of Created Data.

We can see that although the orange and blue data values are different (2.4 to 7.8 and 3.4 to 8.3), their plots do have the same slopes (correlations).

The green plot, however, shows a non-linear correlation between a Y-axis variable (heating/cooling power costs) and an X-axis variable—number of windows. (When bundling heating and cooling together, the power company provides a “consumption discount.”)

An Excel logarithmic trendline (dashed red) has been overlaid on the green curve. Looking at this logarithmic trendline and the two linear data plots, we can see the nature of the correlations is different.

Experiment Results

Figure 9 presents a histogram of Geekbench 4 multi-core performance generated by the four computers. (A histogram is the correct way to plot discrete data.) A linear trendline has been overlaid on these data. The Coefficient of Determination, r2 provides an estimate of the correlation (0.0 to 1.00) between the test data and the trendline.

Figure 9: Geekbench 4 Multi-core Performance.

Figure 9: Geekbench 4 Multi-core Performance.

Figure 10 presents game performance, in frames-per-second, from these four computers. Each computer’s data point is the average of the automatic playback of three benchmark battles: Pacific War (morning), Battle of Berlin, and Tank Battle. A logarithmic trendline has been overlaid on these data. Again, r2 provides an estimate of the correlation between the test data and the trendline.

Figure 10: War Thunder Game Performance.

Figure 10: War Thunder Game Performance.

Figures 11 and 12 present these same data using connected data points so the shape of their plots is rendered more clearly.

Figure 11: Geekbench 4 Multi-core Performance.

Figure 11: Geekbench 4 Multi-core Performance.

Figure 12: War Thunder Performance.

Figure 12: War Thunder Performance.

To make the plot shapes more comparable, Figure 13 shows the two measures superimposed.

Figure 13: Multi-core Geekbench Performance verses Web Thunder Performance.

Figure 13: Multi-core Geekbench Performance verses Web Thunder Performance.

In the second of this two-part exploration, we will compare these two performance measures with performance data collected when editing with DaVinci Resolve. We should then see whether the game play curve or the multi-core benchmark curve is a better match to content creation performance.

Let us know what you think…

Log-in or Register for free to post comments…

You might also like...

Real Film Grain For Video

People have been making pictures for both the big and small screens for almost a century. In an industry with a history that long, it’s no surprise that the perpetual search for something new has long been tempered by a…

For DOPs: The Peril Of Larger, Brighter TVs

Each year, as the TVs in our homes grow larger and brighter, DOPs have to wonder how this will affect our craft and the integrity of our images. As it is, HDR is touted as a kind of industry panacea,…

What You Need To Know About Thermal Throttling

With 6K acquisition becoming more common, you may be considering getting ahead of the editing curve by upgrading your computer system. Likely you’ll want a hot system based upon one of the new AMD or Intel 6- or 8-core m…

What Does PCI 4.0 Offer?

When, in May 2019, AMD announced their Ryzen Zen 2 architecture, beyond the amazing performance offered by the new Series 3000 microprocessors, they announced the new chips would support PCI 4.0. Although I was pretty confident the step from 3.0 to 4.0 meant 2X greater bandwidth,…

Shooting HDR: A More Efficient Way

As High Dynamic Range (HDR) and Wide Color Gamut (i.e.BT.2020) are increasingly mandated by major industry players like Netflix and Amazon, DOPs in the broadcast realm are under intense pressure to get it right during original image capture.…