Apple’s M1 ARM For Broadcast Infrastructure Applications: Part 1

Apple’s M1-based MacBook Air, MacBook Pro, and Mac Mini have been the focus of computer news for the last half-year because of their surprisingly high-performance.

Review benchmarks comparing a PowerBook Pro M1 to a comparable Dell XPS 13 laptop with the latest Intel 11^th generation Tiger Lake i7 CPU plus an Intel Iris Xe GPU, show the PowerBook blowing away the XPS.

Geekbench 5 single- and multi-thread CPU performance scores were 1405 and 3729 for the XPS while they were 1727 and 7558 for the PowerBook. Geekbench 5 GPU scores were 17,554 (Dell) and 21,789 (Apple).

The usual explanation for the new Apple product’s superior performance is Apple’s replacement of Intel x86 CPUs with an Apple developed ARM chip. Performance is bumped-up because ARM chips are simply faster than x86 chips.

While this is part of an explanation for Apple’s 2020 Holiday surprise, it misses much of the real M1 story.

The Real Story

We are familiar with “transaction computing” because that’s how we work with our desktop and laptop computers. When we need information, we request it with a click, keypress, or touch. Our computer responds by obtaining what we requested from local storage or from a remote system. When our computer obtains the data and displays it, the transaction is complete.

From our computer’s viewpoint, the transaction requires a burst of computing power—typically for only one or two seconds. Upon transaction initiation, the CPU boosts its clock speed to maximum. During the transaction, the fast clock-speed causes the chip to draw high-current which in-turn causes the chip to get progressively hotter.

Figure 1: Thermal throttling—clock-speed control based on chip temperature.

From this point onward it is a contest. If the transaction completes before the CPU chip becomes “too hot” the clock-speed is ultimately reduced to idle. See Figure 1.

Were the transaction to continue, first fan-speed would increase to maximum in an attempt to dissipate system heat. If this action is not adequate, the CPU chip “thermal throttles” to save itself. When a chip throttles, clock-speed is decreased which lets it run cooler. (What You Need To Know About Thermal Throttling).

This type of operation is an example of “episodic computing.” High-performance is required in bursts. It is very different from “continuous computing” where high-performance is required for long periods. Video gaming is an obvious example.

In the commercial world, applications such as those within the broadcast infrastructure often have the same requirements. Equipment monitoring and control is one example. Another example: time critical video editing where there simply isn’t time to transcode source media to a proxy codec. The system has to be able to play source footage, edit, and rapidly export to a broadcast codec.

To obtain greater performance than possible from a CPU, a high-performance graphics board is usually installed in a system. These discrete GPUs generate so much heat they are equipped with multiple cooling fans. See Figure 2.

Figure 2: Gigabyte discrete GPU.

When PCs are used for gaming and other continuous process applications, problematic heat generation may be kept under control by utilizing exotic, and unfortunately not without risk, water cooling systems. Heat is the enemy of high-performance computing. See Figure 3.

What Is ARM?

Our experience, for the most part, is likely with computers based upon Intel chips. Of course, some folks in the last several years have moved to systems employing AMD chips. This year, it is ARM-based chips that are challenging Intel and AMD processors. But what is ARM?

ARM processors were developed by the U.K.’s Acorn Computers starting in the mid-80s. In 1990, Acorn spun-off their CPU development group as ADVANCED RISC MACHINE LTD. Apple and VLSI were partners in the new ARM company. In 1998, the company shortened its name to arm. (Apple has a perpetual ARM license; and currently, NVIDIA is in talks to purchase arm).

Figure 3: Complex water-cooling system that can be extended to GPU board.

Arm sells its processor design to many companies including Amazon (Chromebook), Apple (M1 computers), and Microsoft (Surface Pro X). Figure 4 shows the Surface Pro X which employs an 8-core, 3.1GHz Qualcomm SQ 2 ARM processor.

ARM chips employ a RISC (Reduced Instruction Set Computer) architecture while x86 chips employ a CISC (Complex Instruction Set Computer) architecture. The difference is a single CISC variable-length instruction can perform multiple operations. Many clock cycles are required to execute a CISC instruction.

Therefore, to obtain very high performance a CPU’s clock-rate must be pushed to its maximum. The high clock-rate draws maximum current which causes the chip to become very hot.

CISC is efficient only when compilers are able to generate instructions that combine multiple operations. If this is not the case, many clock cycles are wasted.

RISC chips utilize a relatively small set of instructions. Each instruction performs a single operation. Only a one clock cycle is required to execute an instruction. Therefore, to obtain high-performance, the clock-rate need not be super high.

Figure 4: Microsoft Surface Pro X

You might be surprised to learn the M1 chip is not Apple’s first use of RISC. Back in the day, Apple Power Macs employed IBM/Apple/Motorola PowerPC chips which were RISC chips. (Apple invited Motorola to join it and IBM because Motorola had long experience building chips—it was the USAs `80 version of Taiwan’s TMSC which fabricates the M1. And, of course, Motorola built the CISC 32-bit 68000 chip used by the Apple Macintosh.

The key take away is that RISC chips draw much less current and so generate far less heat which in turn inherently makes any cooling system far less elaborate. Phones, tablets, and the Chromebook, Microsoft Surface Book, and Apple’s M1 MacBook Air have no fans.

What’s important to know is that ARM is a design architecture which each licensee can implement to meet their own product requirements. Apple’s M1 chip has been designed by Apple for very high-performance. The M1 chip is manufactured by TMSC using 5nm technology. TMSC’s ability to build 5nm chips is a huge advantage for Apple. The finer a chip’s design process, the less heat it generates. Figure 5 pictures an Apple PowerBook Pro M1.

Figure 5: Apple PowerBook Pro M1.

Monster ARM

If you have the feeling that ARM chips are only for use in low-power laptops, take a look at what is now the worlds fastest supercomputer built by Fujitsu (Figure 6). The FUGAKU has 7,630,848 cores and delivers 442 petaflops performance. The supercomputer employs almost 160,000 Fujitsu A64FZ ARM chips. An A64FZ chip has 50 CPUs cores plus 50 GPUs cores. It supports an ARM floating-point vector processing instruction set.

Developing this kind of computer where programs often need to run days in order to find a problem solution is an extreme example of continuous computing. Of course, hardware is only part of the story. Both a multi-core OS and a compiler must be written for a new system. And, user’s programs must be recompiled to run on ARM chips.

Figure 6: Fujitsu FUGAKU Supercomputer.

Apple has avoided the immediate need for application recompiles by developing Rosetta 2. Rosetta 2 software translates x86 code to ARM instructions before an application begins execution.

After an application, such as Photoshop, has been recompiled, at load time an automatic choice will be made: load x86 or ARM code, depending on the system’s CPU.

PC users may well dismiss Apple’s ARM solutions because most applications used within broadcast infrastructure are built on Windows.

However, because Microsoft is marketing its own ARM-based laptop—the Surface Pro X—they have released an ARM-based version of Windows 10.

Figure 7: Video editing on an emulated computer.

Under the Parallels Desktop 16 for M1 software (Parallels for M1), Windows 10 (Windows 10 for ARM64) can be run. Therefore, Windows x86 applications can be run on Apple’s new ARM computers.

It took only a few minutes to have Windows 10 running on my M1 PowerBook Pro. You would expect an emulated x86 to run very slowly. Amazingly, Geekbench 5 single- and multi-core scores from the emulated x86 were 1540 and 5107 compared to the PowerBook Pro M1’s scores of 1748 and 7678—only about 30-percent slower.

No sane person would try to edit video on an emulated computer. So, of course, I used the https://www.lwks.com/ to edit Sony FS5 4K h.264 footage. After generating SD proxy files, editing was very smooth, and the HD export to YouTube went as expected. See Figure 7.

In part 2 I look at how the M1 achieves the performance necessary for broadcast applications, including its innovative AI solution.

You might also like...