Apple’s M1 ARM For Broadcast Infrastructure Applications: Part 2

In part 2 of this investigation, we look at why Apple’s new M1 processor benefits broadcasters.

It’s easy to understand why when a PC user learns that over the next two years Apple will stop using Intel CPU chips, they imagine that Apple has simply designed its own x86 replacement chip. Thankfully, that’s not what Apple has done. Figure 1 is a picture of an open Mac Mini.

Figure 1: An open Apple M1Mac Mini.  <br />Figure 2: RAM within an M1 Chip.

Figure 1: An open Apple M1Mac Mini.
Figure 2: RAM within an M1 Chip.

The parts surrounding the silver frame implement the Mini’s non-CPU/GPU functions such as PCIe4, a 256GB/512GB/1TB/2TB NVMe SSD, as well as interface logic supporting two USB4/Thunderbolt 3 buses. (HDMI 2.0 and DisplayPort 1.4 DSC are also available through the computer’s pair of USB-C ports.)

Figure 2 reveals three M1 components: two 4GB or two 8GB RAM chips—which in future could be two 16GB or two 32GB RAM chips. The “computer” itself is the third component.

System on a Chip

The M1 is Apple’s implementation of an ARM computer. Everything, CPU and GPU, resides within the chip. It is a System on a Chip—SoC. See Figures 3, 4, and 5.

Figure 3: M1 SoC Die—1 billion transistors.<br />Figure 4: CT X-ray of M1 SoC.

Figure 3: M1 SoC Die—1 billion transistors.
Figure 4: CT X-ray of M1 SoC.

The M1’s Unified Memory Architecture utilizes a high-speed dual-channel (4,266MT/second) bus that connects the dual LowPowerDDR4X SDRAM chips to the “fabric” which connects all chip elements.

Every computational element has direct access to data held in RAM. For example, there is no need for the CPU to transfer data to and from RAM held on the GPU. Both the GPU and CPU share data in the M1’s RAM. (All RAM is VRAM.) In fact, all SoC components use the closely coupled RAM as a common resource.

Figure 5: M1 architecture.<br />Figure 6: M1 secure enclave.

Figure 5: M1 architecture.
Figure 6: M1 secure enclave.

These components include codecs that support: JPEG, VP8, VP9, h.264, 8-bit h.265 HEVC and 10-bit h.265 HEVC. Media decode is supported for these codecs: VC-1, AVC, and AV1. These codecs offload video decode/encode from both CPU and GPU.

Figure 6 shows the M1’s Secure Enclave that supports secure-boot, authentication, and file-encryption.

The M1 chip has 8 computational cores (Figure 7). 

Figure 7: M1 has 8 computation cores.<br />Figure 8: 4 M1 high-efficiency CPU cores.

Figure 7: M1 has 8 computation cores.
Figure 8: 4 M1 high-efficiency CPU cores.

Of these eight, the M1 has 4 high-efficiency cores that support, for example, background threads from MacOS where low-power operation is more important than high-performance. [Icestorm specifications: shared 4MB L2 cache; 128KB instruction cache (L1I); and a 64KB data cache (L1D); 0.6–2.064 GHz clock-speed].  See Figure 8.

The M1 chip also has 4 high-performance cores that function without hyper-threading which eliminates Microarchitecture Data Sampling Side-channel vulnerabilities. [Firestorm specifications: shared 12MB L2 cache; 192KB instruction cache (L1I); 128KB data cache (L1D); 0.6–3.204GHz clock-speed]. See Figure 9.

Figure 9: M1 4 high-performance CPU cores.<br />Figure 10:  M1 8 GPU cores.

Figure 9: M1 4 high-performance CPU cores.
Figure 10: M1 8 GPU cores.

An M1 chip provides either 7- (fan-less MacBook Air) or 8-graphics cores. The 8-core GPU can execute almost 25,000 threads simultaneously using its 128 EUs and 1024 ALUs to provide 2.6 teraflops performance. See Figure 10.

Smart Chip

Figure 11 has a focus on what I think is the most intriguing aspect of the M1—the 16-element Neural Engine (NPU) offering 11 Trillion Operations/second performance.

The M1’s NPU opens new possibilities to support the broadcast industries interest in AI (How AI Is Your AI?).

Although Tensor Flow Machine Learning (ML) training likely cannot be done using an M1’s NPU, ML can be performed using an M1’s CPU and GPU. As an example, the MobilNet image classification task has been trained on an M1.

The NPU can be used, however, when a trained AI model such as natural language analysis, is executed. Figure 12 shows the MobilNet task run under Apple’s Core ML framework. Reported NPU performance was 4X faster than GPU performance.

Figure 11:  M1 16-element Neural Engine (NPU).<br />Figure 12: MobilNet task run on M1 NPU.

Figure 11: M1 16-element Neural Engine (NPU).
Figure 12: MobilNet task run on M1 NPU.

Only The Beginning Of ARM

If you find yourself doubting the future of ARM architecture and unconvinced PC makers are going to feel threatened by Apple—you should consider the M1’s successor, the M1X.

The M1X most likely will have double the Firestorm cores: from 4- to 8-cores. The number of GPU cores should also double from 8- to 16-cores. Based on the 90-percent speed increase from switching from an A12 to an A12X (in the iPad) we can expect a similar speed increase from the M1X.

Experiments I performed (Predicting Editing Performance From Games And Benchmarks: Part 2) found video gaming an excellent predictor of editing performance. GFXBench is a way to measure gaming performance.

My 2019 PowerBook Pro 16, with an AMD 5500M GPU has a GFXBench gaming result of 44fps. My M1 w/16GB RAM tests at 60fps. An M1X PowerBook Pro 16 should offer from 110fps to 120fps performance—just in time for 8K h.265 source media to become standard.

You might also like...

Apple’s Next Generation Processor Is All About Power… Consumption

The last twenty years has seen a lot of film and TV hardware turn into software that runs on workstations and servers. With Apple’s move away from Intel, though, and several other recent attempts to popularize alternative designs, anyone…

HDR: Part 39 - Creative Technology - The Good Old Days

Progress inevitably comes with compromise. We can’t complain about the technology that’s brought us from hundred-kilo tube cameras to the 4K cellphones of today, but anyone who remembers the fuzzy old days of the 1990s might also remember…

HDR: Part 38 - Creative Technology - Where’s My Jetpack?

At one time, it was traditional to complain about the things that science had promised us but failed to deliver. Now we actually have a jet pack that’s practical, or at least that lasts more than thirty seconds, we’…

HDR: Part 37 - Creative Technology - Producers Guide To Large Format

Cinematography is not generally a field given to the idea that less is more. Probably the most direct and current expression of that is large format, a term that isn’t even particularly well-defined, let alone a technology that usually p…

HDR: Part 36 - Finding Focus: Good ACs Have To Feel It

Was it to embarrass me or simply the only way to get the shot?