What Does PCI 4.0 Offer?

When, in May 2019, AMD announced their Ryzen Zen 2 architecture, beyond the amazing performance offered by the new Series 3000 microprocessors, they announced the new chips would support PCI 4.0. Although I was pretty confident the step from 3.0 to 4.0 meant 2X greater bandwidth, I decided it was time to learn more about the PCIe bus.

The problem with most descriptions of PCIe: they present a top-down explanation of the PCIe version being described. Thus, an explanation of PCI 3.0 describes it as a “thing” rather than something built on more fundamental “things.” The more fundamental thing is a lane and every instance of a PCIe bus is a set of lanes.

Another problem with most PCIe bus descriptions is their focus on the length of desktop motherboard PCIe connectors. Figure 1 shows two motherboard PCIe connectors: EX4 (89mm) and EX1 (25mm.) (A shorter PCIe card can be inserted into a longer PCIe connector.) Of course, slot-lengths have no meaning for those of us who use laptops since they don’t have PCIe connectors.

Figure 1: EX4 PCIe Slot (green) and EX1 PCIe Slot (orange). (Courtesy MSI)

PCI 1.0 and 2.0 Lanes

While we correctly understand PCIe bus performance has increased over generations, fundamentally it’s lane performance that increases over time. So, the first technology to understand is how a lane functions.

A lane consists of four signal lines. Each pair of lines comprise a differential line-pair that sends signals in one direction. A line-pair functions as do a twisted pair of wires in a cable carrying a balanced microphone signal.

Were a lane simply a differential signal path—the only way to increase lane speed would be to increase the speed at which data passes over a lane. When the PCI Express (PCIe) bus was released in 2004, it was designed to transmit data not simply as ones and zeros representing the data itself. Rather, the data to be moved were coded using a scheme called “8b/10b.”

If you are curious, this scheme is described as a line-code that maps 8-bit words to 10-bit symbols to… provide enough state changes to allow reasonable clock recovery.

If you are really curious, line-codes go back to IBM in 1983 and are used, for example, to burn pits on optical discs. HDMI, Displayport, SATA, SD UHS II, and USB 3.0 all employ 8b/10b coding.

At each end of a lane’s line-pair you will find an I/O processor that inputs or outputs 8-bit data to be moved from one point to another. (Lanes are point-to-point connections.) An Input processor rapidly translates 256 possible codes from 8-bit data to one of 1024 possible 10-bit symbols. At the other end of a lane’s line-pair, an output processor translates 10-bit symbols back to 8-bit data.

The speed of an I/O processor determines symbol transfer rate. A PCI 1.0 supports a symbol transfer rate of 2.5 Gigatransfers per-second (GT/s).

While 8b/10 coding is a clever way to obtain greater noise immunity—it isn’t free. Encoding imposes a 20-percent overhead because every byte becomes a 10-bit symbol. Thus, a symbol transfer rate of 2.5GT/s provides only a 2Gbps data-rate.

A 2Gbps bandwidth is equal to a unidirectional data-rate of 250MBps. A PCI 2.0 lane supports a 2X greater symbol transfer rate and thus provides a unidirectional 500MBps data-rate. (Figure 2.)

Figure 2: PCI 1.0 and PCI 2.0—aggregate 500MBps and 1,000MBps per Lane.

PCI 3.0 Lanes

To increase lane speed, lane processor I/O performance must be increased. (Lanes are backward compatible because a faster processor can translate lower symbol transfer rates.)

Were the move from PCI 2.0 to 3.0 to be accomplished the same way as the move from PCI 1.0 to PCI 2.0, the symbol rate would need to double to 10GT/s.

To avoid such a high symbol transfer rate, PCI 3.0 utilizes the more efficient “128b/130b” coding scheme. This scheme’s greater efficiency means the aggregate symbol transfer rate needs only to be increased by 60-percent from 5GT/s to 8GT/s.

Encoding overhead now falls from 20-percent to only 1.54-percent, thus enabling a 7880GT/s symbol transfer rate. This symbol bandwidth translates to 7.9Gbps which is a unidirectional data-rate of 985MBps. Not quite a doubling of the data-rate, but close enough for marketing purposes. (Figure 3.)

Figure 3: PCI 2.0 and PCI 3.0—aggregate 1,000MBps and 1,970MBps per Lane.

PCI 4.0

There are four things to know about PCI 4.0. First, as expected it is twice as fast as PCI 3.0. To achieve this performance, the symbol bandwidth is doubled from 7,880GT/s to 15,760GT/s. This translates to a unidirectional data-rate of 1,970MBps. (Figure 4.)

Figure 4: PCI 3.0 and PCI 4.0—aggregate 1,970MBps and 3,940MBps per Lane.

Second, AMD has announced it will only support PCI 4.0 on motherboards based on their top-of-the-line X570 chipset. Thankfully, the well-regarded Gigabyte Aorus Elite WIFI motherboard employs the X570 chipset and yet sells for only $200. See Figure 5.

Figure 5: Aorus Elite WIFI X570 Chipset Motherboard. (Courtesy Gigabyte).

Third, the current highest performing graphics card, NVIDIA’s RTX-2080, is able to only slightly exceed the bandwidth provided by PCI 3.0. PCI 4.0 performance will not be needed until, for example, a card with a pair of 2080-class GPUs becomes available. Figure 6 shows an Apple announced dual GPU board for its new Mac Pro.

Figure 6: Apple Mac Pro MPX Module with pair Radeon Pro Vega II GPU Chips. (Courtesy Apple).

Fourth, the question of PCI Generation 4.0’s role in supporting newly announced very high-performance M.2 4.0 SSDs is unclear. See Figure 7.

Figure 7: Corsair MP600 Gen4 SSD. (Courtesy Corsair).

Figure 8 presents Sequential-Access performance data from a Corsair Gen4 SSD plugged into an M.2 4.0 slot. Both Read and Write performance, as expected, are very high. The MP600’s Read data-rate is almost 5,000MBps. Figure 8 also shows the performance of a Gen3 Samsung EVO 970+ in a PCIe 3.0 slot. As expected, it provides lower Sequential-Access Read/Write performance.

However, when both the MP600 and EVO 970+ drives are tested in a multi-tasking Random-Access Read/Write situation, the MP600 Gen4 performs about the same as the cheaper, but far less sexy looking, Gen3 Samsung 970 EVO+.

Figure 8: Corsair MP600 Gen4 NVMe SSD Performance.

PCIe Bus

Figure 9 presents a matrix of five PCIe generations by the number of PCIe bus lanes. (Intel is expected to jump directly to PCI 5.0.) Specifically, an x1 PCIe bus carries a single lane. An x4 bus carries 4-lanes; an x8 bus carries 8-lanes; while an x16 bus carries 16-lanes. Each matrix cell provides unidirectional bandwidth values. Thus, for example, a PCI 4.0 x4 bus provides an almost 8000MBps Read or Write connection.

Figure 9: Unidirectional PCIe Bandwidth Matrix.

More About PCIe

Where do lanes originate? Intel microprocessors typically host 16- or 24-lanes. Very high-performance CPU chips, such as the AMD Threadripper, can host up to 60-lanes.

Via a Direct Media Interface (DMI), an Intel microprocessor connects to an Intel Platform Controller Hub (PCH) chip that typically can host an additional 24-lanes. (AMD systems operate in a similar manner.)

Unfortunately, DMI bandwidth is equivalent to four PCIe 3.0 lanes. An Intel hub (chipset) thus acts only as a high-speed switch for the additional lanes. Not only is a PCIe bus from a hub bandwidth limited, additional latencies are introduced.

Assuming a high-performance laptop, the discrete GPU chip is connected by a x16 PCIe bus to the microprocessor. This connection will be PCI 3.0, or 4.0 on AMD Series 3000 chips.

Modern laptop and desktop computers offer an M.2 slot to support an SSD. To communicate with an SSD, an M.2 slot employs the NVMe (Non-Volatile Memory Express) protocol. An M.2 slot provides a x4 PCIe connection. (Figure 10.) Many systems now provide a second M.2 slot. Again, the connection will be either PCI 3.0 or 4.0.

Figure 10: M.2 Slot with Four Mounting-posts for Different Length cards. (Courtesy MSI).

A microprocessor’s hub chip also supports 5Gbps USD 3.0 (aka USB 3.1 Gen 1) and 10Gbps USB 3.1 (aka USB 3.1 Gen 2). Both speeds can be supported by a USB-C connector.

Also using a USB-C connector: the 40Gbps Thunderbolt-3 bus. Because Thunderbolt-3 employs 8b/10b coding, its actual data transfer rate is 20-percent less—only 32Gbps. A PCI 3.0 x4 bus provides the 32Gbps unidirectional transfer rate employed by Thunderbolt-3.

Now that we have a good understanding of the PCIe bus, and we know how PCIe 4.0 performs, it’s not unreasonable to see both PCIe 4.0 and PCIe 5.0—and, yes, PCIe 6.0 specifications as offering mostly marketing performance. Kind of a “…move on folks, nothing to see here—YET” technology.

You might also like...

Sports Graphics Production: Data Sources For Live Sports Graphics

The first step in data driven sports graphics production is gathering the data itself. The nature of that data can vary dramatically from sport to sport. Here we discuss some of the data gathering technology and techniques required.

Growing Momentum For 5G In Remote Production

A combination of factors that includes new 3GPP 5G standards & optimizations that have reduced latencies & jitter, new network slicing capabilities and the availability of new LEO satellite services are bringing increasing momentum to the use of 5G for…

Monitoring & Compliance In Broadcast: Accessibility & The Impact Of AI

The proliferation of delivery devices and formats increases the challenges presented by accessibility compliance, but it is an area of rapid AI powered innovation.

Sports Graphics Production: Part 1 – Data Driven Visualization

Welcome to Part 1 of our new series on Sports Graphics Production - a collection of six articles presented in two parts, that examine two key areas of live sports broadcast; data driven visualizations and the rise of virtual studio environments.

Monitoring & Compliance In Broadcast: Monitoring QoS & QoE To Power Monetization

Measuring Quality of Experience (QoE) as perceived by viewers has become critical for monetization both from targeted advertising and direct content consumption.