These Are the Numbers Every Computer Engineer Must Know

As a full-stack developer and computer engineer, having a deep understanding of the performance characteristics of the hardware you work with is essential. The speed of CPUs, memory, disks, and networks fundamentally constrains the kinds of systems it‘s practical to design and build. While the raw capabilities are constantly evolving, having an intuition for their relative scales remains invaluable.

In this article, we‘ll take a deep dive into the key metrics that every computer engineer should internalize. We‘ll examine the latencies of core computing operations, analyze historical trends, and explore how hardware constraints impact real-world system design. Whether you‘re a software engineer optimizing application performance or an architect designing data center infrastructure, this knowledge is foundational.

Latencies Every Programmer Should Know

In 2010, Google‘s Jeff Dean gave a now-famous talk titled "Building Software Systems at Google and Lessons Learned". In it, he highlighted some key latency numbers that every programmer should know:

Operation Latency
L1 cache reference 0.5 ns
Branch mispredict 5 ns
L2 cache reference 7 ns
Mutex lock/unlock 25 ns
Main memory reference 100 ns
Compress 1K bytes with Zippy 3,000 ns
Send 2K bytes over 1 Gbps network 20,000 ns
Read 1 MB sequentially from memory 250,000 ns
Round trip within same datacenter 500,000 ns
Disk seek 10,000,000 ns
Read 1 MB sequentially from disk 20,000,000 ns
Send packet CA->Netherlands->CA 150,000,000 ns

These numbers, while over a decade old now, encapsulate some fundamental truths about the relative performance of different computing operations. Memory accesses are dramatically faster than disk I/O, even for sequential reads. Inter-datacenter network latencies easily dominate round-trip times. And a single disk seek is orders of magnitude slower than accessing main memory.

But hardware has advanced significantly since 2010. Let‘s take a look at some more up-to-date numbers.

CPU and Memory Latencies in 2023

Modern CPUs have evolved to include complex cache hierarchies to bridge the gap between processor speeds and main memory. Typical latencies for a modern Intel Xeon or AMD Epyc CPU look like:

Operation Latency (ns)
L1 cache access ~1 ns
L2 cache access ~3 ns
L3 cache access ~12 ns
Main memory access ~80 ns

While L1 caches have stayed roughly constant in size (32-64KB) and latency (1-2 cycles), last-level caches have exploded in capacity. A high-end Intel Xeon now sports up to 60MB of L3 cache, albeit with higher latency than the smaller L2.

Main memory latencies have also gradually improved over the past decade. While DDR3 DRAM had latencies around 15ns, modern DDR5 DRAM achieves closer to 12.5ns at the chip level. However, additional latency is introduced by memory controllers and interconnects, leading to observed latencies around 80ns.

But raw latency is only half the story. Memory bandwidth has increased dramatically, from around 10 GB/s with DDR2 to over 200 GB/s with octa-channel DDR5. This means that while random access is only moderately faster compared to a decade ago, sequential memory throughput has improved by 20x.

Storage and Network Latencies

Storage and networking technologies have also advanced significantly. Here are some typical latency and bandwidth numbers for modern hardware:

Operation Latency Bandwidth
NVMe SSD random read 10 us 1-3 GB/s
SATA SSD random read 50 us 500-550 MB/s
Hard disk seek 2-5 ms
1 Gbps Ethernet round-trip ~200 us 125 MB/s
10 Gbps Ethernet round-trip ~20 us 1.25 GB/s
100 Gbps Ethernet round-trip ~2 us 12.5 GB/s
Datacenter round-trip 100-500 us
Cross-country round-trip 30-50 ms
Transoceanic round-trip 100-250 ms

The move from hard disk drives (HDDs) to NAND-based solid state drives (SSDs) has driven a 1000x improvement in random I/O latency and a 50-100x improvement in sequential throughput. More recent NVMe SSDs, which use high-speed PCIe interfaces, offer an additional 5-10x reduction in latency and 2-3x increase in throughput over SATA SSDs.

On the networking front, bandwidths have increased by orders of magnitude, from 1 Gbps to 400 Gbps for high-speed Ethernet. But speed-of-light latencies remain a fundamental limitation, with cross-country and transoceanic round trips bounded in the millisecond range. Even within a datacenter, server-to-server round trips typically exceed 100 microseconds due to networking stack overheads.

Impact on Software Design

These hardware characteristics have profound implications for how we design and architect software systems. Let‘s look at a few examples.

Consider a typical web application serving requests from a database. With a HDD-based database, each random I/O could incur a 2-5ms seek penalty, limiting throughput to a few hundred requests per second. Naively moving that database to an SSD might improve throughput to a few thousand requests per second. But truly leveraging the hardware would require restructuring queries to take advantage of the SSD‘s superior sequential throughput, potentially using techniques like columnar storage.

Likewise, a data-intensive computation running on a single machine is ultimately limited by memory bandwidth and capacity. While caching can help, algorithmic approaches like streaming and out-of-core processing are key to handling datasets that exceed RAM. And as memory bandwidths scale beyond 200 GB/s, many workloads shift from being compute-bound to memory-bound, requiring new algorithmic approaches.

In a distributed system, network latencies force critical design choices. A high-throughput stream processing system may need to buffer seconds of data to smooth over millisecond-scale network hiccups. And while it‘s tempting to shard datasets across a cluster for scalability, the latency penalty of remote data access often dominates gains from parallelism. Techniques like bounded staleness, partial replication, and intelligent request routing become essential.

Looking Ahead

So what lies ahead for computer hardware? While the details are uncertain, the broad strokes are clear: processors will continue to get wider and more parallel, memory and storage hierarchies will get deeper and more complex, and networks will get faster but remain constrained by the speed of light.

Technologies on the horizon like die stacking, non-volatile memory, and silicon photonics will reshape latencies and bandwidths in the coming decade. But rather than abstract these away, the most performant software will need to embrace and optimize for the characteristics of the underlying hardware.

As a concrete example, the emerging Compute Express Link (CXL) standard will allow processors to access memory and storage across PCIe, blurring the lines between local and remote resources. This will enable new system architectures that pool and dynamically provision hardware based on workload requirements. But achieving optimal performance in this new world will require rethinking everything from data layouts to network protocols to scheduling algorithms.

Conclusion

As a computer engineer in the 2020s, it‘s more important than ever to have a deep understanding of hardware fundamentals. While the specific latencies involved are constantly evolving, the relative differences between CPU, memory, storage, and network operations remain as relevant as ever.

By internalizing these numbers and developing an intuition for their implications, you‘ll be able to design and optimize systems that are well-matched to the strengths and constraints of the underlying hardware. Whether you‘re building cloud infrastructure, embedded devices, or high-performance computing applications, this knowledge will be a key enabler.

So take the time to learn these numbers. Experiment with microbenchmarks to understand the performance characteristics of your hardware. Reason about the theoretical limits of your systems, and optimize accordingly. The hardware landscape may be complex and ever-changing, but with a solid grasp of the fundamentals, you‘ll be well-equipped to navigate it.

Similar Posts