The compiler will optimize that away | RoyalSloth

https://blog.royalsloth.eu/posts/the-compiler-will-optimize-that-away/

A lot of programmers believe that compilers are magic black boxes in which you put your messy code in and get a nice optimized binary out. The hallway philosophers will often start a debate on which language features or compiler flags to use in order to capture the full power of the compiler’s magic. If you have ever seen the GCC codebase, you would really believe it must be doing some magical optimizations coming from another planet.

Nevertheless, if you analyze the compiler’s output you will found out that the compilers are not really that great at optimizing your code. Not because people writing them wouldn’t know how to generate efficient instructions, but simply because the compilers can only reason about a very small part of the problem space 1.

In order to understand why magic compiler optimizations are not going to speed up your software, we have to go back in time when the dinosaurs roamed the earth and the processors were still extremely slow. The following graph shows the relative processor and memory performance improvements throughout the years (1980-2010), taken from 2:

The problem that this picture represents is that the CPU performance improved tremendously over the years (y axis is in a logarithmic scale), while the memory performance increased at a much slower pace:

In 1980 the RAM latency was ~1 CPU cycle
In 2010 the RAM latency was ~400 CPU cycles

So, what? We spend a few more cycles when loading something from the memory, but the computers are still way faster than they used to be. Who cares how many cycles we spend?

Well, for one it’s sad to know that even if we get better compilers or faster hardware, the speed of our software is not going to drastically improve, since they are not the reason why our software is slow. The main problem that we have today lies in utilizing the CPUs to their full potential.

The following table displays the latency numbers of common operations, taken from 3. Scaled latency column represents latency in numbers that are easier to understand for humans.

Untitled

When looking at the scaled latency column we can quickly figure out that accessing memory is not free and for the vast majority of applications the CPU is twiddling its thumbs while waiting for the memory to arrive 4. The reasons for that are two-fold:

The programming languages, which we are still using today, were made in time when the processors were slow and the memory access latency wasn’t much of a problem.
The industry best practices are still revolving around the object oriented programming which does not perform well on a modern hardware.

Programming languages

The programming languages listed above are all over 20 years old and their initial design decisions, like Python’s global interpreter lock or Java’s everything is an object mindset, no longer make any sense today 5. The hardware has changed tremendously with the addition of the CPU caches and multiple cores, while the programming languages are still based on the ideas that are no longer true.

Most of the modern programming languages are trying to make your life easier by taking away the sorrows of manual memory management. While not having to think about the memory is going to make you more productive, that productivity comes at a heavy price.

Allocating chunks of memory without oversight is going to make your program slow, mostly due to random memory accesses (cache misses) that will cost you a few hundred CPU cycles each. Nevertheless, most programming languages listed above still act like the random memory allocation is not a big deal and you shouldn’t worry about it because the computers are so fast.