300 – How Processors Got So Fast
Rate/Vote |
Guest: Lex Augusteijn Host: Markus Voelter Shownoter: Stefaan Rillaert
Have you ever wondered how the processor in your phone or computer got so much more faster than what the increase in megahertz suggests? In this episode we talk with Lex Augusteijn about superscalar processors, pipelining, speculative execution, register renaming and the like. We also discuss concerns other than speed, in particular, energy efficiency.
Introduction
00:06:00Lex Augusteijn | KIM-1 computer | 6502 processor | Functional programming | Philips Research Eindhoven | Compiler | Media processor | DSP | Neuromorphic computing
Basics
00:10:53Moore's law | Richard Feynmans' 'There's Plenty of Room at the Bottom' | Richard Feynman | "Surely You're Joking, Mr. Feynman!" | Gordon Moore | Clock speed | Transistor | Episode of 'The Freak Show' podcast about the 'monster 6502' | NAND gate | Fab | ASML | Episode about ASML | Wafer stepper | Processor | Computer memory | Hard drive | Graphics processing unit | Computer bus | Processor register | Branching | Instruction pipelining | Floating-point arithmetic | ALU | Instruction set | CISC | RISC | Compiler | Superscalar processor | Microcode | Von Neumann architecture | Harvard architecture
Speed optimizations in modern processors
00:42:11Data width | SIMD | Pipeline stall | Intel 8080 | ARM architecture | Branch prediction | Two bit prediction scheme | Memory hierarchy | Clock cycle | CPU cache | Cache line | Write-Through cache line | Operating system | Cache coherence | Voatile keyword in C | Simultaneous multithreading (SMT) | Hyper-threading | Context switch | Single instruction, multiple data (SIMD) | Speculative execution | Register renaming | Very long instruction word (VLIW) | Intel Atom | Abstract interpretation | Static program analysis | Domain-specific language (DSL) | (German) episode about DSLs | C pragma | TriMedia mediaprocessor | Out-of-order execution | Instruction scheduling | Multi-core processor | The Free Lunch Is Over | MMX instruction set
Additional concerns
01:44:44Application-Specific Integrated Circuit (ASIC) | Image processor | Dynamic voltage scaling | ARM architecture | Memory management unit (MMU) | Spectre bug | Side-channel attack | CUDA | Convolutional neural network | NXP Semiconductors | Processor design | Intel Tick-tock strategy
Vielen Dank für die Episode. Spannendes Thema. Ich freue mich darauf, es bei dem schlechten in Ruhe zu hören.
Tolles Thema für das Jubiläum. Leider hänge ich noch ein paar Folgen hinten, aber ich kann es kaum erwarten!
Bei dem schlechten :-) ?
Great show (as always)! What about an episode on GPUs (history, current and future technologies, fixed vs. programmable pipelines, deep learning, graphics)?
Sehr schöne Abhandlung zu diesem komplexen Thema. Am Ende äussert ihr euch dazu, dass das Thema Processor Design eine eigene Episode wert sei. Ich könnte dazu ggf. Kontakt zu interessanten Gesprächspartnern vermitteln.
The best episode for a long time (if not ever ). Fascinating and superbly engaging. Thanks so much.
Would be great to hear something more on how storage /memory tech has a managed to keep pace with Moore’s law. e.g. how storage arrays work and SSD ?
Thanks David for the high praise :-)
Markus I agree with David. One of your best episodes. Congratulations and many thanks. Lex was a super guest, you should consider another episode with him. I’m probably same generation as Lex and have followed the evolution of chip design as an amateur. The complexities of super scalar were vague to me before but you both managed to bring them alive in a clear way. I’ve already recommended this episode to my team. Best Adam
Thanks Adam :-)
Sorry für den Tippfehler bei dem Comment von mir oben. Bei dem schlechten Wetter … war gemeint. Die Folge ist wirklich toll und ich habe viel gelernt.
I finally found the time to listen to this. Another very good episode. I was surprised about how much my old (1994) knowledge (mostly from Hennessy/Patterson) still applies. I missed a section on forwarding of results in the pipeline (which reduces the effect of data dependencies, because it can make a result available to the next instruction before the store). On the other hand, I much better understood why a two bit counter is great for branch prediction (especially in loops – if you leave the loop, you will not loose the correct (backward) prediction, so if you re-enter it, you will still correctly predict the frequent case).
A side note on the cache eviction/invalidation topic.
There are basically 2 concepts which intermix.
Concept 1 is that the CPU (or the chipset) does Bus snooping. So every memory write transaction from an external e.g. PCI/PCIe device to memory comes past the chipset which tells the CPU to invalidate the cache line the memory transaction touches. This was pretty common in all the PC ará machines. The problem here is that the more CPUs/Cores you get you will have more than one bus, more than one Memory Interface etc. So Intel made up some transaction protocol between the CPU cores/chipsets which only tell the others which cache lines to invalidate.
The other concept is to let the Operating System do it itself. For example the early SGI/Mips machines were of this concept. So before letting the OS start a DMA transaction from an external storage device the OS had to invalidate the Data Cache lines currently in the CPU cache (And avoid loading new cache lines while DMA was running)
This got more complicated from the OS side of things and had some problems with speculative execution as sometimes the CPUs loaded stuff from memory while speculating addresses. It sometimes mispredicted addresses and loaded memory from the DMA region. This was from Mips R10000 and upwards the case. It was only fixable with some kind of bus snooping which later machines employed
So on x86/PC style machines manually invalidating the cache is a pretty rare thing (it is necessary though). On MIPS style machines you have to do it in most of the OS drivers whenever the external peripheral touches memory directly.
Flo
AWESOME episode. First of all the topic is of course super interesting, but also the way the dialog was held is very pleasant. Very nice level of detail, could have been even a little more at some point….With every answer it became evident that Lex could go on about every subtopic for hours. For me it is a extremely satisfying experience to listen to such experts digesting most complex things into something that a non-expert can understand and value.
And Markus is of course a very intelligent and quick thinking interviewer.
Hi, I have just started following you and this was the first podcast I listened to till the end. This has been an awesome experience. I came across your podcast on spotify so I am really thankful for that.
Wow, you are the first Spotify listener I know of (there are a few more, but nobody had contacted us yet). Cool :-)