OCamlot - (Untitled post)

For anyone looking for a concrete example of how dropping support for legacy features in an architecture can improve things:The 8086 and 8086 were very simple designs (even add was a microcoded instruction). On these systems, you could store and overwrite an instruction that you were just about to execute and everything was fine. The fetch would see the result. And, because you could, some people did. Some DOS programs depended on this.Modern x86 chips are (modulo bugs) 100% backwards compatible. These same DOS programs ran on the 80286, 386, and so on up to modern Intel and AMD CPUs.Preserving this behaviour became more complex. A modern CPU has hundreds of instructions in flight at a time. The notion of a 'current' instruction is an abstraction that doesn't really exist. If a store instruction overwrites an instruction just after it in the instruction stream, the instruction that it's overwritten may already have been (speculatively) executed. The hardware needs to determine that the store overwrote this instruction and then cancel everything in the pipeline after the modified instruction.The core needs to track all of the state that's required for this. Actually cancelling the instructions can be simple: treat it as if the store trapped and then re-execute any instructions between the store and the modified instruction. The book-keeping is consuming power and area all of the time, irrespective of whether this is ever used.If you run x86 code on, say, an Arm core, it will typically just mark the pages containing code read only. When it runs software that does this kind of trick, the CPU will trap, the kernel will deliver a signal to userspace, and the emulator will then trap invalidate cached translations and run the instruction again.This is slow, but most of the code that does this was written for a 4.77 MHz or 8 MHz single-issue CPU that executed one instruction every few cycles, so running a 2 GHz CPU that executes multiple instructions per cycle at a fraction of its peak speed is fine.Running this legacy DOS code on a modern Arm core under emulation will be much slower than running it on a modern AMD or Intel chip. On the other hand, the Arm core doesn't have to spend transistors and power budget to handle this painful corner case and so can devote them to things that make more modern software fast instead.