Gonna say this one time.
I don’t post this stuff because I am looking for someone to tell me there’s not a scholarly article, or a deeper dive. I post it because of trends I have seen in the reporting of day to day events, and emerging threats.
I didn’t get here on scholarly articles on emerging threats
As this one gets closer to being truly weaponized… You need to know that SPECTRE and Meltdown cannot be patched..
I mean, I hate it too… all my life has levels been focused on one version of x86 or another.
But SPECTRE is not a ghost. It is real. It can do damage.
The issue isn't static logic. The issue is divorcing instruction decoding from instruction set design to attain performance goals not originally built into the ISA.
It takes, for example, several clock cycles just to decode x86 instructions into a form that can then be readily executed. Several clocks to load the code cache. Several clocks to translate what's in the code cache into a pre-decoded form in the pre-decode cache. Several clocks to load a pre-decode line into the instruction registers (yes, plural) of the instruction fetch unit. A clock to pass that onto the first of (I think?) three instruction decode stages in the core. Three more clocks after that, you finally have a fully decoded instruction that the remainder of the pipelines (yes, plural) can potentially execute.
Of course, I say potentially because there's register renaming happening, there's delays caused by waiting for available instruction execution units to become available in the first place, there's waiting for result buses to become uncontested, ...
The only reason all this abhorrent latency is obscured is because the CPU literally has hundreds of instructions in flight at any given time. Gone are the days when it was a technical achievement that the Pentium had 2 concurrently running instructions. Today, our CPUs, have literally hundreds.
(Consider: a 7-pipe superscalar processor with 23 pipeline stages, assuming no other micro-architectural features to enhance performance, still offers 23*7=161 in-flight instructions, assuming you have some other means of keeping those pipes filled.)
This is why CPU vendors no longer put cycle counts next to their instructions anymore. Instructions are pre-decoded into short programs, and it's those programs (strings of "micro-ops", hence micro-op caches, et. al.) which are executed by the core on a more primitive level.
Make no mistake: the x86 instruction set architecture we all love to hate today has been shambling undead zombie for decades now. RISC definitely won, which is why every x86-compatible processor has been built on top of RISC cores since the early 00s, if not earlier. Intel just doesn't want everyone to know it because the ISA is such a cash cow these days. Kind of like how the USA is really a nation whose official measurement system is the SI system, but we continue to use imperial units because we have official definitions that maps one to the other.
Oh, but don't think that RISC is immune from this either. It makes my blood boil when people say, "RISC-V|ARM|MIPS|POWER is immune."
No, it's not. Neither is MIPS, neither is ARM, neither is POWER. If your processor has any form of speculative execution and depends on caches for maintaining instruction throughputs, which is to say literally all architectures on the planet since the Pentium-Pro demonstrated its performance advantages over the PowerPC 601, you will be susceptible to SPECTRE. Full stop. That's laws of physics talking, not Intel or IBM.
Whether it's implemented as a sea-of-gates in some off-brand ASIC or if it's an FPGA, or you're using the latest nanometer-scale process node by the most expensive fab house on the planet, it won't matter -- SPECTRE is an artifact of the micro-architecture used by the processor. It has nothing whatsoever to do with the ISA. It has everything to do with performance-at-all-costs, gotta-keep-them-pipes-full mentality that drives all of today's design requirements.
I will put the soapbox back in the closet now. Sorry.
@djsundog @requiem @thegibson I distinctly remember when the first round of SPECTRE and Meltdown attacks came out and everyone and their grandmother were heralding the technical superiority of ARM cores because they didn't have a successful demonstration of these attacks.
It only took several months of effort to demo the first attack for the ARM.
Then, POWER became the patron saint of processing. And, as I recall, not long after, its fortified walls fell eventually as well.
You can absolutely get to the moon from here if you have enough bandaids. But, I'll argue that there are easier ways to do it than creating a big, gooey stack of padded rubber strips carefully balanced on each other.
@sjb @requiem @thegibson That works for some workloads. Consider GPUs for example. However, for other workloads not so much. A more general approach would be a fleet of small processors interacting with each other over communications links each with their own private memory. This cellular approach to computing is something that was envisioned back in the days of SmallTalk, but never fully realized. The GreenArrays GA144 chip is probably the next incarnation of the idea, but it's application domain appears to be limited to deep embedded to applications.
I don't claim to know a general purpose solution to this extremely general purpose problem. However knowing the true reasons why it exists in the first place is critical in knowing how to mitigate it, at least for specific domains.
Yes _and no_ ... RISC as a microarchitecture thoroughly and definitively won. There's a real argument to be made that a CISC ISA with a RISC microarchitecture is the true performance winner. (Cf. X86, GPUs)
And of course, as you say, our RISCs aren't as RISCy as they could be these days, either.
SPECTRE depends on changing between user and kernel modes of operation. The idea is to exploit failed speculation into kernel space. Under these conditions, you're still running in user-space, but the caches now have privileged information in them. How much depends on which paths were speculated in the kernel, and flushing those cache lines in favor of new user-mode content takes time. Hence, the timing side-channel.
With a compiler for a VLIW architecture, this can't occur, because speculation never happens across a privilege boundary. The cache is always hot with the working set of the process currently running.
I read the paper about that latest spectre variant and it looks like their whole lfence-bypassing attack relies on a secret-dependent indirect branch after the bounds/permission check (and the lfence), and to my best understanding, if that indirect branch was a retpoline, the attack would no longer work.
Am I missing something? I can't believe they haven't thought of such a simple mitigation...
A bunch of technomancers in the fediverse. Keep it fairly clean please. This arcology is for all who wash up upon it's digital shore.