L1 Cache Timer

Before we run the Spectre proof of concept, we need a way to observe the side-effects of the transient execution. The most popular one is a cache side channel. By timing a memory access, we can infer if a chosen address is in the cache (if the access is fast) or needs to be loaded from memory (if the access is slow). Later, we will use a Spectre gadget to access a JavaScript array using a secret value as the index. Testing which array index got cached will allow us to recover that secret value.

To be able to measure these small timing differences we need a high-precision timer. There is a great paper on this topic by Michael Schwarz, Clémentine Maurice, Daniel Gruss, and Stefan Mangard: "Fantastic Timers and Where to Find Them: High-Resolution Microarchitectural Attacks in JavaScript". One example they show is the use of a SharedArrayBuffer as a monotonic clock: increment a counter in a worker thread and read it from a second thread as a high precision timestamp. SharedArrayBuffers in particular are nowadays only available if the site is cross-origin isolated but there are many more timers described in that paper.

An alternative to using a high-precision timer is to amplify the timing difference. By triggering the code repeatedly and measuring the cumulative time, we can increase the time difference and use timers with lower precision. However, this introduces noise and can be difficult if the code to be measured is probabilistic.

In this proof of concept, we implemented the latter method with a trick that avoids triggering the code to be measured repeatedly. We're measuring the timing difference between the L1 and L2 cache and are amplifying it by abusing an L1 cache eviction strategy found commonly on modern CPUs. This works as follows:

  1. Fill a chosen cache set in the L1 cache with known entries.
  2. Trigger the code that might access the cache set.
  3. Repeatedly access the known entries, while keeping the secret entry in the cache.
If the code in step 2 accessed the cache set, one of our known values got evicted and we will repeatedly see L1 misses in step 3. If it didn't access it, all access in step 3 will be fast L1 hits. We then measure this difference using performance.now(), which at the time of writing has a resolution of 5μs in Chrome on desktop. If you're interested in the technical details of the technique please check out this writeup.

For the demo, you can play with the "Cache timer repetitions" option in the menu on the right. It controls the number of iterations of step 3 above. If you increase the value, it will work with even lower precision timers but it will also take longer to run. Some rough example values and resulting timing differences:

4000:20us
40000:200us
400000:2ms

The demo also serves as the timer calibration for the next examples. It will try to infer a good threshold automatically, but you can adjust it manually if it didn't work. Note that if you change the timer repetitions, you will have to re-run this calibration.

Click run to plot the results in the graph on the right. The X axis shows you the timer results. If everything works, there should be two clearly distinct peaks visible in the graph: a purple one on the left and a green one on the right. The measured timings if the cache line was not used by the code that is being tested are shown in purple. They should be faster on average. The green graph shows the timings if the cache line was used, so it leads to more L1 cache misses and becomes slower.

If the measurements fail, try modifying the parameters for the cache timer repetitions and the stable timer. Another reason for failure can be high CPU load from other processes on the system.

Options