Tree-PLRU based L1 Eviction Timer
When the CPU loads a value into the L1 cache, it usually needs to choose an existing entry to evict first. There are different eviction strategies to do this, but most modern desktop CPUs use a tree based "Pseudo Least Recently Used" algorithm, Tree-PLRU in short. A given cache set is organized in a tree structure and every node stores which side of the tree has been accessed farther in the past (least recently used). Choosing a good candidate for eviction is then as simple as following the arrows in the tree.
This also means that in the worst case, an element that has been accessed once can stay in the cache forever. This happens if the neighbor in the tree gets accessed everytime the element is about to be evicted. We can use this to test for an L1 access using low precision timers as follows:
- Fill the cache set with known values
- Trigger the code to test, which might or might not access the cache set
- Repeatedly access the known values from step 1, but keep the unknown element in the cache by accessing its neighbor when needed
Check out the animation on the right and compare what happens when there's a cache hit from the instrumented code, i.e. the unknown element in red marked with an X. The element to keep it alive is marked with a K. You will see that they always end up as neighbors even though the initial state of the cache is randomized. Note how everytime the arrows point to X, we access K instead to keep it in the cache.
Repeating this access pattern allows us to increase the time difference until we can measure it with our low precision timer.