A simple example to demonstrate store-load memory re-ordering on x86/x64 processors using Rust.
This is an example of x86 CPU instruction reordering and highlights the behaviour of the store buffer.
Inspired by Jeff Preshing's blog Memory Reordering Caught in the Act let's recreate the infamous X86 Store/Load Reordering in Rust and explore what's happening
The scenario is described pretty well in Paul Cavallaro's blog
There are 2 threads, Thread 1 and Thread 2.
| Thread 1 | Thread 2 |
|---|---|
| MOV [x] ← 1 | MOV [y] ← 1 |
| MOV EAX ← [y] | MOV EBX ← [x] |
| Thread 1 Final State | Thread 2 Final State |
|---|---|
| EAX == 0 | EBX == 0 |
The program runs these concurrent operations in two threads in a loop and counts how often the final state occurs.
When R1 and R2 are both 0, the store-load reordering has occurred.
You can build the code and generate Assembly. The assembly output will be available at target/release/deps/store_load_reordering-<hash>.s
cargo rustc --release -- --emit asm
Use the -h option to display the command line options. The available options for ordering are:
- Relaxed (default)
- AcquireRelease (use Acquire ordering on Load and Release ordering on Store)
- SeqCst
store-load-reordering -h
A tool to demonstrate memory ordering effects.
Usage: store-load-reordering [OPTIONS]
Options:
-o, --ordering <ORDERING> Memory Ordering to use [default: Relaxed]
-b, --barrier
-h, --help Print help
-V, --version Print version
Run the code with the default relaxed ordering
store-load-reordering
Run the code with Sequentially Consistent ordering
store-load-reordering -o SeqCst
Run the code with AcquireRelease ordering
store-load-reordering -o AcquireRelease
Run the code with Relaxed ordering and a memory barrier
store-load-reordering -b
X86 instructions for both relaxed and acquire-release ordering are the same. The SeqCst x86 instructions use the xchg instruction which has an implicit lock.
The mfence instruction is used to prevent reordering - only with a SeqCst fence.
ARM on the other hand uses the same instructions for acquire-release ordering and sequential consistency.
| Rust Code | X86 | Arm |
|---|---|---|
pub fn relaxed(x: &AtomicU32, y: &AtomicU32, r1: &AtomicU32 ) { |
mov DWORD PTR [rdi], 1 |
mov w0, 1 |
pub fn acquire_release(x: &AtomicU32, y: &AtomicU32, r1: &AtomicU32 ) { |
mov DWORD PTR [rdi], 1 |
mov w0, 1 |
pub fn sequential_consistent(x: &AtomicU32, y: &AtomicU32, r1: &AtomicU32 ) { |
mov eax, 1 |
mov w0, 1 |
pub fn relaxed_with_barrier_seqcst(x: &AtomicU32, y: &AtomicU32, r1: &AtomicU32 ) { |
mov DWORD PTR [rdi], 1 |
mov w0, 1 |
pub fn relaxed_with_barrier(x: &AtomicU32, y: &AtomicU32, r1: &AtomicU32 ) { |
mov DWORD PTR [rdi], 1 |
mov w0, 1 |
https://rust.godbolt.org/z/WKEeTMWGz
https://rust.godbolt.org/z/sqo6KcW1K
x86 TSO: A Programmer's Model for x86 Multiprocessors
Examining ARM vs X86 Memory Models with Rust
Rust atomics on x86: How and why