Skip to content

pmcgleenon/store-load-reordering

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

store-load-reordering

A simple example to demonstrate store-load memory re-ordering on x86/x64 processors using Rust.
This is an example of x86 CPU instruction reordering and highlights the behaviour of the store buffer.

Inspired by Jeff Preshing's blog Memory Reordering Caught in the Act let's recreate the infamous X86 Store/Load Reordering in Rust and explore what's happening

The scenario

The scenario is described pretty well in Paul Cavallaro's blog

There are 2 threads, Thread 1 and Thread 2.

Thread 1 Thread 2
MOV [x] ← 1 MOV [y] ← 1
MOV EAX ← [y] MOV EBX ← [x]
Thread 1 Final State Thread 2 Final State
EAX == 0 EBX == 0

The program runs these concurrent operations in two threads in a loop and counts how often the final state occurs.
When R1 and R2 are both 0, the store-load reordering has occurred.

Building the code

You can build the code and generate Assembly. The assembly output will be available at target/release/deps/store_load_reordering-<hash>.s

cargo rustc --release -- --emit asm

Running the code

Use the -h option to display the command line options. The available options for ordering are:

  • Relaxed (default)
  • AcquireRelease (use Acquire ordering on Load and Release ordering on Store)
  • SeqCst
store-load-reordering -h
A tool to demonstrate memory ordering effects.

Usage: store-load-reordering [OPTIONS]

Options:
  -o, --ordering <ORDERING>  Memory Ordering to use [default: Relaxed]
  -b, --barrier
  -h, --help                 Print help
  -V, --version              Print version

Run the code with the default relaxed ordering

store-load-reordering

Run the code with Sequentially Consistent ordering

store-load-reordering -o SeqCst

Run the code with AcquireRelease ordering

store-load-reordering -o AcquireRelease

Run the code with Relaxed ordering and a memory barrier

store-load-reordering -b

Assembly output

X86 instructions for both relaxed and acquire-release ordering are the same. The SeqCst x86 instructions use the xchg instruction which has an implicit lock. The mfence instruction is used to prevent reordering - only with a SeqCst fence.

ARM on the other hand uses the same instructions for acquire-release ordering and sequential consistency.

Rust Code X86 Arm
pub fn relaxed(x: &AtomicU32, y: &AtomicU32, r1: &AtomicU32 ) {
x.store(1, Ordering::Relaxed);
r1.store(y.load(Ordering::Relaxed), Ordering::Relaxed);
}
mov     DWORD PTR [rdi], 1
mov eax, DWORD PTR [rsi]
mov DWORD PTR [rdx], eax
mov     w0, 1
str w0, [x0]
ldr w0, [x1]
str w0, [x2]
pub fn acquire_release(x: &AtomicU32, y: &AtomicU32, r1: &AtomicU32 ) {
x.store(1, Ordering::Release);
r1.store(y.load(Ordering::Acquire), Ordering::Release);
}
mov     DWORD PTR [rdi], 1
mov eax, DWORD PTR [rsi]
mov DWORD PTR [rdx], eax
mov     w0, 1
stlr w0, [x0]
ldar w0, [x1]
stlr w0, [x2]
pub fn sequential_consistent(x: &AtomicU32, y: &AtomicU32, r1: &AtomicU32 ) {
x.store(1, Ordering::SeqCst);
r1.store(y.load(Ordering::SeqCst), Ordering::SeqCst);
}
mov     eax, 1
xchg dword ptr [rdi], eax
mov eax, dword ptr [rsi]
xchg dword ptr [rdx], eax
mov     w0, 1
stlr w0, [x0]
ldar w0, [x1]
stlr w0, [x2]
pub fn relaxed_with_barrier_seqcst(x: &AtomicU32, y: &AtomicU32, r1: &AtomicU32 ) {
x.store(1, Ordering::Relaxed);
fence(Ordering::SeqCst);
r1.store(y.load(Ordering::Relaxed), Ordering::Relaxed);
}
mov     DWORD PTR [rdi], 1
mfence
mov eax, DWORD PTR [rsi]
mov DWORD PTR [rdx], eax
mov     w0, 1
str w0, [x0]
dsb ish
ldr w0, [x1]
str w0, [x2]
pub fn relaxed_with_barrier(x: &AtomicU32, y: &AtomicU32, r1: &AtomicU32 ) {
x.store(1, Ordering::Relaxed);
fence(Ordering::Release);
r1.store(y.load(Ordering::Relaxed), Ordering::Relaxed);
}
mov     DWORD PTR [rdi], 1
mov eax, DWORD PTR [rsi]
mov DWORD PTR [rdx], eax
mov     w0, 1
str w0, [x0]
dsb ish
ldr w0, [x1]
str w0, [x2]

Godbolt X86 link

https://rust.godbolt.org/z/WKEeTMWGz

Godbolt Arm link

https://rust.godbolt.org/z/sqo6KcW1K

Useful Links - Background Info

x86 TSO: A Programmer's Model for x86 Multiprocessors

Examining ARM vs X86 Memory Models with Rust

Rust atomics on x86: How and why

Explaining Atomics in Rust

Rust release and acquire memory ordering by example

Rust Atomics and Locks

Memory Reordering Caught in the Act

About

simple example to demonstrate store-load memory re-ordering on x86/x64 processors

Topics

Resources

License

Stars

Watchers

Forks

Contributors

Languages