Everything You Always Wanted to Know About Synchronization but Were Afraid to Ask

This paper is an overview of synchronization methods across the hardware stack, with an analysis that attributes high-level performance differences to low-level hardware constraints.

Among the key takeaways:

Latency of atomic operations: There is a heavy penalty for cross-socket sharing - ie, passing data between cores (or hyperthreads). This outweighs other sources of latency by far.

Locking vs message passing: Message passing is better when data has high contention, locking with low contention.

Performance of different types of locks: Each of the nine locks considered is the best performer in certain situations. The ticket lock is the best performer in most low-contention cases, though–the “ticket lock” being a spinlock that uses a FIFO queue to determine which thread is allowed to proceed.