Translate

Friday, January 4, 2013

Models of memory consistency


Cache coherence ensures that multiple processors see a consistent view of memory. It does not answer the question of how consistent the view of memory must be.  By “how consistent” we mean, when must a processor see a value that has been updated by another processor? Since processors communicate through shared variables (used both for data values and for synchronization), the question boils down to this: In what order must a processor observe the data writes of another processor? Since the only way to “observe the writes of another processor” is through reads, the question becomes, What properties must be enforced among reads and writes to different locations by different processors?
Although the question of how consistent memory must be seems simple, it is remarkably complicated, as we can see with a simple example. Here are two code segments from processes P1 and P2, shown side by side:
1: A = 0;           P2: B = 0;
      .....                      .....
      A = 1;                 B = 1;
L1: if (B == 0) ... L2: if (A == 0)...
Assume that the processes are running on different processors, and that locations A and B are originally cached by both processors with the initial value of 0.  The most straightforward model for memory consistency is called sequential consistency.
Sequential consistency requires that the result of any execution be the same as if the memory accesses executed by each processor were kept in order and the accesses among different  processors were arbitrarily interleaved.
Sequential consistency eliminates the possibility of some no obvious execution in the previous example because the assignments must be completed before the if statements are initiated.

The simplest way to implement sequential consistency is to require a processor to delay the completion of any memory access until all the invalidations caused by that access are completed.
Of course, it is equally effective to delay the next memory access until the previous one is completed.

Remember that memory consistency involves operations among different variables: the two accesses that must be ordered are actually to different memory locations. In our example, we must delay the read of A or B (A == 0 or B == 0) until the previous write has completed (B = 1 or A = 1). Under sequential consistency, we cannot, for example, simply place the write in a write buffer and continue with the read.

The Programmer’s View

Although the sequential consistency model has a performance disadvantage, from the viewpoint of the programmer it has the advantage of simplicity.
 The challenge is to develop a programming model that is simple to explain and yet allows a high-performance implementation.
One such programming model that allows us to have a more efficient implementations  to assume that programs are synchronized.

A program is synchronized if all access to shared data are ordered by synchronization operations.

A data reference is ordered by a synchronization operation if, in every possible execution, a write of a variable by one processor and an access (either a read or a
write) of that variable by another processor are separated by a pair of synchronization executed before the access by the second processor.
Cases where variables maybe updated without ordering by synchronization are called data races because the execution outcome depends on the relative speed of the processors, and like races in hardware design, the outcome is unpredictable, which leads to another name for synchronized programs: data-race-free.

It is a broadly accepted observation that most programs are synchronized. This observation is true primarily because if the accesses were unsynchronized, the behavior of the program would likely be unpredictable because the speed of execution would determine which processor won a data race and thus affect the results of the program.

Even with sequential consistency, reasoning about such programs is very difficult. Programmers could attempt to guarantee ordering by constructing their own synchronization mechanisms, but this is extremely tricky, can lead to buggy programs, and may not be supported architecturally, meaning that they may not work in future generations of the multiprocessor.

 Instead, almost all programmers will choose to use synchronization libraries that are correct and optimized for the multiprocessor and the type of synchronization.

Finally, the use of standard synchronization primitives ensures that even if the architecture implements a more relaxed consistency model than sequential consistency, a synchronized program will behave as if the hardware implemented sequential consistency operations, one executed after the write by the writing processor and one

Relaxed Consistency Models: The Basics

The key idea in relaxed consistency models is to allow reads and writes to complet out of order, but to use synchronization operations to enforce ordering, so that a synchronized program behaves as if the processor were sequentially consistent. There are a variety of relaxed models that are classified according to what read and write orderings they relax.

We specify the orderings by a set of rules of the form X?Y, meaning that operation X must complete before operation Y is done.
 Sequential consistency requires maintaining all four possible orderings:      R?W, R?R, W?R, and W?W. The relaxed models are defined by which of these four sets of orderings they relax:
1. Relaxing the W?R ordering yields a model known as total store ordering or processor consistency. Because this ordering retains ordering among writes, many programs that operate under sequential consistency operate under this model, without additional synchronization
2. Relaxing the W?W ordering yields a model known as partial store order.
3. Relaxing the R?W and R?R orderings yields a variety of models including weak ordering, the PowerPC consistency model, and release consistency, depending on the details of the ordering restrictions and how synchronization operations enforce ordering.

By relaxing these orderings, the processor can possibly obtain significant performance advantages.
There are, however, many complexities in describing relaxed consistency models, including the advantages and complexities of relaxing different orders, defining precisely what it means for a write to complete, and deciding when processors can see values that the processor itself has written.
For more information about the complexities, implementation issues, and performance potential from relaxed models, we highly recommend the excellent tutorial by Adve and Gharachorloo .

No comments:

Post a Comment