Translate

Friday, January 4, 2013

Synchronisation issues

Synchronisation issues

*     Synchronization mechanisms are typically built with user-level software routines that rely on hardware supplied synchronization instructions.
*     For smaller multiprocessors or low-contention situations, the key hardware capability is an uninterruptible instruction or instruction sequence capable of atomically retrieving.
*     In larger-scale multiprocessors or high-contention situations, synchronization can become a performance bottleneck because contention introduces additional delays and because latency is potentially greater in such a multiprocessor.

Basic Hardware Primitives
*     The key ability we require to implement synchronization in a multiprocessor is   asset of hardware primitives with the ability to atomically read and modify a memory location.
*     There are a number of alternative formulations of the basic hardware primitives, all of which provide the ability to atomically read and modify a location, together with some way to tell if the read and write were performed atomically
*     These hardware primitives are the basic building blocks that are used to build a wide variety of user-level synchronization operations, including things such as locks and barriers.
*     One typical operation for building synchronization operations is  atomic exchange, which interchanges a value in a register for a value in memory.
*     To see how to use this to build a basic synchronization operation, assume that we want to build a simple lock where the value 0 is used to indicate that the lock is freehand 1 is used to indicate that the lock is unavailable.
*     A processor tries to set the lock by doing an exchange of 1, which is in a register, with the memory address corresponding to the lock.
*     For example, consider two processors that each try to do the exchange simultaneously: This race is broken since exactly one of the processors will perform the exchange first, returning 0, and the second processor will return 1 when it does the exchange.
*     The key to using the exchange (or swap) primitive to implement synchronization is that the operation is atomic: The exchange is indivisible, and two simultaneous exchanges will be ordered by the write serialization mechanisms.
*     It is impossible for two processors trying to set the synchronization variable in this manner to both think they have simultaneously set the variable.
*     There are a number of other atomic primitives that can be used to implement synchronization
*     One operation, present in many older multiprocessors, is test-and-setwhich tests a value and sets it if the value passes the test.
*     For example, we could define an operation that tested for 0 and set the value to 1, which can be used in a fashion similar to how we used atomic exchange.
*     Another atomic synchronization primitive is fetch-and-incrementIt returns the value of a memory location and atomically increments it.
*     This requirement complicates the implementation of coherence, since the hardware cannot allow any other operations between the read and the write, and yet must not deadlock.
*     An alternative is to have a pair of instructions where the second instruction returns a value from which it can be deduced whether the pair of instructions was executed as if the instructions were atomic.
*     The pair of instructions includes a special load called a load linked or load locked and a special store called a store conditional.
*     These instructions are used in sequence: If the contents of the memory location specified by the load linked are changed before the store conditional to the same address occurs, then the store conditional fails.
*     The store conditional is defined to return 1 if it was successful and a 0 otherwise.
*     An advantage of the load linked/store conditional mechanism is that it can be used to build other synchronization primitives. For example, here is an atomic fetch-and-increment:
LL R2,0(R1) ;load linked
 DADDUI R3,R2,#1 ;increment
SC R3,0(R1) ;store conditional
 BEQZ R3,try ;branch store fails
*     These instructions are typically implemented by keeping track of the address specified in the LL instruction in a register, often called the link register.
*     If an interrupt occurs, or if the cache block matching the address in the link register is invalidated (for example, by another SC), the link register is cleared.
*     In addition, the number of instructions between the load linked and the store conditional should be small to minimize the probability that either an unrelated event or a competing processor causes the store conditional to fail frequently

No comments:

Post a Comment