shareengineer: Synchronisation issues

Synchronisation issues

Synchronization mechanisms are typically built with user-level software routines that rely on hardware supplied synchronization instructions.

For smaller multiprocessors or low-contention situations, the key hardware capability is an uninterruptible instruction or instruction sequence capable of atomically retrieving.

In larger-scale multiprocessors or high-contention situations, synchronization can become a performance bottleneck because contention introduces additional delays and because latency is potentially greater in such a multiprocessor.

Basic Hardware Primitives

The key ability we require to implement synchronization in a multiprocessor is asset of hardware primitives with the ability to atomically read and modify a memory location.

There are a number of alternative formulations of the basic hardware primitives, all of which provide the ability to atomically read and modify a location, together with some way to tell if the read and write were performed atomically

These hardware primitives are the basic building blocks that are used to build a wide variety of user-level synchronization operations, including things such as locks and barriers.

One typical operation for building synchronization operations is atomic exchange, which interchanges a value in a register for a value in memory.

To see how to use this to build a basic synchronization operation, assume that we want to build a simple lock where the value 0 is used to indicate that the lock is freehand 1 is used to indicate that the lock is unavailable.

A processor tries to set the lock by doing an exchange of 1, which is in a register, with the memory address corresponding to the lock.

For example, consider two processors that each try to do the exchange simultaneously: This race is broken since exactly one of the processors will perform the exchange first, returning 0, and the second processor will return 1 when it does the exchange.

The key to using the exchange (or swap) primitive to implement synchronization is that the operation is atomic: The exchange is indivisible, and two simultaneous exchanges will be ordered by the write serialization mechanisms.

It is impossible for two processors trying to set the synchronization variable in this manner to both think they have simultaneously set the variable.

There are a number of other atomic primitives that can be used to implement synchronization

One operation, present in many older multiprocessors, is test-and-set, which tests a value and sets it if the value passes the test.

For example, we could define an operation that tested for 0 and set the value to 1, which can be used in a fashion similar to how we used atomic exchange.

Another atomic synchronization primitive is fetch-and-increment: It returns the value of a memory location and atomically increments it.

This requirement complicates the implementation of coherence, since the hardware cannot allow any other operations between the read and the write, and yet must not deadlock.

An alternative is to have a pair of instructions where the second instruction returns a value from which it can be deduced whether the pair of instructions was executed as if the instructions were atomic.

The pair of instructions includes a special load called a load linked or load locked and a special store called a store conditional.

These instructions are used in sequence: If the contents of the memory location specified by the load linked are changed before the store conditional to the same address occurs, then the store conditional fails.

The store conditional is defined to return 1 if it was successful and a 0 otherwise.

An advantage of the load linked/store conditional mechanism is that it can be used to build other synchronization primitives. For example, here is an atomic fetch-and-increment:

LL R2,0(R1) ;load linked

DADDUI R3,R2,#1 ;increment

SC R3,0(R1) ;store conditional

BEQZ R3,try ;branch store fails

These instructions are typically implemented by keeping track of the address specified in the LL instruction in a register, often called the link register.

If an interrupt occurs, or if the cache block matching the address in the link register is invalidated (for example, by another SC), the link register is cleared.

In addition, the number of instructions between the load linked and the store conditional should be small to minimize the probability that either an unrelated event or a competing processor causes the store conditional to fail frequently

shareengineer

Pages

Translate

Friday, January 4, 2013

Synchronisation issues

No comments:

Post a Comment