Translate

Friday, January 4, 2013

Distributed Shared Memory and Directory-Based Coherence:


A snooping protocol requires communication with all caches on every cache miss, including writes of potentially shared data. The absence of any centralized data structure that tracks the state of the caches is both the fundamental advantage of a snooping-based scheme, since it allows it to be inexpensive, as well as its Achilles’ heel when it comes to scalability.

Directory protocol:
The alternative to a snoop-based coherence protocol is a directory protocol. A directory keeps the state of every block that may be cached.
·         To prevent the directory from becoming the bottleneck, the directory is distributed along with the memory so that different directory accesses can go to different memories.
·         A distributed directory retains the characteristic that the sharing status of a block is always in a single known location.
·         This property is what allows the coherence protocol to avoid broadcast.
  
Directory-Based Cache Coherence Protocols: The Basics

Two primary operations a directory protocol must implement:
Ø  handling a read miss and
Ø  handling a write to a shared,clean cache block.

To implement these operations, a directory must track the state of each
cache block. In a simple protocol, these states could be the following:
o   Shared—One or more processors have the block cached, and the value in
memory is up to date (as well as in all the caches).
o   Uncached—No processor has a copy of the cache block.
o   Modified—Exactly one processor has a copy of the cache block, and it has
written the block, so the memory copy is out of date. The processor is called
the owner of the block.

In addition to tracking the state of each potentially shared memory block, we
must track which processors have copies of that block, since those copies will
need to be invalidated on a write.
The simplest way to do this is to keep a bit vector for each memory block. When the block is shared, each bit of the vector indicates whether the corresponding processor has a copy of that block.

Local and home node:
·         The local node is the node where a request originates.
·         The home node is the node where the memory location and the
directory entry of an address reside.
The physical address space is statically distributed, so the node that contains the memory and directory for a given physical address is known.
The directory must be accessed when the home node is the local node, since copies may exist in yet a third node, called a remote node.
·         A remote node is the node that has a copy of a cache block, whether exclusive or shared.

State transition diagram for the directory has the same states and structure as the transition diagram for an individual cache.

When a block is in the uncached state, the copy in memory is the current value, so the only possible requests for that block are

         Read miss—The requesting processor is sent the requested data from memory, and the requestor is made the only sharing node. The state of the block is made shared.
         Write miss—The requesting processor is sent the value and becomes the sharing node. The block is made exclusive to indicate that the only valid copy is cached.

When the block is in the shared state, the memory value is up to date, so the same two requests can occur:
         Read miss—The requesting processor is sent the requested data from memory, and the requesting processor is added to the sharing set.
         Write miss—The requesting processor is sent the value. The state of the block is made exclusive.

When the block is in the exclusive state, the current value of the block is held in
the cache of the processor identified by the set Sharers (the owner), so there are
three possible directory requests:
         Read miss—The owner processor is sent a data fetch message, which causes the state of the block in the owner’s cache to transition to shared and causes the owner to send the data to the directory, where it is written to memory and sent back to the requesting processor.
         Data write back—The owner processor is replacing the block and therefore must write it back. This write back makes the memory copy up to date home the block is now uncached, and the Sharers set is empty.
         Write miss—The block has a new owner. A message is sent to the old owner, causing the cache to invalidate the block and send the value to the directory, from which it is sent to the requesting processor, which becomes the new owner.The  state of the block remains exclusive.

The directory protocols used in real multiprocessors contain additional optimizations.
Ø  In this protocol when a read or write miss occurs for a block that is exclusive, the block is first sent to the directory at the home node.
Ø  Then  it is stored into the home memory and also sent to the original
requesting node.

Many of the protocols in use in commercial multiprocessors forward the data from the owner node to the requesting node directly.  Such optimizations often add complexity by increasing the possibility of deadlock and by increasing the types of messages that must be handled.

No comments:

Post a Comment