Translate

Friday, January 4, 2013

Software and hardware multithreading


The Software Multithreading paradigm has become more popular as efforts to further exploit instruction level parallelism have stalled since the late-1990s. This allowed the concept of Throughput Computing to re-emerge to prominence from the more specialized field of transaction processing:
  • Even though it is very difficult to further speed up a single thread or single program, most computer systems are actually multi-tasking among multiple threads or programs.
  • Techniques that would allow speed up of the overall system throughput of all tasks would be a meaningful performance gain.
The two major techniques for throughput computing are multiprocessing and multithreading.

Advantages

Some advantages include:
  • If a thread gets a lot of cache misses, the other thread(s) can continue, taking advantage of the unused computing resources, which thus can lead to faster overall execution, as these resources would have been idle if only a single thread was executed.
  • If a thread can not use all the computing resources of the CPU (because instructions depend on each other's result), running another thread permits to not leave these idle.
  • If several threads work on the same set of data, they can actually share their cache, leading to better cache usage or synchronization on its values.
Hardware multithreading is a well-known technique to increase the utilization of processor resources. The idea is to start executing a different thread when the current thread is stalled. All hardware multithreading schemes assume that the workload consists of several independent tasks.
Basically, three different hardware multithreading techniques can be distinguished: cycle-by-cycle interleaving, block interleaving, and simultaneous multithreading.
In cycle-by-cycle interleaving, the processor switches to a different thread each cycle. In  principle, the next instruction of a thread is fed into the pipeline after the retirement of the previous instruction. This eliminates the need for forwarding datapaths, but implies that there must be as many threads as pipeline stages.
In block interleaving also referred to as coarse-grain multithreading, the processor starts executing another thread if the current thread experiences an event that is predicted to have a significantly long latency. If it can be predicted that the latency is larger than the cost of
a thread switch, then the processor can at least hide part of the latency by executing another thread.
Both cycle-by-cycle interleaving as well as block interleaving attempt to eliminate vertical waste. Vertical waste means that no instructions are issued during a cycle because the current thread is stalled. Simultaneous multithreading (SMT) also tries to eliminate horizontal waste (unused instruction slots in a cycle) because it fetches and issues instructions from different threads simultaneously.

1 comment: