Parallelism Challenge and Explicit Multi-Threading (XMT): A PRAM-On-Chip
Prof. Uzi Vishkin
|Dr. Uzi Vishkin
Computer chips comprise more and more "working power" (hardware features). This has been achieved through remarkable progress by chip fabrication technologies due to success in miniaturization. When computer engineers talk about the speed of processing instructions they use the following terminology. Computer chips typically operate using a clock. The time from one tick of a clock to the next defines a "time unit" (or "rate") of the clock. Suppose that a computer executes one instruction per time unit. If the clock can tick 10 times faster (and the computer is still capable of executing one instruction between two ticks of the clock), it should be clear that it will run 10 times faster.
Generally speaking, improvements in performance have been mostly due to shorter time units. Alas, after decades of phenomenal improvement, time units (i.e., clock rates) have not changed since 2003. It is hard to tell whether clock rates will improve again in the future, and if they do, by how much and whether engineers be will be able to continue improving instruction processing speed to match improved clock rates. Skeptics base their doubts on current technological constraints, such as slowing signal speed due to electrical effects of miniaturization and problems due to heat dissipation, and eventually physical limitations, such as the speed of light and limits on miniaturization due to quantum effect.
The inability of vendors to advance the fastest clock rates since 2003 greatly increased the camp of skeptics in the last few years. A long record of research attempts suggest that design engineers will not be able to improve performance beyond improvement in clock rate. These suggest the need for non-incremental changes in order to sustain growth in computer speed.
An ABC Story:
The following ABC story will help clarify the problem. Abraham (A) employs Benjamin (B) as a cleaning person for his house. B comes in alone once per week, works for several hours, and is paid by the hour by A. One day, A gets the following alternative offer from a cleaning company (C). C proposes to work at the same hourly rate as B, but is willing to provide A with 1000 cleaners for his house.
A is asked to give C a plan for how to clean the house (what to do first, second, and so on) using the 1000 cleaners. That is, each step of A's plan can use up to 1000 cleaners simultaneously. The only limitation is that A must give one plan to be used every week. The single plan should work efficiently, regardless of whether, in the previous week, A's teenaged child had 100 people party in the basement, or A hosted a formal dinner, or had an open house for his friends, or spilled coffee on the bedroom carpet, etc.
The benefit of C's offer for A could be pretty drastic as A will pay C the same hourly rate he has been paying to B, only C will use a workforce 1000 times larger than B's. Note that theoretically this could save $999 out of every $1000 that A currently pays B. So A said yes to the C's marketing person and signed an annual contract with them.
After signing the contract A was told that none of the 1000 cleaners speak his language, and that he must come up with a cleaning plan that will always work well without any further input or intervention from A. C will translate A's plan to the language the cleaners understand once and they will execute it every week.
This concludes the ABC story.
Here is why the story is relevant for Dr. Vishkin's research. The hardware features within a computer chip will play the role of the cleaners as explained later in the above paragraph. The numbers for the increase in working power of computer chips are even more dramatic than a 1000-fold improvement: from the 1980s to 2005, the number of hardware features that can fit on a single computer chip have increased by a factor of over 30,000. (The number of transistors on one chip increased to one billion!)
For those who prefer numbers to draw an analogy to the ABC story, say that 100,000 transistors are needed to form one computer processor. This would parallel one cleaner in the ABC story. 1000 cleaners will need 100,000,000 transistors, which is still lower that one billion.
The standard, existing way for writing computer programs is similar to preparing a cleaning plan for one cleaner. That is, in each step the cleaner performs one operation and then consults the plan to find out which operation to perform in the next step. In other words, if A were to develop a plan for a single cleaner this would parallel the standard way in which current computers are programmed. Such a plan for one cleaner would still need to spontaneously (i.e., without any intervention from A) respond to the cleaning that needs to be done each week as the plan guides them to progress with their cleaning.
In computer science and engineering, the idea behind such a plan is called "algorithm" and the way in which it is written for the computer to "understand" is called a "program." How to conceive such a "serial algorithm" and write such a "serial program," and how to build computers on which such serial programs can run, is generally well understood.
Dr. Vishkin has been interested in the "parallelism challenge.” This challenge is about seeking an effective way for building computers and programming them so that the plurality of processors will be harnessed to complete any computational task a fast as possible.
The new way for writing computer programs will be similar to preparing a cleaning plan for many (1000) cleaners. That is, in each step, every cleaner performs one operation and then consults the plan to find out which operation to perform in the next step. In other words, A must develop a plan for 1000 cleaners that would not only respond spontaneously (i.e., without any intervention from A) to the cleaning that need to be done each week; the plan would also spontaneously allocate the many cleaners to specific tasks in order to complete the overall house cleaning job as fast as possible.
How to conceive such a "parallel algorithm" and write such a "parallel program," and how to build "parallel computers" on which such parallel programs can run, has been the main area for Dr. Vishkin's research.
His most recent effort (www.umiacs.umd.edu/~vishkin/XMT) addresses the new opportunity to build a parallel computer on a single chip. Perhaps the most ambitious goal is to have such a computer replace the standard desktop processor (e.g., Intel's Pentium) during the second decade of this century.
ECE Professor Introduces New "Desktop Supercomputing"
Vishkin Receives NSF Grant for Asynchrony in Desktop Supercomputer Technology
Vishkin Receives Innovator of the Year Award
Maryland To Host Workshop on Many-Core Computing, May 29
return to spotlight on research