Overview: Spring 2015
Moving to exascale, i.e., building a 1 exa-FLOPS computing system
(equivalent to 1,000,000 TFLOPS or 1,000 PFLOPS),
is limited by how efficiently one can perform a staggeringly large number of operations.
It is not really a question of "can we build a machine that executes this many operations per second?" but rather
"can we build one and afford to power it?"
The power levels for existing supercomputers are barely tolerated, as high as they are,
and so a 1 EFLOPS machine tomorrow must not dissipate significantly more
power than today's low-double-digit PFLOPS machines, which dissipate on the order of 10 MW.
This leads to necessary conditions for exascale that are challenging -- such as approaching
1 TFLOPS per Watt at the CPU or core level, and the ability to build a 1-10 PFLOPS rack that dissipates 10-100 kW.
For perspective, typical CPUs today execute at roughly 0.01 TFLOPS per Watt, and typical cabinets (racks) today
dissipate on the order of 100 kW to produce roughly 0.1 PFLOPS of execution.
The desire for more performance, more capacity, more bandwidth is ever-present, and
yet it is clear that scaling current system-design techniques -- i.e., simply doing more of the same -- will not
provide the benefits at affordable costs, especially since the desired goals
(more performance, more capacity, more bandwidth) will have to be delivered at no cost.
For perspectivve, the system architectures used in many of today's datacenters, enterprise computing systems,
and supercomputers, suffer from significant limitations:
- They either have limited per-socket capacity, or they provide
high capacity at extremely high price points (factors of 10-100x the cost/bit of consumer
main-memory systems).
Note that 1TB of DRAM dissipates roughly 100W, just in refresh.
- Large systems dissipate significant power, often in the megawatt range for petascale computing
capabilities, with the most efficient high-performance machines
running at 1--5 PFLOPS per MW (see Top500, Green500).
- The per-node power is high: for example, the POWER8 chip alone dissipates 350W, and
the per-node memory systems often dissipate power on par with that of the processing
components.
- The file systems represent a significant bottleneck, especially in those systems that use
checkpointing to extend their application runtimes.
- The programming models typically do not allow easy sharing of data across the machine, for
instance by allowing shared pointers system-wide.
- Systems are not easily partitioned, such that different threads (e.g., different VMs) can be
assigned different amounts of memory, beyond what is on a single node.
We have been developing system and node architectures that address all of these issues and more.
Frequently Asked Questions
Here is a brief list of the types of questions we get asked the most often.
Questions asked by people in industry:
- When I try to reproduce other people's results, I can't. When I try to reproduce yours, I can. Why is this?
We validate our simulators against real hardware. This takes a lot of time and a lot of effort, but it
means that what we're doing here is real.
- I like your work, but why don't you publish more?
Several answers to that (see the answer above to get an idea):
1. A real study takes a long time to plan, execute, and write up. It typically takes us about a year to
do a study and write it up. Six to nine months is about as fast as it gets, and that is only possible if there is no
simulator-development work that needs to be done.
I find it hard to believe anyone can do three, four, five studies in a year without
cutting some serious corners.
2. People in academia don't believe our results ... our papers have been rejected from conferences
with comments like, "There is absolutely no way this can be true." We find this amusing. However, the
consequence is that you don't get to see what we've done, although about eight
different unpublished papers on DRAM-systems simulation, design, and performance
characterization wound up in
our book.
3. Much of our research (perhaps half of it) goes into the hands of the people funding us and stays proprietary.
We answer questions that people want answered ... so talk to us if interested.
Questions asked by students and people in academia:
- Can I have your code?
Uh, yeah, it's on the website.
- When are you going to release the code for XYZ [insert the name of simulator]?
Once it is ready to be released and we've gotten a paper or two out of it.
If you want our code, accept our papers. :)