Maryland Exascale Systems Research: Overview

University of Maryland
Exascale Systems Research: Overview of Our Work (and FAQ)

Bruce Jacob - - http://www.ece.umd.edu/~blj/

Site Index

Brought to you by Northrop Grumman, Intel, Micron, Sandia National Labs, US Dept. of Energy, and No Such Agency

Overview: Spring 2015

Moving to exascale, i.e., building a 1 exa-FLOPS computing system (equivalent to 1,000,000 TFLOPS or 1,000 PFLOPS), is limited by how efficiently one can perform a staggeringly large number of operations. It is not really a question of "can we build a machine that executes this many operations per second?" but rather "can we build one and afford to power it?"

The power levels for existing supercomputers are barely tolerated, as high as they are, and so a 1 EFLOPS machine tomorrow must not dissipate significantly more power than today's low-double-digit PFLOPS machines, which dissipate on the order of 10 MW. This leads to necessary conditions for exascale that are challenging -- such as approaching 1 TFLOPS per Watt at the CPU or core level, and the ability to build a 1-10 PFLOPS rack that dissipates 10-100 kW. For perspective, typical CPUs today execute at roughly 0.01 TFLOPS per Watt, and typical cabinets (racks) today dissipate on the order of 100 kW to produce roughly 0.1 PFLOPS of execution.

The desire for more performance, more capacity, more bandwidth is ever-present, and yet it is clear that scaling current system-design techniques -- i.e., simply doing more of the same -- will not provide the benefits at affordable costs, especially since the desired goals (more performance, more capacity, more bandwidth) will have to be delivered at no cost. For perspectivve, the system architectures used in many of today's datacenters, enterprise computing systems, and supercomputers, suffer from significant limitations:

They either have limited per-socket capacity, or they provide high capacity at extremely high price points (factors of 10-100x the cost/bit of consumer main-memory systems). Note that 1TB of DRAM dissipates roughly 100W, just in refresh.
Large systems dissipate significant power, often in the megawatt range for petascale computing capabilities, with the most efficient high-performance machines running at 1--5 PFLOPS per MW (see Top500, Green500).
The per-node power is high: for example, the POWER8 chip alone dissipates 350W, and the per-node memory systems often dissipate power on par with that of the processing components.
The file systems represent a significant bottleneck, especially in those systems that use checkpointing to extend their application runtimes.
The programming models typically do not allow easy sharing of data across the machine, for instance by allowing shared pointers system-wide.
Systems are not easily partitioned, such that different threads (e.g., different VMs) can be assigned different amounts of memory, beyond what is on a single node.

We have been developing system and node architectures that address all of these issues and more.

Frequently Asked Questions

Here is a brief list of the types of questions we get asked the most often.

Questions asked by people in industry:

When I try to reproduce other people's results, I can't. When I try to reproduce yours, I can. Why is this?
We validate our simulators against real hardware. This takes a lot of time and a lot of effort, but it means that what we're doing here is real.

I like your work, but why don't you publish more?
Several answers to that (see the answer above to get an idea):
1. A real study takes a long time to plan, execute, and write up. It typically takes us about a year to do a study and write it up. Six to nine months is about as fast as it gets, and that is only possible if there is no simulator-development work that needs to be done. I find it hard to believe anyone can do three, four, five studies in a year without cutting some serious corners.
2. People in academia don't believe our results ... our papers have been rejected from conferences with comments like, "There is absolutely no way this can be true." We find this amusing. However, the consequence is that you don't get to see what we've done, although about eight different unpublished papers on DRAM-systems simulation, design, and performance characterization wound up in our book.
3. Much of our research (perhaps half of it) goes into the hands of the people funding us and stays proprietary. We answer questions that people want answered ... so talk to us if interested.

Questions asked by students and people in academia:

Can I have your code?
Uh, yeah, it's on the website.
When are you going to release the code for XYZ [insert the name of simulator]?
Once it is ready to be released and we've gotten a paper or two out of it. If you want our code, accept our papers. :)

Interested? Talk to us.

Contact Information

Prof. Bruce Jacob - email address

- http://www.ece.umd.edu/~blj/

Traditional correspondence can be sent to

Prof. Bruce Jacob
Dept. of Electrical & Computer Engineering
University of Maryland
College Park, MD 20742

Last updated: recently by Bruce Jacob ( email address

) using the vi text editor ... best viewed in Safari