Ph.D. Dissertation Defense: Paul Rosenfeld
Friday, April 18, 2014
11:30 a.m. Room 2460, AVW Bldg.
For More Information:
301 405 3681 firstname.lastname@example.org
ANNOUNCEMENT: Ph.D. Dissertation Defense
Name: Paul Rosenfeld
Committee: Advisory Committee:
Professor Bruce Jacob, Chair/Advisor Professor Manoj Franklin Professor Donald Yeung Professor Gang Qu Professor Jeffrey Hollingsworth, Dean's Representative
Date/Time:Friday, April 18 2014, 11:30 AM
Location: AV Williams building, 2460 (AVW2460)
Title: DESIGN SPACE AND PERFORMANCE EXPLORATION OF THE HYBRID MEMORY CUBE
Abstract: The Hybrid Memory Cube (HMC) is an emerging main memory technology that leverages advances in 3D fabrication techniques to create a memory device with several DRAM dies stacked on top of a CMOS logic layer. The logic layer at the base of each stack contains several DRAM memory controllers that communicate with the host processor over high speed serial links using an abstracted packet interface. Each memory controller is connected to several memory banks in the DRAM stack with Through-Silicon Vias (TSVs), which are metal connections that extend vertically through each chip in the die stack. Since the TSVs form a dense interconnect with short path lengths, the data bus between the controller and memory banks can be operated at higher throughput and lower energy per bit compared to traditional Double Data Rate (DDRx) memories, which uses many long and parallel wires on the motherboard to communicate with the memory controller located on the CPU die. The TSV connections combined with the presence of multiple memory controllers near the memory arrays form a device that exposes significant memory-level parallelism and is capable of delivering an order of magnitude more bandwidth than current DDRx solutions.
While the architecture of this device is still nascent, we present several parameter sweeps to highlight the performance characteristics and trade-offs in the HMC architecture. In the first part of this dissertation, we attempt to understand and optimize the architecture of a single HMC device that is not connected to any other HMCs. We begin by quantifying the impact of a packetized high-speed serial interface on the performance of the memory system and how it differs from current generation DDRx memories. Next, we perform a sensitivity analysis to gain insight into how various queue sizes, interconnect parameters, and DRAM timings affect the overall performance of the memory system. Then, we analyze several different cube configurations that are resource-constrained to illustrate the trade-offs in choosing the number of memory controllers, DRAM dies, and memory banks in the system. Finally, we use a full system simulation environment running multi-threaded workloads on top of an unmodified Linux kernel to compare the performance of HMC against DDRx and "ideal" memory systems.
In addition to being used as a single HMC device attached to a CPU socket, the HMC allows two or more devices to be "chained" together to form a diverse set of topologies with unique performance characteristics. Since each HMC regenerates the high speed signal on its links, in theory any number of cubes can be connected together to extend the capacity of the memory system. There are, however, practical limits on the number of cubes and types of topologies that can be implemented.
In the second part of this work, we describe the challenges and performance impacts of chaining multiple HMC cubes together. We implement several cube topologies of two, four, and eight cubes and apply a number of different routing heuristics of varying complexity. We discuss the effects of the topology on the overall performance of the memory system and the practical limits of chaining. Finally, we perform full-system simulation of various workloads to try to quantify the impact of chaining on full system workloads.