** Adaptable Intelligent Nanoscale Electronic Systems Lab**

**Dr Ankur Srivastava, Associate Professor**

Outlined below is a brief summary of the research projects that we are pursing and some nugget results. Please feel free to read our papers and/or contact us if you need more information.

- Dynamic Thermal Sensing for Runtime Thermal Profile Estimation
- Autonomous Dynamic Thermal Management
- Resource Constrained Distributed Data Collection and Estimation
- Statistical Analysis and Optimization of VLSI Circuits in Presence of Fabrication Variability
- Methods for Leakage and Dynamic Power Optimization
- Predictability in Design Optimization at Higher Levels of Design Abstraction
- Computational Efficiency in Particle Filters with Applications to Computer Vision

**Dynamic Thermal Sensing for Runtime Thermal Profile Estimation: **

Temperature has become a foremost challenge affecting the reliability and Mean Time Before Failure. Temperature on chip exhibits significant variation in time and space, caused due to different activity patterns of various modules on the chip and varying power footprint of different applications. This underlying space time unpredictability means a pure design time approach to curtailing thermal effects is not sufficient. Runtime thermal management cannot be performed effectively without accurate knowledge of the chips thermal state (temperature at each location of interest at any given time and the associated trends). Modern chips, especially CPUs are being equipped with on-chip thermal sensors that could be used to provide on-line runtime information. Although, thermal sensors are promising, there is no scientific methodology in place that intelligently incorporates the thermal sensing infrastructure in the design (where should the sensors go, how should the data be collected etc.). Sensors cannot be placed everywhere in the chip and tend to be noisy and error prone as well. Using noisy readings from a few sensors and generating the thermal profile of the entire chip presents an intriguing theoretical and practical challenge.

We have investigated a statistical methodology that develops a scientific basis for deciding the locations of the thermal sensors, the number of bits assigned to each sensor and the size of the central register that stores the readings of all the sensors. Since placing sensors on chip is a costly affair, our methods strive to minimize the redundancy in the sensor observations. Two sensors placed at locations with highly correlated temperature values is wasteful while placing them at locations with smaller thermal correlations gives more information about the thermal profile. Once the thermal sensing infrastructure is in place, we would like to use the noisy sensor readings to estimate the chip thermal profile accurately. We have developed statistical schemes where the correlations between the sensor and non sensor locations are exploited to estimate the entire thermal profile. We used concepts from Kalman Filters and other dynamic state estimation paradigms to perform our estimation while accounting for noisy thermal sensors.

** Autonomous Dynamic Thermal Management: **

Managing the temperature surges of computer systems has become an active topic of research where much of the existing effort is being focused on controlling parameters such as task scheduling, CPU voltage and speed etc. using a combination of reactive and predictive-proactive schemes. With growing complexity of computer systems radical approaches are needed for thermal management. Fundamentally, there is some debate over exactly what constitutes thermal management. Most existing schemes attempt to maintain the silicon temperature to be within a manufacturer specified constraint, although this may not be the optimum approach is many scenarios. We are developing thermal management schemes at different levels of abstraction of computer systems: from data centers to individual chips and from application software to hardware platform. The methods we are investigating strive for the perfect balance of schemes at different levels of abstraction in an autonomous fashion. For example, if a simple re-distribution of tasks in a datacenter can address the thermal issue then we don’t need to activate costly CPU level thermal management schemes. A key aspect of our work deals with autonomy and self healing of systems.

** Resource Constrained Distributed Data Collection and Estimation: **

Although resource management (such as energy) in sensor networks has been a very active topic of research for the past decade, the fundamental tradeoff between sensor computing capability and quality & timeliness of sensing is yet to be understood. Sensor networks are examples of distributed collaborative systems that autonomously collect data and make inferences about the system state. Several fundamental algorithms such as Kalman/Particle filters, Wyner Ziv coding etc are used extensively along with protocols for data communication. Imposing resource constraints such as finite precision, energy, fixed sensor node memory etc. result in completely different results of distributed inference problem instances. We are investigating such research agendas from the context of distributed video coding where large visual sensor networks are used to collect real time visual data for performing target tracking and achieving perpetual surveillance.

** Statistical Analysis and Optimization of VLSI Circuits in Presence of Fabrication Variability: **

In order to keep up with the society’s growing appetite for electronic systems such as faster computers and ultra low energy mobile devices, the semiconductor industry has been continuously reducing the dimensions of fabricated features on VLSI chips (which currently stand at tens of nanometers in size). This has enabled us to pack hundreds of millions of transistors on modern chips thereby unlocking new applications which were unimaginable just a few years ago.

VLSI fabrication in nano-scale dimensions is very hard to control and is therefore accompanied by significant randomness. This causes several fabricated chips to deviate from the power/performance-speed specifications leading to loss in reliability and yield. The state of the art design schemes ignore this randomness, a feature that is becoming increasingly important. We have developed several effective techniques for understanding the impact and reducing the effects of fabrication randomness through intelligent, automated VLSI design schemes that proactively account for this randomness statistically.

*Methods for Statistical Timing Analysis under Fabrication Variability: *Fabrication variability causes the chip’s timing and power to randomly deviate from the specifications. The state of the art either ignores this variability or makes highly simplifying assumptions about its nature and impact. We have made several significant contributions to the problem of modeling the impact of fabrication randomness on the chip’s timing characteristics. We have developed several techniques that accurately and efficiently predict the probability density function (PDF) or the statistical spread in chip’s timing in presence of fab-randomness. To this end we investigated two general approaches: effectively modeling the non-linear and non-Gaussian nature of fab-randomness, computational efficiency of predicting the timing PDF while accounting for fab-randomness. Fabrication randomness, which exhibits non-Gaussian behavior, also has a non-linear impact on chip parameters like timing. We were among the first to propose an effective polynomial based model for addressing this non-linear, non-Gaussian nature while predicting the timing PDF of multi million transistor designs. Our methods proved to be significantly better in accuracy over existing methods that ignored such characteristics. Predicting the timing PDF of multi-million transistor designs is a computational nightmare primarily due to the sheer size and complexity of such designs. We addressed this challenge by proposing a technique called error budgeting, which enables effective tradeoff between computational overhead and error in predicting the timing PDF. We also investigated the problem of improving the computational efficiency of Monte Carlo based timing PDF prediction. This work is particularly relevant in situations where the true nature of fabrication randomness cannot be captured in a low complexity model (like a polynomial) and we need to rely on Monte Carlo methods.

Overall, these efforts represent a significant step towards efficiently and accurately predicting the impact of fabrication randomness on the chip’s timing specifications.

*Statistical Optimization of VLSI Designs under Fabrication Variability: *Using the knowledge of statistical spread in design characteristics (such as timing, power), we have developed several schemes for controlling design parameters such as device/wire dimensions, for reducing the detrimental impact of fabrication randomness on yield and reliability. Our approach includes first developing a precise mathematical understanding of how controlling design parameters impacts yield and reliability followed by applying problem specific customizations to this theory for making it practically applicable. This two-pronged approach was applied to two distinct but related schemes for countering variability. The first attacks the problem by optimizing the circuit parameters during *the design phase* so as to have enough slack around the specifications for immunity to variability (low probability of violating specifications). The second approach investigates a different paradigm in which the impact of variability on design constraints was *corrected* after fabrication of each chip separately. Our contributions have received much recognition through * news articles in EE-Times and a Best Paper Award in ISPD 2007*(along with several publications in top journals and conferences).

The VLSI design process entails choosing from a huge set of potential design solutions (millions), a design instance that meets the specifications. Comparing these solutions becomes problematic when the criteria involved are random variables (due to fabrication variability). We have developed a Conditional Monte Carlo based general approach of efficiently comparing competing VLSI design solutions by introducing the notion of superiority probability of one potential design over the other with respect to a set of design criteria. Our methods can efficiently traverse the solution space and prune out design instances that have low probability of superiority. * This work was cited in an EE-Times article.* We customized this general theory and applied it to controlling several circuit parameters for generating design instances that had the best reliability. These include controlling the transistor threshold voltage and adding signal boosting buffers to on-chip wires. Our methods resulted in significant improvements in semiconductor yields and are among the first pieces of work to reformulate core VLSI circuit optimization techniques to proactively account for fabrication randomness. Our work exposes several strong mathematical properties of the binning yield loss function (a variant of yield loss) with respect to circuit parameters like transistor sizes. By exploiting these properties we could successfully formulate the problem of assigning sizes to logic gates for minimizing binning yield loss as a convex program. This is a very significant result because the convexity property could be exploited to solve this formulation optimally and efficiently. Using developments in convex optimization theory we could solve the logic gate sizing problem to generate highly optimized designs with very small yield loss.

These design time approaches end up over-constraining the circuit and also rely on accurate knowledge of variability statistics which are not always available. In and we investigated a different paradigm in which the impact of variability on design constraints was *corrected* after fabrication. This was performed using specific knobs installed in the design that allowed detection and correction of timing constraint violations due to fab-randomness. Our work exposed several significant mathematical properties that allowed controlling these knobs efficiently and effectively for reducing the yield loss and improving reliability.

** Methods for Leakage and Dynamic Power Optimization: **

Growing desire for light weight mobile electronics and ever increasing transistor counts integrated on-chip have made reducing power/energy dissipation an extremely significant part of the VLSI design process. We have developed several techniques for reducing power dissipation thereby controlling on-chip thermal hotspots (one of the primary causes of reliability loss) and also improving the battery life. We followed a two pronged approach of first developing a precise mathematical understanding of how controlling design parameters impacts power dissipation followed by applying problem specific customizations to this theory.

*Techniques for Leakage Power Reduction: *As the fabrication dimensions scale to mid/lower nanometers, leakage/static power becomes a significant contributor of overall chip power. We have investigated several automated design techniques including forward body biasing FBB, MTCMOS sleep transistors and Dual threshold technology that can provide a 10x reduction in overall leakage power. Reduction of leakage by controlling these circuit parameters causes the timing to increase. By developing a rigorous mathematical understanding of the leakage vs timing tradeoff obtained by controlling these parameters we developed effective schemes for reducing leakage under circuit timing constraints. For example our approache investigated ways of controlling the transistor body bias such that the leakage could be minimized effectively when a device (like a cell phone) was in standby (did not require high speed capability) and switched back to high speed mode (with high leakage) whenever high performance was needed. A similar approach could also be enabled by MTCMOS (multiple threshold CMOS) based sleep transistors. Using a control signal connected to these sleep transistors, one could effectively turn off the gates and functional modules of the device into sleep state where it dissipated extremely low leakage power. Whenever the device was needed, the same control knob could be used to wake it up. We proposed highly effective fine grained sleep transistor placement and sizing schemes with significantly better outreach than traditional approaches. We also investigated leakage minimization techniques based on assigning threshold voltages to logic gates from a choice of two available thresholds. In these papers, we investigated the theoretical properties of the threshold assignment problem and proved the continuous version to be a convex program. Overall our methods resulted in orders of magnitude reduction in leakage with minimal penalty on overall circuit timing compared with designs optimized solely for high speed operation.

*Controlling Dynamic Power in VLSI Systems: *Dynamic power dissipation occurs due to excessive switching of transistors that causes power/energy to be dissipated in the device capacitors. Typical high performance computing workloads are associated with high dynamic power overhead. We investigated the problem of distributing tasks to chip’s functional resources in such a way that overall switching is minimized. Since the problem was NP Complete we proposed effective graph theoretic heuristics. Allocating dual supply voltages to gates is a very effective way of reducing dynamic power. We also addressed the problem of clock power dissipation by judiciously synthesizing the clock tree. Experimental results highlight the effectiveness of these schemes in significantly reducing the dynamic power dissipation of modern multi-million transistor designs.

** Predictability in Design Optimization at Higher Levels of Design Abstraction: **

Automated VLSI design is a systematic process in which the system specifications are first converted into “Register Transfer Level” or architectural description of the design. This is followed by logic synthesis which generates the gate level description followed by placement and routing that creates the layout. The accurate speed/power characteristics cannot be determined until the final design layout is known. But the maximum impact on design quality can only be obtained by taking implementation decisions early in the design process. How do we make correct design decisions early on in the circuit design process when the true impact of decisions on design quality cannot be accurately gauged? To address this issue we have worked actively to define and refine the notion of predictability. Predictability allows one to take early design decisions that significantly improve the design quality. There are two aspects pertaining to predictability: modeling and optimization. Modeling aspect characterizes the uncertainty associated with predicting design performance metrics early in the design process. We developed a statistical methodology for predicting the uncertainty associated with wire-length and wire-delay estimates during the architectural stage using various statistical methods. The optimization part takes into account the uncertainty associated with design performance metrics while making design decisions (early in the design process) leading to a better understanding of how they impact overall quality. We developed such optimization techniques where this knowledge of uncertainty associated with wire-delays, impact of supply voltage was used to make design decisions with better power/performance characteristics.

** Computational Efficiency in Particle Filters with Applications to Computer Vision: **

Technique of particle filtering has been widely applied for solving the inference problems for non-linear systems like target tracking, navigation etc. Computational efficiency of particle filters is a major hindrance to their practicality. We have investigated several techniques for improving their computational efficiency using algorithmic modifications, parallelization and pipelining on multi-processor machines. Our schemes could solve several particle filtering instances (like target tracking in video) almost in real time while the traditional implementations could not come close in terms of computational efficiency.

Contact

ankurs at umd dot edu

2317 AV Williams Bldg, Department of ECE, University of Maryland, College Park, Maryland.

Phone: 301 405 0434