![]() |
Course Information:
| Lecture: | Mon Wed 12:30 - 1:45, EGR-1104 |
| Mailing List: | enee759m-0101-spr02@coursemail.umd.edu |
| Required Text: | Hill, Jouppi, and Sohi (Eds), Readings in Computer Architecture, Morgan Kaufmann, 2000 |
| Recommended Texts: | Weiss & Smith, POWER and PowerPC, Morgan Kaufmann, 1994
Shriver & Smith, The Anatomy of a High-Performance Microprocessor, IEEE Computer Society Press, 1998 Johnson, Superscalar Microprocessor Design, Prentice Hall, 1991 |
Instructor Information:
| Professor: | Bruce L. Jacob, Assistant Professor, Electrical & Computer Engineering |
| Office: | 1325 A.V. Williams Building |
| Phone: | (301) 405-0432 |
| Email: |
|
| Office Hours: | Open-door policy, for now ... |
Course Handouts and General Information:
Reading Assignments:
| Week-1: | Overviews | Diefendorff 1999 (on-line) Smith & Sohi 1995 (on-line) |
| For fun | Wharton 1994 & Paterson 1994 (on-line) | |
| Week-2: | Overviews, cont'd | Rao & Fisher 1992 (p. 288) |
| Week-3: | Issues Then & Now | Anderson, Sparacio & Tomasulo 1967 (p. 185) Smith 1989 (on-line) Conte et al. 1995 (on-line) Henry et al. 2000 (on-line) |
| Week-4: | Dynamic Scheduling | Tomasulo 1967 (on-line) Smith & Pleszkun 1988 (p. 202) Sohi & Vajapeyam 1987 (p. 244) |
| For example | Yeager 1996 (p. 275) ... this is an extremely in-depth description of the MIPS R10000, including its implementation of a reorder buffer structure, support for precise interrupts, register renaming, etc. | |
| Week-5: | Advanced Caches | Wang, Baer & Levy 1989 (p. 434) Wheeler & Bershad 1992 (on-line) Tyson et al. 1995 (on-line) Rotenberg et al. 1996 (on-line) |
| Historical | Kroft 1981 (p. 380) Jouppi 1990 (p. 395) | |
| Week-6: | DRAM Architectures | Dipert 2000 (on-line) Cuppu et al. 1999 (on-line) Cuppu & Jacob 2001 (on-line) |
| For example | Jacob 1997 (on-line) ... this is one way to deal with virtual caches, as we discussed in class this week ... using segmentation, a high-level OS function that couples with low-level hardware support, one can have the best of both worlds: easy coherence and reasonable flexibility. | |
| Week-7: | Review & Midterm | |
| Week-8: | Low Power | Gonzalez & Horowitz 1996 (on-line) Fromm et al. 1997 (on-line) Weiser et al. 1994 (on-line) Grunwald et al. 2000 (on-line) Pering et al. 1998a (on-line) |
| For example | Scott et al. 1998 (on-line) ... this is a write-up about the design of the Motorola M-CORE, a
(failed) low-power high-performance embedded microcontroller ... it failed because it went head-to-head with
the StrongARM, a relatively entrenched design. Pering et al. 1998b (on-line) ... this is another paper from the Berkeley InfoPad group, like the paper listed above ... good stuff. Horowitz et al. 1994 (on-line) ... though everyone cites the journal version of this paper (above), the conference version has a lot of info that the journal version does not. The Transmeta white papers (on-line) ... interesting stuff; talks about their approach, which is DAISY-like (see the DAISY paper by Ebcioglu & Altman). | |
| Week-9: | Spring Break | |
| Week-10: | Branch Prediction, Data Prediction | Yeh & Patt 1991 (p. 228) Lee, Chen & Mudge 1997 (on-line) Sazeides & Smith 1997 (on-line) Lipasti, Wilkerson & Shen 1996 (on-line) |
| Historical, Futuristic | Smith 1981 (p. 214) ... the early work on branch prediction. Lipasti & Shen 1997 (on-line) ... an interesting, very readable, treatment of the level of speculation possible in future microprocessors. | |
| Week-11: | Fault Tolerance | austin00diva.pdf slipstream_asplos.final.pdf |
| Week-12: | Graphics Processing | openGL-pipe.pdf rixner1998-micro31.pdf khailany2001-micro21-2.pdf "Sony's emotionally charged chip" "Designing and programming the Emotion Engine" "Vector unit architecture for emotion synthesis" |
| Week-13: | DSP/Embedded/Wearable | "DSP Processors hit the mainstream" "Transparent data-memory organizations for digital signal processors" "Trends in embedded-microprocessor design" "Wearable computing: A first step toward personal imaging" "Wearable computing: One man's mission" "The challenges of wearable computing: Part I" "The challenges of wearable computing: Part II" |
| Week-14: | Project Presentations | |
| Week-15: | Future Visions, Future Products (I'll cover as many as I can ...) | MPR 1996 (on-line) Herring 2000 (on-line) Tredennick & Shimamoto 2000 (on-line) Edahiro et al. 2000 (on-line) Quach 2000 (on-line) Araki 2000 (on-line) Suga & Matsunami 2000 (on-line) Tremblay et al. 2000 (on-line) Biswas et al. 2000 (on-line) Frantz 2000 (on-line) |
Documents describing the RiSC-16:
| File Name | Document Name | Document Description |
| RiSC-isa.pdf | The RiSC-16 Instruction-Set Architecture | Describes the instruction-set architecture: machine-code forms, assembly-code forms, etc. |
| RiSC-sys.pdf | RiSC-16 System Architecture | Describes the system-level component of the instruction set, including system calls, exceptions, how interrupts should be handled, etc. |
| RiSC-seq.pdf | RiSC-16: Sequential Implementation | Describes a sequential implementation of the architecture: control flow, data flow, etc. |
| RiSC-pipe.pdf | The Pipelined RiSC-16 | Describes a pipelined implementation of the architecture: control flow, data flow, pipeline stages, pipeline hazards, data forwarding, etc. |
| RiSC-oo.1.pdf | An Out-of-Order RiSC-16: Tomasulo + Reorder Buffer = Interruptible Out-of-Order | Describes an out-of-order implementation: instruction queue (ROB/RUU), fetch buffers, forwarding logic, wakeup/scheduling logic, recovery from branch misspeculations, memory request queue, commit logic, etc. Version 1 does not implement precise interrupts (sorry; I ran out of time). Version 2 will have TLBs with software-managed TLB-refill (a la MIPS). |
| RiSC-ex.pdf | RiSC-oo.1.v Execution Example | The document An Out-of-Order RiSC-16 gives the first dozen machine cycles in the execution of a small assembly-code file. The full execution is given in this document. |
Source Code:
| File Name | Description of Contents |
| RiSC.c | A C-language implementation of the RiSC-16, useful for verification of Verilog models. |
| RiSC-oo.1.v | A Verilog implementation of an out-of-order core, with an 8-entry ROB, 3-way issue (2 ALU instructions, 1 memory instruction), 2-way fetch, 2-way enqueue, 2-way commit, etc. Detailed in the previous doc An Out-of-Order RiSC-16. This is Version 1 of the core. Version 1 does not implement precise interrupts (sorry; I ran out of time). Version 2 will have TLBs with software-managed TLB-refill (a la MIPS). |
| a.c | C code for a rudimentary RiSC-16 assembler. |
| laplace.s | RiSC-16 assembly code for a decent-sized benchmark, written by Vince Weaver and Asher Lazarus -- former enee350 and enee759m students. |
Additional Papers:
| anderson1991: | "The interaction of architecture and operating system design." T. E. Anderson, H. M. Levy, B. N. Bershad, and E. D. Lazowska. In Proc. Fourth Int'l Conf. on Architectural Support for Programming Languages and Operating Systems (ASPLOS-4), pp. 108-120, 1991. (click here for a postscript copy ... the PDF copy is viewable but prints out poorly) |
| araki2000: | "The Memory Stick." Shigeo Araki. IEEE Micro, vol. 20, no. 4, pp. 40-46. Jul/Aug 2000. |
| biswas2000: | "SH-5: The 64-bit SuperH architecture." Prasenjit Biswas, Atsushi Hasegawa, Srinivas Mandaville, Mark Debbage, Andy Sturges, Fumio Arakawa, Yasuhiko Saito, and Kunio Uchiyama. IEEE Micro, vol. 20, no. 4, pp. 28-39. Jul/Aug 2000. |
| conte1995: | "Optimization of instruction fetch mechanisms for high issue rates." T. M. Conte, K. N. Menezes, P. M. Mills, and B. A. Patel. In Proc. 22nd Annual International Symposium on Computer Architecture (ISCA'95), pp. 333-344, Santa Margherita Ligure, Italy, June 1995. (click here for a postscript copy ... the PDF copy is viewable but prints out poorly) |
| cuppu1999: | "A performance comparison of contemporary DRAM architectures." Vinodh Cuppu, Bruce Jacob, Brian Davis, and Trevor Mudge. In Proc. 26th International Symposium on Computer Architecture (ISCA'99), pp. 222-233. Atlanta GA, May 1999. |
| cuppu+jacob2001: | "Concurrency, latency, or system overhead: Which has the largest impact on uniprocessor DRAM-system performance?" Vinodh Cuppu and Bruce Jacob. In Proc. 28th International Symposium on Computer Architecture (ISCA'01), pp. 62-71. Goteborg Sweden, June 2001. |
| diefendorff1999: | "PC processor microarchitecture." Keith Diefendorff. Microprocessor Report, vol. 13, no. 9, pp. 16-22, July 12 2000. |
| dipert2000: | "The slammin, jammin, DRAM scramble." Brian Dipert. EDN, pp. 68-82, January 2000. |
| ebcioglu+altman1996: | "DAISY: Dynamic compilation for 100% architectural compatibility." Kemal Ebcioglu dn Erik Altman. IBM Research Report, September 1996. (click here for a postscript copy ... the PDF copy is viewable but prints out poorly) |
| edahiro2000: | "A single-chip multiprocessor for smart terminals." Masato Edahiro, Satoshi Matsushita, Masakazu Yamashina, and maoki mishi. IEEE Micro, vol. 20, no. 4, pp. 12-20. Jul/Aug 2000. |
| frantz2000: | "Digital signal processor trends." Gene Frantz. IEEE Micro, vol. 20, no. 6, pp. 52-59. Nov/Dec 2000. |
| fromm1997: | "The energy efficieny of IRAM structures." R. Fromm, S. Perissakis, N. Cardwell, C. Kozyrakis, B. McGaughy, D. Patterson, T. Anderson, and K. Yelick. In Proc. 24th Annual International Symposium on Computer Architecture (ISCA'97), pp. 327-337, June 1997. |
| gonzalez+horowitz1996: | "Energy dissipation in general purpose microprocessors." Ricardo Gonzalez and Mark Horowitz. IEEE Journal of Solid-State Circuits, 31(9), pp. 1277-1284, September 1996. |
| grunwald2000: | "Policies for Dynamic Clock Scheduling." Dirk Grunwald, Philip Levis, Charles B. Morrey III, Michael Neufeld, and Keith I. Farkas. In proc. Fourth Symposium on Operating System Design and Implementation (OSDI 2000), pp. 73-86, San Diego, CA, October 2000. |
| henry2000: | "Circuits for wide-window superscalar processors." Dana S. Henry, Bradley C. Kuszmaul, Gabriel H. Loh, Rahul Sami, and Vinod Viswanath. In Proc. 27th International Symposium on Computer Architecture (ISCA '00), Vancouver, BC, June 12-14, 2000, pp. 236-247. |
| herring2000: | "Microprocessors, microcontrollers, and systems in the new millenium." Chris Herring. IEEE Micro, vol. 20, no. 6, pp. 45-51. Nov/Dec 2000. |
| horowitz1994: | "Low-power digital design." M. Horowitz, T. Indermaur, and R. Gonzalez. In Proc. Symposium on Low Power Electronics, pp. 8-11, October 1994. |
| jacob1996: | "An analytical model for designing memory hierarchies." Bruce Jacob, Peter Chen, Seth Silverman, and Trevor Mudge. IEEE Transactions on Computers, vol. 45, no. 10, pp. 1180-1194. October 1996. (click here for a postscript copy ... the PDF copy is viewable but prints out poorly) |
| jacob1997: | "Segmented addressing solves the virtual cache synonym problem." Bruce Jacob. University of Maryland Technical Report UMD-SCA-97-01. December 1997. |
| jacob1998a: | "Virtual memory: Issues of implementation." Bruce Jacob and Trevor Mudge. IEEE Computer, vol. 31, no. 6, pp. 33-43. June 1998. |
| jacob1998b: | "Virtual memory in contemporary microprocessors." Bruce Jacob and Trevor Mudge. IEEE Micro, vol. 18, no. 4, pp. 60-75. July/August 1998. |
| jacob1998c: | "A look at several memory management units, TLB-refill mechanisms, and page table organizations." Bruce Jacob and Trevor Mudge. In Proc. Eighth International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS'98), pp. 295-306. San Jose CA, October 1998. |
| johnson2000: | "Metastable persons." Howard Johnson. EDN, p. 30, March 16 2000. |
| lee+chen+mudge1997: | "The bi-mode branch predictor." Chih-Chieh Lee, I-Chen Chen, and Trevor Mudge. In Proc. 30th Annual International Symposium on Microarchitecture (MICRO-30), pp. 4-13. Research Triangle Park NC, December 1997. (click here for a postscript copy ... the PDF copy is viewable but prints out poorly) |
| liedtke1993: | "Improving IPC by kernel design." Jochen Liedtke. In Proc. Fourteenth ACM Symposium on Operating Systems Principles (SOSP-14). pages 175-187, 1993. (click here for a postscript copy ... the PDF copy is viewable but prints out poorly) |
| liedtke1995: | "On micro-kernel construction." Jochen Liedtke. In Proc. Fifteenth ACM Symposium on Operating Systems Principles (SOSP-15). 1995. (click here for a postscript copy ... the PDF copy is viewable but prints out poorly) |
| lipasti1996: | "Value locality and load value prediction." M. H. Lipasti, C. B. Wilkerson, and J. P. Shen. In Proc. Seventh Int'l Conf. on Architectural Support for Programming Languages and Operating Systems (ASPLOS-7). Cambridge MA, October 1996. (click here for a postscript copy ... the PDF copy is viewable but prints out poorly) |
| lipasti1997: | "Superspeculative microarchitecture for beyond AD 2000." M. H. Lipasti and J. P. Shen. IEEE Computer, vol. 30, no. 9, pp. 59-66, September 1997 |
| mpr1996: | "Architects look to processors of future." Gordon Bell, Richard Sites, William Dally, David Ditzel, and Yale Patt. Microprocessor Report, vol. 10, no. 10, pp. 18-24, August 1996. |
| ousterhout1989: | "Why aren't operating systems getting faster as fast as hardware?." John Ousterhout. Technical Report WRL-TN-11, DEC Western Research Laboratory. October 1989. (click here for a postscript copy ... the PDF copy is viewable but prints out poorly) |
| pering1998a: | "Dynamic voltage scaling and the design of a low-power microprocessor system." T. Pering, T. Burd, and R. Broderson. In Proc. Power-Driven Microarchitecture Workshop, associated with ISCA98. Barcelona, Spain, June 1998. |
| pering1998b: | "The simulation and evaluation of dynamic voltage scaling algorithms." T. Pering, T. Burd, and R. Broderson. In Proc. International Symposium on Low Power Electronics and Design (ISLPED'98). August 1998. |
| quach2000: | "High availability and reliability in the Itanium processor." Nhon Quach. IEEE Micro, vol. 20, no. 5, pp. 61-69. Sep/Oct 2000. |
| rotenberg1996: | "Trace cache: A low-latency approach to high-bandwidth instruction fetching." E. Rotenberg, S. Bennett, and J. Smith. In Proc. 29th Annual International Symposium on Microarchitecture (MICRO-29), pp. 24-34. Paris, France, December 1996. |
| sazeides+smith1997: | "The predictability of data values." Yiannakis Sazeides and Jim Smith. In Proc. 30th Annual International Symposium on Microarchitecture (MICRO-30). Research Triangle Park NC, December 1997. (click here for a postscript copy ... the PDF copy is viewable but prints out poorly) |
| scott1998: | "Designing the low-power M-CORE architecture." Jeff Scott, Lea Hwang Lee, John Arends, and Bill Moyer. In Proc. IEEE Power Driven Microarchitecture Workshop, pp. 145-150. Barcelona Spain, June 1998. |
| smith+sohi1995: | "The microarchitecture of superscalar processors." Jim Smith and Guri Sohi. Proceedings of the IEEE, December 1995. |
| smith+pleszkun1988: | "Implementing precise interrupts in pipelined processors." James. E. Smith and Andrew. R. Pleszkun. IEEE Transactions on Computers, vol. 37, no. 5, pp. 562-573, May 1988. |
| smith1989: | "Dynamic instruction scheduling and the Astronautics ZS-1." James E. Smith. IEEE Computer, vol. 22, no. 7, pp. 21-35, July 1989. |
| suga+matsunami2000: | "Introducing the FR500 embedded microprocessor." Atsuhiro Suga and Kunihiko Matsunami. IEEE Micro, vol. 20, no. 4, pp. 21-27. Jul/Aug 2000. |
| tomasulo1967: | "An efficient algorithm for exploiting multiple arithmetic units." Robert. M. Tomasulo. IBM Journal of Research and Development, vol. 11, no. 1, pp. 25-33, January 1967. |
| transmeta2000a: | "The technology behind Crusoe processors." Alexander Klaiber. Transmeta Corporation white paper, January 2000. |
| transmeta2000b: | "Mobile platform benchmarks." Daniel McKenna. Transmeta Corporation white paper, January 2000. |
| transmeta2000c: | "Crusoe processor benchmark report." Transmeta Corporation white paper, January 2000. |
| tredennick+shimamoto2000: | "Guest viewpoint: Embedded systems and the microprocessor." Nick Tredennick and Brion Shimamoto. Microprocessor Report, vol. 14, no. 4, pp. 26-33, April 2000. |
| tremblay2000: | "The MAJC architecture: A synthesis of parallelism and scalability." Marc Tremblay, Jeffrey Chan, Shailender Chaudry, Andrew Conigliaro, and Shing Sheung Tse. IEEE Micro, vol. 20, no. 6, pp. 12-25. Nov/Dec 2000. |
| tyson1995: | "A modified approach to data cache management." G. Tyson, M. Farrens, J. Matthews, and A. R. Pleszkun. In Proc. 28th Annual International Symposium on Microarchitecture (MICRO-28), pp. 93-103. Ann Arbor MI, December 1995. |
| weiser1994: | "Scheduling for reduced CPU energy." Mark Weiser, Brent Welch, Alan Demers, and Scott Shenker. In Proc. First Symposium on Operating Systems Design and Implementation (OSDI), pp. 13-23. Monterey CA, November 1994. |
| wharton1994: paterson1994: | "Gary Kildall, industry pioneer, dead at 52."
John Wharton. Microprocessor Report, vol. 8, no. 10, August 1994.
"The origins of DOS: DOS creator gives his view of relationship between CP/M, MS-DOS." Letter to the editor from Tim Paterson. Published in Microprocessor Report, vol. 8, no. 13, October 1994. |
| wheeler+bershad1992: | "Consistency management for virtually indexed caches." In Proc. Fifth Int'l Conf. on Architectural Support for Programming Languages and Operating Systems (ASPLOS-5), pp. 124-136. Boston MA, October 1992. (click here for a postscript copy ... the PDF copy is viewable but prints out poorly) |