Course Reading List

Applications

[1] Jaswinder Pal Singh, Wolf-Dietrich Weber and Anoop Gupta. SPLASH: Stanford Parallel Applications for Shared Memory. Computer Architecture News, 1992.
(pdf)

[2] Jaswinder Pal Singh, Chris Holt, Takashi Totsuka, Anoop Gupta and John L. Hennessy. Load Balancing and Data Locality in Adaptive Hierarchical N-body Methods: Barnes-Hut, Fast Multipole and Radiosity. Journal of Parallel and Distributed Computing, June 1995.
(citeseer)

[3] Shubu Mukherjee, Shamik Sharma, Mark Hill, Jim Larus, Anne Rogers, and Joel Saltz. Efficient Support for Irregular Applications on Distributed-Memory Machines. Proceedings of the 5th Annual Symposium on Principles and Practice of Parallel Programming, July 1995.
(postscript)

[4] Frederic T. Chong and Robert Schreiber. Parallel Sparse Triangular Solution with Partitioned Inverses and Prescheduled DAGs. In the Workshop on Solving Irregular Problems on Distributed Memory Machines, April 1995.
(citeseer)

Shared Memory

[5] Ronald Bianchini, Susan Dickey, Jan Edler, Gabriel Goodman, Allan Gottlieb, Richard Kenner, and Jiarui Wang. The Ultra III Prototype. Proc. Parallel Systems Fair, 7th Int'l Parallel Processing Symposium. April, 1993.
(tar'ed postscript)

[6] James Archibald and Jean-Loup Baer. Cache Coherence Protocols: Evalution Using a Multiprocessor Simulation Model. ACM Transactions on Computer Systems, May 1986.

[7] David Chaiken, Craig Fields, Kiyoshi Kurihara, and Anant Agarwal. Directory-Based Cache Coherence in Large-Scale Multiprocessors. IEEE Computer, June 1990.

[8] David Chaiken, John Kubiatowicz, and Anant Agarwal. LimitLESS Directories: A Scalable Cache Coherence Scheme. Fourth International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS IV), April 1991.
(compressed postscript)

[9] K. Gharachorloo, D. Lenoski, J. Laudon, P. Gibbons, A. Gupta, and J. Hennessy. Memory Consistency and Event Ordering in Scalable Shared-Memory Multiprocessors. Proceedings 17th Annual International Symposium on Computer Architecture, June 1990.
(citeseer)

[10] Anoop Gupta, John Hennessy, Kourosh Gharachorloo, Todd Mowry, and Wolf-Dietrich Weber. Comparative Evaluation of Latency Reducing and Tolerating Techniques. Proceedings of the 18th Annual International Symposium on Computer Architecture, May 1991.
(compressed postscript)

Message Passing

[11] Charles L. Seitz. The Cosmic Cube. Communications of the ACM, January 1985.

[12] Charles E. Leiserson, Zahi S. Abuhamdeh, David C. Douglas, Carl R. Feynman, Mahesh N. Ganmukhi, Jeffrey V. Hill, W. Daniel Hillis, Bradley C. Kuszmaul, Margaret A. St. Pierre, David S. Wells, Monica C. Wong, Shaw-Wen Yang, and Robert Zak. The Network Architecture of the Connection Machine CM-5. The Journal of Parallel and Distributed Computing, Volume 33, Number 2, March 15, 1996, pp. 145-158.
(compressed postscript)

Interconnection Networks

[13] William J. Dally. Performance Analysis of k-ary n-cube Interconnection Networks. IEEE Transactions on Computers, June 1990.

[14] Anant Agarwal. Limits on Interconnection Network Performance. IEEE Transactions on Parallel and Distributed Systems, October 1991.
(postscript)

[15] William J. Dally. Virtual-Channel Flow Control. In Proceedings of the 17th International Symposium on Computer Architecture, May 1990.

[16] William J. Dally and Charles L. Seitz. Deadlock-Free Message Routing in Multiprocessor Interconnection Networks. IEEE Transactions on Computers, May 1987.

Papers for End of Semester Lectures

[17] J. Gregory Steffan and Todd C. Mowry. The Potential for Using Thread-Level Data Speculation to Facilitate Automatic Parallelization. Proceedings of the 4th International Symposium on High-Performance Computer Architecture, February 1998.
(compressed postscript)

[18] J. Gregory Steffan, Christopher B. Colohan, Antonia Zhai, and Todd C. Mowry. A Scalable Approach to Thread-Level Speculation. Proceedings of the 27th Annual International Symposium on Computer Architecture, June 2000.
(compressed postscript)

[19] Amir Roth and Gurindar S. Sohi. Speculative Data-Driven Multithreading. In Proceedings of the 7th International Symposium on High-Performance Computer Architecture, January 2001. (pdf)

[20] Dongkeun Kim and Donald Yeung. Design and Evaluation of Compiler Algorithms for Pre-Execution. In Proceedings of the 10th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS-X). San Jose, CA. October 2002.
[pdf, gzip'd ps]

[21] Luiz André Barroso, Kourosh Gharachorloo, Robert McNamara, Andreas Nowatzyk, Shaz Qadeer, Barton Sano, Scott Smith, Robert Stets, and Ben Verghese. Piranha: A Scalable Architecture Based on Single-Chip Multiprocessing. In Proceedings of the 27th ACM International Symposium on Computer Architecture, June 2000.
(citeseer)

[22] Basem A. Nayfeh, Lance Hammond, and Kunle Olukotun. Evaluation of Design Alternatives for a Multiprocessor Microprocessor. Proceedings of the 23rd International Symposium on Computer Architecture, May 1996.

[23] Ronald N. Kalla, Balaram Sinharoy, and Joel M. Tendler. IBM Power5 Chip: A Dual-Core Multithreaded Processor. IEEE Micro. 24(2), pg. 40-47. 2004.

[24] Poonacha Kongetira, Kathirgamar Aingaran, and Kunle Olukotun. Niagara: A 32-Way Multithreaded SPARC Processor. IEEE Micro. August 2004.

[25] Ravi Rajwar and James R. Goodman. Speculative Lock Elision: Enabling Highly Concurrent Multithreaded Execution. In Proceedings of the 34th International Symposium on Microarchitecture, December 2001.
[pdf]