|Subject Area||Computer Hardware and Architecture|
|Semester||Semester 8 – Spring|
- The need for parallel architectures
- Patterns of Parallelism (ILP, DLP, TLP)
- ILP in hardware -Dynamic out of order execution
- Case study: Intel’s x86 Core2 and i7 microarchitectures
- VLIW technology – Software pipelining, Modulo scheduling
- Superblocks, Hyperblocks, Predication, Speculation
- Case study : Intel’s Itanium ISA
- Case study: IBM’s Cell BE processor – Architecture, Programming model, Tools
- Simultaneous multithreading – SMT processors
- Case study : Power6-Power7 processor
- Case study: Graphics Processing Units (GPUs) – CUDA libraries
- Shared and Distributed memory systems
- Memory Coherence protocols and Memory consistency models
- Transactional Memory
- Case study : Sun’s T2 processor
- Streaming architecture paradigm
- Case study : Merrimac (Stanford), RSVP (Motorola)
- Interconnect networks
- DRAM technology and organization – Memory access scheduling and prefetching
- Customizable processors – Reconfigurable computing
- Class project involves using simulating of parallel architectures using open source simulators (GEM5) running benchmarks programs like PARSEC.
This course provides a detailed study on the design, engineering and evaluation of parallel computing systems.
The course begins by explaining the need for multi-core systems due to the physical limitations of unicore high performance processors.
It goes on to describe forms and patterns of parallelism such as instruction level (ILP), data level (DLP) and thread level parallelism (TLP) in modern high performance processors. Technologies for ILP extraction and deployment such as superscalar, out-of-order execution and VLIW technology as well as the accompanying compiler optimizations such as loop unrolling, software pipelining, predication, speculation, etc. are covered in detail.
Then, multi-core (or many-core) architectures that exploit thread(task) level parallelism are discussed in detail. There is special emphasis on problems of multi-core systems such as memory coherence and memory consistency. The course describes hardware and software techniques to resolve these issues, such as cache coherence mechanisms, synchronization primitives, and latest advances such as transactional memory, and streaming archictectures. We also cover interconnection networks, which become especially important for the implementation of high performance multi-core systems.
The course emphasizes the practical application of all these technologies in real machines. Throughout the class, we will be describing the architecture of modern real processors, such as Intel’s x86 i7 microarchitecture, Intel’s Itanium ISA, the Cell BE processor, GPU architectures, Sun’s multithreaded processors, streaming architectures such as Merrimac (Stanford) and RSVP (Motorola), reconfigurable architectures etc.
Finally, the course will cover special topics such as customizable processors, reconfigurable computing, DRAM technology and memory controllers, etc.
There will be a number of homeworks and a final exam covering the material. There will also be weekly recitations based on study of research papers. Finally, the students will engage on a term project on configuration, simulation, and study of a multicore system.