|Subject Area||Software and Information System Engineering|
|Semester||Semester 7 – Fall|
The course discusses programming techniques for parallel systems and more specifically multicores and manycores. It spans programming of conventional and non-conventional, homegeneous and heterogeneous architectures. The students are introduced to performance measurement and estimation techniques, application profiling, experimental performance evaluation, experimental evaluation of software / hardware interaction and optimization techniques.
The course is complemented by a series of homeworks that allow the students to apply in practice the methods and techniques discussed in class.
The main course topics are the following:
- Introduction, technical and economic reasons that lead to the de-facto prevalence of multicore systems, grand-challenge applications
- Main metrics, Amdahl’s law, Karp-Flatt metric, Gustafson-Barsis law.
- Elements of parallel computer architectures, parallel systems taxonomies, typical conventional and non-conventional architectures.
- Methodologies for the experimental evaluation of the performance of parallel applications on multicore systems and of their interaction with hardware.
- Patterns in parallel computing: parallelism extraction, algorithmic structure, data structures, implementation mechanisms.
- Programming models for multi- and many-core systems (OpenMP, Intel Thread Building Blocks, Cilk, OpenCL).
- GPU programming. The CUDA programming model.
- Software interaction with the underlying memory architecture, effective use of caches, data prefetching, communication/computation overlap. The CUDA memory model.
- Perfprmance optimization on GPUs – Floating point issues, accuracy, accuracy/performance tradeoff.
- CUDA applications case studies: MRI reconstruction, molecular visualization and analysis.
- Synchronization implementation techniques (locks, barriers) and their interraction with hardware, alternative synchronization methods (fine-grained, speculative, lazy, non-blocking), transactional memory.
- Performance optimization techniques on a single core (branches, efficient use of the cache hierarchy, loop manipulation, slow instructions and lookup tables).
- Vectorization techniques, data alignment, automatic vectorization.
After successfully fulfilling the requirements of the course, students are capable of:
- Knowing the main parallel computing architectures..
- Knowing the basic steps required to develop parallel software and applying them on real codes.
- Understanding the interaction of software with the underlying hardware and applying it to better map code on the underlying architecture..
- Knowing the architecture of basic services used by parallel codes (for examples different synchronization methods) and choosing the best algorithm / implementation according to the characteristics of his / her code and of the underlying architecture.
- Developing code for conventional and non conventional multi- and many-core acrchitectures, using the appropriate programming model for each case.
- Analyzing code performance using the respective tools and exploiting the results of the analysis to optimize the code.
- Quantifying code performance both at a macroscopic (execution time), as well as at a lower level (interaction with hardware), using scientifically sound methodologies and documenting his / her observations in a technical report.