High Performance Computing (532AA, 9 CFU)

Course objectives. The course deals with two interrelated issues in high-performance computing:i)fundamental concepts and techniques in parallel computation structuring and design, including parallelization methodologies and paradigms, parallel programming models, their implementation, and related cost models;ii)architectures of high-performance computing systems, including shared-memory multiprocessors, distributed-memory multicomputers, GPUs, clusters, and others.Both issues are studied in terms of structural models, static and dynamic support to computation and programming models, performance evaluation, capability for building complex and heterogeneous applications and/or enabling platforms, also through examples of application cases. Technological features and trends are studied, in particular multi-/many-core technology and high-performance networks.

Course topics. The course is structured into two parts:

Structuring and Design Methodology for Parallel Applications: structured parallelism at applications and processes levels, cost models, impact of communications, parallel computations as queueing systems/queueing networks, parallel paradigms (Pipeline, Data-Flow, Farm, Function Partitioning, Data Parallel), parallel systems at the firmware level, instruction level parallelism (Pipeline, Superscalar, Multithreaded CPUs).

Parallel Architectures: shared-memory multiprocessors (SMP and NUMA architectures), distributed-memory multicomputers (Clusters and MPP architectures), SIMDs, GPUs, run-time support to inter-process communication, interconnection networks, performance evaluation and multicore architectures.

Textbook. M. Vanneschi,High Performance Computing: Parallel Processing Models and Architectures. Pisa University Press, 2014. The book is integrated with an errata corrige. Attending and studying the course requires proper background knowledge in Structured Computer Architecture.The appendixof the textbook contains a detailed review of basic concepts and techniques in Structured Computer Architecture.

Teaching modality. According to the current rules of the University of Pisa for the Academic Year 2021-2022 (Covid-19 pademic), the course will be given in a blended manner, i.e., mixed mode with some students in the room (according to the University’s rules for attending the lectures in person) and others online. I will try to offer the online streaming of the lectures, in order to stimulate a real-time interaction will all the students (both the ones in the room and the ones online). However, in case of unexpected technical issues, the lecture recording will be uploaded at the end of the lecture in the Microsoft Teams platform.

Microsoft Teams. Official course available with the following code: oz8oa7a

Question Time. Official slot is every Tuesday 10:00-13:00. Please send an email to the professor in advance. Preferred option is to give the question time through Microsoft Teams.

Course material:

1. Organization of the course and exam rules

2. Errata corrige of the textbook (version 2022)

3. Collection of past exams with solutions

4. Course introduction

5. Overview of the course approach ,notes

6. Level-structuring of computer systems ,notes

7. Parallelism and performance metrics ,notes

8. Concurrent language ,notes

9. Pipeline and farm paradigms ,notes part 1 ,and part 2

10. First homework

11. Collective communications ,notes part 1 ,and part 2

12. Data-flow model ,notes

13. Correction of the first homework

14. Analysis of acyclic computation graphs ,notes part 1 , notes part 2

15. Basic run-time support

16. Second homework

17. Shared-Memory Systems , notespart 1 , notes part 2 , notes part 3 , notes part 4 , notes part 5

18. Correction of the second homework

19. Third homework

20. Flow control and base latency , notes

21. Correction of the third homework

22. Synchronization mechanisms , notes

23. Data-parallel paradigm , notes

24. Fourth homework

25. Cache coherence , notes part 1 , notes part 2

26. Optimized run-time system , notes part 1 , notes part 2

27. Analysis of cache coherence overhead

28. SIMD and GPU architectures

29. Distributed-memory systems , notes

Gabriele Mencagli