DEA - Module ALPA : Algorithmique Parallèle

2000-2001

version francaise de ce document

Instructors

Sanjay Rajopadhye (CNRS Researcher), and Tanguy Risset (INRIA Researcher).

Contacts:

risset@irisa.fr (email)

rajopadh@irisa.fr (email)

alias alpa (email)

The DEA home page (html)

Schedule

Thursdays 10h00 - 12h00, Salle Jersey

Course Outline and Plan

ALPA is a postgraduate course offered by IFSIC to DEA students (in their fifth year of post-high school education). It meets for lectures for two hours per week for 10 weeks. The final grade is based a 3 hour final exam plus a report on an "individual study" component. The lectures are supplemented by a number of exercises (not graded). In the 2000-2001 year, ALPA is a required class for all DEA students (about 35). The independent study is on "tiling", a loop transformation used for locality improvement in high performance codes.

Schedule

As the course proceeds, this will evolve to contain copies of the lecture transparencies, exercises (and possibly solutions), instructions, etc.

Weeks 1 & 2 (Rajopadhye): Basic notions of parallelism, speedup and efficiency, Amdahl's law. The PRAM model, pointer jumping, divide and parallelize, algorithms for list ranking, parallel prefix, Euler tours, etc. Brent's theorem, notions of depth and size and efficient simulation. Relative power of concurrent reads and writes. This material is based on Ch 30 (sections 30.1 to 30.3) of the book Algorithms by Cormen, Liserson et Rivest, and some supplementary notes.
Week 3 (Rajopadhye): The BSP (Bulk Synchronous Parallelism) model, proposed in A bridging model for parallel computation by Valiant in "Communications of the ACM" en August 1990 (v33 n8 p103). Our main reference will be Questions and answers about BSP by Skillicorn, Hill and McColl (Scientific Programming, 6(3): 249-274, Fall 1997). There are a number of sites for additional information, such as the one maintained by McColl and another which is more tutorial oriented tutorial. There is also an organization BSP Worldwide of researchers and users of BSP.
Weeks 5, 6,& 7 (Rajopadhye): Loop parallelization and the polyhedral model. Dependence analysis, single assignment form, systems of recurrence equations, scheduling and parallelism detection, space time transformations, and code generation.
25 November to 16 December (Risset): Parallel algorithms under the communication network model: linear arrays, grids and trees. Course notes for linear arrays (in French) are available here. Here are a few exercises for this material, and the answers are (will soon be) available here.

Similar courses are taught at many other places, such as University of North Carolina, IIT Bombay.

Independent Study

ALPA involves a detailed independent study of a current topic or research. You will start with a few given references, but it is up to your initiative to follow other citations of what you read, to make good notes of the subject and develop a clear understanding of the subject. A part of the final exam will be based on this study. You are very strongly encouraged to work together for your reading, and to meet and discuss your readings with colleagues and with the instructors outside class time. It is also crucial to develop a good plan for the study and not leave it all to the 11-th hour.

Tiling is a regular partitioning of a uniform index space representing either computations (e.g., the iteration space of a loop program), or data (e.g., arrays distributed over the processors of a parallel machine). Originally presented in a seminal paper by Irigoin and Triolet [4] for increasing the granularity of parallel computations, tiling can be used to achieve many different performance goals: exploiting data locality in hierarchical memory machines, communication optimisation by message aggregation, overlap of communication and computation, latency avoidance etc. Two other early papers describing the transformation are due to Schreiber and Dongarra [7] and Ramanujam and Sadayappan [6]. Xue [8] gives a nice synthesis of these earlier results and some extensions. There has also been work on the choice of the parameters of a tiling transformation (notably the shape and size of the tiles) so as to optimize various performance criteria which are more or less valid depending on the context in which the tiling is performed.

Your independent study deals with the tile size optimization problem. The most relevant results are due to Andonov and Rajopadhye [1], Hodzic and Shang [2], Ohta et al [5] (who are widely cited in spite of an error in their main theorem: one of your tasks is to find and understand the mistake) and Hogstedt et al [3]. In 1998, a Workshop Tiling for Optimal Resource Utilization was held in Dagstuhl Germany, and the followup page leads to additional references.

General References

1. F. Thomson Leighton, Parallel Algorithms and Architectures: arrays, trees, Hypercubes. Excellent reference text (700 pages).
2. M. Cosnard and Denis Trystram, Algorithmes et architectures parallèlesm InterEditions, 1993. General reference text covering most aspects of parallelism.
3. P. Quinton and Y. Robert, Algorithmes et architectures systoliques, Masson, 1989. Staring to become a bit old, but the iitial part on systolic algorithmes is very interesting, as are the synthesis methods of Chapters 11 and 12.
4. V. Kumar, A. Grama, A. Gupta and G. Karypis, Introduction to Parallel Computing, Benjamin Cummings, 1994. General reference text.
5. P. Feautrier, Dataflow analysis of array and scalar references, dans "International Journal of Parallel Programming", February, 1991 (volume 20, number 1, pages 23-53).
6 Cormen, Leiserson, Rivest, Introduction à l'algorithmique, Dunod 1994 (transa=lation of the English Algorithms (chapter 30). Another excellent reference text for all aspects of analysis of algorithms (not just parallel).

Tiling References

[1] Andonov and Rajopadhye, Optimal Orthogonal Tiling of 2-D Iterations in Journal of Parallel and Distributed Computing, September 1997 (pages 159-165)

[2] Hodzic and Shang, On Supernode Transformation with Minimized Total Running Time in IEEE Transactions on Parallel and Distributed Systems, May 1998 (pages 417-428)

[3] Hogstedt, Carter and Ferrante, Determining the Idle Time of a Tiling in 1997 ACM Symposium on the Principles of Programming Languages, January 1997

[4] Irigoin in Triolet, Supernode Partitioning in 15th ACM Symposium on Principles of Programming Languages, January 1988, (pages 319-328)

[5] Ohta, Saito, Kainaga and Ono, Optimal Tile Size Adjustment in Compiling General DOACROSS Loop Nests in ACM International Conference on Supercomputing July 1995 (pages 270-279)

[6] Ramanujam and Sadayappan, Tiling Multidimensional Iteration Spaces for Multicomputers in Journal of Parallel and Distributed Computing, vol 16, No. 2, 1992 (pages 108-120)

[7] Schreiber and Dongarra, Automatic blocking of nested loops Technical Report No. 90.38 RIACS, NASA Ames Research Center, August 1990

[8] Xue, On Tiling as a Loop Transformation in Parallel Processing Letters, vol 7, No. 4, October (?) 1997 (pages 490-424)