2013.13: Avoiding communication through a multilevel LU factorization
2013.13: Simplice Donfack, Laura Grigori and Amal Khabou (2012) Avoiding communication through a multilevel LU factorization. Euro-Par 2012 Parallel Processing, 7484 (2012). pp. 551-562. ISSN 0302-9743
Full text available as:
|PDF - Requires a PDF viewer such as GSview, Xpdf or Adobe Acrobat Reader|
Due to the evolution of massively parallel computers towards deeper levels of parallelism and memory hierarchy, and due to the exponentially increasing ratio of the time required to transfer data, either through the memory hierarchy or between different compute units, to the time required to compute floating point operations, the algorithms are confronted with two challenges. They need not only to be able to exploit multiple levels of parallelism, but also to reduce the communication between the compute units at each level of the hierarchy of parallelism and between the different levels of the memory hierarchy.
In this paper we present an algorithm for performing the LU factorization of dense matrices that is suitable for computer systems with two levels of parallelism. This algorithm is able to minimize both the volume of communication and the number of messages transferred at every level of the two-level hierarchy of parallelism. We present its implementation for a cluster of multicore processors based on MPI and Pthreads. We show that this implementation leads to a better performance than routines implementing the LU factorization in well-known numerical libraries. For matrices that are tall and skinny, that is they have many more rows than columns, our algorithm outperforms the corresponding algorithm from ScaLAPACK by a factor of 4.5 on a cluster of 32 nodes, each node having two quad-core Intel Xeon EMT64 processors.
|Subjects:||MSC 2000 > 65 Numerical analysis|
|Deposited By:||Amal Khabou|
|Deposited On:||13 March 2013|