You are here: MIMS > EPrints
MIMS EPrints

2017.22: The Design and Performance of Batched BLAS on Modern High-Performance Computing Systems

2017.22: Jack Dongarra, Sven Hammarling, Nicholas J. Higham, Samuel D. Relton, Pedro Valero-Lara and Mawussi Zounon (2017) The Design and Performance of Batched BLAS on Modern High-Performance Computing Systems. Procedia Computer Science, 108. pp. 495-504.

Full text available as:

PDF - Requires a PDF viewer such as GSview, Xpdf or Adobe Acrobat Reader
336 Kb

DOI: 10.1016/j.procs.2017.05.138


A current trend in high-performance computing is to decompose a large linear algebra prob- lem into batches containing thousands of smaller problems, that can be solved independently, before collating the results. To standardize the interface to these routines, the community is developing an extension to the BLAS standard (the batched BLAS), enabling users to perform thousands of small BLAS operations in parallel whilst making efficient use of their hardware. We discuss the benefits and drawbacks of the current batched BLAS proposals and perform a number of experiments, focusing on GEMM, to explore their affect on the performance. In particular we analyze the effect of novel data layouts which, for example, interleave the ma- trices in memory to aid vectorization and prefetching of data. Utilizing these modifications our code outperforms both MKL and CuBLAS by up to 6 times on the self-hosted Intel KNL (codenamed Knights Landing) and Kepler GPU architectures, for large numbers of DGEMM operations using matrices of size 2 × 2 to 20 × 20.

Item Type:Article
Additional Information:

International Conference on Computational Science, ICCS 2017, 12-14 June 2017, Zurich, Switzerland

Uncontrolled Keywords:BLAS, Batched BLAS, High-Performance Computing, Scientific Computing, Parallel Processing
Subjects:MSC 2000 > 65 Numerical analysis
MSC 2000 > 68 Computer science
MIMS number:2017.22
Deposited By:Dr Samuel Relton
Deposited On:28 June 2017

Download Statistics: last 4 weeks
Repository Staff Only: edit this item