You are here: MIMS > EPrints
MIMS EPrints

2016.42: A Comparison of Potential Interfaces for Batched BLAS Computations

2016.42: Samuel D. Relton, Pedro Valero-Lara and Mawussi Zounon (2016) A Comparison of Potential Interfaces for Batched BLAS Computations.

Full text available as:

PDF - Requires a PDF viewer such as GSview, Xpdf or Adobe Acrobat Reader
391 Kb


One trend in modern high performance computing (HPC) is to decompose a large linear algebra problem into thousands of small problems which can be solved indepen- dently. There is a clear need for a batched BLAS standard, allowing users to perform thousands of small BLAS operations in parallel and making efficient use of their hard- ware. There are many possible ways in which the BLAS standard can be extended for batch operations. We discuss many of these possible designs, giving benefits and criticisms of each, along with a number of experiments designed to determine how the API may affect performance on modern HPC systems. Related issues that influence API design, such as the effect of memory layout on performance, are also discussed.

Item Type:MIMS Preprint
Uncontrolled Keywords:BLAS, batched BLAS, linear algebra, parallel computing, high-performance computing
Subjects:MSC 2000 > 68 Computer science
MIMS number:2016.42
Deposited By:Dr Samuel Relton
Deposited On:04 August 2016

Download Statistics: last 4 weeks
Repository Staff Only: edit this item