2006.56: Cache Efficient Bidiagonalization Using BLAS 2.5 Operators
2006.56: G. W. Howell, J. W. Demmel, C. T. Fulton, S. Hammarling and K. Marmol (2006) Cache Efficient Bidiagonalization Using BLAS 2.5 Operators.
Full text available as:
|PDF - Requires a PDF viewer such as GSview, Xpdf or Adobe Acrobat Reader|
On cache based computer architectures using current standard algorithms, Householder bidiagonalization requires a significant portion of the execution time for computing matrix singular values and vectors In this paper we reorganize the sequence of operations for Householder bidiagonalization of a general m × n matrix, so that two (_GEMV) vector-matrix multiplications can be done with one pass of the unreduced trailing part of the matrix through cache. Two new BLAS 2.5 operations approximately cut in half the transfer of data from main memory to cache. We give detailed algorithm descriptions and compare timings with the current LAPACK bidiagonalization algorithm.
|Item Type:||MIMS Preprint|
|Subjects:||MSC 2000 > 65 Numerical analysis|
|Deposited By:||Sven Hammarling|
|Deposited On:||07 April 2006|