Software construction for scientific computing is a difficult task. Scientific codes are often
large and complex, requiring vast amounts of domain knowledge for their construction.
They also process large data sets so there is an additional requirement for efficiency and
high performance. Considerable knowledge of modern computer architectures and compilers
is required to make the necessary optimizations, which is a time-intensive task and
further complicates the code.
The last decade has seen significant advances in the area of software engineering. New
techniques have been created for managing software complexity and building abstractions.
Underneath the layers of new terminology (object-oriented, generic [51], aspectoriented
[40], generative [17], metaprogramming [55]) there is a core of solid work that
points the way for constructing better software for scientific computing: software that is
portable, maintainable and achieves high performance at a lower development cost.
One important key to better software is better abstractions. With the right abstractions
each aspect of the software (domain specific, performance optimization, parallel
communication, data-structures etc.) can be cleanly separated, then handled on an individual
basis. The proper abstractions reduce the code complexity and help to achieve
high-quality and high-performance software.
The first generation of abstractions for scientific computing came in the form of sub-
1
CHAPTER 1. INTRODUCTION 2
routine libraries such as the Basic Linear Algebra Subroutines (BLAS) [22, 23, 36], LINPACK
[21], EISPACK [50], and LAPACK [2]. This was a good first step, but the first
generation libraries were inflexible and difficult to use, which reduced their applicability.
Moreover the construction of such libraries was a complex and expensive task. Many
software engineering techniques (then in their infancy) could not be applied to scientific
computing because of their interference with performance.
In the last few years significant improvements have been made in the tools used for
expressing abstractions, primarily in the maturation of the C++ language and its compilers.
The old enmity between abstraction and performance can now be put aside. In fact,
abstractions can be used to aid performance portability by making the necessary optimizations
easier to apply. With the intelligent use of modern software engineering techniques
it is now possible to create extremely flexible scientific libraries that are portable, easy
to use, highly efficient, and which can be constructed in far fewer lines of code than has
previously been possible.
This thesis describes such a library, the Matrix Template Library (MTL), a package
for high-performance numerical linear algebra. There are four main contributions in this
thesis. The first is a breakthrough in software construction that enables the heavy use
of abstraction without inhibiting high performance. The second contribution is the development
of software designs that allow additive programming effort to produce multiplicative
amounts of functionality. This produced an order of magnitude reduction in the
code length for MTL compared to the Netlib BLAS implementation, a software library
of comparable functionality. The third contribution is the construction of flexible kernels
that simplify the automatic generation of portable optimized linear algebra routines. The
fourth contribution is the analysis and classification of the numerical linear algebra problem
domain which is formalized in the concepts that define the interfaces of the MTL
CHAPTER 1. INTRODUCTION 3
Personal Accomplishments Others' RelatedWork
Implementation of all the MTL software BLAS [22, 23, 36] and LAPACK [2]
Idea to use adaptors to solve “fat” interface
problem
Use of aspect objects to handle indexing
for matrices
Generic Programming [43], Aspect Oriented
Programming [40], idea of a separation
of orientation and 2D containers
[37, 38], idea to use iterators for linear
algebra [37, 38]
Idea to use template metaprogramming to
perform register blocking in linear algebra
kernels
Complete unrolling for operations on
small arrays [55], matrix constructor
interface [16, 18], compile-time prime
number calculations [54]
Tuned MTL algorithms for high performance
Tiling and blocking techniques [10, 11,
12, 14, 32, 34, 35, 39, 60, 61], automatically
tuned libraries [7, 59]
Proved that iterators can be used in high
performance arenas
Optimizing compilers [33, 41],
lightweight object optimization, inlining
Created the Mayfly pattern Andrew Lumsdaine thought of the name
Designed the ITL interface ITL implementation by Andrew Lumsdaine
and Rich Lee
Table 1.1. Breakdown of personal accomplishments vs. others' related work and work
used in this thesis.
components and algorithms.
The work in this thesis builds off of work by many other people, and parts of others
work is described in this thesis. Table 1.1 is provided in order to clarify what work was
done by others, and what work I did as part of this thesis. The related work listed here
is only the work that was very closely related to MTL, or that was used heavily in MTL.
Chapter 3 describes in more detail the work related to MTL.
The following is a road map for the rest of this thesis. Chapter 2 gives a introduction
to generic programming, and describes how to extend generic programming to linear
algebra. Chapter 3 gives and overview of prior work by others that is related to MTL.
CHAPTER 1. INTRODUCTION 4
Chapters 4 and 5 address the design and implementation of the MTL algorithms and
components. Chapter 6 discusses performance issues such as the ability of modern C++
compilers to optimize abstractions and how template metaprogramming techniques can
be used to express loop optimizations.
Chapter 7 describes an iterative methods library — the Iterative Template Library
(ITL) — that is constructed using MTL. The ultimate purpose of the work in this thesis
is to aid the construction of higher-level scientific libraries and applications in several respects:
reduce the development costs, improve software quality from a software engineering
standpoint, and to make high-performance easier to achieve. The Iterative Template
Library is an example of how higher-level libraries can be constructed using MTL.
Chapter 8 gives the real proof that our generic programming approach is viable for
scientific computing: the performance results. The performance of MTL is compared
to vendor BLAS libraries for several dense and sparse matrix computations on several
different architectures. Chapter 9 summarizes the verification and testing of the MTL
software. Chapter 10 discusses some future directions of MTL and concludes the thesis.