MTL 作者论文序

Software construction for scientific computing is a difficult task. Scientific codes are often

large and complex, requiring vast amounts of domain knowledge for their construction.

They also process large data sets so there is an additional requirement for efficiency and

high performance. Considerable knowledge of modern computer architectures and compilers

is required to make the necessary optimizations, which is a time-intensive task and

further complicates the code.

The last decade has seen significant advances in the area of software engineering. New

techniques have been created for managing software complexity and building abstractions.

Underneath the layers of new terminology (object-oriented, generic [51], aspectoriented

[40], generative [17], metaprogramming [55]) there is a core of solid work that

points the way for constructing better software for scientific computing: software that is

portable, maintainable and achieves high performance at a lower development cost.

One important key to better software is better abstractions. With the right abstractions

each aspect of the software (domain specific, performance optimization, parallel

communication, data-structures etc.) can be cleanly separated, then handled on an individual

basis. The proper abstractions reduce the code complexity and help to achieve

high-quality and high-performance software.

The first generation of abstractions for scientific computing came in the form of sub-

CHAPTER 1. INTRODUCTION 2

routine libraries such as the Basic Linear Algebra Subroutines (BLAS) [22, 23, 36], LINPACK

[21], EISPACK [50], and LAPACK [2]. This was a good first step, but the first

generation libraries were inflexible and difficult to use, which reduced their applicability.

Moreover the construction of such libraries was a complex and expensive task. Many

software engineering techniques (then in their infancy) could not be applied to scientific

computing because of their interference with performance.

In the last few years significant improvements have been made in the tools used for

expressing abstractions, primarily in the maturation of the C++ language and its compilers.

The old enmity between abstraction and performance can now be put aside. In fact,

abstractions can be used to aid performance portability by making the necessary optimizations

easier to apply. With the intelligent use of modern software engineering techniques

it is now possible to create extremely flexible scientific libraries that are portable, easy

to use, highly efficient, and which can be constructed in far fewer lines of code than has

previously been possible.

This thesis describes such a library, the Matrix Template Library (MTL), a package

for high-performance numerical linear algebra. There are four main contributions in this

thesis. The first is a breakthrough in software construction that enables the heavy use

of abstraction without inhibiting high performance. The second contribution is the development

of software designs that allow additive programming effort to produce multiplicative

amounts of functionality. This produced an order of magnitude reduction in the

code length for MTL compared to the Netlib BLAS implementation, a software library

of comparable functionality. The third contribution is the construction of flexible kernels

that simplify the automatic generation of portable optimized linear algebra routines. The

fourth contribution is the analysis and classification of the numerical linear algebra problem

domain which is formalized in the concepts that define the interfaces of the MTL

CHAPTER 1. INTRODUCTION 3

Personal Accomplishments Others' RelatedWork

Implementation of all the MTL software BLAS [22, 23, 36] and LAPACK [2]

Idea to use adaptors to solve “fat” interface

problem

Use of aspect objects to handle indexing

for matrices

Generic Programming [43], Aspect Oriented

Programming [40], idea of a separation

of orientation and 2D containers

[37, 38], idea to use iterators for linear

algebra [37, 38]

Idea to use template metaprogramming to

perform register blocking in linear algebra

kernels

Complete unrolling for operations on

small arrays [55], matrix constructor

interface [16, 18], compile-time prime

number calculations [54]

Tuned MTL algorithms for high performance

Tiling and blocking techniques [10, 11,

12, 14, 32, 34, 35, 39, 60, 61], automatically

tuned libraries [7, 59]

Proved that iterators can be used in high

performance arenas

Optimizing compilers [33, 41],

lightweight object optimization, inlining

Created the Mayfly pattern Andrew Lumsdaine thought of the name

Designed the ITL interface ITL implementation by Andrew Lumsdaine

and Rich Lee

Table 1.1. Breakdown of personal accomplishments vs. others' related work and work

used in this thesis.

components and algorithms.

The work in this thesis builds off of work by many other people, and parts of others

work is described in this thesis. Table 1.1 is provided in order to clarify what work was

done by others, and what work I did as part of this thesis. The related work listed here

is only the work that was very closely related to MTL, or that was used heavily in MTL.

Chapter 3 describes in more detail the work related to MTL.

The following is a road map for the rest of this thesis. Chapter 2 gives a introduction

to generic programming, and describes how to extend generic programming to linear

algebra. Chapter 3 gives and overview of prior work by others that is related to MTL.

CHAPTER 1. INTRODUCTION 4

Chapters 4 and 5 address the design and implementation of the MTL algorithms and

components. Chapter 6 discusses performance issues such as the ability of modern C++

compilers to optimize abstractions and how template metaprogramming techniques can

be used to express loop optimizations.

Chapter 7 describes an iterative methods library — the Iterative Template Library

(ITL) — that is constructed using MTL. The ultimate purpose of the work in this thesis

is to aid the construction of higher-level scientific libraries and applications in several respects:

reduce the development costs, improve software quality from a software engineering

standpoint, and to make high-performance easier to achieve. The Iterative Template

Library is an example of how higher-level libraries can be constructed using MTL.

Chapter 8 gives the real proof that our generic programming approach is viable for

scientific computing: the performance results. The performance of MTL is compared

to vendor BLAS libraries for several dense and sparse matrix computations on several

different architectures. Chapter 9 summarizes the verification and testing of the MTL

software. Chapter 10 discusses some future directions of MTL and concludes the thesis.

MTL 作者 论文 序