关于Fault Modeling using the Program Dependence Graph 的读书笔记

关于Fault Modeling using the Program Dependence Graph 的读书笔记(1)

MSE 2004,perdubug,2004.6.1

注:Fault Modeling using the Program Dependence Graph是一篇介绍关于软件故障分类和故障播撒技术的文献.

文章一开始介绍了看懂这篇文献需要了解的一些背景知识，其中包括：

A.Faults and Fault Categories

B.The Program Dependence Graph

C.Dataflow testing

D.Mutation Testing

E.Modeling Faults using the PDG

F.PDG-based Fault Classification

A.前言

0.we present a fault classification schema and a fault seeding method that is based on the manifestation of faults in the program dependence graph(PDG)

作者要给我们介绍的是一种故障分类方案和故障播撒方法，这些方案和方法都是基于故障在PDG图中的状态。

1. A meaningful measure of a testing technique is its fault-detection ability or effectiveness.We measure the effectiveness of a testing technique in terms of its ability to detect certain types of faults; we also measure the effectiveness of a testing technique relative to the effectiveness of other testing techniques.

我们一般从故障检测能力(Fault-Detection Ability)或者是故障检测效力(Fault-Detection Effectiveness)两个方面来衡量一个测试技术，一个测试技术的故障检测效力主要是看它正确的检测出故障类型的能力。

2. Performing studies on fault detection is diffcult because of the lack of fault-seeding techniques that automatically and systematically insert faults into a program One fault-seeding method, mutation testing, inserts small syntactic changes or faults into a program [10]. However, this approach is expensive and results in a large number of fault-seeded programs that must be tested. Additionally, mutation testing inserts only simple faults (simple syntactic code changes) into a program instead of complex faults (multiple syntactic changes or structural changes).

研究故障检测是比较困难的，原因主要是缺乏故障播撒技术,故障播撒技术能够自动的，系统的将故障插入到程序中。一种叫做mutation testing:的故障播撒方法将细微的语法变化或者是故障插入到程序中..但是这种方法对于一个需要被测试的而且内部已经被播撒了大量的故障的程序来说代价是高昂的。而且这种方法只能是插入简单的故障而不能是复杂的故障,比如结构上的改变。

B. Faults and Fault Categories(故障和故障的分类)

3. According to the IEEE standard definitions [24], a fault or “bug" is an incorrect step, instruction or data definition in a program. A fault may result in a failure, which is observed when the system exhibits incorrect external behavior. An error is an internal difference between the computed, observed, or measured values or conditions, and the true, specified, or theoretically correct values or conditions. Finally, a mistake is a human misconception that results in a fault.

作者按照IEEE的规范解释了什么叫bug,什么叫mistake等基本概念：

mistake ，由于人的误解导致的故障

fault ，程序中不正确的步骤,指令或者是数据定义

error ，是一种内在差异,这种差异来自:

1.已经计算完的、已经观察到的、已经衡量过的值或者是条件；

2.正确的、指定的、理论上正确的值或条件；

一言以蔽之,也就是说我们实际得到的和理论上、我们希望的不一致

接下来作者给了一个例子来说明上面的几个概念：

To illustrate, suppose a programmer assumes that the upper bound of an array of size 50 is at index 50.For an array of size 50 whose indices start at 0, this is a mistake. A fault in the program due to this mistake may be a statement such as `array[50] = value;'. The failure in this case may be a memory violation and core dump. If the incorrectly accessed location array[50] contains a numerical value that is used in some computation, the output will be different from the expected output, and this difference is an error.

举例来说，一个程序员假设一个大小为50的数组上界在序号为50这个位置，但是对于一个序号从零开始的数组来说这是一个失误(mistake, 由于人的误解导致的故障)，所以说当程序作类似于后面这样的操作的时候会出现故障(fault, 程序中不正确的步骤,指令或者是数据定义): array[50] = value，假如说array[50]这个位置放的是一个数值性数据，然后这个数值被用到一些计算中去了，很显然输出将不是我们希望的输出，这就是一个错误(error, 是一种差异)。

4. Howden [22] originally classified program faults, and Zeil [40] extended Howden's classi_cation. There have been other fault classifications based on observed instances of faults and on the complexity of faults [12]. According to the Howden/Zeil classifications, program faults are categorized as domain faults or computation faults. A domain fault occurs when, due to an error in control flow, a program generates incorrect output. A computation fault occurs when a program takes the correct path, but generates incorrect output because of faults in the computations along that path. Domain faults are further classified into two categories. A missing path fault is caused by a missing conditional or clause and the associated statements, and a path selection fault is caused by an incorrect decision at a predicate. Path-selection faults can result

from an incorrect predicate (predicate fault) or from an incorrect assignment statement that propagates to a control point, leading to an incorrect decision (assignment fault).

从上面的文字可以看出来，如果要追溯软件故障分类的源头的话理所当然要去查一下Howden的相关文章，按照Howden和Zeil的分类,程序故障可以被分为：

1) domain fault，范围故障，当程序由于控制流中有一个error导致产生不正确的输出的时候的时候，我们就说这是一个domain fault。它可以进一步分为:

a.missing path fault，由于错误的条件、子句、声明造成的故障；

b.path selection fault，由于不正确的判断造成的,该故障可以由predicate fault、assignment fault引发产生

2)computation fault，计算故障,程序沿着正确的路径执行但是因为计算中的fault于是产生了不正确的输出，我们就说这是一个computation fault。

5. For fault-seeding purposes, faults should be “representative" of naturally-occurring faults; otherwise, any results obtained from the seeded faults may to be inaccurate or biased. Unfortunately, to date, there is no accepted model with which to determine whether seeded faults are representative.

为了故障播撒的目的，故障应该具有代表意义--代表一个自然发生的故障，否则，任何从种子故障得到的故障都可能是错误的，有偏差的。不幸的是，现在还没有一个可接受的模型去判断是否一个被播撒的故障是具有代表意义的。