Notes on software testing

 

...we do not rely very much on the immediate and proper perception of the senses, but we bring the matter to the point that the senses judge only of the experiment, the experiment judges of the thing.

- Francis Bacon, New Organon (p. 16) 1

As guilefull Goldsmith that by secret skill,
With golden foyle doth finely ouer spred
Some baser metall, which commend he will
Vnto the vulgar for good gold insted,
He much more goodly glosse thereon doth shed,
To hide his falshood, then if it were trew:
So hard, this Idole was to be ared,
That Florimell her selfe in all mens vew
She seem'd to passe: so forged things do fairest shew.

- Edmund Spenser, The Faerie Queen (Book IV, Canto V, Stanza 15)

I.

The test - The word test traces its meaning to the Latin testum, an earthen pot used in assaying precious metals. The assay is the qualitative or quantitative analysis of a metal or ore for the purpose of determining its components and purity. Within the testum, gold would be melted and left to cool. The color of the cooled material indicated its purity as well as the nature of any adulteration. Gold adulterated with silver became white. Gold adulterated with lead became black 2. It is in this context that later, the word test takes on the legalistic sense of a trial or examination to determine the correctness of something.

The test, then, emerges from an attitude of doubt as to quality or correctness. The primary antagonist of the test is the mimic, forgery, or pretender - that which appears to be but is not. All those to be put under test are the claimants to some title - false suitors and true. The test allows us "to screen the claims (pretensions) and to distinguish the true pretender from the false one." 3

What is implied is the existence of some criteria a suitable procedure or act of demonstration - the test. This procedure, when carried out, has the power to "make a difference". This difference allows us to distinguish the false claimants from the true - those that meet the criteria from those that do not.

II.

The specification - A test requires that the criteria are given from somewhere. As Marick notes 4:

Tests can be derived from both the specification and the program...[this requires] either the program or the specification to identify all relevant cases. They don't, not always. The specification writer may have missed a special case...[or] the special case may be an implementation detail that isn't relevant to the specification.

Thus, in software the specification and the program are sources of these criteria. Clarifying the specification as well as identifying gaps and issues is a part of the testing process.

III.

The experiment - The word experiment comes from the Latin experimentum, a combination of the word forming element ex- "out of" and the verb peritus "experienced". Thus, meaning that which takes something out of what is experienced.

Whereas the test emerges from doubt regarding the quality of things, the experiment emerges from doubt regarding the quality of human understanding. We suspect that our understanding has gaps or lacks clarity, and we propose an experiment to help identify and address these. If the test denotes a top-down process that starts with claims and devises procedures for validating them, then the experiment operates bottom-up, perfecting and generating claims from the outcomes of acts of demonstration.

IV.

The prototype - Gaudi, when designing La Sagrada Familia, knew that arches and chains have a lot in common 5 - in fact, the "chain test" is a common method to determine whether or not an arch of a given shape and thickness will stand on its own or collapse under the applied weight. Knowledge of the "chain test" led Gaudi to build an entire upside-down model of La Sagrada Familia with weights suspended from ropes in order to determine the necessary shape of each arch in the final structure.

The prototype is the paradigmatic experiment. Its express aim is to determine how something must be built. It involves building a component iteratively and in isolation until enough is learned. This necessarily involves the prototyper in a loop in which their vision of what they are building evolves through prototyping.

V.

Instrumentation and actuation - As von Neumann writes about the execution of a computer program 6:

It is worth noting...that the device will in general produce essentially more numerical material (in order to reach the results) than the (final) results mentioned. Thus only a fraction of its numerical output will be recorded...the remainder will only circulate in the interior of the device, and never be recorded for human sensing.

The recording of computational results for human sensing is fundamental to computing. Every computation and any results it produces are by their very nature hidden from human perception. Similarly, effects within the computer are beyond the capabilities of unaided humans to produce. Therefore, instrumentation and actuation are fundamental arts of software development. As a general rule, each effect produced within a program as well as each result surfaced for perception requires the modification of the program.

Testing all but the simplest programs also involves the modification of the unit under test to provide the instrumentation and actuation required to conduct the test. This will necessarily result in a different program than the one that does not expose any instrumentation or actuation.

VI.

Two purposes of software tests - A software test is an act of demonstration in which a piece of software is actuated in a particular way and its instrumentation is then inspected and compared against an expected result. A software test can constitute either a test in the above sense or an experiment.

Software tests serve two primary purposes. The first, is as an element of an argument for the correctness of a program. The second is as a predicate of determination for a software concept. The conflation of these purposes under the shared name "test" has been the source of much turmoil and confusion in the history of software testing.

VII.

Testing as argument for correctness - What kind of argument does a series of tests make for the correctness of our program? We know they are not (and generally cannot) be exhaustive enough to constitute a complete demonstration of correctness. They are usually too informal to be a proof. They do not typically produce a statistical proof of correctness.

In practice, tests are often intended to serve as part of an eliminative inductive proof of correctness. To construct such a proof we typically begin with a claim - for example, "The function handles all malformed strings". Given this claim we then do some thinking and research to identify ways our claim could be disproven - these are called “defeaters”. These are statements like “Unless it crashes when given a null string pointer.” If all of the defeaters for a claim can be disproven and some supporting evidence provided for it, then we can consider it provisionally proven unless new unanticipated defeaters come to light (eg., a crash or unexpected functionality in practice).

The specification, experience of the individual software developer, and the history of software errors provide the sources of possible defeaters for any piece of software. Linters like cppcheck demonstrate this historical aspect particularly well.

VIII.

Testing as conceptual determination - Tests and test writing can also be viewed as a process for discovering or more clearly determining the object to be built. As Kant writes in his "Reflexionen" 7:

Every object is known through predicates which we think or assert of it. Before this, any representations that may be found in us are to be regarded only as material for cognition, not as themselves cognitions. An object, therefore, is only a something in general which we think to ourselves through certain predicates which constitute the concept. Every judgment, therefore, contains two predicates which we compare with one another. One of these, which constitutes the given cognition of the object, is called the logical subject; the other, which is compared with it, is called the predicate.

Under a Kantian analogy we can regard inputs as the representations / concepts given to our software. Our program then takes these "given" representations and operates on them to construct its own concepts or produce outputs for human sensing. Tests serve as the predicates of determination for these software concepts. Each test determines its subject by an act of demonstration. Each act of demonstration proves the applicability of some predicate(s) to our software. The more demonstrations we have, the more clearly determined our software is.

IX.

Formal methods vs test-driven - The divide in software between formal methods and test-driven schools of thought can, from this perspective, be seen as a debate between proponents of theoretical (formal methods 8) or empirical (test-driven) determination of software concepts. Where theoretical denotes a purely logical and mathematical approach to the determination software objects (Alloy, VDM, Z-Notation, Strong static typing, etc.). In contrast to this, empirical denotes an approach based on acts of demonstration (Tests, linters, etc.).

What is truly at issue between the two schools is when software concepts should, ideally, be determined - a priori for the formal methods crowd and a posteriori for the test-driven crowd. Determining concepts a posteriori does not always make for rigorous or complete determination. Similarly, the possibility of determining them a priori is always limited by the quality and completeness of ones knowledge of what is to be produced.


Footnotes

incoming(1): writing

Last update on 7E681F, edited 1 times. 2/2thh