This paper examines the use of genetic algorithms (GAs) in generating sets of input data to use for software testing. The aim is to produce test sets which maximise coverage of the software using a given metric, whilst minimising the size of the sets.
Using the well known triangle program as an example, a representation is described which allows the GA to learn the number of test cases in a set. This is done by adding a set of flags to the encoding, which determine whether or not a gene is expressed (in this case, whether a test case is used as input to the program). A simple mechanism for biassing the search towards longer or shorter sets is described.
A study is then made of the effect of changing chromosome lengths and initialisation procedures, and the relationship that this has to the quality and size of the test sets evolved, in order to assess the scalability of the evolutionary approach to “real-world” problems, and the factors that would need to be taken into consideration when designing systems for the automatic generation of test cases.