Machine Learning (ML), as a sub domain of AI, is widely
used in the various stages of the software development life-cycle, especially
for automating software testing processes. Machine Learning algorithms have
proven to be of great practical value in a variety of application domains. They
are particularly useful for (a) poorly understood problem domains where little
knowledge exists for humans to develop effective algorithms; (b) domains where
there are large databases containing valuable implicit regularities to be
discovered; or (c) domains where programs must adapt to changing conditions.
Not surprisingly, the field of software engineering turns out to be a fertile
ground wherein many software development tasks could be formulated as learning
problems and approached in terms of learning algorithms.
Machine Learning techniques vary a great deal in terms of
their underlying theory, assumptions, and model representations. Techniques
also differ in terms of the research community they originate from, ranging from
traditional statistics to heuristic learning algorithms and neural networks.
The question is whether appropriate data can be collected and used by machine
learning algorithms in order to help decision making in software
engineering.
Various types of data can be collected while testing
software. Execution traces and coverage information for test cases can be
captured at different levels. Failure data that captures where and why a
failure occurs is also of interest..
Questions that arise when working on ML
implementation in software testing:
- What type
of machine learning methods can be effective in different aspects of
software testing?
- What are
the strengths and weaknesses of each learning method in the context of
testing?
- How can we
determine the proper type of learning method for the stages of a testing
process? Where are the critical points in a software testing process in
which ML can positively contribute?
In the Machine Learning area, various types of learning
methods have been introduced - Decision Trees (DT), Artificial Neural Networks
(ANN), Genetic Algorithms (GA) and more, and can be a combination of several
other learning methods. The two aspects of machine learning are: Training data
and learning technique. Learning technique is classified as supervised and
unsupervised learning.
Applying this to software testing, training data can be
collected in different stages of the testing process or development life-cycle.
The learning elements could be software metrics, software specification,
test case, execution data, failure reports and/or coverage data.
Let us identify the different phases of testing and the
tasks that can be implemented using ML!
In the test planning, cost estimation can help test
managers predict testing process costs and time, and can provide a good testing
plan to manage the testing process efficiently. Test case management includes
several tasks such as test case design, which intends to generate high quality
test cases; regression test suite to reuse the available test cases software
system to the existing test cases in order to reuse the available test cases;
and test case evaluation which intends to measure the quality of the generated
test cases.
In the test execution sub-dimension, fault localization
can help find the exact location of the program that is defected. In addition,
bug prioritization intends to prioritize the revealed faults based on their
severities
Consider one of the phase test case management here: To
answer which learning method fits into testing we need to prepare the data,
choose an algorithm, fit a model, examine the fitting results and then improve
the model or choose another algorithm. This gives a best performance one.
The methodology that researches worked on for this phase
of testing that uses ML algorithms is MELBA. MELBA (MachinE Learning based
refinement of BlAck-box test specification) methodology is an iterative process
and it consists of five main activities.
In Activity 1, test cases are transformed into the
abstract level using the Category-Partition (CP) strategy, so that they would
be ready to be used by the Machine Learning method in Activity 2. The CP method
helps to model the input domain. This technique requires as input, both a test
suite and a test specification in the form of Category-Partition (CP)
categories and choices. The CP method seeks to generate test cases that cover
the various execution conditions of a function. To apply the CP method, one
first identifies the input and environment parameters of each function, the
characteristics (categories) of each parameter and the choices of each
category. Categories are properties of parameters that can have an influence on
the behavior of the software under test (e.g., size of an array in a sorting
algorithm). Choices (e.g., whether an array is empty) are the potential values
of a category. Test frames and test data are generated according to the
categories and choices defined. As the input domain is modeled using CP
categories and choices, the test suite is then transformed into an abstract
test suite (Activity 1) in order to enable effective learning. An abstract test
case shows an output equivalence class and pairs category and choice, which
characterize its inputs and environment parameters instead of raw inputs. That
means we are using test specification to move test cases to a higher level of
abstraction. Once abstract test suite is available, the machine algorithm to be
used is decision trees for classifying the abstracted test cases. This comes
under Activity 2. In Activity 3, using a number of heuristics determine
potential problems that may indicate redundancy among test cases and the need
for additional test cases. In Activity 4, learnt rules may indicate the need to
update CP. In Activity 5, iteration continues till no problems are identified
in the trees.
Let us look at another task of testing where ML can be
applied in.
Regression testing: Regression testing is to check
whether modified software fails test cases where its earlier versions would
have passed.
These test cases constitute a database that can be taken
into account over a software life cycle. Regression testing generally accounts
for two thirds of the total verification cost. Hence, a careful selection of
test suite is crucial for an efficient regression testing.
If we can identify the parts of the SUT affected by
changes to a program, then we can focus on testing those parts and it may save
testing costs by avoiding irrelevant test cases. To address this, program
slicing technique can be combined with ML. A program slicing problem could be
anything of this kind: Let us assume a function x and the corresponding test
case that has x need to be considered and the rest is irrelevant.
Machine Learning techniques can be used to predict which
test cases may reveal bugs in the modification. Assuming the availability of
the training data with the test input, and the outputs as V, we also need the
dependency of the test cases for a certain function and assume that is
available from the previous database. This gives us a training set which can be
given to an artificial neural network.
With further work on the different phases of software
testing and with training data, we can use ML algorithms in testing. In
addition, there are several AI techniques which can be used to manage several
different tasks.