Friday, August 9, 2013

Artificial Intelligence to Software Testing

Machine Learning (ML), as a sub domain of AI, is widely used in the various stages of the software development life-cycle, especially for automating software testing processes. Machine Learning algorithms have proven to be of great practical value in a variety of application domains. They are particularly useful for (a) poorly understood problem domains where little knowledge exists for humans to develop effective algorithms; (b) domains where there are large databases containing valuable implicit regularities to be discovered; or (c) domains where programs must adapt to changing conditions. Not surprisingly, the field of software engineering turns out to be a fertile ground wherein many software development tasks could be formulated as learning problems and approached in terms of learning algorithms. 
Machine Learning techniques vary a great deal in terms of their underlying theory, assumptions, and model representations. Techniques also differ in terms of the research community they originate from, ranging from traditional statistics to heuristic learning algorithms and neural networks. The question is whether appropriate data can be collected and used by machine learning algorithms in order to help decision making in software engineering. 
Various types of data can be collected while testing software. Execution traces and coverage information for test cases can be captured at different levels. Failure data that captures where and why a failure occurs is also of interest.. 
Questions that arise when working on  ML  implementation in software testing: 
  1. What type of machine learning methods can be effective in different aspects of software testing?
  2. What are the strengths and weaknesses of each learning method in the context of testing?
  3. How can we determine the proper type of learning method for the stages of a testing process? Where are the critical points in a software testing process in which ML can positively contribute? 
In the Machine Learning area, various types of learning methods have been introduced - Decision Trees (DT), Artificial Neural Networks (ANN), Genetic Algorithms (GA) and more, and can be a combination of several other learning methods. The two aspects of machine learning are: Training data and learning technique. Learning technique is classified as supervised and unsupervised learning. 
Applying this to software testing, training data can be collected in different stages of the testing process or development life-cycle.  The learning elements could be software metrics, software specification, test case, execution data, failure reports and/or coverage data.
Let us identify the different phases of testing and the tasks that can be implemented using ML!
In the test planning, cost estimation can help test managers predict testing process costs and time, and can provide a good testing plan to manage the testing process efficiently. Test case management includes several tasks such as test case design, which intends to generate high quality test cases; regression test suite to reuse the available test cases software system to the existing test cases in order to reuse the available test cases; and test case evaluation which intends to measure the quality of the generated test cases.
In the test execution sub-dimension, fault localization can help find the exact location of the program that is defected. In addition, bug prioritization intends to prioritize the revealed faults based on their severities
Consider one of the phase test case management here: To answer which learning method fits into testing we need to prepare the data, choose an algorithm, fit a model, examine the fitting results and then improve the model or choose another algorithm. This gives a best performance one.
The methodology that researches worked on for this phase of testing that uses ML algorithms is MELBA. MELBA (MachinE Learning based refinement of BlAck-box test specification) methodology is an iterative process and it consists of five main activities.

In Activity 1, test cases are transformed into the abstract level using the Category-Partition (CP) strategy, so that they would be ready to be used by the Machine Learning method in Activity 2. The CP method helps to model the input domain. This technique requires as input, both a test suite and a test specification in the form of Category-Partition (CP) categories and choices. The CP method seeks to generate test cases that cover the various execution conditions of a function. To apply the CP method, one first identifies the input and environment parameters of each function, the characteristics (categories) of each parameter and the choices of each category. Categories are properties of parameters that can have an influence on the behavior of the software under test (e.g., size of an array in a sorting algorithm). Choices (e.g., whether an array is empty) are the potential values of a category. Test frames and test data are generated according to the categories and choices defined. As the input domain is modeled using CP categories and choices, the test suite is then transformed into an abstract test suite (Activity 1) in order to enable effective learning. An abstract test case shows an output equivalence class and pairs category and choice, which characterize its inputs and environment parameters instead of raw inputs. That means we are using test specification to move test cases to a higher level of abstraction. Once abstract test suite is available, the machine algorithm to be used is decision trees for classifying the abstracted test cases. This comes under Activity 2. In Activity 3, using a number of heuristics determine potential problems that may indicate redundancy among test cases and the need for additional test cases. In Activity 4, learnt rules may indicate the need to update CP. In Activity 5, iteration continues till no problems are identified in the trees.

Let us look at another task of testing where ML can be applied in.
Regression testing: Regression testing is to check whether modified software fails test cases where its earlier versions would have passed.
These test cases constitute a database that can be taken into account over a software life cycle. Regression testing generally accounts for two thirds of the total verification cost. Hence, a careful selection of test suite is crucial for an efficient regression testing.
If we can identify the parts of the SUT affected by changes to a program, then we can focus on testing those parts and it may save testing costs by avoiding irrelevant test cases. To address this, program slicing technique can be combined with ML. A program slicing problem could be anything of this kind: Let us assume a function x and the corresponding test case that has x need to be considered and the rest is irrelevant.

Machine Learning techniques can be used to predict which test cases may reveal bugs in the modification. Assuming the availability of the training data with the test input, and the outputs as V, we also need the dependency of the test cases for a certain function and assume that is available from the previous database. This gives us a training set which can be given to an artificial neural network.

With further work on the different phases of software testing and with training data, we can use ML algorithms in testing. In addition, there are several AI techniques which can be used to manage several different tasks.


No comments:

Post a Comment