Skip to main content
Log in

Automatically finding the control variables for complex system behavior

  • Published:
Automated Software Engineering Aims and scope Submit manuscript

Abstract

Testing large-scale systems is expensive in terms of both time and money. Running simulations early in the process is a proven method of finding the design faults likely to lead to critical system failures, but determining the exact cause of those errors is still time-consuming and requires access to a limited number of domain experts. It is desirable to find an automated method that explores the large number of combinations and is able to isolate likely fault points.

Treatment learning is a subset of minimal contrast-set learning that, rather than classifying data into distinct categories, focuses on finding the unique factors that lead to a particular classification. That is, they find the smallest change to the data that causes the largest change in the class distribution. These treatments, when imposed, are able to identify the factors most likely to cause a mission-critical failure. The goal of this research is to comparatively assess treatment learning against state-of-the-art numerical optimization techniques. To achieve this, this paper benchmarks the TAR3 and TAR4.1 treatment learners against optimization techniques across three complex systems, including two projects from the Robust Software Engineering (RSE) group within the National Aeronautics and Space Administration (NASA) Ames Research Center. The results clearly show that treatment learning is both faster and more accurate than traditional optimization methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Acevedo, A., Arnold, J., Othon, W., Berndt, J.: ANTARES: Spacecraft simulation for multiple user communities and facilities. In: AIAA Modeling and Simulation Technologies Conference and Exhibit, pp. 2007–6888 (2007)

  • Agrawal, R., Imeilinski, T., Swami, A.: Mining association rules between sets of items in large databases. In: Proceedings of the 1993 ACM SIGMOD Conference, Washington, DC, USA (1993). Available from http://citeseer.nj.nec.com/agrawal93mining.html

  • Antoniol, G., Gueheneuc, Y.: Feature identification: a novel approach and a case study. In: ICSM 2005, pp. 357–366 (2005)

  • Antoniol, G., Canfora, G., Casazza, G., De Lucia, A., Merlo, E.: Recovering traceability links between code and documentation. IEEE Trans. Softw. Eng. 28(10), 970–983 (2002)

    Article  Google Scholar 

  • Austin, P., Grootendorst, P., Anderson, G.: A comparison of the ability of different propensity score models to balance measured variables between treated and untreated subjects: a Monte Carlo study. Stat. Med. 26, 734–753 (2007)

    Article  MathSciNet  Google Scholar 

  • Basili, V., McGarry, F., Pajerski, R., Zelkowitz, M.: Lessons learned from 25 years of process improvement: the rise and fall of the NASA software engineering laboratory. In: Proceedings of the 24th International Conference on Software Engineering (ICSE) 2002, Orlando, Florida (2002). Available from http://www.cs.umd.edu/projects/SoftEng/ESEG/papers/83.88.pdf

  • Bay, S.B., Pazzani, M.J.: Detecting change in categorical data: mining contrast sets. In: Proceedings of the Fifth International Conference on Knowledge Discovery and Data Mining (1999). Available from http://www.ics.uci.edu/pazzani/Publications/stucco.pdf

  • Bishop, C.: Pattern Recognition and Machine Learning. Springer, New York (2007)

    Google Scholar 

  • Boehm, B., Papaccio, P.: Understanding and controlling software costs. IEEE Trans. Softw. Eng. 14(10), 1462–1477 (1988)

    Article  Google Scholar 

  • Boetticher, G.: An assessment of metric contribution in the construction of a neural network-based effort estimator. In: Second International Workshop on Soft Computing Applied to Software Engineering, Enschade, NL (2001). Available from: http://nas.cl.uh.edu/boetticher/publications.html

  • Breiman, L., Friedman, J.H., Olshen, R.A., Stone, C.J.: Classification and regression trees. Technical report, Wadsworth International, Monterey, CA (1984)

  • Cai, C.H., Fu, A.W.C., Cheng, C.H., Kwong, W.W.: Mining association rules with weighted items. In: Proceedings of International Database Engineering and Applications Symposium (IDEAS 98) (August 1998). Available from http://www.cse.cuhk.edu.hk/kdd/assoc_rule/paper.pdf

  • Cleland-Huang, J., Settimi, R., Zou, X., Solc, P.: The detection and classification of non-functional requirements with application to early aspects. In: RE 2006, pp. 36–45 (2006)

  • Cornford, S.L., Feather, M.S., Hicks, K.A.: DDP a tool for life-cycle risk management. In: IEEE Aerospace Conference, Big Sky, Montana, pp. 441–451 (March 2001)

  • Dechter, R.: Constraint Processing. Morgan Kaufmann, San Mateo (2003)

    Google Scholar 

  • Eruhimov, V., Martyanov, V., Tuv, E.: Knowledge discovery in databases: PKDD 2007. In: Constructing High Dimensional Feature Space for Time Series Classification, pp. 414–421. Springer, Berlin (2007)

    Google Scholar 

  • Fayyad, U., Irani, I.: Multi-interval discretization of continuous-valued attributes for classification learning. In: Proceedings of the Thirteenth International Joint Conference on Artificial Intelligence, pp. 1022–1027 (1993)

  • Feather, M., Cornford, S., Hicks, K., Kiper, J., Menzies, T.: Application of a broad-spectrum quantitative requirements model to early-lifecycle decision making. In: IEEE Software (2008). Available from http://menzies.us/pdf/08ddp.pdf

  • Fischer, B., Schumann, J.: Autobayes: a system for generating data analysis programs from statistical models. J. Funct. Program. 13, 483–508 (2003)

    Article  MATH  MathSciNet  Google Scholar 

  • Gay, G., Menzies, T., Jalali, O., Mundy, G., Gilkerson, B., Feather, M., Kiper, J.: Finding robust solutions in requirements models. Autom. Softw. Eng. 17(1), 87–116 (2010)

    Article  Google Scholar 

  • Gigerenzer, G., Goldstein, D.G.: Reasoning the fast and frugal way: models of bounded rationality. Psychol. Rev. 650–669 (1996)

  • Gill, P.E., Murray, W., Wright, M.H.: Practical Optimization. Academic Press, San Diego (1981)

    MATH  Google Scholar 

  • Goldberg, D.E.: Genetic Algorithms in Search, Optimization, and Machine Learning. Addison–Wesley, Reading (1989)

    MATH  Google Scholar 

  • Gu, J., Purdom, P., Franco, J., Wah, B.: Algorithms for the satisfiability (sat) problem: a survey. In: DIMACS Series in Discrete Mathematics and Theoretical Computer Science, pp. 19–152. American Mathematical Society, Providence (1997)

    Google Scholar 

  • Gundy-Burlet, K., Schumann, J., Barrett, T., Menzies, T.: Parametric analysis of ANTARES re-entry guidance algorithms using advanced test generation and data analysis. In: 9th International Symposium on Artificial Intelligence, Robotics and Automation in Space (2007)

  • Gundy-Burlet, K., Schumann, J., Barrett, T., Menzies, T.: Parametric analysis of a hover test vehicle using advanced test generation and data analysis. In: AIAA Aerospace (2009)

  • Holte, R.C.: Very simple classification rules perform well on most commonly used datasets. Mach. Learn. 11, 63 (1993)

    Article  MATH  Google Scholar 

  • Holzmann, G.J.: The model checker SPIN. IEEE Trans. Softw. Eng. 23(5), 279–295 (1997)

    Article  MathSciNet  Google Scholar 

  • Hu, Y.: Treatment learning: implementation and application. Master’s thesis, Department of Electrical Engineering, University of British Columbia (2003)

  • Jing, H., George, R., Tuv, E.: Contributors to a signal from an artificial contrast. In: Informatics in Control, Automation and Robotics II, pp. 71–78. Springer, Berlin (2007)

    Chapter  Google Scholar 

  • Kirkpatrick, S., Gelatt, C.D., Vecchi, M.P.: Optimization by simulated annealing. Science 4598, 671–680 (1983)

    Article  MathSciNet  Google Scholar 

  • Kohavi, R., John, G.: Wrappers for feature subset selection. Artif. Intell. 97(1–2), 273–324 (1997)

    Google Scholar 

  • Mann, H.B., Whitney, D.R.: On a test of whether one of two random variables is stochastically larger than the other. Ann. Math. Stat. 18(1), 50–60 (1947). Available on-line at http://projecteuclid.org/DPubS?service=UI&version=1.0&verb=Display&handle=euclid.aoms/1177730491

    Article  MATH  MathSciNet  Google Scholar 

  • Marcus, A., Maletic, J.: Recovering documentation-to-source code traceability links using latent semantic indexing. In: Proceedings of the Twenty-Fifth International Conference on Software Engineering (2003)

  • Menzies, T., Hu, Y.: Data mining for very busy people. In: IEEE Computer (November 2003). Available from http://menzies.us/pdf/03tar2.pdf

  • Menzies, T., Sinsel, E.: Practical large scale what-if queries: case studies with software risk assessment. In: Proceedings ASE 2000 (2000). Available from http://menzies.us/pdf/00ase.pdf

  • Menzies, T., Dekhtyar, A., Distefano, J., Greenwald, J.: Problems with precision. IEEE Trans. Softw. Eng. (September 2007). Available from http://menzies.us/pdf/07precision.pdf

  • Menzies, T., Greenwald, J., Frank, A.: Data mining static code attributes to learn defect predictors. IEEE Trans. Soft. Eng. (January 2007). Available from http://menzies.us/pdf/06learnPredict.pdf

  • Metropolis, N., Rosenbluth, A.W., Rosenbluth, M.N., Teller, A.H., Teller, E.: Equation of state calculations by fast computing machines. J. Chem. Phys 21, 1087–1092 (1953)

    Article  Google Scholar 

  • Oakley, J., O’Hagan, A.: Probabilistic sensitivity analysis of complex models: a Bayesian approach. J. R. Stat. Soc. B 66(3), 751–769 (2004)

    Article  MATH  MathSciNet  Google Scholar 

  • Orrego, A.S.: Sawtooth: Learning from huge amounts of data. Master’s thesis, Computer Science, West Virginia University (2004)

  • Quinlan, R.: C4.5: Programs for Machine Learning. Morgan Kaufman, San Mateo (1992). ISBN: 1558602380

    Google Scholar 

  • Rose, K., Smith, E., Gardner, R., Brenkert, A., Bartell, S.: Parameter sensitivities, Monte Carlo filtering, and model forecasting under uncertainty. J. Forecast. 10, 117–133 (1991)

    Article  Google Scholar 

  • Saltelli, A., Chan, K., Scott, E.M.: Sensitivity Analysis. Wiley, New York (2000)

    MATH  Google Scholar 

  • Saltelli, A., Ratto, M., Andres, T., Campolongo, F., Cariboni, J., Gatelli, D., Saisana, M., Tarantola, S.: Global Sensitivity Analysis: The Primer. Wiley, New York (2008)

    MATH  Google Scholar 

  • Schumann, J., Gundy-Burlet, K., Pasareanu, C., Menzies, T., Barrett, T.: Tool support for parametric analysis of large software systems. In: Proc. Automated Software Engineering, 23rd IEEE/ACM International Conference (2008)

  • Schumann, J., Gundy-Burlet, K., Pasareanu, C., Menzies, T., Barrett, A.: Software V&V support by parametric analysis of large software simulation systems. In: 2009 IEEE Aerospace Conference (2009)

  • Sendall, S., Kozacaynski, W.: Model transformation: the heart and soul of model-driven software development. IEEE Softw. 20(5), 42–45 (2003)

    Article  Google Scholar 

  • Sims, C.: Matlab optimization software. QM&RBC Codes, Quantitative Macroeconomics & Real Business Cycles (March 1999)

  • Spear, R., Grieb, T., Shang, N.: Parameter uncertainty and interaction in complex environmental models. Water Resour. Res. 30(11), 3159–3169 (1994)

    Article  Google Scholar 

  • Taylor, B.J., Darrah, M.A.: Rule extraction as a formal method for the verification and validation of neural networks. In: IJCNN ’05: Proceedings. 2005 IEEE International Joint Conference on Neural Networks, vol. 5, pp. 2915–2920 (2005)

  • Torkkola, K., Tuv, E.: Ensembles of regularized least squares classifiers for high-dimensional problems. In: Feature Extraction, pp. 297–313. Springer, Berlin (2006)

    Chapter  Google Scholar 

  • Towell, G., Shavlik, J.: Extracting refined rules from knowledge-based neural networks. Mach. Learn. 13, 71–101 (1993)

    Google Scholar 

  • Turhan, B., Menzies, T., Bener, A.B., Di Stefano, J.: On the relative value of cross-company and within-company data for defect prediction. In: Empirical Software Engineering (2009). Available from http://menzies.us/pdf/08ccwc.pdf

  • Tuv, E., Borisov, A., Torkkola, K.: Best subset feature selection for massive mixed-type problems. In: Intelligent Data Engineering and Automated Learning—IDEAL 2006, pp. 1048–1056. Springer, Berlin (2006)

    Chapter  Google Scholar 

  • Uribe, T., Stickel, M.: Ordered binary decision diagrams and the Davis-Putnam procedure. In: Proc. of the 1st International Conference on Constraints in Computational Logics, pp. 34–49. Springer, Berlin (1994)

    Chapter  Google Scholar 

  • Witten, I.H., Frank, E.: Data Mining: Practical Machine Learning Tools and Techniques with Java Implementations. Morgan Kaufmann, San Mateo (1999)

    Google Scholar 

  • Witten, I.H., Frank, E.: Data Mining: Practical Machine Learning Tools and Techniques, 2nd edn. Morgan Kaufmann, San Mateo (2005)

    MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Gregory Gay.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Gay, G., Menzies, T., Davies, M. et al. Automatically finding the control variables for complex system behavior. Autom Softw Eng 17, 439–468 (2010). https://doi.org/10.1007/s10515-010-0072-x

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10515-010-0072-x

Keywords

Navigation