Skip to main content

Towards Flexible Task Environments for Comprehensive Evaluation of Artificial Intelligent Systems and Automatic Learners

  • Conference paper
  • First Online:
Artificial General Intelligence (AGI 2015)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 9205))

Included in the following conference series:

Abstract

Evaluation of artificial intelligence (AI) systems is a prerequisite for comparing them on the many dimensions they are intended to perform on. Design of task-environments for this purpose is often ad-hoc, focusing on some limited aspects of the systems under evaluation. Testing on a wide range of tasks and environments would better facilitate comparisons and understanding of a system’s performance, but this requires that manipulation of relevant dimensions cause predictable changes in the structure, behavior, and nature of the task-environments. What is needed is a framework that enables easy composition, decomposition, scaling, and configuration of task-environments. Such a framework would not only facilitate evaluation of the performance of current and future AI systems, but go beyond it by allowing evaluation of knowledge acquisition, cognitive growth, lifelong learning, and transfer learning. In this paper we list requirements that we think such a framework should meet to facilitate the evaluation of intelligence, and present preliminary ideas on how this could be realized.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Archibald, T.W., McKinnon, K.I.M., Thomas, L.C.: On the generation of Markov decision processes. J. Oper. Res. Soc. 46, 354–361 (1995)

    Article  Google Scholar 

  2. Asta, S., Özcan, E., Parkes, A.J.: Batched mode hyper-heuristics. In: Nicosia, G., Pardalos, P. (eds.) LION 7. LNCS, vol. 7997, pp. 404–409. Springer, Heidelberg (2013)

    Chapter  Google Scholar 

  3. Bhatnagar, S., Sutton, R.S., Ghavamzadeh, M., Lee, M.: Natural actor-critic algorithms. Automatica 45(11), 2471–2482 (2009)

    Article  MathSciNet  MATH  Google Scholar 

  4. Bieger, J., Thórisson, K.R., Garrett, D.: Raising AI: tutoring matters. In: Goertzel, B., Orseau, L., Snaider, J. (eds.) AGI 2014. LNCS, vol. 8598, pp. 1–10. Springer, Heidelberg (2014)

    Google Scholar 

  5. Bischl, B., Mersmann, O., Trautmann, H., Preuß, M.: Algorithm selection based on exploratory landscape analysis and cost-sensitive learning. In: Proceedings of the 14th Annual Conference on Genetic and Evolutionary Computation, GECCO 2012, pp. 313–320. ACM, New York (2012)

    Google Scholar 

  6. Burke, E.K., Gendreau, M., Hyde, M., Kendall, G., Ochoa, G., Özcan, E., Qu, R.: Hyper-heuristics: A survey of the state of the art. J. Oper. Res. Soc. 64(12), 1695–1724 (2013)

    Article  Google Scholar 

  7. Decker, K.: TAEMS: A framework for environment centered analysis & design of coordination mechanisms. In: O’Hare, G.M.P., Jennings, N.R. (eds.) Foundations of Distributed Artificial Intelligence, pp. 429–448. Wiley Inter-Science (1996)

    Google Scholar 

  8. Ebner, M., Levine, J., Lucas, S.M., Schaul, T., Thompson, T., Togelius, J.: Towards a video game description language. In: Lucas, S.M., Mateas, M., Preuss, M., Spronck, P., Togelius, J. (eds.) Artificial and Computational Intelligence in Games. Dagstuhl Follow-Ups, vol. 6, pp. 85–100. Schloss Dagstuhl (2013)

    Google Scholar 

  9. Garrett, D., Bieger, J., Thórisson, K.R.: Tunable and generic problem instance generation for multi-objective reinforcement learning. In: ADPRL 2014. IEEE (2014)

    Google Scholar 

  10. Hernández-Orallo, J.: A (hopefully) non-biased universal environment class for measuring intelligence of biological and artificial systems. In: Baum, E., Hutter, M., Kitzelmann, E. (eds.) AGI 2010, pp. 182–183. Atlantis Press (2010)

    Google Scholar 

  11. Hernández-Orallo, J.: AI Evaluation: past, present and future (2014). arXiv:1408.6908

  12. Hernández-Orallo, J., Dowe, D.L.: Measuring universal intelligence: Towards an anytime intelligence test. Artif. Intell. 174(18), 1508–1539 (2010)

    Article  MATH  Google Scholar 

  13. Legg, S., Hutter, M.: Tests of Machine Intelligence [cs] (December 2007). arXiv:0712.3825

  14. Legg, S., Veness, J.: An approximation of the universal intelligence measure. In: Dowe, D.L. (ed.) Solomonoff Festschrift. LNCS(LNAI), vol. 7070, pp. 236–249. Springer, Heidelberg (2013)

    Google Scholar 

  15. Lim, C.U., Harrell, D.F.: An approach to general videogame evaluation and automatic generation using a description language. In: CIG 2014. IEEE (2014)

    Google Scholar 

  16. Love, N., Hinrichs, T., Haley, D., Schkufza, E., Genesereth, M.: General game playing: Game description language specification. Tech. Rep. LG-2006-01, Stanford Logic Group (2008)

    Google Scholar 

  17. McDermott, D., Ghallab, M., Howe, A., Knoblock, C., Ram, A., Veloso, M., Weld, D., Wilkins, D.: PDDL-The Planning Domain Definition Language. Tech. Rep. TR-98-003, Yale Center for Computational Vision and Control (1998). http://www.cs.yale.edu/homes/dvm/

  18. Rohrer, B.: Accelerating progress in Artificial General Intelligence: Choosing a benchmark for natural world interaction. J. Art. Gen. Int. 2(1), 1–28 (2010)

    Article  Google Scholar 

  19. Schaul, T.: A video game description language for model-based or interactive learning. In: CIG 2013, pp. 1–8. IEEE (2013)

    Google Scholar 

  20. Schaul, T., Togelius, J., Schmidhuber, J.: Measuring intelligence through games (2011). arXiv preprint arXiv:1109.1314

  21. Togelius, J., Champandard, A.J., Lanzi, P.L., Mateas, M., Paiva, A., Preuss, M., Stanley, K.O.: Procedural content generation: Goals, challenges and actionable steps. In: Lucas, S.M., Mateas, M., Preuss, M., Spronck, P., Togelius, J. (eds.) Artificial and Computational Intelligence in Games. Dagstuhl Follow-Ups, vol. 6, pp. 61–75. Schloss Dagstuhl (2013)

    Google Scholar 

  22. Turing, A.M.: Computing machinery and intelligence. Mind 59(236), 433–460 (1950)

    Article  MathSciNet  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jordi Bieger .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer International Publishing Switzerland

About this paper

Cite this paper

Thórisson, K.R., Bieger, J., Schiffel, S., Garrett, D. (2015). Towards Flexible Task Environments for Comprehensive Evaluation of Artificial Intelligent Systems and Automatic Learners. In: Bieger, J., Goertzel, B., Potapov, A. (eds) Artificial General Intelligence. AGI 2015. Lecture Notes in Computer Science(), vol 9205. Springer, Cham. https://doi.org/10.1007/978-3-319-21365-1_20

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-21365-1_20

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-21364-4

  • Online ISBN: 978-3-319-21365-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics