Skip to main content

Foundations of XML Based on Logic and Automata: A Snapshot

  • Conference paper
Foundations of Information and Knowledge Systems (FoIKS 2012)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 7153))

Abstract

XML query and schema languages have some obvious connections to Formal Language Theory. For example, Document Type Definitions (DTDs) can be viewed as tree grammars and use regular expressions, XML Schemas resemble tree automata. Likewise, there are immediate links to Logic, e.g., through the classical characterization of regular tree languages by monadic second-order logic.

It is therefore not surprising that concepts from Logic and Formal Language Theory played an important role in the development of the theoretical foundations of XML query and schema languages. For example, they helped to analyze the expressiveness of languages, to understand the restrictions posed by the W3C standards, and to develop algorithms for various purposes, including efficient evaluation and static analysis.

However, methods from Logic and Formal Languages have not merely been applied to XML theory, the fertilization took place both ways. XML theory posed a lot of new challenges for Logic and Formal Language Theory and triggered various new research lines, e.g., the study of deterministic regular expressions and the development of automata for trees with data values.

The aim of the talk at FoIKS 2012 is to present some of the fundamental connections between XML query and schema languages and Logic and Formal Language Theory, to report on recent developments in the area, and to highlight some current directions of research. This accompanying paper is a kind of annotated bibliography for that talk.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Aho, A.V., Ullman, J.D.: Translations on a context-free grammar. Information and Control 19(5), 439–475 (1971)

    Article  MathSciNet  MATH  Google Scholar 

  2. Balmin, A., Papakonstantinou, Y., Vianu, V.: Incremental validation of xml documents. ACM Trans. Database Syst. 29(4), 710–751 (2004)

    Article  Google Scholar 

  3. Barbosa, D., Mendelzon, A.O., Libkin, L., Mignet, L., Arenas, M.: Efficient incremental validation of XML documents. In: ICDE, pp. 671–682. IEEE Computer Society (2004)

    Google Scholar 

  4. Benedikt, M., Fan, W., Geerts, F.: XPath satisfiability in the presence of DTDs. J. ACM 55(2) (2008)

    Google Scholar 

  5. Benedikt, M., Koch, C.: Interpreting Tree-to-Tree Queries. In: Bugliesi, M., Preneel, B., Sassone, V., Wegener, I. (eds.) ICALP 2006, Part II. LNCS, vol. 4052, pp. 552–564. Springer, Heidelberg (2006)

    Chapter  Google Scholar 

  6. Benedikt, M., Koch, C.: From XQuery to relational logics. ACM Trans. Database Syst. 34(4) (2009)

    Google Scholar 

  7. Berstel, J., Boasson, L.: Formal properties of XML grammars and languages. Acta Inf. 38(9), 649–671 (2002)

    Article  MathSciNet  MATH  Google Scholar 

  8. Bex, G.J., Maneth, S., Neven, F.: A Formal Model for an Expressive Fragment of XSLT. In: Palamidessi, C., Moniz Pereira, L., Lloyd, J.W., Dahl, V., Furbach, U., Kerber, M., Lau, K.-K., Sagiv, Y., Stuckey, P.J. (eds.) CL 2000. LNCS (LNAI), vol. 1861, pp. 1137–1151. Springer, Heidelberg (2000)

    Chapter  Google Scholar 

  9. Bex, G.J., Gelade, W., Martens, W., Neven, F.: Simplifying xml schema: effortless handling of nondeterministic regular expressions. In: Çetintemel, U., Zdonik, S.B., Kossmann, D., Tatbul, N. (eds.) SIGMOD Conference, pp. 731–744. ACM (2009)

    Google Scholar 

  10. Bex, G.J., Gelade, W., Neven, F., Vansummeren, S.: Learning deterministic regular expressions for the inference of schemas from XML data. TWEB 4(4) (2010)

    Google Scholar 

  11. Bex, G.J., Neven, F., Schwentick, T., Vansummeren, S.: Inference of concise regular expressions and DTDs. ACM Trans. Database Syst. 35(2) (2010)

    Google Scholar 

  12. Bex, G.J., Neven, F., Vansummeren, S.: Inferring XML schema definitions from XML data. In: VLDB, pp. 998–1009 (2007)

    Google Scholar 

  13. Bex, G.J., Neven, F., Vansummeren, S.: Schemascope: a system for inferring and cleaning xml schemas. In: Wang, J.T.-L. (ed.) SIGMOD Conference, pp. 1259–1262. ACM (2008)

    Google Scholar 

  14. Björklund, H., Bojańczyk, M.: Bounded Depth Data Trees. In: Arge, L., Cachin, C., Jurdziński, T., Tarlecki, A. (eds.) ICALP 2007. LNCS, vol. 4596, pp. 862–874. Springer, Heidelberg (2007)

    Chapter  Google Scholar 

  15. Björklund, H., Gelade, W., Martens, W.: Incremental xpath evaluation. ACM Trans. Database Syst. 35(4), 29 (2010)

    Article  Google Scholar 

  16. Björklund, H., Schwentick, T.: On notions of regularity for data languages. Theor. Comput. Sci. 411(4-5), 702–715 (2010)

    Article  MathSciNet  MATH  Google Scholar 

  17. Bojanczyk, M.: Data monoids. In: Schwentick, Dürr [82], pp. 105–116

    Google Scholar 

  18. Bojanczyk, M., Colcombet, T.: Tree-walking automata do not recognize all regular languages. SIAM J. Comput. 38(2), 658–701 (2008)

    Article  MathSciNet  MATH  Google Scholar 

  19. Bojanczyk, M., Lasota, S.: An extension of data automata that captures xpath. In: LICS, pp. 243–252. IEEE Computer Society (2010)

    Google Scholar 

  20. Bojanczyk, M., Muscholl, A., Schwentick, T., Segoufin, L.: Two-variable logic on data trees and xml reasoning. J. ACM 56(3) (2009)

    Google Scholar 

  21. Bojanczyk, M., Parys, P.: XPath evaluation in linear time. In: Lenzerini, M., Lembo, D. (eds.) PODS, pp. 241–250. ACM (2008)

    Google Scholar 

  22. Bojanczyk, M., Parys, P.: XPath evaluation in linear time. J. ACM 58(4), 17 (2011)

    Article  MathSciNet  MATH  Google Scholar 

  23. Bojańczyk, M., Samuelides, M., Schwentick, T., Segoufin, L.: Expressive Power of Pebble Automata. In: Bugliesi, M., Preneel, B., Sassone, V., Wegener, I. (eds.) ICALP 2006, Part I. LNCS, vol. 4051, pp. 157–168. Springer, Heidelberg (2006)

    Chapter  Google Scholar 

  24. Bouajjani, A., Habermehl, P., Jurski, Y., Sighireanu, M.: Rewriting Systems with Data. In: Csuhaj-Varjú, E., Ésik, Z. (eds.) FCT 2007. LNCS, vol. 4639, pp. 1–22. Springer, Heidelberg (2007)

    Chapter  Google Scholar 

  25. Bouyer, P., Petit, A., Therien, D.: An algebraic approach to data languages and timed languages. Inf. Comput. 182(2), 137–162 (2003)

    Article  MathSciNet  MATH  Google Scholar 

  26. Bray, T., Paoli, J., Sperberg-McQueen, C.M.: Extensible markup language (XML) working draft. Technical report, W3C (1996), www.w3.org/TR/WD-xml-961114

  27. Bray, T., Paoli, J., Sperberg-McQueen, C.M., Maler, E., Yergeau, F.: Extensible markup language (XML) 1.0 (fifth edition). Technical report, W3C (2008), www.w3.org/TR/2008/REC-xml-20081126

  28. Brüggemann-Klein, A., Wood, D.: Deterministic regular languages. Technical Report Bericht 38, Universität Freiburg, Institut für Informatik (1991)

    Google Scholar 

  29. Brüggemann-Klein, A., Wood, D.: One-unambiguous regular languages. Information and Computation 140(2), 229–253 (1998)

    Article  MathSciNet  MATH  Google Scholar 

  30. Brüggemann-Klein, A., Murata, M., Wood, D.: Regular tree languages over non-ranked alphabets (1998), http://hdl.handle.net/1783.1/738

  31. Büchi, J.R.: Weak second-order arithmetic and finite automata. Z. Math. Logik Grundl. Math. 6, 66–92 (1960)

    Article  MathSciNet  MATH  Google Scholar 

  32. Chamberlin, D., Clark, J., Florescu, D., Robie, J., Simeon, J., Stefanascu, M.: XQuery 1.0: An XML query language (2002), http://www.w3.org/TR/xquery/

  33. Clark, J., Murata, M.: Relax ng specification. Technical report, OASIS Committee Specification Document (2001)

    Google Scholar 

  34. Clark, J.: XSL transformations version 1.0 (August 1999), http://www.w3.org/TR/xslt

  35. Colcombet, T., Ley, C., Puppis, G.: On the Use of Guards for Logics with Data. In: Murlak, F., Sankowski, P. (eds.) MFCS 2011. LNCS, vol. 6907, pp. 243–255. Springer, Heidelberg (2011)

    Chapter  Google Scholar 

  36. World Wide Web Consortium. XML Path Language (XPath), Version 1.0. W3C Recommendation, November 16 (1999), http://www.w3.org/TR/xpath

  37. World Wide Web Consortium. XML Path Language (XPath), Version 2.0. W3C Recommendation, http://www.w3.org/TR/xpath20

  38. World Wide Web Consortium. XML schema, http://www.w3.org/XML/Schema

  39. Demri, S., Lazic, R.: LTL with the freeze quantifier and register automata. ACM Trans. Comput. Log. 10(3) (2009)

    Google Scholar 

  40. Demri, S., Sangnier, A.: When Model-Checking Freeze ltl Over Counter Machines Becomes Decidable. In: Ong, L. (ed.) FOSSACS 2010. LNCS, vol. 6014, pp. 176–190. Springer, Heidelberg (2010)

    Chapter  Google Scholar 

  41. Doner, J.: Tree acceptors and some of their applications. J. Comput. Syst. Sci. 4(5), 406–451 (1970)

    Article  MathSciNet  MATH  Google Scholar 

  42. Elgot, C.C.: Decision problems of finite automata design and related arithmetics. Transactions of The American Mathematical Society 98, 21–21 (1961)

    Article  MathSciNet  MATH  Google Scholar 

  43. Engelfriet, J., Hoogeboom, H.J.: Automata with nested pebbles capture first-order logic with transitive closure. Logical Methods in Computer Science 3(2) (2007)

    Google Scholar 

  44. Engelfriet, J., Hoogeboom, H.J., Samwel, B.: Xml transformation by tree-walking transducers with invisible pebbles. In: Libkin, L. (ed.) PODS, pp. 63–72. ACM (2007)

    Google Scholar 

  45. Engelfriet, J., Maneth, S.: Macro tree transducers, attribute grammars, and mso definable tree translations. Inf. Comput. 154(1), 34–91 (1999)

    Article  MathSciNet  MATH  Google Scholar 

  46. Figueira, D.: Satisfiability of downward XPath with data equality tests. In: PODS, pp. 197–206 (2009)

    Google Scholar 

  47. Figueira, D.: Forward-XPath and extended register automata on data-trees. In: ICDT, pp. 231–241 (2010)

    Google Scholar 

  48. Figueira, D., Segoufin, L.: Bottom-up automata on data trees and vertical xpath. In: Schwentick, Dürr [82], pp. 93–104

    Google Scholar 

  49. Garofalakis, M.N., Gionis, A., Rastogi, R., Seshadri, S., Shim, K.: Xtract: Learning document type descriptors from xml document collections. Data Min. Knowl. Discov. 7(1), 23–56 (2003)

    Article  MathSciNet  Google Scholar 

  50. Van den Bussche, J., Bex, G.J., Neven, F.: DTDs versus XML schema: A practical study. In: International Workshop on the World Wide Web and Databases, pp. 79–84 (2004)

    Google Scholar 

  51. Gelade, W., Gyssens, M., Martens, W.: Regular Expressions with Counting: Weak versus Strong Determinism. In: Královič, R., Niwiński, D. (eds.) MFCS 2009. LNCS, vol. 5734, pp. 369–381. Springer, Heidelberg (2009)

    Chapter  Google Scholar 

  52. Gelade, W., Neven, F.: Succinctness of the complement and intersection of regular expressions. In: Albers, S., Weil, P. (eds.) STACS. LIPIcs, vol. 1, pp. 325–336. Schloss Dagstuhl - Leibniz-Zentrum fuer Informatik, Germany (2008)

    Google Scholar 

  53. Glushkov, V.M.: The abstract theory of automata. Russian Math. Surveys 16, 1–53 (1961)

    Article  MathSciNet  MATH  Google Scholar 

  54. Goris, E., Marx, M.: Looping caterpillars. In: LICS 2005 (2005)

    Google Scholar 

  55. Gottlob, G., Koch, C., Pichler, R.: Efficient algorithms for processing xpath queries. ACM Trans. Database Syst. 30(2), 444–491 (2005)

    Article  Google Scholar 

  56. Green, T.J., Miklau, G., Onizuka, M., Suciu, D.: Processing XML Streams with Deterministic Automata. In: Calvanese, D., Lenzerini, M., Motwani, R. (eds.) ICDT 2003. LNCS, vol. 2572, pp. 173–189. Springer, Heidelberg (2002)

    Chapter  Google Scholar 

  57. Kaminski, M., Francez, N.: Finite-memory automata. Theor. Comput. Sci. 134(2), 329–363 (1994)

    Article  MathSciNet  MATH  Google Scholar 

  58. Koch, C., Scherzinger, S., Schweikardt, N., Stegmaier, B.: FluXQuery: An optimizing XQuery processor for streaming XML data. In: Proc. 30th International Conference on Very Large Data Bases (VLDB), Toronto, pp. 1309–1312 (2004)

    Google Scholar 

  59. Libkin, L.: Logics for unranked trees: An overview. Logical Methods in Computer Science 2(3) (2006)

    Google Scholar 

  60. Maneth, S., Berlea, A., Perst, T., Seidl, H.: XML type checking with macro tree transducers. In: PODS, pp. 283–294 (2005)

    Google Scholar 

  61. Maneth, S., Perst, T., Seidl, H.: Exact XML Type Checking in Polynomial Time. In: Schwentick, T., Suciu, D. (eds.) ICDT 2007. LNCS, vol. 4353, pp. 254–268. Springer, Heidelberg (2006)

    Chapter  Google Scholar 

  62. Martens, W., Neven, F.: Frontiers of tractability for typechecking simple xml transformations. J. Comput. Syst. Sci. 73(3), 362–390 (2007)

    Article  MathSciNet  MATH  Google Scholar 

  63. Martens, W., Neven, F., Schwentick, T.: Simple off the shelf abstractions for XML schema. SIGMOD Record 36(3), 15–22 (2007)

    Article  Google Scholar 

  64. Martens, W., Neven, F., Schwentick, T.: Complexity of decision problems for XML schemas and chain regular expressions. SIAM J. Comput. 39(4), 1486–1530 (2009)

    Article  MathSciNet  MATH  Google Scholar 

  65. Martens, W., Neven, F., Schwentick, T., Bex, G.J.: Expressiveness and complexity of XML schema. ACM Trans. Database Syst. 31(3), 770–813 (2006)

    Article  Google Scholar 

  66. Marx, M., de Rijke, M.: Semantic characterizations of XPath. In: TDM 2004 Workshop on XML Databases and Information Retrieval, Twente, The Netherlands, June 21 (2004)

    Google Scholar 

  67. Marx, M.: Conditional xpath. ACM Trans. Database Syst. 30(4), 929–959 (2005)

    Article  Google Scholar 

  68. Marx, M.: Logical Foundations of XML and XQuery. In: Tessaris, S., Franconi, E., Eiter, T., Gutierrez, C., Handschuh, S., Rousset, M.-C., Schmidt, R.A. (eds.) Reasoning Web 2009. LNCS, vol. 5689, pp. 111–157. Springer, Heidelberg (2009)

    Chapter  Google Scholar 

  69. Marx, M., de Rijke, M.: Semantic characterizations of navigational xpath. SIGMOD Record 34(2), 41–46 (2005)

    Article  Google Scholar 

  70. Milo, T., Suciu, D., Vianu, V.: Typechecking for xml transformers. J. Comput. Syst. Sci. 66(1), 66–97 (2003)

    Article  MathSciNet  MATH  Google Scholar 

  71. Møller, A., Schwartzbach, M.I.: The Design Space of Type Checkers for XML Transformation Languages. In: Eiter, T., Libkin, L. (eds.) ICDT 2005. LNCS, vol. 3363, pp. 17–36. Springer, Heidelberg (2005)

    Chapter  Google Scholar 

  72. Murata, M.: Forest-regular languages and tree-regular languages. Technical report, Fuji Xerox, Japan. Technical Report (1995)

    Google Scholar 

  73. Murata, M., Lee, D., Mani, M., Kawaguchi, K.: Taxonomy of XML schema languages using formal language theory. ACM Trans. Internet Techn. 5(4), 660–704 (2005)

    Article  Google Scholar 

  74. Neven, F., Schwentick, T.: Query automata over finite trees. Theoretical Computer Science 275, 633–674 (2002)

    Article  MathSciNet  MATH  Google Scholar 

  75. Neven, F.: Automata, logic, and XML. In: CSL, pp. 2–26 (2002)

    Google Scholar 

  76. Neven, F.: Automata theory for XML researchers. SIGMOD Record 31(3), 39–46 (2002)

    Article  Google Scholar 

  77. Neven, F., Schwentick, T.: Query automata over finite trees. Theoretical Computer Science 275(1-2), 633–674 (2002)

    Article  MathSciNet  MATH  Google Scholar 

  78. Neven, F., Schwentick, T.: On the power of tree-walking automata. Information and Control 183(1), 86–103 (2003)

    MathSciNet  MATH  Google Scholar 

  79. Neven, F., Schwentick, T.: On the complexity of XPath containment in the presence of disjunction, DTDs, and variables. Logical Methods in Computer Science 2(3) (2006)

    Google Scholar 

  80. Potthoff, A.: Logische Klassifizierung regulärer Baumsprachen. Doctor’s thesis, Institut für Informatik u. Prakt. Math., Universität Kiel (1994)

    Google Scholar 

  81. Schwentick, T.: Automata for XML - a survey. J. Comput. Syst. Sci. 73(3), 289–315 (2007)

    Article  MathSciNet  MATH  Google Scholar 

  82. Schwentick, T., Dürr, C. (eds.): 28th International Symposium on Theoretical Aspects of Computer Science, STACS 2011, Dortmund, Germany, March 10-12. LIPIcs, vol. 9. Schloss Dagstuhl - Leibniz-Zentrum fuer Informatik (2011)

    Google Scholar 

  83. Segoufin, L.: Automata and Logics for Words and Trees over an Infinite Alphabet. In: Ésik, Z. (ed.) CSL 2006. LNCS, vol. 4207, pp. 41–57. Springer, Heidelberg (2006)

    Chapter  Google Scholar 

  84. Segoufin, L.: Static analysis of xml processing with data values. SIGMOD Record 36(1), 31–38 (2007)

    Article  Google Scholar 

  85. Segoufin, L., Vianu, V.: Validating streaming XML documents. In: PODS 2002, pp. 53–64 (2002)

    Google Scholar 

  86. ten Cate, B., Lutz, C.: The complexity of query containment in expressive fragments of xpath 2.0. J. ACM 56(6) (2009)

    Google Scholar 

  87. ten Cate, B., Marx, M.: Axiomatizing the logical core of xpath 2.0. Theory Comput. Syst. 44(4), 561–589 (2009)

    Article  MathSciNet  MATH  Google Scholar 

  88. ten Cate, B., Segoufin, L.: Transitive closure logic, nested tree walking automata, and xpath. J. ACM 57(3) (2010)

    Google Scholar 

  89. Thatcher, J.W., Wright, J.B.: Generalized finite automata theory with an application to a decision problem of second-order logic. Mathematical Systems Theory 2(1), 57–81 (1968)

    Article  MathSciNet  MATH  Google Scholar 

  90. Thomas, W.: Languages, Automata, and Logic. In: Rozenberg, G., Salomaa, A. (eds.) Handbook of Formal Languages, vol. III, pp. 389–455. Springer, Heidelberg (1997)

    Chapter  Google Scholar 

  91. Trakhtenbrot, B.A.: Finite automata and logic of monadic predicates. Doklady Akademii Nauk SSSR 140, 326–329 (1961)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2012 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Schwentick, T. (2012). Foundations of XML Based on Logic and Automata: A Snapshot. In: Lukasiewicz, T., Sali, A. (eds) Foundations of Information and Knowledge Systems. FoIKS 2012. Lecture Notes in Computer Science, vol 7153. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-28472-4_2

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-28472-4_2

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-28471-7

  • Online ISBN: 978-3-642-28472-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics