Skip to main content
Log in

Foundations of Fast Communication via XML

  • Published:
Annals of Software Engineering

Abstract

Communication with XML often involves pre-agreed document types. In this paper, we propose an offline parser generation approach to enhance online processing performance for documents conforming to a given DTD. Our examination of DTDs and the languages they define demonstrates the existence of ambiguities. We present an algorithm that maps DTDs to deterministic context-free grammars defining the same languages. We prove the grammars to be LL(1) and LALR(1), making them suitable for standard parser generators. Our experiments show the superior performance of generated optimized parsers. Our results generalize from DTDs to XML schema specifications with certain restrictions, most notably the absence of namespaces, which exceed the scope of context-free grammars.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Apache (2002), Xerces C++ Parser, Apache XML Project, http://xml.apache.org/xerces-c/.

  • B2B Group (2002), aXMLerate Project, University of Karlsruhe, http://i44pc29.info.unikarlsruhe.de/B2Bweb/.

  • Berstel, J. and L. Boasson (2000), “XML Grammars, ” In Mathematical Foundations of Computer Science (MFCS'2000), N. Nielsen and B. Rovan, Eds., Lecture Notes in Computer Science, Vol. 1893, Springer, pp. 182–191. Long version as Technical Report IGM 2000–06, see www-igm.univ-mlv. fr/~berstel/Recherche.html.

  • Brüggemann-Klein, A. (1993), “Regular Expressions into Finite Automata, ” Theoretical Computer Science 120, 2, 197–213.

    Article  Google Scholar 

  • Clark, J. (2000), “Expat - XML Parser Toolkit Version 1.2, ” http://www.jclark.com/xml/expat.html.

  • DeRemer, F.L. (1971), “Simple LR(k) Grammars, ” Communications of the ACM 14, 7, 453–460.

    Article  Google Scholar 

  • Donelly and Stallmann (1988), “Bison Manual, ” The GNU Project, http://www.gnu.org/manual/bison/.

  • Grosch, J. (1989), “Generators for High-Speed Front-Ends, ” In Proceedings of the 2nd Workshop on Compiler Compilers and High Speed Compilation, D. Hammer, Ed., Lecture Notes in Computer Science, Vol. 371, Springer, Berlin, pp. 81–92.

    Google Scholar 

  • IBM AlphaWorks (2001), “XML Parser for Java, ” IBM Alpha Works, http://alphaworks.ibm.com/aw.nsf/techmain/xml4j.

  • ISO (1986), “Information Processing - Text and Office Systems - Standard Generalized Markup Language (SGML), ” ISO 8879.

  • Johnson, S. (1975), “Yacc - Yet Another Compiler-Compiler, ” Technical Report 32, Bell Telephone Laboratories, Murray Hill, NJ.

    Google Scholar 

  • Microsoft (2002), “Component Object Model, ” Microsoft, http://www.microsoft.com/com/.MOST (2002), “The MOST Cooperation, ” The MOST Cooperation, http://www.mostnet.org/.

  • OMG (2002), “Corba 2.4.2 Specification, ” Object Management Group, http://www.omg.org/technology/documents/formal/corbaiiop.htm.

  • PhiDaNi (2001), “The XML Booster, ” PhiDaNi Software, http://www.xmlbooster.com.

  • Rosenkrantz, D.J. and R.E. Stearns (1969), “Properties of Deterministic Top Down Grammars, ” In Conference Record of ACM Symposium on Theory of Computing, Marina del Rey, CA, pp. 165- 180.

  • Vielsack, B. (1988), “The Parser Generators lalr and ell, ” Technical Report 93–3, Gesellschaft für Mathematik und Datenverarbeitung, Forschungsstelle Karlsruhe.

  • W3C (1998), “Extensible Markup Language (XML) 1.0, ” W3C Recommendation 10 February 1998, http://www.w3.org/TR/1998/REC-xml-19980210.

  • W3C (1999), “Namespaces in XML, ”W3C Recommendation 14 January 1999, http://www.w3.org/TR/1999/REC-xml-names-19990114.

  • W3C (2001), “XML Schema Part 1: Structures, ” W3C Recommendation 2 May 2001, http://www.w3.org/TR/2001/REC-xmlschema–1–20010502.

  • Waite, W. and G. Goos (1985), Compiler Construction, Texts and Monographs in Computer Science, Springer, Berlin.

    Google Scholar 

  • doc.html.

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

About this article

Cite this article

Löwe, W.M., Noga, M.L. & Gaul, T.S. Foundations of Fast Communication via XML. Annals of Software Engineering 13, 357–379 (2002). https://doi.org/10.1023/A:1016566031114

Download citation

  • Issue Date:

  • DOI: https://doi.org/10.1023/A:1016566031114

Keywords

Navigation