Skip to main content

File-Based Storage of Digital Objects and Constituent Datastreams: XMLtapes and Internet Archive ARC Files

  • Conference paper
Research and Advanced Technology for Digital Libraries (ECDL 2005)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 3652))

Included in the following conference series:

Abstract

This paper introduces the write-once/read-many XMLtape/ARC storage approach for Digital Objects and their constituent datastreams. The approach combines two interconnected file-based storage mechanisms that are made accessible in a protocol-based manner. First, XML-based representations of multiple Digital Objects are concatenated into a single file named an XMLtape. An XMLtape is a valid XML file; its format definition is independent of the choice of the XML-based complex object format by which Digital Objects are represented. The creation of indexes for both the identifier and the creation datetime of the XML-based representation of the Digital Objects facilitates OAI-PMH-based access to Digital Objects stored in an XMLtape. Second, ARC files, as introduced by the Internet Archive, are used to contain the constituent datastreams of the Digital Objects in a concatenated manner. An index for the identifier of the datastream facilitates OpenURL-based access to an ARC file. The interconnection between XMLtapes and ARC files is provided by conveying the identifiers of ARC files associated with an XMLtape as administrative information in the XMLtape, and by including OpenURL references to constituent datastreams of a Digital Object in the XML-based representation of that Digital Object.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. International Organization for Standardization. ISO/IEC 21000-2:2003. Information technology – Multimedia framework (MPEG-21) – Part 2: Digital Item Declaration (1st ed.) (2003)

    Google Scholar 

  2. The Library of Congress: The Network Development and MARC Standards Office. Metadata Encoding and Transmission Standard (METS) (November 2004) Retrieved from, http://www.loc.gov/standards/mets/

  3. International Organization for Standardization ISO 14721:2003: Space data and information transfer systems – Open Archival Information System – Reference model (1st ed.) (2003)

    Google Scholar 

  4. Burner, M., Kahle, B.: Arc File format, (September 15,1996), Retrieved from, http://www.archive.org/web/researcher/ArcFileFormat.php

  5. Van de Sompel, H., Bekaert, J., Liu, X., Balakireva, L., Schwander, T. (accepted submission): aDORe: a Modular, Standards-based Digital Object Repository. The Computer Journal (2005), Preprint at, http://arxiv.org/abs/cs.DL/0502028

  6. Bekaert, J., Hochstenbach, P., Van de Sompel, H.: Using MPEG-21 DIDL to represent complex Digital Objects in the Los Alamos National Laboratory Digital Library. D-Lib Magazine 9(11) (November 2003), Retrieved from, http://dx.doi.org/10.1045/november2003-bekaert

  7. Bekaert, J., Van de Walle, R., Van de Sompel, H.: (submitted): Representing Digital Objects using MPEG-21 Digital Item Declaration. International Journal on Digital Libraries (2005)

    Google Scholar 

  8. IMS Global Learning Consortium: IMS content packaging XML binding specification version 1.1.3. (2003, June), Retrieved from, http://www.imsglobal.org/content/packaging/

  9. National Information Standards Organization. In: ANSI/NISO Z39.84-2000: Syntax for the Digital Object Identifier. NISO Press, Bethesda (May 2000)

    Google Scholar 

  10. Leach, P., Mealling, M., Salz, R.: A UUID URN Namespace. In: IETF Internet-Draft, expired on July 1, 2004, 3rd edn. (January 2004)

    Google Scholar 

  11. Van de Sompel, H., Hammond, T., Neylon, E., Weibel, S.: The “info” URI scheme for information assets with identifiers in public namespaces, 2nd edn., (January 12, 2005), Retrieved from, http://info-uri.info/registry/docs/drafts/draft-vandesompel-info-uri-03.txt

  12. Van de Sompel, H.: XMLtape XML Schema, http://purl.lanl.gov/STB-RL/schemas/2005-01/tape.xsd

  13. International Organization for Standardization. DIDL XML Schema, http://purl.lanl.gov/STB-RL/schemas/2004-11/DIDL.xsd

  14. netarchive.dk, http://www.netarchive.dk

  15. National Information Standards Organization (in press). In: ANSI/NISO Z39.88-2004: The OpenURL Framework for Context-Sensitive Services. NISO Press, Bethesda (2004)

    Google Scholar 

  16. Lagoze, C., Van de Sompel, H., Nelson, M.L., Warner, S. (eds.): The Open Archives Initiative protocol for metadata harvesting, 2nd edn. (June 2002), Retrieved from, http://www.openarchives.org/OAI/openarchivesprotocol.htm

  17. Berkeley, D.B.: Java Edition, http://www.sleepycat.com/products/je.shtml

  18. Online Computer Library Center. OAICat (October 2004), Retrieved from, http://www.oclc.org/research/software/oai/cat.htm

  19. DOM Level 3 API, http://www.w3.org/DOM/DOMTR

  20. International Internet Preservation Consortium, http://netpreserve.org/about/index.php

  21. Christensen, S., Stack, M.: ARC file Revision 3.0 Proposal. (September 2004), Retrieved from, http://archive-access.sourceforge.net/arc_revision_3/index.pdf

  22. Liu, X., Van de Sompel, H.: ARC File Format Revision 3.0 : Feedback from the Los Alamos National Laboratory (November 2004), Retrieved from, http://public.lanl.gov/herbertv/papers/arc3-20041101.pdf

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2005 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Liu, X., Balakireva, L., Hochstenbach, P., Van de Sompel, H. (2005). File-Based Storage of Digital Objects and Constituent Datastreams: XMLtapes and Internet Archive ARC Files. In: Rauber, A., Christodoulakis, S., Tjoa, A.M. (eds) Research and Advanced Technology for Digital Libraries. ECDL 2005. Lecture Notes in Computer Science, vol 3652. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11551362_23

Download citation

  • DOI: https://doi.org/10.1007/11551362_23

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-28767-4

  • Online ISBN: 978-3-540-31931-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics