Trie methods for representing text

Merrett, T. H.; Shang, Heping

doi:10.1007/3-540-57301-1_9

T. H. Merrett¹ &
Heping Shang¹

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 730))

Included in the following conference series:

International Conference on Foundations of Data Organization and Algorithms

139 Accesses
4 Citations

Abstract

We propose a new trie organization for large text documents requiring secondary storage. Index size is critical in all trie representations of text, and our organization is smaller than all known methods. Access time is as good as the best known method. Tries can be constructed in good time. For an index of 100 million entries, our experiments show size factors of less than 3, as compared with 3.4 for the best previous method. Our measurements show expected access costs of 0.1 sec., and construction times of 18 to 55 hours, depending on the text characteristics.

Our organization can also handle dynamic data, and we give new algorithms for inserting and deleting. It supports searches for general patterns, as well as a variety of special searches, such as proximity, range, longest repetitions and most frequent occurrences.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

A. V. Aho, R. Sethi, and J. D. Ullman. Compilers Principles, Techniques, and Tools. Addison-Wesley Publishing Co., Reading, MA, 1986.
Google Scholar
R. de la Briandais. File searching using variable-length keys. In Proc. Western Joint Computer Conf., pages 295–8, San Francisco, March 1959.
Google Scholar
L. Devroye. A note on the average depth of tries. Computing, 28:367–371, 1982.
Google Scholar
E. H. Fredkin. Trie memory. Communications of the ACM, 3(9):490–9, Sept. 1960.
Google Scholar
G. H. Gonnet. Efficient searching of text and pictures. Technical Report OED-88-02, Centre for the New Oxford English Dictionary, University of Waterloo, Waterloo, Ont., Canada, 1988.
Google Scholar
G. H. Gonnet, R. A. Baeza-Yates, and T. Snider. Lexicograhic indices for text: Inverted files vs. PAT trees. Technical Report OED-91-01, Centre for the New Oxford English Dictionary, University of Waterloo, Waterloo, Ont., Canada, February 1991.
Google Scholar
D. E. Knuth. The Art of Computer Programming. Addison-Wesley Publishing Co., Reading, Mass., 1968–1973. Volumes I, II, III.
Google Scholar
T. H. Merrett and H. Shang. Trie methods for representing text. Technical Report TR-SOCS-93.3, McGill University, School of Computer Science, June 1993.
Google Scholar
D. R. Morrison. PATRICIA: Practical algorithm to retrieve information coded in alphanumeric. Journal of the ACM, 15:514–34, 1968.
Google Scholar
J. A. Orenstein. Blocking mechanism used by multidimensional tries. Unpublished Letter, February 1983.
Google Scholar
B. Pittel. Asymptotical growth of a class of random trees. The Annals of Probability, 13(2):414–427, 1985.
Google Scholar

Download references

Author information

Authors and Affiliations

School of Computer Science, McGill University, H3A 2A7, Montréal, Qué.
T. H. Merrett & Heping Shang

Authors

T. H. Merrett
View author publications
You can also search for this author in PubMed Google Scholar
Heping Shang
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

David B. Lomet

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Merrett, T.H., Shang, H. (1993). Trie methods for representing text. In: Lomet, D.B. (eds) Foundations of Data Organization and Algorithms. FODO 1993. Lecture Notes in Computer Science, vol 730. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-57301-1_9

Download citation

DOI: https://doi.org/10.1007/3-540-57301-1_9
Published: 04 June 2005
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-57301-2
Online ISBN: 978-3-540-48047-1
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics