Skip to main content

Cache-Conscious Collision Resolution in String Hash Tables

  • Conference paper
String Processing and Information Retrieval (SPIRE 2005)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 3772))

Included in the following conference series:

Abstract

In-memory hash tables provide fast access to large numbers of strings, with less space overhead than sorted structures such as tries and binary trees. If chains are used for collision resolution, hash tables scale well, particularly if the pattern of access to the stored strings is skew. However, typical implementations of string hash tables, with lists of nodes, are not cache-efficient. In this paper we explore two alternatives to the standard representation: the simple expedient of including the string in its node, and the more drastic step of replacing each list of nodes by a contiguous array of characters. Our experiments show that, for large sets of strings, the improvement is dramatic. In all cases, the new structures give substantial savings in space at no cost in time. In the best case, the overhead space required for pointers is reduced by a factor of around 50, to less than two bits per string (with total space required, including 5.68 megabytes of strings, falling from 20.42 megabytes to 5.81 megabytes), while access times are also reduced.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Aho, A.V., Hopcroft, J.E., Ullman, J.D.: The Design and Analysis of Computer Algorithms. Addison-Wesley, Reading (1974)

    MATH  Google Scholar 

  2. Baer, J., Chen, T.: Effective hardware-based data prefetching for high-performance processors. IEEE Transactions on Computers 44(5), 609–623 (1995)

    Article  MATH  Google Scholar 

  3. Callahan, D., Kennedy, K., Porterfield, A.: Software prefetching. In: Proc. ASPLOS Int. Conf. on Architectural Support for Programming Languages and Operating Systems, pp. 40–52. ACM Press, New York (1991)

    Chapter  Google Scholar 

  4. Chilimbi, T.M., Hill, M.D., Larus, J.R.: Cache-conscious structure layout. In: Proc. ACM SIGPLAN conf. on Programming Language Design and Implementation, pp. 1–12. ACM Press, New York (1999)

    Google Scholar 

  5. Collins, J., Sair, S., Calder, B., Tullsen, D.M.: Pointer cache assisted prefetching. In: Proc. Annual ACM/IEEE MICRO Int. Symp. on Microarchitecture, pp. 62–73. IEEE Computer Society Press, Los Alamitos (2002)

    Google Scholar 

  6. Fu, J.W.C., Patel, J.H., Janssens, B.L.: Stride directed prefetching in scalar processors. SIGMICRO Newsletter 23(1-2), 102–110 (1992)

    Article  Google Scholar 

  7. Halatsis, C., Philokyprou, G.: Pseudochaining in hash tables. Communications of the ACM 21(7), 554–557 (1978)

    Article  MATH  MathSciNet  Google Scholar 

  8. Harman, D.: Overview of the second text retrieval conference (TREC-2). In: Information Processing & Management, pp. 271–289. Pergamon Press, Inc., Oxford (1995)

    Google Scholar 

  9. Heileman, G.L., Luo, W.: How caching affects hashing. In: Proc. ALENEX Workshop on Algorithm Engineering and Experiments (January 2005)

    Google Scholar 

  10. Heinz, S., Zobel, J., Williams, H.E.: Self-adjusting trees in practice for large text collections. Software—Practice and Experience 31(10), 925–939 (2001)

    Article  MATH  Google Scholar 

  11. Karlsson, M., Dahlgren, F., Stenstrom, P.: A prefetching technique for irregular accesses to linked data structures. In: Proc. Symp. on High-Performance Computer Architecture, January 2000, pp. 206–217 (2000)

    Google Scholar 

  12. Knuth, D.E.: The Art of Computer Programming: Sorting and Searching, 2nd edn., vol. 3. Addison-Wesley Longman, Amsterdam (1998)

    Google Scholar 

  13. Larson, P.: Performance analysis of linear hashing with partial expansions. ACM Transactions on Database Systems 7(4), 566–587 (1982)

    Article  MATH  Google Scholar 

  14. Lebeck, A.R.: Cache conscious programming in undergraduate computer science. In: Proc. SIGCSE Technical Symp. on Computer Science Education, pp. 247–251. ACM Press, New York (1999)

    Google Scholar 

  15. McCabe, J.: On serial files with relocatable records. Operations Research 13, 609–618 (1965)

    Article  MATH  MathSciNet  Google Scholar 

  16. Munro, J.I., Celis, P.: Techniques for collision resolution in hash tables with open addressing. In: Proc. ACM Fall Joint Computer Conf., pp. 601–610. IEEE Computer Society Press, Los Alamitos (1986)

    Google Scholar 

  17. Peterson, W.W.: Open addressing. IBM J. Research & Development 1, 130–146 (1957)

    Article  Google Scholar 

  18. Ramakrishna, M.V., Zobel, J.: Performance in practice of string hashing functions. In: Proc. DASFAA Symp. on Databases Systems for Advanced Applications, vol. 6, pp. 215–224. World Scientific, Singapore (1997)

    Google Scholar 

  19. Rathi, A., Lu, H., Hedrick, G.E.: Performance comparison of extendible hashing and linear hashing techniques. In: Proc. ACM SIGSMALL/PC Symp. on Small Systems, pp. 178–185. ACM Press, New York (1990)

    Chapter  Google Scholar 

  20. Rosenberg, A.L., Stockmeyer, L.J.: Hashing schemes for extendible arrays. Jour. of the ACM 24(2), 199–221 (1977)

    Article  MATH  MathSciNet  Google Scholar 

  21. Roth, A., Sohi, G.S.: Effective jump-pointer prefetching for linked data structures. In: Proc. Int. Symp. on Computer Architecture, pp. 111–121. IEEE Computer Society Press, Los Alamitos (1999)

    Google Scholar 

  22. Sarwate, D.V.: A note on universal classes of hash functions. Information Processing Letters 10(1), 41–45 (1980)

    Article  MATH  MathSciNet  Google Scholar 

  23. Sinha, R., Ring, D., Zobel, J.: Cache-efficient string sorting using copying. In: submission

    Google Scholar 

  24. Sinha, R., Zobel, J.: Cache-conscious sorting of large sets of strings with dynamic tries. ACM Jour. of Exp. Algorithmics 9 (2005)

    Google Scholar 

  25. Vitter, J.S.: Analysis of the search performance of coalesced hashing. Jour. of the ACM 30(2), 231–258 (1983)

    Article  MATH  MathSciNet  Google Scholar 

  26. Yang, C., Lebeck, A.R., Tseng, H., Lee, C.: Tolerating memory latency through push prefetching for pointer-intensive applications. ACM Trans. Architecture Code Optimisation 1(4), 445–475 (2004)

    Article  Google Scholar 

  27. Zobel, J., Williams, H.E., Heinz, S.: In-memory hash tables for accumulating text vocabularies. Information Processing Letters 80(6), 271–277 (2001)

    Article  MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2005 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Askitis, N., Zobel, J. (2005). Cache-Conscious Collision Resolution in String Hash Tables. In: Consens, M., Navarro, G. (eds) String Processing and Information Retrieval. SPIRE 2005. Lecture Notes in Computer Science, vol 3772. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11575832_11

Download citation

  • DOI: https://doi.org/10.1007/11575832_11

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-29740-6

  • Online ISBN: 978-3-540-32241-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics