Skip to main content

A Fast Generative Spell Corrector Based on Edit Distance

  • Conference paper
Advances in Information Retrieval (ECIR 2013)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 7814))

Included in the following conference series:

Abstract

One of the main challenges in the implementation of web-scale online search systems is the disambiguation of the user input when portions of the input queries are possibly misspelt. Spell correctors that must be integrated with such systems have very stringent restrictions imposed on them; primarily they must possess the ability to handle large volume of concurrent queries and generate relevant spelling suggestions at a very high speed. Often, these systems consist of highend server machines with lots of memory and processing power and the requirement from such spell correctors is to minimize the latency of generating suggestions to a bare minimum.

In this paper, we present a spell corrector that we developed to cater to high volume incoming queries for an online search service. It consists of a fast, per-token candidate generator which generates spell suggestions within a distance of two edit operations of an input token. We compare its performance against an n-gram based spell corrector and show that the presented spell candidate generation approach has lower response times.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 99.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 129.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  • Behm, A., Ji, S., Li, C., Lu, J.: Space-constrained gram-based indexing for efficient approximate string search. In: Proceedings of the 2009 IEEE International Conference on Data Engineering, pp. 604–615 (2009); Computer Society

    Google Scholar 

  • Bloom, B.H.: Space/time trade-offs in hash coding with allowable errors. Communications of the ACM 13(7), 422–426 (1970)

    Article  MATH  Google Scholar 

  • Burkhard, W.A., Keller, R.M.: Some approaches to best-match file searching. Communications of the ACM 16(4), 230–236 (1973)

    Article  MATH  Google Scholar 

  • Damerau, F.J.: A technique for computer detection and correction of spelling errors. Communications of the ACM 7(3), 171–176 (1964)

    Article  Google Scholar 

  • Kann, V., Domeij, R., Hollman, J., Tillenius, M.: Implementation aspects and applications of a spelling correction algorithm. In: Text as a Linguistic Paradigm: Levels, Constituents, Constructs. Festschrift in honour of Ludek Hrebicek (1999)

    Google Scholar 

  • Okazaki, N., Tsujii, J.: Simple and Efficient Algorithm for Approximate Dictionary Matching. In: Proceedings of the 23rd International Conference on Computational Linguistics (COLING 2010), pp. 851–859 (August 2010)

    Google Scholar 

  • Phillips, L.: Hanging on the Metaphone. Computer Language 7(12), 38 (1990)

    Google Scholar 

  • Phillips, L.: The Double Metaphone Search Algorithm. CC Plus Plus Users Journal (2000)

    Google Scholar 

  • Schulz, K., Mihov, S.: Fast String Correction with Levenshtein-Automata. International Journal of Document Analysis and Recognition 5, 65–85 (2010)

    Google Scholar 

  • Udupa, R., Kumar, S.: Hashing-based Approaches to Spelling Correction of Personal Names. In: Proceedings of EMNLP 2010, pp. 1256–1265 (October 2010)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2013 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Chattopadhyaya, I., Sirchabesan, K., Seal, K. (2013). A Fast Generative Spell Corrector Based on Edit Distance. In: Serdyukov, P., et al. Advances in Information Retrieval. ECIR 2013. Lecture Notes in Computer Science, vol 7814. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-36973-5_34

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-36973-5_34

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-36972-8

  • Online ISBN: 978-3-642-36973-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics