Abstract
Search engines have become the primary means of accessing information on the Web. However, recent studies show misspelled words are very common in queries to these systems. When users misspell query, the results are incorrect or provide inconclusive information. In this work, we discuss the integration of a spelling correction component into tumba!, our community Web search engine. We present an algorithm that attempts to select the best choice among all possible corrections for a misspelled term, and discuss its implementation based on a ternary search tree data structure.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Arasu, A., Cho, J., Garcia-Molina, H., Paepcke, A., Raghavan, S.: Searching the web. ACM Transactions on Internet Technology 1(1), 2–43 (2001)
Baeza-Yates, R., Ribeiro-Neto, B.: Modern Information Retrieval. ACM Press, New York (1999)
Bentley, J., Sedgewick, R.: Fast algorithms for sorting and searching strings. In: Proceedings of SODA 1997, the 8th ACM-SIAM Symposium on Discrete Algorithms (1997)
Bentley, J., Sedgewick, R.: Ternary search trees. Dr. Dobb’s Journal 23(4), 20–25 (1998)
Bigert, J.: Probabilistic detection of context-sensitive spelling errors. In: Proceedings of LREC-2004, the 4th International Conference on Language Resources and Evaluation (2004)
Bonfante, A.G.: Uso de redes neurais para correção gramatical do português: Um estudo de caso. Master’s thesis, Instituto de Ciências Matemáticas e da Computação da Universidade de São Paulo, São Carlos, São Paulo, Brazil, Dissertação de Mestrado (1997)
Brill, E., Moore, R.C.: An improved error model for noisy channel spelling correction. In: Proceedings of ACL 2000, the 38th Annual Meeting of the Association for Computational Linguistics, pp. 286–293 (2000)
Brin, S., Page, L.: The anatomy of a large-scale hypertextual Web search engine. Computer Networks and ISDN Systems 30(1–7), 107–117 (1998)
Clément, J., Flajolet, P., Vallée, B.: The analysis of hybrid trie structures. In: Proceedings of DA 1998, the 9th annual ACM-SIAM symposium on discrete algorithms. Society for Industrial and Applied Mathematics, pp. 531–539 (1998)
Dalianis, H.: Evaluating a spelling support in a search engine. In: Andersson, B., Bergholtz, M., Johannesson, P. (eds.) NLDB 2002. LNCS, vol. 2553, pp. 183–190. Springer, Heidelberg (2002)
Damerau, F.J.: A technique for computer detection and correction of spelling errors. Communications of the ACM 7(3), 171–176 (1964)
Davidson, L.: Retrieval of mis-spelled names in an airline passenger record system. Communications of the ACM 5(3), 169–171 (1962)
Durham, I., Lamb, D.A., Saxe, J.B.: Spelling correction in user interfaces. Communications of the ACM 26(10), 764–773 (1983)
Elmi, M.A., Evens, M.: Spelling correction using context. In: Boitet, C., Whitelock, P. (eds.) Proceedings of the 26th Annual Meeting of the Association for Computational Linguistics and 17th International Conference on Computational Linguistics, pp. 360–364. Morgan Kaufmann Publishers, San Francisco (1998)
Fisher, W.M.: A statistical text-to-phone function using n-grams and rules. In: Proceedings of ICASSP 1999, the 1999 IEEE International Conference on Acoustics, Speech and Signal Processing, March 1999, vol. 2, pp. 649–652 (1999)
Hodge, V.J., Austin, J.: An evaluation of phonetic spell checkers. Technical Report YCS 338, Department of Computer Science of the University of York (2001)
Hodge, V.J., Austin, J.: A novel binary spell checker. In: Dorffner, G., Bischof, H., Hornik, K. (eds.) ICANN 2001. LNCS, vol. 2130, p. 1199. Springer, Heidelberg (2001)
Kashyap, R.L., Oommen, J.: Spelling correction using probabilistic methods. Pattern Recognition Letters (1985)
Knuth, D.E.: The Art of Computer Programming, Sorting and Searching, 2nd edn., vol. 3. Addison-Wesley Publishing Company, Reading (1982)
Kukich, K.: Techniques for automatically correcting words in text. ACM Computing Surveys 24(4), 377–440 (1992)
Levenshtein, V.I.: Binary codes capable of correcting deletions, insertions and reversals. Soviet Physics Doklady 10, 707–710 (1966)
Medeiros, J.C.D.: Processamento morfológico e correcção ortográfica do português. Master’s thesis, Instituto Superior Técnico (1995)
Philips, L.: Hanging on the metaphone. Computer Language 7(12), 39–43 (1990)
Philips, L.: The double-metaphone search algorithm. C/C++ User’s Journal 18(6) (June 2000)
Riseman, E.M., Hanson, A.R.: A contextual postprocessing system for error correction using binary n-grams. IEEE Transactions on Computer Systems C-23(5), 480–493 (1974)
Santos, D., Rocha, P.: Evaluating cetempúblico, a free resource for portuguese. In: Proceedings of ACL 2001, the 39th Annual Meeting of the Association for Computational Linguistics, July 2001, pp. 442–449 (2001)
Santos, D., Sarmento, L.: O projecto AC/DC: acesso a corpora / disponibilização de corpora. In: Mendes, A., Freitas, T. (eds.) Actas do XVIII Encontro da Associação Portuguesa de Linguística, October 2002, pp. 705–717 (2002)
Sejnowski, T.J., Rosenberg, C.R.: Parallel networks that learn to pronounce english text. Complex Systems 1, 145–168 (1987)
Silva, M.J.: The case for a portuguese Web search engine. DI/FCUL TR 03–03, Department of Informatics, University of Lisbon (March 2003)
Toutanova, K., Moore, R.C.: Pronunciation modeling for improved spelling correction (July 2002)
Wagner, R.A., Fischer, M.J.: The string-to-string correction problem. Communications of the ACM 1(21), 168–173 (1974)
Yannakoudakis, E.J.: Expert spelling error analysis and correction. In: Jones, K.P. (ed.) Proceedings of a Conference held by the Aslib Informatics Group and the Information Retrieval Group of the British Computer Society, March 1983, pp. 39–52 (1983)
Zamora, E.M., Pollock, J.J., Zamora, A.: The use of trigram analysis for spelling error detection. Information Processing and Management 6(17), 305–316 (1981)
Zobel, J., Dart, P.: Phonetic string matching: Lessons from information retrieval. In: Proceedings of SIGIR 1996, the 19th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 166–172 (1996)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2004 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Martins, B., Silva, M.J. (2004). Spelling Correction for Search Engine Queries. In: Vicedo, J.L., Martínez-Barco, P., Muńoz, R., Saiz Noeda, M. (eds) Advances in Natural Language Processing. EsTAL 2004. Lecture Notes in Computer Science(), vol 3230. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-30228-5_33
Download citation
DOI: https://doi.org/10.1007/978-3-540-30228-5_33
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-23498-2
Online ISBN: 978-3-540-30228-5
eBook Packages: Springer Book Archive