Definition
Probability smoothing is a language modeling technique that assigns some nonzero probability to events that were unseen in the training data. This has the effect that the probability mass is divided over more events; hence, the probability distribution becomes more smooth.
Key Points
Smoothing overcomes the so-called sparse data problem, that is, many events that are plausible in reality are not found in the data used to estimate probabilities. When using maximum likelihood estimates, unseen events are assigned a zero probability. In case of information retrieval, most events are unseen in the data, even if simple unigram language models are used documents that are relatively short (say on average several hundreds of words), whereas the vocabulary is typically big (maybe millions of words), so the vast majority of words does not occur in the document. A small document about “information retrieval” might not mention the word “search,” but that does not mean it is not relevant...
Recommended Reading
Chen SF, Goodman J. An empirical study of smoothing techniques for language modeling. Technical report TR-10-98, Center for Research in Computing Technology, Harvard University, August 1998.
Zaragoza H, Hiemstra D, Tipping M, Robertson S. Bayesian extension to the language model for ad hoc information retrieval. In: Proceedings of 26th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval; 2003. p. 4–9.
Zhai C, Lafferty J. A study of smoothing methods for language models applied to information retrieval. ACM Trans Inf Syst. 2004;22(2):179–214.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Section Editor information
Rights and permissions
Copyright information
© 2018 Springer Science+Business Media LLC
About this entry
Cite this entry
Hiemstra, D. (2018). Probability Smoothing. In: Liu, L., Özsu, M. (eds) Encyclopedia of Database Systems. Springer, New York, NY. https://doi.org/10.1007/978-1-4899-7993-3_936-3
Download citation
DOI: https://doi.org/10.1007/978-1-4899-7993-3_936-3
Received:
Accepted:
Published:
Publisher Name: Springer, New York, NY
Print ISBN: 978-1-4899-7993-3
Online ISBN: 978-1-4899-7993-3
eBook Packages: Springer Reference Computer SciencesReference Module Computer Science and Engineering
Publish with us
Chapter history
-
Latest
Probability Smoothing- Published:
- 25 October 2017
DOI: https://doi.org/10.1007/978-1-4899-7993-3_936-3
-
Original
Probability Smoothing- Published:
- 29 August 2017
DOI: https://doi.org/10.1007/978-1-4899-7993-3_936-2