Definition
Repeatedly choosing random numbers according to a given distribution is generally referred to as sampling. It is a popular technique for data reduction and approximate query processing. It allows a large set of data to be summarized as a much smaller data set, the sampling synopsis, which usually provides an estimate of the original data with provable error guarantees. One advantage of the sampling synopsis is easy and efficient. The cost of constructing such a synopsis is only proportional to the synopsis size, which makes the sampling complexity potentially sublinear to the size of the original data. The other advantage is that the sampling synopsis represents parts of the original data. Thus, many query processing and data manipulation techniques that are applicable to the original data can be directly applied on the synopsis.
Historical Background
The notion of representing large data sets through small samples dates back to the end of nineteenth century and has led to...
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Recommended Reading
Aggarwal CC. On biased reservoir sampling in the presence of stream evolution. In: Proceedings of the 32nd International Conference on Very Large Data Bases; 2006.
Chaudhuri S, et al. Overcoming limitations of sampling for aggregation queries. In: Proceedings of the 17th International Conference on Data Engineering; 2001.
Ganti V, Lee M-L, Ramakrishnan R. ICICLES: self-tuning samples for approximate query answering. In: Proceedings of the 28th International Conference on Very Large Data Bases; 2000.
Gibbons PB, Matias Y. New sampling-based summary statistics for improving approximate query answers. In: Proceedings of the ACM SIGMOD International Conference on Management of Data; 1998.
Kish L. Survey sampling. New York: Wiley; 1965. p. 643. xvi.
Speegle GD, Donahoo MJ. Using statistical sampling for query optimization in heterogeneous library information systems. In: Proceedings of the 20th ACM Annual Conference on Computer Science; 1993.
Vitter JS. Random sampling with a reservoir. ACM Trans Math Softw. 1985;11(1):37–57.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Section Editor information
Rights and permissions
Copyright information
© 2018 Springer Science+Business Media, LLC, part of Springer Nature
About this entry
Cite this entry
Zhang, Q. (2018). Data Sampling. In: Liu, L., Özsu, M.T. (eds) Encyclopedia of Database Systems. Springer, New York, NY. https://doi.org/10.1007/978-1-4614-8265-9_535
Download citation
DOI: https://doi.org/10.1007/978-1-4614-8265-9_535
Published:
Publisher Name: Springer, New York, NY
Print ISBN: 978-1-4614-8266-6
Online ISBN: 978-1-4614-8265-9
eBook Packages: Computer ScienceReference Module Computer Science and Engineering