Abstract
Given a relative rank r ∈ (0,1) (e.g., r = 1/2 refers to the median), we show how to efficiently sample with high probability an element with rank very close to r from any probability distribution that supports efficient sampling (e.g., elements stored in an array). A primary feature of our methods is their elegance and ease of implementation – they can be coded in less space than is occupied by this abstract, and their lightweight footprint makes them ideally suited for highly resource-constrained computing environments. We demonstrate through empirical testing that these methods perform well in practice, and provide a complete theoretical analysis for our methods that offers valuable insight into the performance of a natural class of approximate selection algorithms based on hierarchical random sampling.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Alsabti, K., Ranka, S., Singh, V.: A one-pass algorithm for accurately estimating quantiles for disk-resident data. In: VLDB, pp. 346–355 (1997)
Agrawal, R., Swami, A.: A one-pass space-efficient algorithm for finding quantiles. In: COMAD (1995)
Biggs, N.: Some odd graph theory. Annals of the New York Academy of Sciences 319(1), 71–81 (1979)
Brody, J., Liang, H., Sun, X.: Space-efficient approximation scheme for circular earth mover distance. In: Fernández-Baca, D. (ed.) LATIN 2012. LNCS, vol. 7256, pp. 97–108. Springer, Heidelberg (2012)
Cormode, G., Korn, F., Muthukrishnan, S., Srivastava, D.: Space- and time-efficient deterministic algorithms for biased quantiles over data streams. In: IEEE International Conference on Data Engineering (2005)
Cormode, G., Korn, F., Muthukrishnan, S., Srivastava, D.: Space- and time-efficient deterministic algorithms for biased quantiles over data streams. In: PODS, pp. 263–272 (2006)
DeWitt, D.J., Naughton, J.F., Schneider, D.A.: Parallel sorting on a shared-nothing architecture using probabilistic splitting. In: PDIS, pp. 280–291 (1991)
Floyd, R.W., Rivest, R.L.: Expected time bounds for selection. Commun. ACM 18(3), 165–172 (1975)
Greenwald, M., Khanna, S.: Space-efficient online computation of quantile summaries. In: SIGMOD, pp. 58–66 (2001)
Guha, S., McGregor, A.: Approximate quantiles and the order of the stream. SIAM J. Comput. 38(5), 2044–2059 (2009)
Gibbons, P.B., Matias, Y., Poosala, V.: Fast incremental maintenance of approximate histograms. ACM Trans. Database Syst. 27(3), 261–298 (2002)
Ioannidis, Y.E.: The history of histograms (abridged). In: VLDB, pp. 19–30 (2003)
Jain, R., Chlamtac, I.: The P2 algorithm for dynamic calculation of quantiles and histograms without storing observations. Commun. ACM 28(10), 1076–1085 (1985)
Munro, I., Paterson, M.: Selection and sorting with limited storage. In: FOCS, pp. 253–258 (1978)
Munro, I., Raman, V.: Selection from read-only memory and sorting with minimum data movement. Theor. Comput. Sci. 165(2), 311–323 (1996)
Manku, G.S., Rajagopalan, S., Lindsay, B.G.: Approximate medians and other quantiles in one pass and with limited memory. In: SIGMOD, pp. 426–435 (1998)
McGregor, A., Valiant, P.: The shifting sands algorithm. In: SODA, pp. 453–458 (2012)
Srikant, R., Agrawal, R.: Mining quantitative association rules in large relational tables. In: SIGMOD, pp. 1–12 (1996)
Selinger, P.G., Astrahan, M.M., Chamberlin, D.D., Lorie, R.A., Price, T.G.: Access path selection in a relational database management system. In: SIGMOD, pp. 23–34 (1979)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Dean, B.C., Jalasutram, R., Waters, C. (2014). Lightweight Approximate Selection. In: Schulz, A.S., Wagner, D. (eds) Algorithms - ESA 2014. ESA 2014. Lecture Notes in Computer Science, vol 8737. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-662-44777-2_26
Download citation
DOI: https://doi.org/10.1007/978-3-662-44777-2_26
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-662-44776-5
Online ISBN: 978-3-662-44777-2
eBook Packages: Computer ScienceComputer Science (R0)