Abstract
Clustering interval data has been studied for decades. High-dimensional interval data can be expressed in terms of hyperrectangles in \(\mathbb {R}^d\) (or d-orthotopes) in case of real-valued d-attributes data. This paper investigates such high-dimensional interval data: the Cartesian product of intervals, or a vector of interval. For the efficient computation of related Boolean functions, some interesting aspects have been discovered using vertices and edges of the graph, generated from given events. We also study the lower and upper-bounded orthants in \(\mathbb {R}^d\) as events for which we show the existence of a polynomial-time algorithm to calculate the probability of the union of such events. This efficient algorithm has been discovered by constructing a suitable partial order relation based on a recursive projection onto lower-dimensional spaces. Illustrative real-life applications are presented.
Similar content being viewed by others
References
Agarwal, A., Hosanagar, K., & Smith, M. (2008). Location, location, location: An analysis of profitability of position in online advertising markets. Journal of Marketing Research, 48, 1057–1073.
Boole, G. (1854). Laws of thought. New York: Dover.
Boole, G. (1868). Of propositions numerically definite. Trans Cambridge Philos Soc, Part II, XI pp 396–411.
Boros, E., & Prékopa, A. (1989). Closed form two-sided bounds for probabilities that exactly \(r\) and at least \(r\) out of \(n\) events occur. Mathematics of Operations Research, 14, 317–342.
Boros, E., Scozzari, A., Tardella, F., & Veneziani, P. (2014). Polynomially computable bounds for the probability of the union of events. Mathematics of Operations Research, 39(4), 1311–1329.
Boyd, S., & Vandenberghe, L. (2018). Introduction to applied linear algebra. Cambridge: Cambridge University Press.
Bukszár, J., & Prékopa, A. (2001). Probability bounds with cherry trees. Mathematics of Operations Research, 26(1), 174–192.
Bukszár, J., & Szántai, T. (2001). Probability bounds given by hypercherry trees. Alkalmaz Mat Lapok, 19, 69–85.
Chan, T. M. (2011). Persistent predecessor search and orthogonal point location on the word ram. In SODA ’11.
Hailperin, T. (1965). Best possible inequalities for the probability of a logical function of events. The American Mathematical Monthly, 72, 343–359.
Hunter, D. (1976). Bounds for the probability of a union. Journal of Applied Probability, 13, 597–603.
Iacono, J., & Langerman, S. (2000). Dynamic point location in fat hyperrectangles with integer coordinates. In CCCG.
Jordan, C. (1867). Mémoire sur la résolution algébrique des équations. Journal de Mathématiques pures et appliquées, 12, 109–157.
Kruskal, J. (1956). On the shortest spanning subtree of a graph and the traveling salesman problem. Proceedings of the American Mathematical Society, 7, 48–50.
Lee, J. (2017). Computing the probability of union in the \(n\)-dimensional Euclidean space for application of the multivariate quantile: \(p\)-level efficient points. Operations Research Letters, 45(3), 242–247.
Lee, J., & Choi, P. M. S. (2020). Chain of Antichains: An efficient and secure distributed ledger, Springer Singapore, Singapore, pp 19–58. https://doi.org/10.1007/978-981-15-2205-5_2.
Lee, J., & Kim, J. (2019). Partially ordered data sets and a new efficient method for calculating multivariate conditional value-at-risk. Annals of Operations Research,. https://doi.org/10.1007/s10479-019-03366-0.
Lee, J., & Prékopa, A. (2017). On the probability of union in the n-space. Operations Research Letters, 45(1), 19–24.
Miklosik, A., Kuchta, M., Evans, N., & Zak, S. (2019). Towards the adoption of machine learning-based analytical tools in digital marketing. IEEE Access, 7, 85705–85718.
Pelleg, D., & Moore, A. (2001). Mixtures of rectangles: Interpretable soft clustering. In ICML.
Prékopa, A. (1988). Boole–Bonferroni inequalities and linear programming. Operational Research, 36(1), 145–162.
Prékopa, A. (1990a). Sharp bounds on probabilities using linear programming. Operational Research, 38(2), 227–239.
Prékopa, A. (1990b). The discrete moment problem and linear programming. Discrete Applied Mathematics, 27, 235–254.
Prékopa, A. (1995). Stochastic programming. Amsterdam: Kluwer Academic Publishers.
Prékopa, A. (2003). Probabilistic programming. Hand books in Operations Research and Management Science (Ruszczyński, A and Shapiro, A, Eds), 10, 267–351.
Scozzari, A., & Tardella, F. (2018). Complexity of some graph-based bounds on the probability of a union of events. Discrete Applied Mathematics, 244, 186–197.
Souza, R., & Carvalho, F. (2004). Clustering of interval data based on city-block distances. Pattern Recognition Letters, 25, 353–365.
Strang, G. (2019). Linear Algebra and Learning from Data. Wellesley - Cambridge Press.
Suzuki, S., & Ibaraki, T. (2004). An average running time analysis of a backtracking algorithm to calculate the measure of the union of hyperrectangles in \(d\) dimensions. In CCCG.
Worsley, K. (1982). An improved Bonferroni inequality and applications. Biometrika, 69, 297–302.
Yang, Y., & Padmanabhan, B. (2005). Ghic: A hierarchical pattern-based clustering algorithm for grouping web transactions. IEEE Transactions on Knowledge and Data Engineering, 17, 1300–1304.
Acknowledgements
It is an honor for the first author to have his academic father, Professor András Prékopa (1929–2016) as a second author of this paper. This paper’s main topic: the probability of Boolean functions of high dimensional interval data, was studied in 2019 - 2020 solely by the first author, and he presented the main idea of this paper at ISAIM (International Symposium of Artificial Intelligence and Mathematics) in January 2020 in Fort Lauderdale, Florida. Working on Boolean functions of hyperrectangles and related binomial moment problem formulation was initially suggested by Professor Prékopa in May 2016. The first author dearly misses him.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
András Prékopa: Deceased 18 September 2016.
Rights and permissions
About this article
Cite this article
Lee, J., Prékopa, A. Clusters of high-dimensional interval data and related Boolean functions of events in Euclidean space. Ann Oper Res (2021). https://doi.org/10.1007/s10479-021-03951-2
Accepted:
Published:
DOI: https://doi.org/10.1007/s10479-021-03951-2