Skip to main content
Log in

Clusters of high-dimensional interval data and related Boolean functions of events in Euclidean space

  • Original Research
  • Published:
Annals of Operations Research Aims and scope Submit manuscript

Abstract

Clustering interval data has been studied for decades. High-dimensional interval data can be expressed in terms of hyperrectangles in \(\mathbb {R}^d\) (or d-orthotopes) in case of real-valued d-attributes data. This paper investigates such high-dimensional interval data: the Cartesian product of intervals, or a vector of interval. For the efficient computation of related Boolean functions, some interesting aspects have been discovered using vertices and edges of the graph, generated from given events. We also study the lower and upper-bounded orthants in \(\mathbb {R}^d\) as events for which we show the existence of a polynomial-time algorithm to calculate the probability of the union of such events. This efficient algorithm has been discovered by constructing a suitable partial order relation based on a recursive projection onto lower-dimensional spaces. Illustrative real-life applications are presented.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

Similar content being viewed by others

References

  • Agarwal, A., Hosanagar, K., & Smith, M. (2008). Location, location, location: An analysis of profitability of position in online advertising markets. Journal of Marketing Research, 48, 1057–1073.

    Article  Google Scholar 

  • Boole, G. (1854). Laws of thought. New York: Dover.

    Google Scholar 

  • Boole, G. (1868). Of propositions numerically definite. Trans Cambridge Philos Soc, Part II, XI pp 396–411.

  • Boros, E., & Prékopa, A. (1989). Closed form two-sided bounds for probabilities that exactly \(r\) and at least \(r\) out of \(n\) events occur. Mathematics of Operations Research, 14, 317–342.

    Article  Google Scholar 

  • Boros, E., Scozzari, A., Tardella, F., & Veneziani, P. (2014). Polynomially computable bounds for the probability of the union of events. Mathematics of Operations Research, 39(4), 1311–1329.

    Article  Google Scholar 

  • Boyd, S., & Vandenberghe, L. (2018). Introduction to applied linear algebra. Cambridge: Cambridge University Press.

    Book  Google Scholar 

  • Bukszár, J., & Prékopa, A. (2001). Probability bounds with cherry trees. Mathematics of Operations Research, 26(1), 174–192.

    Article  Google Scholar 

  • Bukszár, J., & Szántai, T. (2001). Probability bounds given by hypercherry trees. Alkalmaz Mat Lapok, 19, 69–85.

    Google Scholar 

  • Chan, T. M. (2011). Persistent predecessor search and orthogonal point location on the word ram. In SODA ’11.

  • Hailperin, T. (1965). Best possible inequalities for the probability of a logical function of events. The American Mathematical Monthly, 72, 343–359.

    Article  Google Scholar 

  • Hunter, D. (1976). Bounds for the probability of a union. Journal of Applied Probability, 13, 597–603.

    Article  Google Scholar 

  • Iacono, J., & Langerman, S. (2000). Dynamic point location in fat hyperrectangles with integer coordinates. In CCCG.

  • Jordan, C. (1867). Mémoire sur la résolution algébrique des équations. Journal de Mathématiques pures et appliquées, 12, 109–157.

    Google Scholar 

  • Kruskal, J. (1956). On the shortest spanning subtree of a graph and the traveling salesman problem. Proceedings of the American Mathematical Society, 7, 48–50.

    Article  Google Scholar 

  • Lee, J. (2017). Computing the probability of union in the \(n\)-dimensional Euclidean space for application of the multivariate quantile: \(p\)-level efficient points. Operations Research Letters, 45(3), 242–247.

    Article  Google Scholar 

  • Lee, J., & Choi, P. M. S. (2020). Chain of Antichains: An efficient and secure distributed ledger, Springer Singapore, Singapore, pp 19–58. https://doi.org/10.1007/978-981-15-2205-5_2.

  • Lee, J., & Kim, J. (2019). Partially ordered data sets and a new efficient method for calculating multivariate conditional value-at-risk. Annals of Operations Research,. https://doi.org/10.1007/s10479-019-03366-0.

    Article  Google Scholar 

  • Lee, J., & Prékopa, A. (2017). On the probability of union in the n-space. Operations Research Letters, 45(1), 19–24.

    Article  Google Scholar 

  • Miklosik, A., Kuchta, M., Evans, N., & Zak, S. (2019). Towards the adoption of machine learning-based analytical tools in digital marketing. IEEE Access, 7, 85705–85718.

    Article  Google Scholar 

  • Pelleg, D., & Moore, A. (2001). Mixtures of rectangles: Interpretable soft clustering. In ICML.

  • Prékopa, A. (1988). Boole–Bonferroni inequalities and linear programming. Operational Research, 36(1), 145–162.

    Article  Google Scholar 

  • Prékopa, A. (1990a). Sharp bounds on probabilities using linear programming. Operational Research, 38(2), 227–239.

    Article  Google Scholar 

  • Prékopa, A. (1990b). The discrete moment problem and linear programming. Discrete Applied Mathematics, 27, 235–254.

    Article  Google Scholar 

  • Prékopa, A. (1995). Stochastic programming. Amsterdam: Kluwer Academic Publishers.

    Book  Google Scholar 

  • Prékopa, A. (2003). Probabilistic programming. Hand books in Operations Research and Management Science (Ruszczyński, A and Shapiro, A, Eds), 10, 267–351.

    Google Scholar 

  • Scozzari, A., & Tardella, F. (2018). Complexity of some graph-based bounds on the probability of a union of events. Discrete Applied Mathematics, 244, 186–197.

    Article  Google Scholar 

  • Souza, R., & Carvalho, F. (2004). Clustering of interval data based on city-block distances. Pattern Recognition Letters, 25, 353–365.

    Article  Google Scholar 

  • Strang, G. (2019). Linear Algebra and Learning from Data. Wellesley - Cambridge Press.

  • Suzuki, S., & Ibaraki, T. (2004). An average running time analysis of a backtracking algorithm to calculate the measure of the union of hyperrectangles in \(d\) dimensions. In CCCG.

  • Worsley, K. (1982). An improved Bonferroni inequality and applications. Biometrika, 69, 297–302.

    Article  Google Scholar 

  • Yang, Y., & Padmanabhan, B. (2005). Ghic: A hierarchical pattern-based clustering algorithm for grouping web transactions. IEEE Transactions on Knowledge and Data Engineering, 17, 1300–1304.

    Article  Google Scholar 

Download references

Acknowledgements

It is an honor for the first author to have his academic father, Professor András Prékopa (1929–2016) as a second author of this paper. This paper’s main topic: the probability of Boolean functions of high dimensional interval data, was studied in 2019 - 2020 solely by the first author, and he presented the main idea of this paper at ISAIM (International Symposium of Artificial Intelligence and Mathematics) in January 2020 in Fort Lauderdale, Florida. Working on Boolean functions of hyperrectangles and related binomial moment problem formulation was initially suggested by Professor Prékopa in May 2016. The first author dearly misses him.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jinwook Lee.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

András Prékopa: Deceased 18 September 2016.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Lee, J., Prékopa, A. Clusters of high-dimensional interval data and related Boolean functions of events in Euclidean space. Ann Oper Res (2021). https://doi.org/10.1007/s10479-021-03951-2

Download citation

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s10479-021-03951-2

Keywords

Navigation