Clusters of high-dimensional interval data and related Boolean functions of events in Euclidean space

Lee, Jinwook; Prékopa, András

doi:10.1007/s10479-021-03951-2

Clusters of high-dimensional interval data and related Boolean functions of events in Euclidean space

Original Research
Published: 23 January 2021

(2021)
Cite this article

Annals of Operations Research Aims and scope Submit manuscript

136 Accesses
Explore all metrics

Abstract

Clustering interval data has been studied for decades. High-dimensional interval data can be expressed in terms of hyperrectangles in \(\mathbb {R}^d\) (or d-orthotopes) in case of real-valued d-attributes data. This paper investigates such high-dimensional interval data: the Cartesian product of intervals, or a vector of interval. For the efficient computation of related Boolean functions, some interesting aspects have been discovered using vertices and edges of the graph, generated from given events. We also study the lower and upper-bounded orthants in \(\mathbb {R}^d\) as events for which we show the existence of a polynomial-time algorithm to calculate the probability of the union of such events. This efficient algorithm has been discovered by constructing a suitable partial order relation based on a recursive projection onto lower-dimensional spaces. Illustrative real-life applications are presented.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

On the nature and types of anomalies: a review of deviations in data

Article Open access 04 August 2021

Clustering graph data: the roadmap to spectral techniques

Article Open access 22 January 2024

Multidimensional scaling for big data

Article Open access 13 April 2024

References

Agarwal, A., Hosanagar, K., & Smith, M. (2008). Location, location, location: An analysis of profitability of position in online advertising markets. Journal of Marketing Research, 48, 1057–1073.
Article Google Scholar
Boole, G. (1854). Laws of thought. New York: Dover.
Google Scholar
Boole, G. (1868). Of propositions numerically definite. Trans Cambridge Philos Soc, Part II, XI pp 396–411.
Boros, E., & Prékopa, A. (1989). Closed form two-sided bounds for probabilities that exactly \(r\) and at least \(r\) out of \(n\) events occur. Mathematics of Operations Research, 14, 317–342.
Article Google Scholar
Boros, E., Scozzari, A., Tardella, F., & Veneziani, P. (2014). Polynomially computable bounds for the probability of the union of events. Mathematics of Operations Research, 39(4), 1311–1329.
Article Google Scholar
Boyd, S., & Vandenberghe, L. (2018). Introduction to applied linear algebra. Cambridge: Cambridge University Press.
Book Google Scholar
Bukszár, J., & Prékopa, A. (2001). Probability bounds with cherry trees. Mathematics of Operations Research, 26(1), 174–192.
Article Google Scholar
Bukszár, J., & Szántai, T. (2001). Probability bounds given by hypercherry trees. Alkalmaz Mat Lapok, 19, 69–85.
Google Scholar
Chan, T. M. (2011). Persistent predecessor search and orthogonal point location on the word ram. In SODA ’11.
Hailperin, T. (1965). Best possible inequalities for the probability of a logical function of events. The American Mathematical Monthly, 72, 343–359.
Article Google Scholar
Hunter, D. (1976). Bounds for the probability of a union. Journal of Applied Probability, 13, 597–603.
Article Google Scholar
Iacono, J., & Langerman, S. (2000). Dynamic point location in fat hyperrectangles with integer coordinates. In CCCG.
Jordan, C. (1867). Mémoire sur la résolution algébrique des équations. Journal de Mathématiques pures et appliquées, 12, 109–157.
Google Scholar
Kruskal, J. (1956). On the shortest spanning subtree of a graph and the traveling salesman problem. Proceedings of the American Mathematical Society, 7, 48–50.
Article Google Scholar
Lee, J. (2017). Computing the probability of union in the \(n\)-dimensional Euclidean space for application of the multivariate quantile: \(p\)-level efficient points. Operations Research Letters, 45(3), 242–247.
Article Google Scholar
Lee, J., & Choi, P. M. S. (2020). Chain of Antichains: An efficient and secure distributed ledger, Springer Singapore, Singapore, pp 19–58. https://doi.org/10.1007/978-981-15-2205-5_2.
Lee, J., & Kim, J. (2019). Partially ordered data sets and a new efficient method for calculating multivariate conditional value-at-risk. Annals of Operations Research,. https://doi.org/10.1007/s10479-019-03366-0.
Article Google Scholar
Lee, J., & Prékopa, A. (2017). On the probability of union in the n-space. Operations Research Letters, 45(1), 19–24.
Article Google Scholar
Miklosik, A., Kuchta, M., Evans, N., & Zak, S. (2019). Towards the adoption of machine learning-based analytical tools in digital marketing. IEEE Access, 7, 85705–85718.
Article Google Scholar
Pelleg, D., & Moore, A. (2001). Mixtures of rectangles: Interpretable soft clustering. In ICML.
Prékopa, A. (1988). Boole–Bonferroni inequalities and linear programming. Operational Research, 36(1), 145–162.
Article Google Scholar
Prékopa, A. (1990a). Sharp bounds on probabilities using linear programming. Operational Research, 38(2), 227–239.
Article Google Scholar
Prékopa, A. (1990b). The discrete moment problem and linear programming. Discrete Applied Mathematics, 27, 235–254.
Article Google Scholar
Prékopa, A. (1995). Stochastic programming. Amsterdam: Kluwer Academic Publishers.
Book Google Scholar
Prékopa, A. (2003). Probabilistic programming. Hand books in Operations Research and Management Science (Ruszczyński, A and Shapiro, A, Eds), 10, 267–351.
Google Scholar
Scozzari, A., & Tardella, F. (2018). Complexity of some graph-based bounds on the probability of a union of events. Discrete Applied Mathematics, 244, 186–197.
Article Google Scholar
Souza, R., & Carvalho, F. (2004). Clustering of interval data based on city-block distances. Pattern Recognition Letters, 25, 353–365.
Article Google Scholar
Strang, G. (2019). Linear Algebra and Learning from Data. Wellesley - Cambridge Press.
Suzuki, S., & Ibaraki, T. (2004). An average running time analysis of a backtracking algorithm to calculate the measure of the union of hyperrectangles in \(d\) dimensions. In CCCG.
Worsley, K. (1982). An improved Bonferroni inequality and applications. Biometrika, 69, 297–302.
Article Google Scholar
Yang, Y., & Padmanabhan, B. (2005). Ghic: A hierarchical pattern-based clustering algorithm for grouping web transactions. IEEE Transactions on Knowledge and Data Engineering, 17, 1300–1304.
Article Google Scholar

Download references

Acknowledgements

It is an honor for the first author to have his academic father, Professor András Prékopa (1929–2016) as a second author of this paper. This paper’s main topic: the probability of Boolean functions of high dimensional interval data, was studied in 2019 - 2020 solely by the first author, and he presented the main idea of this paper at ISAIM (International Symposium of Artificial Intelligence and Mathematics) in January 2020 in Fort Lauderdale, Florida. Working on Boolean functions of hyperrectangles and related binomial moment problem formulation was initially suggested by Professor Prékopa in May 2016. The first author dearly misses him.

Author information

Authors and Affiliations

Decision Sciences & MIS, LeBow College of Business, Drexel University, 3220 Market Street, Philadelphia, PA, USA
Jinwook Lee
RUTCOR (Center for Operations Research), Rutgers University, Piscataway, NJ, 08854, USA
András Prékopa

Authors

Jinwook Lee
View author publications
You can also search for this author in PubMed Google Scholar
András Prékopa
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jinwook Lee.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

András Prékopa: Deceased 18 September 2016.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Lee, J., Prékopa, A. Clusters of high-dimensional interval data and related Boolean functions of events in Euclidean space. Ann Oper Res (2021). https://doi.org/10.1007/s10479-021-03951-2

Download citation

Accepted: 13 January 2021
Published: 23 January 2021
DOI: https://doi.org/10.1007/s10479-021-03951-2

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Clusters of high-dimensional interval data and related Boolean functions of events in Euclidean space

Abstract

Access this article

Similar content being viewed by others

On the nature and types of anomalies: a review of deviations in data

Clustering graph data: the roadmap to spectral techniques

Multidimensional scaling for big data

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Clusters of high-dimensional interval data and related Boolean functions of events in Euclidean space

Abstract

Access this article

Similar content being viewed by others

On the nature and types of anomalies: a review of deviations in data

Clustering graph data: the roadmap to spectral techniques

Multidimensional scaling for big data

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation