Skip to main content

Characteristics of Local Intrinsic Dimensionality (LID) in Subspaces: Local Neighbourhood Analysis

  • Conference paper
  • First Online:
Similarity Search and Applications (SISAP 2019)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 11807))

Included in the following conference series:

Abstract

The local intrinsic dimensionality (LID) model enables assessment of the complexity of the local neighbourhood around a specific query object of interest. In this paper, we study variations in the LID of a query, with respect to different subspaces and local neighbourhoods. We illustrate the surprising phenomenon of how the LID of a query can substantially decrease as further features are included in a dataset. We identify the role of two key feature properties in influencing the LID for feature combinations: correlation and dominance. Our investigation provides new insights into the impact of different feature combinations on local regions of the data.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 59.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 74.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    https://aminer.org/lab-datasets/soinf/.

  2. 2.

    https://aminer.org/data.

  3. 3.

    Suppose \(q=0\) \(\in \) \(\mathbb {R}\) and \(x_1=2 \in X\) are 1 dimensional data values. Then, \(x_1\) directly represents a distance value from q to itself along the X axis.

  4. 4.

    In fact, our model allows \(F_X\) (or \(F_Y\)) to be a set of features, rather than a single feature, but for simplicity we will present in the context of being a single feature.

  5. 5.

    https://au.mathworks.com/help/stats/generate-correlated-data-using-rank-correlation.html.

References

  1. Bouveyron, C., Celeux, G., Girard, S.: Intrinsic dimension estimation by maximum likelihood in probabilistic PCA. Pattern Recogn. Lett. 32, 1706–1713 (2011)

    Article  Google Scholar 

  2. Tenenbaum, J.B., Silva, V., Langford, J.C.: A global geometric framework for nonlinear dimensionality reduction. Science 290, 2319–2323 (2000)

    Article  Google Scholar 

  3. Roweis, S.T., Saul, L.K.: Nonlinear dimensionality reduction by locally linear embedding. In: ICCV, vol. 290, pp. 2323–2326 (2000)

    Article  Google Scholar 

  4. Amsaleg, L., et al.: Extreme-value-theoretic estimation of local intrinsic dimensionality. DMKD 32(6), 1768–1805 (2018)

    MathSciNet  Google Scholar 

  5. Amsaleg, L., et al.: Estimating local intrinsic dimensionality. In: SIGKDD, pp. 29–38 (2015)

    Google Scholar 

  6. Karger, D.R., Ruhl, M.: Finding nearest neighbors in growth-restricted metrics. In: Proceedings of the Thirty-Fourth Annual ACM STOC, pp. 741–750 (2002)

    Google Scholar 

  7. Houle, M.E., Kashima, H., Nett, M.: Generalized expansion dimension. In: ICDMW, pp. 587–594 (2012)

    Google Scholar 

  8. Houle, M.E.: Dimensionality, discriminability, density and distance distributions. In: ICDMW, pp. 468–473 (2013)

    Google Scholar 

  9. Houle, M.E.: Local intrinsic dimensionality I: an extreme-value-theoretic foundation for similarity applications. In: Beecks, C., Borutta, F., Kröger, P., Seidl, T. (eds.) SISAP 2017. LNCS, vol. 10609, pp. 64–79. Springer, Heidelberg (2017). https://doi.org/10.1007/978-3-319-68474-1_5

    Chapter  Google Scholar 

  10. Houle, M.E., Ma, X., Nett, M., Oria, V.: Dimensional testing for multi-step similarity search. In: ICDM, pp. 299–308 (2012)

    Google Scholar 

  11. Von Brünken, J., Houle, M., Zimek, A.: Intrinsic dimensional outlier detection in high-dimensional data. NII Technical Reports, pp. 1–12 (2015)

    Google Scholar 

  12. Houle, M.E., Schubert, E., Zimek, A.: On the correlation between local intrinsic dimensionality and outlierness. In: Marchand-Maillet, S., Silva, Y.N., Chávez, E. (eds.) SISAP 2018. LNCS, vol. 11223, pp. 177–191. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-02224-2_14

    Chapter  Google Scholar 

  13. Houle, M.E.: Inlierness, outlierness, hubness and discriminability: an extreme-value-theoretic foundation. NII Technical Reports, pp. 1–32 (2015)

    Google Scholar 

  14. Houle, M.E.: Local intrinsic dimensionality II: multivariate analysis and distributional support. In: Beecks, C., Borutta, F., Kröger, P., Seidl, T. (eds.) SISAP 2017. LNCS, vol. 10609, pp. 80–95. Springer, Heidelberg (2017). https://doi.org/10.1007/978-3-319-68474-1_6

    Chapter  Google Scholar 

  15. Coles, S.G.: An Introduction to Statistical Modeling of Extreme Values, vol. 208. Springer, London (2001). https://doi.org/10.1007/978-1-4471-3675-0

    Book  MATH  Google Scholar 

  16. Rousu, D.N.: Weibull skewness and kurtosis as a function of the shape parameter. Technometrics 15(4), 927–930 (1973)

    Article  Google Scholar 

  17. Pearson, K.: Contributions to the mathematical theory of evolution. II. skew variation in homogeneous material. Philos. Trans. R. Soc. Lond. Ser. A 186, 343–414 (1895)

    Article  Google Scholar 

  18. Nelsen, R.B.: An Introduction to Copulas. Springer, New York (2006). https://doi.org/10.1007/0-387-28678-0

    Book  MATH  Google Scholar 

  19. Takeuchi, T.: Constructing a bivariate distribution function with given marginals and correlation: application to the galaxy luminosity function. Mon. Not. R. Astron. Soc. 406, 1830–1840 (2010)

    Google Scholar 

  20. Kendall, M.G., Stuart, A., Ord, J.K. (eds.): Kendall’s Advanced Theory of Statistics. Oxford University Press Inc., Oxford (1987)

    MATH  Google Scholar 

  21. Kendall, M.G.: Rank and product-moment correlation. Biometrika 36(1/2), 177–193 (1949)

    Article  MathSciNet  Google Scholar 

  22. Gionis, A., Mannila, H., Mielikäinen, T., Tsaparas, P.: Assessing data mining results via swap randomization. ACM TKDD 1(3), 14 (2007)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Tahrima Hashem .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Hashem, T., Rashidi, L., Bailey, J., Kulik, L. (2019). Characteristics of Local Intrinsic Dimensionality (LID) in Subspaces: Local Neighbourhood Analysis. In: Amato, G., Gennaro, C., Oria, V., Radovanović , M. (eds) Similarity Search and Applications. SISAP 2019. Lecture Notes in Computer Science(), vol 11807. Springer, Cham. https://doi.org/10.1007/978-3-030-32047-8_22

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-32047-8_22

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-32046-1

  • Online ISBN: 978-3-030-32047-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics