Skip to main content

Conclusion

  • Chapter
  • First Online:
Data Mining in Large Sets of Complex Data

Part of the book series: SpringerBriefs in Computer Science ((BRIEFSCOMPUTER))

  • 1926 Accesses

Abstract

This book was motivated by the increasing amount and complexity of the dada collected by digital systems in several areas, which turns the task of knowledge discovery out to an essential step in businesses’ strategic decisions. The mining techniques used in the process usually have high computational costs and force the analyst to make complex choices. The complexity stems from the diversity of tasks that may be used in the analysis and from the large amount of alternatives to execute each task. The most common data mining tasks include data classification, labeling and clustering, outlier detection and missing data prediction. The large computational cost comes from the need to explore several alternative solutions, in different combinations, to obtain the desired information. Although the same tasks applied to traditional data are also necessary for more complex data, such as images, graphs, audio and long texts, the complexity and the computational costs associated to handling large amounts of these complex data increase considerably, making the traditional techniques impractical. Therefore, especial data mining techniques for this kind of data need to be developed. We discussed new data mining techniques for large sets of complex data, especially for the clustering task tightly associated to other mining tasks that are performed together. Specifically, this book described in detail three novel data mining algorithms well-suited to analyze large sets of complex data: the method Halite for correlation clustering [11, 13]; the method BoW for clustering Terabyte-scale datasets [14]; and the method QMAS for labeling and summarization [12].

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    www.yahoo.com

  2. 2.

    twitter.com

  3. 3.

    Table 7.1 includes a summary of one table found in [17], i.e., Table 7.1 includes a selection of most relevant desirable properties and most closely related works from the original table. Table 7.1 also includes two novel desirable properties not found in [17]—Linear or quasi-linear complexity and Terabyte-scale data analysis.

References

  1. Achtert, E., Böhm, C., David, J., Kröger, P., Zimek, A.: Global correlation clustering based on the hough transform. Stat. Anal. Data Min 1, 111–127 (2008). doi:10.1002/sam.v1:3

    Article  MathSciNet  Google Scholar 

  2. Achtert, E., Böhm, C., Kriegel, H.P., Kröger, P., Zimek, A.: Robust, complete, and efficient correlation clustering. SDM, USA (2007)

    Google Scholar 

  3. Aggarwal, C.C., Yu, P.S.: Finding generalized projected clusters in high dimensional spaces. SIGMOD Rec. 29(2), 70–81 (2000). doi:10.1145/335191.335383

    Article  Google Scholar 

  4. Aggarwal, C., Yu, P.: Redefining clustering for high-dimensional applications. IEEE TKDE 14(2), 210–225 (2002). doi:10.1109/69.991713

    Google Scholar 

  5. Aggarwal, C.C., Wolf, J.L., Yu, P.S., Procopiuc, C., Park, J.S.: Fast algorithms for projected clustering. SIGMOD Rec. 28(2), 61–72 (1999). doi:10.1145/304181.304188

    Article  Google Scholar 

  6. Agrawal, R., Gehrke, J., Gunopulos, D., Raghavan, P.: Automatic subspace clustering of high dimensional data for data mining applications. SIGMOD Rec. 27(2), 94–105 (1998). doi:10.1145/276305.276314

    Article  Google Scholar 

  7. Agrawal, R., Gehrke, J., Gunopulos, D., Raghavan, P.: Automatic subspace clustering of high dimensional data. Data Min. Knowl. Discov. 11(1), 5–33 (2005). doi:10.1007/s10618-005-1396-1

    Article  MathSciNet  Google Scholar 

  8. Bohm, C., Kailing, K., Kriegel, H.P., Kroger, P.: Density connected clustering with local subspace preferences. In: ICDM ’04: Proceedings of the Fourth IEEE International Conference on Data Mining, pp. 27–34. IEEE Computer Society, USA (2004).

    Google Scholar 

  9. Böhm, C., Kailing, K., Kröger, P., Zimek, A.: Computing clusters of correlation connected objects. In: SIGMOD, pp. 455–466. USA (2004). http://doi.acm.org/10.1145/1007568.1007620

  10. Cheng, C.H., Fu, A.W., Zhang, Y.: Entropy-based subspace clustering for mining numerical data. In: KDD, pp. 84–93. NY, USA (1999). http://doi.acm.org/10.1145/312129.312199

  11. Cordeiro, R.L.F., Traina, A.J.M., Faloutsos, C., Traina Jr, C.: Finding clusters in subspaces of very large, multi-dimensional datasets. In: Li, F., Moro, M.M., Ghandeharizadeh, S., Haritsa, J.R., Weikum, G., Carey, M.J., Casati, F., Chang, E.Y., Manolescu, I., Mehrotra, S., Dayal, U., Tsotras, V.J. (eds.) pp. 625–636. IEEE In ICDE. (2010).

    Google Scholar 

  12. Cordeiro, R.L.F., Guo, F., Haverkamp, D.S., Horne, J.H., Hughes, E.K., Kim, G., Traina, A.J.M., Traina Jr., C., Faloutsos, C.: Qmas: Querying, mining and summarization of multi-modal databases. In: Webb, G.I., Liu, B., Zhang, C., Gunopulos, D., Wu, X. (eds.) ICDM, pp. 785–790. IEEE Computer Society (2010).

    Google Scholar 

  13. Cordeiro, R.L.F., Traina, A.J.M., Faloutsos, C., Traina Jr., C.: Halite: Fast and scalable multi-resolution local-correlation clustering. IEEE Trans. Knowl. Data Eng. 99(PrePrints) (2011). doi:10.1109/TKDE.2011.176.

  14. Cordeiro, R.L.F., Traina Jr., C., Traina, A.J.M., López, J., Kang, U., Faloutsos, C.: Clustering very large multi-dimensional datasets with mapreduce. In: C. Apté, J. Ghosh, P. Smyth (eds.) KDD, pp. 690–698. ACM (2011).

    Google Scholar 

  15. Friedman, J.H., Meulman, J.J.: Clustering objects on subsets of attributes (with discussion). J. Roy. Stat. Soc. Ser. B 66(4), 815–849 (2004). doi:a/bla/jorssb/v66y2004i4p815-849.html

    Article  MathSciNet  MATH  Google Scholar 

  16. Kriegel, H.P., Kröger, P., Renz, M., Wurst, S.: A generic framework for efficient subspace clustering of high-dimensional data. In: ICDM, pp. 250–257. Washington, USA (2005). http://dx.doi.org/10.1109/ICDM.2005.5

  17. Kriegel, H.P., Kröger, P., Zimek, A.: Clustering high-dimensional data: A survey on subspace clustering, pattern-based clustering, and correlation clustering. ACM TKDD 3(1), 1–58 (2009). doi:10.1145/1497577.1497578

    Article  Google Scholar 

  18. Kröger, P., Kriegel, H.P., Kailing, K.: Density-connected subspace clustering for high-dimensional data. SDM, USA (2004)

    Google Scholar 

  19. Moise, G., Sander, J., Ester, M.: P3C: A robust projected clustering algorithm. In: ICDM, pp. 414–425. IEEE Computer Society (2006).

    Google Scholar 

  20. Moise, G., Sander, J., Ester, M.: Robust projected clustering. Knowl. Inf. Syst. 14(3), 273–298 (2008). doi:10.1007/s10115-007-0090-6

    Article  MATH  Google Scholar 

  21. Procopiuc, C.M., Jones, M., Agarwal, P.K., Murali, T.M.: A monte carlo algorithm for fast projective clustering. In: SIGMOD, pp. 418–427. USA (2002). http://doi.acm.org/10.1145/564691.564739

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Robson L. F. Cordeiro .

Rights and permissions

Reprints and permissions

Copyright information

© 2013 The Author(s)

About this chapter

Cite this chapter

Cordeiro, R.L., Faloutsos, C., Traina Júnior, C. (2013). Conclusion. In: Data Mining in Large Sets of Complex Data. SpringerBriefs in Computer Science. Springer, London. https://doi.org/10.1007/978-1-4471-4890-6_7

Download citation

  • DOI: https://doi.org/10.1007/978-1-4471-4890-6_7

  • Published:

  • Publisher Name: Springer, London

  • Print ISBN: 978-1-4471-4889-0

  • Online ISBN: 978-1-4471-4890-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics