Conclusion

Cordeiro, Robson L. F.; Faloutsos, Christos; Traina Júnior, Caetano

doi:10.1007/978-1-4471-4890-6_7

Robson L. F. Cordeiro⁴,
Christos Faloutsos⁵ &
Caetano Traina Júnior⁴

Part of the book series: SpringerBriefs in Computer Science ((BRIEFSCOMPUTER))

1926 Accesses

Abstract

This book was motivated by the increasing amount and complexity of the dada collected by digital systems in several areas, which turns the task of knowledge discovery out to an essential step in businesses’ strategic decisions. The mining techniques used in the process usually have high computational costs and force the analyst to make complex choices. The complexity stems from the diversity of tasks that may be used in the analysis and from the large amount of alternatives to execute each task. The most common data mining tasks include data classification, labeling and clustering, outlier detection and missing data prediction. The large computational cost comes from the need to explore several alternative solutions, in different combinations, to obtain the desired information. Although the same tasks applied to traditional data are also necessary for more complex data, such as images, graphs, audio and long texts, the complexity and the computational costs associated to handling large amounts of these complex data increase considerably, making the traditional techniques impractical. Therefore, especial data mining techniques for this kind of data need to be developed. We discussed new data mining techniques for large sets of complex data, especially for the clustering task tightly associated to other mining tasks that are performed together. Specifically, this book described in detail three novel data mining algorithms well-suited to analyze large sets of complex data: the method Halite for correlation clustering [11, 13]; the method BoW for clustering Terabyte-scale datasets [14]; and the method QMAS for labeling and summarization [12].

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
www.yahoo.com
2.
twitter.com
3.
Table 7.1 includes a summary of one table found in [17], i.e., Table 7.1 includes a selection of most relevant desirable properties and most closely related works from the original table. Table 7.1 also includes two novel desirable properties not found in [17]—Linear or quasi-linear complexity and Terabyte-scale data analysis.

References

Achtert, E., Böhm, C., David, J., Kröger, P., Zimek, A.: Global correlation clustering based on the hough transform. Stat. Anal. Data Min 1, 111–127 (2008). doi:10.1002/sam.v1:3
Article MathSciNet Google Scholar
Achtert, E., Böhm, C., Kriegel, H.P., Kröger, P., Zimek, A.: Robust, complete, and efficient correlation clustering. SDM, USA (2007)
Google Scholar
Aggarwal, C.C., Yu, P.S.: Finding generalized projected clusters in high dimensional spaces. SIGMOD Rec. 29(2), 70–81 (2000). doi:10.1145/335191.335383
Article Google Scholar
Aggarwal, C., Yu, P.: Redefining clustering for high-dimensional applications. IEEE TKDE 14(2), 210–225 (2002). doi:10.1109/69.991713
Google Scholar
Aggarwal, C.C., Wolf, J.L., Yu, P.S., Procopiuc, C., Park, J.S.: Fast algorithms for projected clustering. SIGMOD Rec. 28(2), 61–72 (1999). doi:10.1145/304181.304188
Article Google Scholar
Agrawal, R., Gehrke, J., Gunopulos, D., Raghavan, P.: Automatic subspace clustering of high dimensional data for data mining applications. SIGMOD Rec. 27(2), 94–105 (1998). doi:10.1145/276305.276314
Article Google Scholar
Agrawal, R., Gehrke, J., Gunopulos, D., Raghavan, P.: Automatic subspace clustering of high dimensional data. Data Min. Knowl. Discov. 11(1), 5–33 (2005). doi:10.1007/s10618-005-1396-1
Article MathSciNet Google Scholar
Bohm, C., Kailing, K., Kriegel, H.P., Kroger, P.: Density connected clustering with local subspace preferences. In: ICDM ’04: Proceedings of the Fourth IEEE International Conference on Data Mining, pp. 27–34. IEEE Computer Society, USA (2004).
Google Scholar
Böhm, C., Kailing, K., Kröger, P., Zimek, A.: Computing clusters of correlation connected objects. In: SIGMOD, pp. 455–466. USA (2004). http://doi.acm.org/10.1145/1007568.1007620
Cheng, C.H., Fu, A.W., Zhang, Y.: Entropy-based subspace clustering for mining numerical data. In: KDD, pp. 84–93. NY, USA (1999). http://doi.acm.org/10.1145/312129.312199
Cordeiro, R.L.F., Traina, A.J.M., Faloutsos, C., Traina Jr, C.: Finding clusters in subspaces of very large, multi-dimensional datasets. In: Li, F., Moro, M.M., Ghandeharizadeh, S., Haritsa, J.R., Weikum, G., Carey, M.J., Casati, F., Chang, E.Y., Manolescu, I., Mehrotra, S., Dayal, U., Tsotras, V.J. (eds.) pp. 625–636. IEEE In ICDE. (2010).
Google Scholar
Cordeiro, R.L.F., Guo, F., Haverkamp, D.S., Horne, J.H., Hughes, E.K., Kim, G., Traina, A.J.M., Traina Jr., C., Faloutsos, C.: Qmas: Querying, mining and summarization of multi-modal databases. In: Webb, G.I., Liu, B., Zhang, C., Gunopulos, D., Wu, X. (eds.) ICDM, pp. 785–790. IEEE Computer Society (2010).
Google Scholar
Cordeiro, R.L.F., Traina, A.J.M., Faloutsos, C., Traina Jr., C.: Halite: Fast and scalable multi-resolution local-correlation clustering. IEEE Trans. Knowl. Data Eng. 99(PrePrints) (2011). doi:10.1109/TKDE.2011.176.
Cordeiro, R.L.F., Traina Jr., C., Traina, A.J.M., López, J., Kang, U., Faloutsos, C.: Clustering very large multi-dimensional datasets with mapreduce. In: C. Apté, J. Ghosh, P. Smyth (eds.) KDD, pp. 690–698. ACM (2011).
Google Scholar
Friedman, J.H., Meulman, J.J.: Clustering objects on subsets of attributes (with discussion). J. Roy. Stat. Soc. Ser. B 66(4), 815–849 (2004). doi:a/bla/jorssb/v66y2004i4p815-849.html
Article MathSciNet MATH Google Scholar
Kriegel, H.P., Kröger, P., Renz, M., Wurst, S.: A generic framework for efficient subspace clustering of high-dimensional data. In: ICDM, pp. 250–257. Washington, USA (2005). http://dx.doi.org/10.1109/ICDM.2005.5
Kriegel, H.P., Kröger, P., Zimek, A.: Clustering high-dimensional data: A survey on subspace clustering, pattern-based clustering, and correlation clustering. ACM TKDD 3(1), 1–58 (2009). doi:10.1145/1497577.1497578
Article Google Scholar
Kröger, P., Kriegel, H.P., Kailing, K.: Density-connected subspace clustering for high-dimensional data. SDM, USA (2004)
Google Scholar
Moise, G., Sander, J., Ester, M.: P3C: A robust projected clustering algorithm. In: ICDM, pp. 414–425. IEEE Computer Society (2006).
Google Scholar
Moise, G., Sander, J., Ester, M.: Robust projected clustering. Knowl. Inf. Syst. 14(3), 273–298 (2008). doi:10.1007/s10115-007-0090-6
Article MATH Google Scholar
Procopiuc, C.M., Jones, M., Agarwal, P.K., Murali, T.M.: A monte carlo algorithm for fast projective clustering. In: SIGMOD, pp. 418–427. USA (2002). http://doi.acm.org/10.1145/564691.564739

Download references

Author information

Authors and Affiliations

Computer Science Department (ICMC), University of Sao Paulo, Avenue do Trabalhador Saocarlense 400, Sao Carlos, Sao Paulo, 13566-590, Brazil
Robson L. F. Cordeiro & Caetano Traina Júnior
Department of Computer Science, Carnegie Mellon University, Forbes Avenue 5000, Pittsburgh, Pennsylvania, 15213, USA
Christos Faloutsos

Authors

Robson L. F. Cordeiro
View author publications
You can also search for this author in PubMed Google Scholar
Christos Faloutsos
View author publications
You can also search for this author in PubMed Google Scholar
Caetano Traina Júnior
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Robson L. F. Cordeiro .

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Cordeiro, R.L., Faloutsos, C., Traina Júnior, C. (2013). Conclusion. In: Data Mining in Large Sets of Complex Data. SpringerBriefs in Computer Science. Springer, London. https://doi.org/10.1007/978-1-4471-4890-6_7

Download citation

DOI: https://doi.org/10.1007/978-1-4471-4890-6_7
Published: 11 January 2013
Publisher Name: Springer, London
Print ISBN: 978-1-4471-4889-0
Online ISBN: 978-1-4471-4890-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics