Summarization-Guided Greedy Optimization of Machine Learning Model

Ruta, Dymitr; Cen, Ling; Damiani, Ernesto

doi:10.1007/978-3-319-62416-7_22

Dymitr Ruta¹⁴,
Ling Cen¹⁴ &
Ernesto Damiani¹⁴

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 10358))

Included in the following conference series:

International Conference on Machine Learning and Data Mining in Pattern Recognition

3809 Accesses

Abstract

Immense amounts of unstructured data account for up to 90% of all human generated data, yet the attempts to extract significant value from it with Machine Learning (ML) and Big Data (BD) technologies yield limited successes. We propose a generic approach to deep data summarization and subsequent automated ML design optimization to extract maximum predictive value from big data. Knowledge summarization is a central component of the proposed methodology and we argue that coupled with strictly linear modeling complexity, hierarchical decomposition and optimized model design may define a backbone of the new platform for automated and scalable construction of robust ML models. We consider ML build process as data journeys through the layers of modeling that consistently follow the same patterns of data summarization and transformation at the subsequent layers of abstraction. In such framework we argue that the robust construction of the ML model can be achieved through hierarchical greedy optimization of the links between connected ML model components. We demonstrate several case studies of deep data summarization and automated ML model design on text, numerical time series and images data. We point out that application awareness allows to deepen data summarizations while maintaining or improving its predictive value.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Yu, P., McLaughlin, J., Levy, M., Data, B.: A big disappointment for scoring consumer credit risk. NCLC report (2014)
Google Scholar
Sicular, S.: Big Data is Falling into the Through of Disillusionment. Gartner Blog Network (2013)
Google Scholar
Bengio, Y., Goodfellow, I., Courville, A.: Deep Learning. MIT Press, Cambridge (2015)
MATH Google Scholar
Schmidhuber, J.: Deep learning in neural networks: an overview. Nural Netw. 61, 85–117 (2015)
Article Google Scholar
Zhang, Z., Huang, Z., Zhang, Z.: Knowledge summarization for scalable semantic data processing. J. Comput. Inf. Syst. 6(12), 3893–3902 (2010)
Google Scholar
Changsheng, X., Maddage, M.C., Xi, S.: Automatic music classification and summarization. IEEE Trans. Speech Audio 13(3), 441–450 (2004)
Article Google Scholar
Ekin, A., Tekalp, A.M.: Automatic soccer video analysis and summarization. IEEE Trans. Image Process. 12(7), 796–807 (2003)
Article Google Scholar
Hori, C., Furui, S.: A new approch to automatic speech summarization. IEEE Trans. Multimedia 5(3), 368–378 (2003)
Article Google Scholar
Yang, C., Junsong, Y., Jiebo, L.: Towards scalable summarization of consumer videos via sparse dictionary selection. IEEE Trans. Multimedia 14(1), 66–75 (2012)
Article Google Scholar
Chandola, V., Kumar, V.: Summarization - compressing data into an informative representation. In: Proceedings 5th IEEE International Conference on Data Mining, New Orleans (2005)
Google Scholar
Hahn, U., Mani, I.: The challenges of automatic summarization. Computer 33(11), 29–36 (2000)
Article Google Scholar
Bishop, C.M.: Model-based machine learning. Philos. Trans. R. Soc. A (2013)
Google Scholar
Langley, P.: Artificial intelligence and cognitive systems. AISB Q. 133, 1–4 (2012)
Google Scholar
Goldstein, J., Kantrowitz, M., Mittal, V., Carbonell, J.: Summarizing text documents: sentence selection and evaluation metrices. In: Proceedings 22nd International ACM SIGIR Conference on Research and Development in Information Retrieval, pp 121–128 (1999)
Google Scholar
Steinberger, J., Jezek, K.: Evaluation measures for text summarization. Comput. Inf. 28, 1001–1026 (2009)
MATH Google Scholar
d’Acierno, A., Moscato, V., Persia, F., Picariello, A., Penta, A.: Semantic summarization of web documents. In: Proceedings IEEE 4th International Conference on Semantic Computingng, pp: 430–435 (2010)
Google Scholar
Verma, R., Chen, P., Lu, W.: A semantic free-text summarization system using ontology knowledge. IEEE Trans. Inf. Technol. Biomed. 5(4), 261–270 (2007)
Google Scholar
Li, T., Zhu, S., Ogihara, M.: Hierarchical document classification using automatically generated hierarchy. J. Intell. Inf. Syst. 29(2), 211–230 (2007)
Article Google Scholar
Vohra, S.M., Teraiya, J.B.: A comparative study of sentiment analysis techniques. J. Inf. Knowl. Res. Comput. Eng. 2(2), 313–317 (2013)
Google Scholar
Saeys, Y., Inza, I., Larranaga, P.: A review of feature selection techniques in bioinformatics. Bioinformatics 19(23), 2507–2517 (2007)
Article Google Scholar
Duda, R.O., Hart, P.E., Stork, D.G.: Pattern Classification. Wiley, New York (2001)
MATH Google Scholar
Bishop, C.M.: Pattern Recognition and Machine Learning. Springer, New York (2006)
MATH Google Scholar
Pan, F., Wang, W., Tung, K.H., Yang, J.: Finding Representative Set from Massive Data. Springer, New York (2006)
Google Scholar
Aggarwal, C.C., Reddy, C.K.: Data Clustering: Algorithms and Applications. CRC Press, Boca Raton (2014)
MATH Google Scholar
Roth, V.: Outlier detection with one-class kernel fisher discriminant. Adv. Neural Inf. Process. Syst. 17, 1169–1176 (2004)
Google Scholar
Mitra, P.: Density-based multiscale data condensation. IEEE Trans. Pattern Anal. Mach. Intell. 24(6), 734–747 (2002)
Article Google Scholar
Yang, P., Li, J.-S., Huang, Y.-X.: HDD: a hypercube division-based algorithm for discretisation. Int. J. Sys. Sci. 42(4), 557–566 (2010)
Article MathSciNet MATH Google Scholar
Sauvola, J., Pietikainen, M.: Adaptive document image binarization. Pattern Recogn. 33, 225–236 (2000)
Article Google Scholar
Zhou, X., Wang, X., Dougherty, E.R.: Binarization of microarray data on the basis of a mixture model. Mol. Cancer Ther. 2(7), 679–684 (2003)
Google Scholar
Tomas, J., Cascuberta, F.: Binary feature classification for word disambiguation in statistical machine translation. In: Proceedings 2nd International Workshop on Pattern Recognition in Information Systems (2002)
Google Scholar
Mitchell, T.: Generative and discriminative classifiers: naive bayes and logistic regression. In: Machine Learning. McGraw Hill (2010)
Google Scholar
Ng, A.Y., Jordan, M.I.: On discriminative vs. generative classifiers: a comparison of logistic regression and naive Bayes. In: NIPS 14 (2002)
Google Scholar
Friedman, J., Hastie, T., Tibshirani, R.: Regularization paths for generalized linear models via coordinate descent. J. Stat. Softw. 33(1), 1–22 (2010)
Google Scholar
Sugiyama, M., Yamamoto, A.: A fast and flexible clustering algorithm using binary discretization. In: 11th IEEE International Conference on Data Mining, pp. 1213–1217 (2011)
Google Scholar
Dietterich, T.G., Bakiri, G.: Solving multiclass learning problems via error-correcting output codes. J. Artif. Intell. Res. 2, 263–286 (1995)
MATH Google Scholar
Aly, M.: Survey on multiclass classification methods. Caltech Technical report (2005)
Google Scholar
Rokach, L.: Pattern Classification Using Ensemble Methods. World Scientific, River Edge (2010)
MATH Google Scholar
Maimon, O., Rokach, L.: Improving supervised learning by feature decomposition. In: Eiter, T., Schewe, K.-D. (eds.) FoIKS 2002. LNCS, vol. 2284, pp. 178–196. Springer, Heidelberg (2002). doi:10.1007/3-540-45758-5_12
Chapter Google Scholar
Drineas, P., Mahoney, M.W., Muthukrishnan, S., Sampling, S.: Approximation, relative-error matrix: column-row-based methods. In: Proceedings 14th Annual Symposium on Algorithms, pp. 304–314 (2006)
Google Scholar

Download references

Author information

Authors and Affiliations

Emirates ICT Innovation Center (EBTIC), Khalifa University of Science and Technology, P.O. Box 127788, Abu Dhabi, United Arab Emirates
Dymitr Ruta, Ling Cen & Ernesto Damiani

Authors

Dymitr Ruta
View author publications
You can also search for this author in PubMed Google Scholar
Ling Cen
View author publications
You can also search for this author in PubMed Google Scholar
Ernesto Damiani
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Dymitr Ruta .

Editor information

Editors and Affiliations

Institute of Computer Vision and Applied Computer Sciences, Leipzig, Sachsen, Germany
Petra Perner

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Ruta, D., Cen, L., Damiani, E. (2017). Summarization-Guided Greedy Optimization of Machine Learning Model. In: Perner, P. (eds) Machine Learning and Data Mining in Pattern Recognition. MLDM 2017. Lecture Notes in Computer Science(), vol 10358. Springer, Cham. https://doi.org/10.1007/978-3-319-62416-7_22

Download citation

DOI: https://doi.org/10.1007/978-3-319-62416-7_22
Published: 02 July 2017
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-62415-0
Online ISBN: 978-3-319-62416-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics