Abstract
In the times of data increasing explosively, data preprocessing technology is particularly important for extracting information from massive data. In this paper, data preprocessing technology was implemented by building models including missing data imputation, duplicate values removal, outlier detections, data standardization and data statute based on the wine data in the UCI data set. Then the preprocessed data was compared with raw data with K-means algorithm, linear regression model and decision tree classification algorithm. The experimental results showed that after data preprocessing, the clustering error was significantly reduced, the fitness of the linear regression model increased and the classification accuracy of decision tree was higher, which showed the importance of data preprocessing and may have some referenced value to optimize data processing.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Zhou, Q.: Analysis of common data preprocessing techniques. World Commun. 26(01), 17–18 (2019)
Han, J., et al.: Data preprocessing. In: Han, J., Kamber, M., Pei, J. (eds.) Data Mining, 3rd edn., pp. 83–124. Morgan Kaufmann, Boston (2012)
Dunham, M.H.: Data Mining: Introductory and Advanced Topics. Prentice Hall PTR, Upper Saddle River (2002)
Jian, Z., Jin, X.: Research on data preprocess in data mining and its application. Appl. Res. Comput. 7,117–118+157 (2004)
Sreenivas, P., Srikrishna, C.V.: An analytical approach for data preprocessing. In: 2013 International Conference on Emerging Trends in Communication, Control, Signal Processing and Computing Applications (C2SPCA), Bangalore, pp. 1–12 (2013)
Sun, B.: Research on data-preprocessing for construction of university information systems. In: 2010 International Conference on Computer Application and System Modeling (ICCASM 2010), Taiyuan, pp. V1-459–V1-462 (2010)
Liu, K.: Clinical data preprocessing and case studies of POMDP for TCM treatment knowledge discovery. In: IEEE International Conference on E-Health Networking. IEEE (2012)
Kumar, M., Kalia, A.: Preprocessing and symbolic representation of stock data. In: Second International Conference on Advanced Computing & Communication Technologies. IEEE (2012)
Hawkins, D.: Indentification of Outliers. Chapman and Hall, London (1980)
Laurikkala, J., Juhola, M., Kentala, E.: Informal identification of outliers in medical data. In: Fifth International Workshop on Intelligent Data Analysis in Medicine and Pharmacology, Berlin (2000)
Breunig, M., Kriegel, H.P., Ng, R., et al.: LOF: indentifying density based local outliers. In: Proceeding of ACM SIGMOD Conference, Dallas, pp. 93–104 (2009)
Liu, J., Zhang, K., Wang, G.: Comparative study on data standardization methods in comprehensive evaluation. Digit. Technol. Appl. 36(06), 84–85 (2018)
Azar, A.T., Hassanien, A.E.: Dimensionality reduction of medical big data using neural-fuzzy classifier. Soft. Comput. 19, 1115–1127 (2015)
Chu, F., Wang, L.P.: Applications of support vector machines to cancer classification with microarray data. Int. J. Neural Syst. 15(6), 475–484 (2005)
Wang, L.P., Chu, F., Xie, W.: Accurate cancer classification using expressions of very few genes. IEEE-ACM Trans. Bioinf. Comput. Biol. 4, 40–53 (2007)
Zhang, L., Wang, L.P., Lin, W.: Semi-supervised biased maximum margin analysis for interactive image retrieval. IEEE Trans. Image Process. 21(4), 2294–2308 (2012)
Gao, H.: Experimental research on decision tree J48 algorithm based on weka platform. J. Hunan Inst. Sci. Technol. (Nat. Sci. Ed.) 30(01), 21–25 (2017)
Acknowledgements
This paper is partially supported by The National Natural Science Foundation of China (No. 61563044, 61866031); National Natural Science Foundation of Qinghai Province (No. 2017-ZJ-902); The Applied Basic Research Programs of Science and Technology Department of Sichuan Province (No. 2019YJ0110); Youth Foundation of Qinghai University (No. 2017-QGY-4, 2018-QGY-7); Teaching Research Project of Qinghai University(KC18038, SZ18015, JY201805); Open Research Fund Program of State key Laboratory of Hydroscience and Engineering (No. sklhse-2017-A-05).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this paper
Cite this paper
Meng, X., Zhu, X., Yang, S., Wang, L., Qi, J., Yang, P. (2020). Research on Wine Analysis Based on Data Preprocessing. In: Liu, Y., Wang, L., Zhao, L., Yu, Z. (eds) Advances in Natural Computation, Fuzzy Systems and Knowledge Discovery. ICNC-FSKD 2019. Advances in Intelligent Systems and Computing, vol 1075. Springer, Cham. https://doi.org/10.1007/978-3-030-32591-6_63
Download citation
DOI: https://doi.org/10.1007/978-3-030-32591-6_63
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-32590-9
Online ISBN: 978-3-030-32591-6
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)