Abstract
Among the neural network architectures for prediction, multi-layer perceptron (MLP), radial basis function (RBF), wavelet neural network (WNN), general regression neural network (GRNN), and group method of data handling (GMDH) are popular. Out of these architectures, GRNN is preferable because it involves single-pass learning and produces reasonably good results. Although GRNN involves single-pass learning, it cannot handle big datasets because a pattern layer is required to store all the cluster centers after clustering all the samples. Therefore, this paper proposes a hybrid architecture, GRNN++, which makes GRNN scalable for big data by invoking a parallel distributed version of K-means++, namely, K-means||, in the pattern layer of GRNN. The whole architecture is implemented in the distributed parallel computational architecture of Apache Spark with HDFS. The performance of the GRNN++ was measured on gas sensor dataset which has 613 MB of data under a ten-fold cross-validation setup. The proposed GRNN++ produces very low mean squared error (MSE). It is worthwhile to mention that the primary motivation of this article is to present a distributed and parallel version of the traditional GRNN.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Kusakunniran, W., Wu, Q., Zhang, J., Li, H.: Multi-view gait recognition based on motion regression using multilayer perceptron. In: 2010 20th International Conference on Pattern Recognition, pp 2186–2189. IEEE, Istanbul (2010)
Agirre-Basurko, E., Ibarra-Berastegi, G., Madariaga, I.: Regression and multilayer perceptron-based models to forecast hourly O3 and NO2 levels in the Bilbao area. Environ. Model Softw. 21, 430–446 (2006)
Gaudart, J., Giusiano, B., Huiart, L.: Comparison of the performance of multi-layer perceptron and linear regression for epidemiological data. Comput. Stat. Data Anal. 44, 547–570 (2004)
Mignon, A., Jurie, F.: Reconstructing faces from their signatures using RBF regression. In: Procedings of the British Machine Vision Conference 2013, pp 103.1–103.11. British Machine Vision Association, Bristol (2013)
Hannan, S.A., Manza, R.R., Ramteke, R.J.: Generalized regression neural network and radial basis function for heart disease diagnosis. Int. J. Comput. Appl. 7, 7–13 (2010)
Taki, M., Rohani, A., Soheili-Fard, F., Abdeshahi, A.: Assessment of energy consumption and modeling of output energy for wheat production by neural network (MLP and RBF) and Gaussian process regression (GPR) models. J. Clean. Prod. 172, 3028–3041 (2018)
Budu, K.: Comparison of wavelet-based ANN and regression models for reservoir inflow forecasting. J. Hydrol. Eng. 19, 1385–1400 (2014)
Vinaykumar, K., Ravi, V., Carr, M., Rajkiran, N.: Software development cost estimation using wavelet neural networks. J. Syst. Softw. 81, 1853–1867 (2008)
Chauhan, N., Ravi, V., Karthik Chandra, D.: Differential evolution trained wavelet neural networks: Application to bankruptcy prediction in banks. Expert Syst. Appl. 36, 7659–7665 (2009)
Rajkiran, N., Ravi, V.: Software reliability prediction using wavelet neural networks. In: International Conference on Computational Intelligence and Multimedia Applications (ICCIMA 2007), pp 195–199. IEEE, Sivakasi (2007)
Astakhov, V.P., Galitsky, V.V.: Tool life testing in gundrilling: an application of the group method of data handling (GMDH). Int. J. Mach. Tools Manuf 45, 509–517 (2005)
Elattar, E.E., Goulermas, J.Y., Wu, Q.H.: Generalized locally weighted GMDH for short term load forecasting. IEEE Trans. Syst. Man Cybern. Part C (Appl. Rev.) 42, 345–356 (2012)
Srinivasan, D.: Energy demand prediction using GMDH networks. Neurocomputing 72, 625–629 (2008)
Ravisankar, P., Ravi, V.: Financial distress prediction in banks using group method of data handling neural network, counter propagation neural network and fuzzy ARTMAP. Knowl. Based Syst. 23, 823–831 (2010)
Mohanty, R., Ravi, V., Patra, M.R.: Software reliability prediction using group method of data handling. In: Sakai, H., Chakraborty, M.K., Hassanien, A.E., Ślęzak, D., Zhu, W. (eds.) Rough Sets, Fuzzy Sets, Data Mining and Granular Computing. RSFDGrC 2009, pp 344–351. Springer, Berlin (2009)
Reddy, K.N., Ravi, V.: Kernel group method of data handling: application to regression problems. In: Panigrahi, B.K., Das, S., Suganthan, P.N., Nanda, P.K. (eds.) Swarm, Evolutionary, and Memetic Computing. SEMCCO 2012, pp 74–81. Springer, Berlin (2012)
Ahad, N., Qadir, J., Ahsan, N.: Neural networks in wireless networks: techniques, applications and guidelines. J. Netw. Comput. Appl. 68, 1–27 (2016)
Jin, L., Li, S., Yu, J., He, J.: Robot manipulator control using neural networks: A survey. Neurocomputing 285, 23–34 (2018)
Marugán, A.P., Márquez, F.P.G., Perez, J.M.P., Ruiz-Hernández, D.: A survey of artificial neural network in wind energy systems. Appl. Energy 228, 1822–1836 (2018)
Agrawal, S., Agrawal, J.: Neural network techniques for cancer prediction: a survey. Proc. Comput. Sci. 60, 769–774 (2015)
Khoshroo, A., Emrouznejad, A., Ghaffarizadeh, A., Kasraei, M., Omid, M.: Sensitivity analysis of energy inputs in crop production using artificial neural networks. J. Clean. Prod. 197(Part 1), 992–998 (2018)
Tkáč, M., Verner, R.: Artificial neural networks in business: two decades of research. Appl. Soft Comput. 38, 788–804 (2016)
Specht, D.F.: A general regression neural network. IEEE Trans. Neural Netw. 2, 568–576 (1991)
Bahmani, B., Moseley, B., Vattani, A., Kumar, R., Vassilvitskii, S.: Scalable K-means++. Proc. VLDB Endow. 5, 622–633 (2012)
Arthur, D., Vassilvitskii, S.: k-means ++: the advantages of careful seeding. In: Proceedings of the Eighteenth Annual ACM-SIAM Symposium on Discrete Algorithms, pp 1027–1035 (2007)
Zhao, W., Ma, H., He, Q.: Parallel K-means clustering based on MapReduce. In: Jaatun, M.G., Zhao, G., Rong, C. (eds.) Cloud Computing, pp. 674–679. Springer, Berlin (2009)
Liao, Q., Yang, F., Zhao, J.: An improved parallel K-means clustering algorithm with MapReduce. In: 2013 15th IEEE International Conference on Communication Technology, pp 764–768. IEEE (2013)
Kamaruddin, S., Ravi, V., Mayank, P.: Parallel evolving clustering method for big data analytics using apache spark: applications to banking and physics. In: Reddy, P., Sureka, A., Chakravarthy, S., Bhalla, S. (eds.) Lecture Notes in Computer Science. Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics, pp. 278–292. Springer, Cham (2017)
Leung, M.T., Chen, A.-S., Daouk, H.: Forecasting exchange rates using general regression neural networks. Comput. Oper. Res. 27, 1093–1110 (2000)
Kayaer, K., Yildirim, T.: Medical diagnosis on Pima Indian diabetes using general regression neural networks. In: Proceedings of the International Conference on Artificial Neural Networks and Neural Information Processing (ICANN/ICONIP), pp 181–184 (2003)
Li, C., Bovik, A.C., Wu, X.: Blind image quality assessment using a general regression neural network. IEEE Trans. Neural Netw. 22, 793–799 (2011)
Li, H., Guo, S., Li, C., Sun, J.: A hybrid annual power load forecasting model based on generalized regression neural network with fruit fly optimization algorithm. Knowl. Based Syst. 37, 378–387 (2013)
Ravi, V., Krishna, M.: A new online data imputation method based on general regression auto associative neural network. Neurocomputing 138, 106–113 (2014)
Tejasviram, V., Solanki, H., Ravi, V., Kamaruddin, S.: Auto associative extreme learning machine based non-linear principal component regression for big data applications. In: 2015 Tenth International Conference on Digital Information Management (ICDIM), pp 223–228. IEEE, Jeju (2015)
Kamaruddin, S., Ravi, V.: Credit card fraud detection using big data analytics: use of PSOAANN based one-class classification. In: Proceedings of the International Conference on Informatics and Analytics—ICIA-16, pp 1–8. ACM Press, Pondicherry (2016)
Dunn, J.C.: A fuzzy relative of the ISODATA process and its use in detecting compact well-separated clusters. J. Cybern. 3, 32–57 (1973)
Bezdek, J.C., Pal, N.R.: Some new indices of cluster validity. IEEE Trans. Syst. Man Cybern. B Cybern. 28, 301–315 (1998)
Fonollosa, J., Sheik, S., Huerta, R., Marco, S.: Reservoir computing compensates slow response of chemosensor arrays exposed to fast varying gas concentrations in continuous monitoring. Sensors Actuators B Chem. 215, 618–629 (2015)
Gas sensor array under dynamic gas mixtures Data Set, https://archive.ics.uci.edu/ml/datasets/Gas+sensor+array+under+dynamic+gas+mixtures
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Kamaruddin, S., Ravi, V. (2020). GRNN++: A Parallel and Distributed Version of GRNN Under Apache Spark for Big Data Regression. In: Sharma, N., Chakrabarti, A., Balas, V. (eds) Data Management, Analytics and Innovation. Advances in Intelligent Systems and Computing, vol 1042. Springer, Singapore. https://doi.org/10.1007/978-981-32-9949-8_16
Download citation
DOI: https://doi.org/10.1007/978-981-32-9949-8_16
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-32-9948-1
Online ISBN: 978-981-32-9949-8
eBook Packages: EngineeringEngineering (R0)