Abstract
This paper explores the decision fusion for the phoneme recognition problem through intelligent combination of Naive Bayes and Learning Vector Quantization (LVQ) classifiers and feature fusion using Mel-frequency Cepstral Coefficients (MFCC), Relative Spectral Transform—Perceptual Linear Prediction (Rasta-PLP) and Perceptual Linear Prediction (PLP). This work emphasizes optimal decision making from decisions of classifiers which are trained on different features. The proposed architecture consists of three decision fusion approaches which are weighted mean, deep belief networks (DBN) and fuzzy logic. We proposed a performance comparison on a dataset of an African language phoneme, Fongbe, for experiments. The latter produced the overall decision fusion performance with the proposed approach using fuzzy logic whose classification accuracies are 95.54 % for consonants and 83.97 % for vowels despite the lower execution time of Deep Belief Networks.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Ager, M., Cvetkovic, Z., Sollich, P.: Phoneme classification in high-dimensional linear feature domains. Comput. Res. Repository (2013)
Agoli-Agbo, E.O., Bernard, C.: Les particules nonciatives du fon. Institut national des langues et civilisations orientales, Paris, 1st edition (2009)
Akoha, A.B.: Syntaxe et lexicologie du fon-gbe: Bénin. Ed. L’harmattan, p. 368 (2010)
Bengio, Y., Lamblin, P., Popovici, D., Larochelle, H.: Greedy layer-wise training of deep networks. In: Advances in Neural Information Processing Systems (2006)
Borne, P., Benrejeb, M., Haggege, J.: Les rseaux de neurones, présentation et applications. TECHNIP Editions, p. 90 (2007)
Cho, S.-B., Kim, J.: Combining multiple neural networks by fuzzy integral and robust classification. IEEE Trans. Syst. Man Cybern. 380–384 (1995)
Corradini, A., Mehta, M., Bernsen, N., Martin, J., Abrilian, S.: Multimodal input fusion in humancomputer interaction. In: NATO-ASI Conference on Data Fusion for Situation Monitoring, Incident Detection, Alert and Response Management (2003)
Esposito, A., Ezin, E., Ceccarelli, M.: Preprocessing and neural classification of english stop consonants [b, d, g, p, t, k]. In: The 4th International Conference on Spoken Language Processing, pp. 1249–1252. Philadelphia (1996)
Esposito, A., Ezin, E., Ceccarelli, M.: Phoneme classification using a rasta-PLP preprocessing algorithm and a time delay neural network: performance studies. In: Proceedings of the 10th Italian Workshop on Neural Nets, pp. 207–217. Salerno (1998)
Foucher, S., Laliberte, F., Boulianne, G., Gagnon, L.: A dempster-shafer based fusion approach for audio-visual speech recognition with application to large vocabulary french speech. In: IEEE International Conference on Acoustics, Speech and Signal Processing, vol. 1 (2006)
Genussov, M., Lavner, Y., Cohen, I.: Classification of unvoiced fricative phonemes using geometric methods. In: 12th International Workshop on Acoustic Echo and Noise Control. Tel-Aviv, Israel (2010)
Hinton, G., Osindero, S., Teh, Y.: A fast learning algorithm for deep belief nets. Neural Comput. 18, 1527–1554 (2006)
Iyengar, G., Nock, H., Neti., C.: Audio-visual synchrony for detection of monologue in video archives. In: IEEE International Conference on Multimedia and Expo, vol. 1, pp. 329–332 (2003)
Jacobs, R.: Methods for combining experts’s probability assessments. Neural Comput. 867–888 (1995)
Jacobs, R., Jordan, M., Nowlan, S., Hinton, G.: Adaptive mixture of local experts. Neural Comput. 79–87 (1991)
Kittler, J., Hatef, M., Duin, R., Matas, J.: On combining classifiers. IEEE Trans. Patt. Anal. Mach. Intell. 226–239 (1998)
Kohonen, T.: An introduction to neural computing. Neural Netw. 1, 3–16 (1988)
Laleye, F.A.A., Ezin, E.C., Motamed, C.: Weighted combination of naive bayes and lvq classifier for fongbe phoneme classification. In: Tenth International Conference on Signal-Image Technology and Internet-Based Systems, pp. 7–13, Marrakech. IEEE (2014)
Le, V.-B., Besacier, L.: Automatic speech recognition for under-resourced languages: Application to vietnamese language. In: IEEE Transactions on Audio, Speech, and Language Processing, pp. 1471–1482. IEEE (2009)
Lefebvre, C., Brousseau, A.: A grammar of fonge, de gruyter mouton, p. 608 (2001)
Lewis, T.W., Powers., D.M.: Improved speech recognition using adaptive audio-visual fusion via a stochastic secondary classifier. Int. Symp. Intell. Multimedia Video Speech Process. 1, 551–554 (2001)
Lung, J.W.J., Salam, M.S.H., Rehman, A., Rahim, M.S.M., Saba, T.: Fuzzy phoneme classification using multi-speaker vocal tract length normalization. IETE Technical Review, London, 2nd edn (2014)
Malcangi, M., Ouazzane, K., Patel, K.: Audio-visual fuzzy fusion for robust speech recognition. In: The 2013 International Joint Conference on Neural Networks (IJCNN), pp. 1–8. IEEE, Dallas (2013)
Metallinou, A., Lee, S., Narayanan, S.: Decision level combination of multiple modalities for recognition and analysis of emotional expression. In: IEEE International Conference on Acoustics Speech and Signal Processing (ICASSP), pp. 2462–24665 (2010)
Meyer, G., Mulligan, J., Wuerger, S.: Continuous audio-visual digit recognition using n-best decision fusion. Inf. Fusion 5, 91–101 (2004)
Mugler, E.M., Patton, J.L., Flint, R.D., Wright, Z.A., Schuele, S.U., Rosenow, J., Shih, J.J., Krusienski, D.J., Slutzky, M.W.: Direct classification of all american english phonemes using signals from functional speech motor cortex. J. Neural Eng. (2014)
Neti, C., Maison, B., Senior, A., Iyengar, G., Decuetos, P., Basu, S., Verma., A.: Joint processing of audio and visual information for multimedia indexing and human-computer interaction. In: Sixth International Conference RIAO, pp. 294–301. France, Paris (2000)
Niesler, T., Louw, P.H.: Comparative phonetic analysis and phoneme recognition for afrikaans, english and xhosa using the african speech technology telephone speech database. S. Afr. Comput. J. 3–12 (2004)
O’Connor, P., Neil, D., SC, L., Delbruck, T., Pfeiffer, M.: Real-time classification and sensor fusion with a spiking deep belief network. Front. Neurosci. (2013)
Palaz, D., Collobert, R., Magimai.-Doss, M.: End-to-end phoneme sequence recognition using convolutional neural networks. Idiap-RR (2013)
Pfleger, N.: Context based multimodal fusion. In: ACM International Conference on Multimodal Interfaces, pp. 265–272 (2004)
Pitsikalis, V., Katsamanis, A., Papandreou, G., Maragos, P.: Adaptive multimodal fusion by uncertainty compensation. In: Ninth International Conference on Spoken Language Processing, vol. 7, pp. 423–435. Pittsburgh (2006)
Rogova, G.: Combining the results of several neural networks classifiers. Neural Netw. 777–781 (1994)
Schlippe, T., Djomgang, E.G.K., Vu, N.T., Ochs, S., Schultz, T: Hausa large vocabulary continuous speech recognition. In: The third International Workshop on Spoken Languages Technologies for Under-resourced Languages. Cape-Town (2012)
Wang, S., Yao, X.: Diversity analysis on imbalanced data sets by using ensemble models. IEEE Symp. Comput. Intell. Data Min. 324–331 (2009)
Xu, H., Chua, T.: Fusion of av features and external information sources for event detection in team sports video. ACM Trans. Multimedia Comput. Commun. Appl. 2, 44–67 (2006)
Yousafzai, J., Cvetkovic, Z., Sollich, P.: Tuning support vector machines for robust phoneme classification with acoustic waveforms. In: 10th Annual conference of the International Speech Communication Association, pp. 2359–2362. England (2009). ISCA-INST SPEECH COMMUNICATION ASSOC
Zhang, H.: Exploring conditions for the optimality of nave bayes. IJPRAI 19, 183–198 (2005)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer International Publishing Switzerland
About this paper
Cite this paper
Laleye, F.A.A., Ezin, E.C., Motamed, C. (2016). Speech Phoneme Classification by Intelligent Decision-Level Fusion. In: Filipe, J., Madani, K., Gusikhin, O., Sasiadek, J. (eds) Informatics in Control, Automation and Robotics 12th International Conference, ICINCO 2015 Colmar, France, July 21-23, 2015 Revised Selected Papers. Lecture Notes in Electrical Engineering, vol 383. Springer, Cham. https://doi.org/10.1007/978-3-319-31898-1_4
Download citation
DOI: https://doi.org/10.1007/978-3-319-31898-1_4
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-31896-7
Online ISBN: 978-3-319-31898-1
eBook Packages: EngineeringEngineering (R0)