Abstract
This work covers the problem of application of neural networks to recognition and categorization of non-fluent and fluent utterance records. Fifty-five 4-s speech samples where the blockade on plosives (p, b, t, d, k and g) occurred and 55 recordings of speech of fluent speakers containing the same fragments were applied. Two Kohonen networks were used. The purpose of the first network was to reduce the dimension of the vector describing the input signals. A result of the analysis was the output matrix consisting of the neurons winning in a particular time frame. This matrix was taken as an input for the next self-organizing map network. Various types of Kohonen networks were examined with respect to their ability to classify utterances correctly into two, non-fluent and fluent, groups. Good examination results were accomplished and classification correctness exceeded 76%.
Similar content being viewed by others
References
Chen W-Y, Chen S-H et al (1996) A speech recognition method based on the sequential multi-layer perceptrons. Neural Netw 9(4):655–669. doi:10.1016/0893-6080(95)00140-9
Chou SM, Papliński AP et al (2007) Speaker-dependent bimodal integration of Chinese phonemes and letters using multimodal self-organizing networks. Proceedings of international joint conference on neural networks, Orlando, Florida, USA
Cottrell M, Fort JC et al (1998) Theoretical aspects of the SOM algorithm. Neurocomputing 21:119–138. doi:10.1016/S0925-2312(98)00034-4
Czyżewski A, Kaczmarek A et al (2003) Intelligent processing of stuttered speech. J Intell Inf Syst 21(2):143–171. doi:10.1023/A:1024710532716
Edwards ADN, Blore A (1995) Speech input for persons with speech impairments. J Microcomput Appl 18:327–333. doi:10.1016/S0745-7138(05)80032-6
Farrell K, Mamione R et al (1994) Speaker recognition using neural networks and conventional classifiers. IEEE Trans Speech Audio Process 2(1, part 2):194–205. doi:10.1109/89.260362
Garfield S, Elshaw M et al (2001) Self-organizing networks for classification learning from normal and aphasic speech. The 23rd Conference of the Cognitive Science Society, Edinburgh
Geetha YV, Pratibha K et al (2000) Classification of childhood disfluencies using neural networks. J Fluency Disord 25:99–117. doi:10.1016/S0094-730X(99)00029-7
Hadjitodorov S, Boyanov B et al (1997) A two-level classifier for text-independent speaker identification. Speech Commun 21:209–217. doi:10.1016/S0167-6393(97)00004-6
Howel P, Vause L (1986) Acoustic analysis and perception of vowels in stuttered speech. J Acoust Soc Am 79:1571–1579. doi:10.1121/1.393684
Howell P, Au-Yeung J et al (1997) Detection of supralexical dysfluencies in a text read by children who stutter. J Fluency Disord 22:299–307. doi:10.1016/S0094-730X(97)00012-0
Kangas J, Kohonen T (1996) Developments and applications of the self-organizing map and related algorithms. Math Comput Simul 41:3–12. doi:10.1016/0378-4754(96)88223-1
Kestler HA, Schwenker F (2000) Classification of high-resolution ECG signals. In: Howlett R, Jain L (eds) Radial basis function neural networks: theory and applications. Physica-Verlag, Heidelberg
Kohonen T (1982) Analysis of a simple self-organizing process. Biol Cybern 44:135–140. doi:10.1007/BF00317973
Kohonen T (1982) Self-organized formation of topologically correct future maps. Biol Cybern 43:59–69. doi:10.1007/BF00337288
Kohonen T (1998) The self-organizing map. Neurocomputing 21:1–6. doi:10.1016/S0925-2312(98)00030-7
Kohonen T (2001) Self-organizing maps. Springer, Berlin
Kuniszyk-Jóźkowiak W (1996) A comparison of speech envelopes of stutterers and nonstutterers. J Acoust Soc Am 100(2):1105–1110. doi:10.1121/1.416295
Leinonen L, Hiltunen T et al (1997) Categorization of voice disorders with six perceptual dimensions. Folia Phoniatr Logop 49:9–20
Leinonen L, Kangas J et al (1992) Dysphonia detected by pattern recognition of spectral composition. J Speech Hear Res 35:287–295
Martinem-Hinarejos CD, Juan A et al (2003) Median strings for k-nearest neighbour classification. Pattern Recognit Lett 24:173–181. doi:10.1016/S0167-8655(02)00209-X
Mulier FM, Cherkassky VS (1995) Statistical analysis of self-organization. Neural Netw 8(5):717–727. doi:10.1016/0893-6080(95)00018-U
Nayak J, Bhat PS et al (2005) Classification and analysis of speech abnormalities. ITBM-RBM 26:319–327. doi:10.1016/j.rbmret.2005.05.002
Nour MA, Madey GR (1996) Heuristic and optimization approaches to extending the Kohonen self organizing algorithm. Eur J Oper Res 93:428–448. doi:10.1016/0377-2217(96)00033-1
Papliński AP, Gustafsson L (2005) Multimodal feedforward self-organizing maps. Lect Notes Comput Sci 3801:81–88. doi:10.1007/11596448_11
Ritchings RT, McGillion M et al (2002) Pathological voice quality assessment using artificial neural networks. Med Eng Phys 24:561–564. doi:10.1016/S1350-4533(02)00064-4
Robb M, Blomgren M (1997) Analysis of F2 transitions in the speech of stutterers and non-stutterers. J Fluency Disord 22(1):1–16. doi:10.1016/S0094-730X(96)00016-2
Schwenker F, Kestler HA et al (2001) Three learning phases for radial-basis-function networks. Neural Netw 14:439–458. doi:10.1016/S0893-6080(01)00027-2
Singh S, Hadron J et al (2001) Nearest-neighbour classifiers in natural scene analysis. Pattern Recognit 34:1601–1612. doi:10.1016/S0031-3203(00)00099-6
Smolka E, Kuniszyk-Jozkowiak W et al (2004) Speech nonfluency recognition in two stages of Kohonen networks. Biocybernetics and Biomedical Engineering, Zakopane
Smołka E, Kuniszyk-Jóźkowiak W et al (2002) Reflection of fluent and non-fluent words in Kohonen network (in Polish). XLIX Open Seminar on Acoustics, Warszawa—Stare Jabłonki
Song X-H, Hopke PK (1996) Kohonen neural network as a pattern recognition method based on weight interpretation. Anal Chim Acta 334:57–66. doi:10.1016/S0003-2670(96)00315-7
Suganthan PN (2001) Pattern classification using multiple hierarchical overlapped self-organizing maps. Pattern Recognit 34:2173–2179. doi:10.1016/S0031-3203(00)00147-3
Szczurowska I, Kuniszyk-Jóźkowiak W et al (2006) The application of Kohonen and multilayer perceptron networks in the speech nonfluency analysis. Archiv Acoust 31(4 (Supplement)):205–210
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Szczurowska, I., Kuniszyk-Jóźkowiak, W. & Smołka, E. Speech nonfluency detection using Kohonen networks. Neural Comput & Applic 18, 677–687 (2009). https://doi.org/10.1007/s00521-009-0261-3
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00521-009-0261-3