Skip to main content

Clustering Approach in Speech Phoneme Recognition Based on Statistical Analysis

  • Conference paper
Recent Trends in Network Security and Applications (CNSA 2010)

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 89))

Included in the following conference series:

Abstract

In general, speech recognition is a process that is referred to convert spoken string into machine-understandable string. Speech Recognition consists of 2 processes, i) removal of background noise (background noise is generated due to the stressful noise environment) and ii) phoneme separation word by word (also involves phoneme recognition). In real time situation, sound signals consist of both noises (target noise as well as background noise).

This paper critically evaluates the currently available signal analysis techniques and the modeling of phonemes, as applied to isolated and context-independent phoneme recognition. The proposed methodology introduces the technique of determining the pure speech-signal in a noisy environment (without background noise) and phonemes-isolation word by word using some clustering approach. With the use of proposed methodology, high accuracy of background noise-isolation (obtaining clean speech-signal without background noise) and high accuracy of phoneme isolation from clean speech-signal have been achieved which can be qualitatively compared to previous research done on continuous phoneme recognition. Performance evaluation also shows the improvement to achieve the speech recognition in a stressful noise situation and better quality of phoneme separation process.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Furuichi, C., Aizawa, K., Inoue, K.: Speech recognition using stochastic phonemic segment model based on phoneme segmentation, Faculty of Engineering. Toin University of Yokohama, 1614 Kurogane, Midori, Yokohama, Japan

    Google Scholar 

  2. Engelbrecht, H.A., du Preez, J.A.: The Interplay of Signal Analysis and Phoneme Modelling Techniques on Phoneme Recognition. Telecommunications and Digital Signal Processing Group, Department of Electronic Engineering. University of Stellenbosch, South Africa

    Google Scholar 

  3. Feng, L., Hansen, L.K.: Phonemes as Short Time Cognitive Components. In: Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2006, May 14-19, vol. 5 (2006)

    Google Scholar 

  4. Shirai, K., Hosaka, N., Kitagawa, E.: Speaker Adaptive Phoneme Recognition by Multi-level Clustering Based on Mutual Information Criterion, Department of Electrical Engineering. Waseda University, 3-4-1 Ohkubo, Shinjyuku - ku, Tokyo 169, Japan

    Google Scholar 

  5. Hansen, John, H.L., Cairns, D.A.: Source Generator Based Real-time Recognition of Speech in Noisy stressful and Lombard Effect Environments, Robust speech processing laboratory, Department of Electrical Engineering. Duke University, Durham, North Caroline, USA

    Google Scholar 

  6. Frahling, G.A., Sohler, C.: A fast k-means implementation using coresets. In: Proceedings of the Twenty-Second Annual Symposium on Computational Geometry, SCG 2006, Sedona, Arizona, USA, June 5-7, pp. 135–143. ACM, NewYork (2006), http://doi.acm.org/10.1145/1137856.1137879

    Chapter  Google Scholar 

  7. Johnstone, A., Altmann, G.: Automated speech recognition: a framework for research. In: Proceedings of the Second Conference on European Chapter of the Association For Computational Linguistics, European Chapter Meeting of the ACL, Geneva, Switzerland, March 27-29, pp. 239–243. Association for Computational Linguistics, Morristown (1985), http://dx.doi.org/10.3115/976931.976966

    Chapter  Google Scholar 

  8. Hincks, R.: Using Speech Recognition to Evaluate skills in spoken English, Department of Speech, Music and Hearing, KTH

    Google Scholar 

  9. Kashima, H., Hu, J., Ray, B., Singh, M.: K-means clustering of proportional data using L1 distance. In: 19th International Conference on Pattern Recognition, ICPR 2008, December 8-11, pp. 1–4 (2008)

    Google Scholar 

  10. Digalakis, V., Ostendorf, M., Rohlicek, J.R.: Improvements in the stochastic segment model for Phoneme recognition. In: Proceedings of the Workshop on Speech and Natural Language, Human Language Technology Conference, Cape Cod, Massachusetts, October 15 - 18, pp. 332–338. Association for Computational Linguistics, Morristown (1989), http://dx.doi.org/10.3115/1075434.1075491

    Chapter  Google Scholar 

  11. Hincks, R.: Speech technologies for pronunciation feedback and evaluation. ReCALL 15(1), 3–20 (2003), http://dx.doi.org/10.1017/S0958344003000211

    Article  Google Scholar 

  12. De Liang, W., Brown, G.J.: Computational Auditory Scene Analysis: Principles, Algorithms, and Applications. Wiley-IEEE Press (2006), ISBN: 978-0-471-74109-1

    Google Scholar 

  13. Hyvärinen, A., Karhunen, J., Oja, E.: Independent Component Analysis, PG 145-164. In: Adaptive and Learning Systems for Signal Processing, Communications, and Control, Nerural Networks Research Center, Helsinki. University of Technology, Finland (2002), http://dx.doi.org/10.1002/0471221317.ch7

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2010 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Tak, G.K., Bhargava, V. (2010). Clustering Approach in Speech Phoneme Recognition Based on Statistical Analysis. In: Meghanathan, N., Boumerdassi, S., Chaki, N., Nagamalai, D. (eds) Recent Trends in Network Security and Applications. CNSA 2010. Communications in Computer and Information Science, vol 89. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-14478-3_48

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-14478-3_48

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-14477-6

  • Online ISBN: 978-3-642-14478-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics