Skip to main content

A Novel K-Means Voice Activity Detection Algorithm Using Linear Cross Correlation on the Standard Deviation of Linear Predictive Coding

  • Conference paper
  • First Online:
Research and Development in Intelligent Systems XXXII (SGAI 2015)

Abstract

This paper presents a novel Voice Activity Detection (VAD) technique that can be easily applied to on–device isolated word recognition on a mobile device. The main speech features used are the Linear Predictive Coding (LPC) speech features which were correlated using the standard deviation of the signal. The output was further clustered using a modified K-means algorithm. The results presented show a significant improvement to a previous algorithm which was based on the LPC residual signal with an 86.6 % recognition rate as compared to this new technique with a 90 % recognition rate on the same data. This technique was able to achieve up to 97.7 % recognition for female users in some of the experiments. The fast processing time makes it viable for mobile devices.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Wu, B., Wang, K.: Voice activity detection based on auto-correlation function using wavelet transform and teager energy operator. Comput. Linguist. Chin. Lang. Process. 11, 87–100 (2006)

    Google Scholar 

  2. Waheed, K., Weaver, K., Salam, F.M.: A robust algorithm for detecting speech segments using an entropic contrast: circuits and systems. In: 2002. MWSCAS-2002. The 2002 45th Midwest Symposium on IEEE, vol. 3, pp. III-328–III-331 (2002)

    Google Scholar 

  3. Alarifi, A., Alkurtass, I., Al-Salman, A.: Arabic text-dependent speaker verification for mobile devices using artificial neural networks. In: Machine Learning and Applications and Workshops (ICMLA), 2011 10th International Conference on, vol. 2, pp. 350–353, IEEE (2011)

    Google Scholar 

  4. Huang, H., Lin, F.: A speech feature extraction method using complexity measure for voice activity detection in WGN. Speech Commun. 51, 714–723 (2009)

    Google Scholar 

  5. Ghaemmaghami, H., Baker, B.J., Vogt, R.J., Sridharan, S.: Noise robust voice activity detection using features extracted from the time-domain autocorrelation function. In: Proceedings of Interspeech (2010)

    Google Scholar 

  6. Ramırez, J., et al.: Efficient voice activity detection algorithms using long-term speech information. Speech commun. 42.3, 271–287 (2004)

    Google Scholar 

  7. Prasanta Kumar, G., Tsiartas, A., Narayanan, S.: Robust voice activity detection using long-term signal variability. IEEE Trans. Audio Speech Lang. Process. 19.3, 600–613 (2011)

    Google Scholar 

  8. Tashan, T., Allen, T., Nolle, L.: Speaker verification using heterogeneous neural network architecture with linear correlation speech activity detection. Expert Syst. (2013). doi:10.1111/exsy.12030

    Google Scholar 

  9. Rabiner, L.R., Schafer, R.W.: Digital Processing of Speech Signals. (IET, Stevenage 1979)

    Google Scholar 

  10. Mustafa, M.K., Allen, T., Evett, L.: A review of voice activity detection techniques for on-device isolated digit recognition on mobile devices. In: Bramer, M., Petridis, M. (eds.) Research and Development in Intelligent Systems XXXI, (Springer International Publishing, Switzerland 2014)

    Google Scholar 

  11. Smith, S.W.: The Scientist and Engineer’s Guide to Digital Signal Processing. (FreeTech Books, San Diego 2003)

    Google Scholar 

  12. Looney, C.G.: A fuzzy clustering and fuzzy merging algorithm, CS791q Class notes (1999)

    Google Scholar 

  13. Žalik, K.R.: An efficient k′-means clustering algorithm. Pattern Recogn. Lett. 29, 1385–1391 (2008)

    Google Scholar 

  14. CSLU Database.: http://www.cslu.ogi.edu/corpora/isolet/

Download references

Acknowledgements

The authors wish to thank the Petroleum Technology Development Fund (PTDF) for their continued support and sponsorship of this research. Dr. S Mustafa, Aishatu Mustafa and colleagues who helped in conducting experiments.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to M. K. Mustafa .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer International Publishing Switzerland

About this paper

Cite this paper

Mustafa, M.K., Allen, T., Appiah, K. (2015). A Novel K-Means Voice Activity Detection Algorithm Using Linear Cross Correlation on the Standard Deviation of Linear Predictive Coding. In: Bramer, M., Petridis, M. (eds) Research and Development in Intelligent Systems XXXII. SGAI 2015. Springer, Cham. https://doi.org/10.1007/978-3-319-25032-8_24

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-25032-8_24

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-25030-4

  • Online ISBN: 978-3-319-25032-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics