An efficient text-independent speaker verification for short utterance data from Mobile devices

Arora, Sanghamitra V.; Vig, Rekha

doi:10.1007/s11042-019-08196-7

An efficient text-independent speaker verification for short utterance data from Mobile devices

Published: 05 December 2019

Volume 79, pages 3049–3074, (2020)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

Sanghamitra V. Arora¹ &
Rekha Vig²

301 Accesses
1 Citation
Explore all metrics

Abstract

Speaker verification is the process used to recognize a speaker from his/her voice characteristics by extracting the features. Speaker verification with text-independent data is a process of verifying the speaker identity without limitation in the speech content. In the speaker verification process, long utterances are normally used but it contains lot of silences leading to complexity and more disruptions. So, we are performing speaker verification method based on short utterance data. The main objective of the research work is to extract, characterize, and recognize the information about speaker identity. Our proposed work contains four stages: 1) utterance partitioning, 2) feature extraction, 3) feature selection, and 4) classification. In our proposed model, an utterance partitioning approach is used to shorten the full-length speech into numerous short-length utterances before the pre-processing stage. In the feature extraction phase, noise removal is carried out with pre-emphasis filter in the pre-processing step. The Mel Advanced Hilbert-Huang Cepstral Coefficients (MAHCC) technique is used for extracting the features from the given input speech signal. Furthermore, the feature selection process is done with the help of a Crow Search Algorithm (CSA) by ranking the given feature set to obtain optimal features for classification. In the classification stage, the Deep Hidden Markov Model (DHMM) method is introduced to classify the features for speaker verification with discriminative pre-training process. Thus, the proposed approach provides an accurate classification and the implementation results show that the performance of the proposed method is better than the existing methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 2

Automatic speech recognition: a survey

Article 10 November 2020

A comprehensive survey on automatic speech recognition using neural networks

Article 15 August 2023

A deep learning approaches in text-to-speech system: a systematic review and recent research perspective

Article 29 September 2022

References

Al-Ali AKH, Dean D, Senadji B, Chandran V, Naik G (2017) Enhanced forensic speaker verification using a combination of DWT and MFCC feature warping in the presence of noise and reverberation conditions. IEEE Access
Chowdhury MFR, Selouani SA, O'Shaughnessy D (2010) Text-independent distributed speaker identification and verification using GMM-UBM speaker models for mobile communications. In Information sciences signal processing and their applications (ISSPA), 2010 10th international conference on (pp. 57–60). IEEE
Dehak KN, Dehak PJ, Dumouchel R, Ouellet P (2011) Front-end factor analysis for speaker verification. IEEE Trans Audio Speech Lang Process 19(4):788–798
Article Google Scholar
Deng S, Huang L, Taheri J, Yin J, Zhou M, Zomaya AY (2017) Mobility-aware service composition in mobile communities. IEEE Transactions on Systems, Man, and Cybernetics: Systems 47(3):555–568
Article Google Scholar
Furui S (1981) Comparison of speaker recognition methods using statistical features and dynamic features. IEEE Trans Acoust Speech Signal Process 29(3):342–350
Article Google Scholar
Furui S (1986) Speaker-independent isolated word recognition using dynamic features of speech spectrum. IEEE Trans Acoust Speech Signal Process 34(1):52–59
Article Google Scholar
Hong C, Yu J, Tao D, Wang M (2014) Image-based three-dimensional human pose recovery by multiview locality-sensitive sparse retrieval. IEEE Trans Ind Electron 62(6):3742–3751
Google Scholar
Hong C, Yu J, Wan J, Tao D, Wang M (2015) Multimodal deep autoencoder for human pose recovery. IEEE Trans Image Process 24(12):5659–5670
Article MathSciNet Google Scholar
Hori T, Chen Z, Erdogan H, Hershey JR, Le Roux J, Mitra V, Watanabe S (2017) Multi-microphone speech recognition integrating beamforming, robust feature extraction, and advanced DNN/RNN backend. Comput Speech Lang 46:401–418
Article Google Scholar
Khodabakhsh A, Mohammadi A, Demiroglu C (2017) Spoofing voice verification systems with statistical speech synthesis using limited adaptation data. Comput Speech Lang 42:20–37
Article Google Scholar
Kounoudes A, Kekatos V, Mavromoustakos S (2006) Voice biometric authentication for enhancing internet service security. In Information and communication technologies, 2006. ICTTA'06. 2nd (Vol. 1, pp. 1020–1025). IEEE
Krothapalli SR, Koolagudi SG (2013) Characterization and recognition of emotions from speech using excitation source information. International journal of speech technology 16(2):181–201
Article Google Scholar
Kua JMK, Epps J, Ambikairajah E (2013) I-vector with sparse representation classification for speaker verification. Speech Comm 55(5):707–720
Article Google Scholar
Larcher A, Lee KA, Ma B, Li H (2014) Text-dependent speaker verification: classifiers, databases and RSR2015. Speech Comm 60:56–77
Article Google Scholar
Lei Y, Scheffer N, Ferrer L, McLaren M (2014) A novel scheme for speaker recognition using a phonetically-aware deep neural network. In 2014 IEEE international conference on acoustics, speech and signal processing (ICASSP) (pp. 1695–1699). IEEE
Li L, Zhao Y, Jiang D, Zhang Y, Wang F, Gonzalez I, Sahli H (2013) Hybrid Deep Neural Network--Hidden Markov Model (DNN-HMM) Based Speech Emotion Recognition. In 2013 Humaine Association Conference on Affective Computing and Intelligent Interaction (pp. 312–317). IEEE
Liu Z, Wu Z, Li T, Li J, Shen C (2018) GMM and CNN hybrid method for short utterance speaker recognition. IEEE Transactions on Industrial Informatics 14(7):3244–3252
Article Google Scholar
Ma J, Sethu V, Ambikairajah E, Lee KA (2018) Generalized variability model for speaker verification. IEEE Signal Processing Letters 25(12):1775–1779
Article Google Scholar
Misra A, Hansen JH (2018) Maximum-likelihood linear transformation for unsupervised domain adaptation in speaker verification. IEEE/ACM Transactions on Audio, Speech and Language Processing (TASLP) 26(9):1549–1558
Article Google Scholar
Narendra NP, Airaksinen M, Story B, Alku P (2019) Estimation of the glottal source from coded telephone speech using deep neural networks. Speech Comm 106:95–104
Article Google Scholar
Ozaydin S (2017) Design of a text independent speaker recognition system. In 2017 international conference on electrical and computing technologies and applications (ICECTA) (pp. 1–5). IEEE
Rahulamathavan Y, Sutharsini KR, Ray IG, Lu R, Rajarajan M (2019) Privacy-preserving iVector-based speaker verification. IEEE/ACM Transactions on Audio, Speech and Language Processing (TASLP), 27(3), 496–506
Article Google Scholar
Raitio T, Suni A, Vainio M, Alku P (2014) Synthesis and perception of breathy, normal, and lombard speech in the presence of noise. Comput Speech Lang 28(2):648–664
Article Google Scholar
Sarkar S, Rao KS (2014) Stochastic feature compensation methods for speaker verification in noisy environments. Appl Soft Comput 19:198–214
Article Google Scholar
Shankar S, Udupi VR (2016) Recognition of faces–an optimized algorithmic chain. Procedia Computer Science 89:597–606
Article Google Scholar
Shifani HJM, Kannan P (2017) Design and analysis of sub-band coding of speech signal under Noisy condition using. Multi rate Signal Processing 4(4):1046–1065
Google Scholar
Sigtia S, Stark AM, Krstulović S, Plumbley MD (2016) Automatic environmental sound recognition: performance versus computational cost. IEEE/ACM Transactions on Audio, Speech, and Language Processing 24(11):2096–2107
Article Google Scholar
Soong FK, Rosenberg AE, Juang BH, Rabiner LR (1987) Report: a vector quantization approach to speaker recognition. AT&T technical journal 66(2):14–26
Article Google Scholar
Sreekumar KT, George KK, Arunraj K, Kumar CS (2014). Spectral matching based voice activity detector for improved speaker recognition. In international conference on power signals control and computations (EPSCICON). IEEE (pp. 1–4).
Sun L, Xie K, Gu T, Chen J, Yang Z (2019) Joint dictionary learning using a new optimization method for single-channel blind source separation. Speech Comm 106:85–94
Article Google Scholar
Tan Z, Mak MW, Mak BKW (2018) DNN-based score calibration with multitask learning for noise robust speaker verification. IEEE/ACM Transactions on Audio, Speech, and Language Processing 26(4):700–712
Article Google Scholar
Tan Z, Mak MW, Mak BKW, Zhu Y (2018) Denoised senone i-vectors for robust speaker verification. IEEE/ACM Transactions on Audio, Speech, and Language Processing 26(4):820–830
Article Google Scholar
Yao Q, Mak MW (2018) SNR-invariant multitask deep neural networks for robust speaker verification. IEEE Signal Processing Letters 25(11):1670–1674
Article Google Scholar
Yao S, Zhou R, Zhang P, Yan Y (2018) Discriminatively learned network for i-vector based speaker recognition. Electron Lett 54(22):1302–1304
Article Google Scholar
Yu J, Rui Y, Tao D (2014) Click prediction for web image reranking using multimodal sparse coding. IEEE Trans Image Process 23(5):2019–2032
Article MathSciNet Google Scholar
Yu J, Tao D, Wang M, Rui Y (2014) Learning to rank using user clicks and visual features for image retrieval. IEEE transactions on cybernetics 45(4):767–779
Article Google Scholar
Yu J, Yang X, Gao F, Tao D (2016) Deep multimodal distance metric learning using click constraints for image ranking. IEEE transactions on cybernetics 47(12):4014–4024
Article Google Scholar
Yu H, Tan ZH, Ma Z, Martin R, Guo J (2017) Spoofing detection in automatic speaker verification systems using DNN classifiers and dynamic acoustic features. IEEE transactions on neural networks and learning systems:99), 1–99),12
Zhang C, Koishida K, Hansen JH (2018) Text-independent speaker verification based on triplet convolutional neural network embeddings. IEEE/ACM Transactions on Audio, Speech and Language Processing (TASLP) 26(9):1633–1644
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of Electronics and Communication Engineering, DCE, Gurgaon, India
Sanghamitra V. Arora
Department of Electrical Electronics and Communication Engineering (EECE), NCU, Gurgaon, India
Rekha Vig

Authors

Sanghamitra V. Arora
View author publications
You can also search for this author in PubMed Google Scholar
Rekha Vig
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Sanghamitra V. Arora.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Arora, S.V., Vig, R. An efficient text-independent speaker verification for short utterance data from Mobile devices. Multimed Tools Appl 79, 3049–3074 (2020). https://doi.org/10.1007/s11042-019-08196-7

Download citation

Received: 22 November 2018
Revised: 06 July 2019
Accepted: 06 September 2019
Published: 05 December 2019
Issue Date: January 2020
DOI: https://doi.org/10.1007/s11042-019-08196-7

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

An efficient text-independent speaker verification for short utterance data from Mobile devices

Abstract

Access this article

Similar content being viewed by others

Automatic speech recognition: a survey

A comprehensive survey on automatic speech recognition using neural networks

A deep learning approaches in text-to-speech system: a systematic review and recent research perspective

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

An efficient text-independent speaker verification for short utterance data from Mobile devices

Abstract

Access this article

Similar content being viewed by others

Automatic speech recognition: a survey

A comprehensive survey on automatic speech recognition using neural networks

A deep learning approaches in text-to-speech system: a systematic review and recent research perspective

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation