Frame level sparse representation classification for speaker verification

Hasheminejad, Mohammad; Farsi, Hassan

doi:10.1007/s11042-016-4071-1

Frame level sparse representation classification for speaker verification

Published: 24 October 2016

Volume 76, pages 21211–21224, (2017)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

Mohammad Hasheminejad¹ &
Hassan Farsi¹

200 Accesses
6 Citations
Explore all metrics

Abstract

In this paper, we analyze the application of the sparse representation of frames of the speech signal for the speaker verification. It is lately shown that Sparse Representation Classification (SRC), is promising for speaker recognition. We bring evidence that the frame level sparse representation classification resembles process of speech recognition in human sensory system. Since the recognition of different voices (noises) helps individuals to immediately distinguish between the noise and the original speech signal, a noise aware system was designed. As a principal in the sparse representation, we argued the mutual coherence of the dictionary columns, called dictionary atoms, which is not efficiently considered in the already published SRC base speaker verification researches. To suppress the mutual coherence, we use a dictionary learning method to construct a dictionary with effective atoms. Our proposed Frame Level Sparse Representation Classification (FSRC), provides new insights to the SRC based speaker verification. We demonstrate that, in the SRC based speaker verification, using a dictionary whose atoms are orthogonal can be more extensible than a dictionary whose atoms are highly correlated, and that the mutual coherence suppression is even more effective than imposing strong orthogonality on the dictionary atoms. We consider the performance of state-of-the-art speaker recognition systems and the proposed method on NIST SRE 2004 data. Experimental results show that in comparison to baseline methods, when we have enough amount of information in the registration of targets, the proposed method improves the performance of speaker verification system in noisy conditions.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

The Optimized Dictionary based Robust Speaker Recognition

Article 11 March 2016

Enhanced speaker verification using an adaptive multiple low-rank representation based on the modified adaptive Gaussian mixture model framework

Article 20 July 2017

A sub-band-based feature reconstruction approach for robust speaker recognition

Article Open access 21 October 2014

References

Brümmer N (2010) Measuring, refining and calibrating speaker and language information extracted from speech, PhD Dissertation, Stellenbosch University, 2010
Brümmer N, Swart A, van Leeuwen D (2014) A comparison of linear and non-linear calibrations for speaker recognition, arXiv:1402.2447
Campbell WM, Sturim DE, Reynolds DA (2006) Support vector machines using GMM supervectors for speaker verification. IEEE Signal Process Lett 13(5):308–311
Article Google Scholar
Chi YT, Ali M, Rajwade A, Ho J (2013) Block and group regularized sparse modeling for dictionary learning. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2013, pp. 377–382
Dehak N, Kenny PJ, Dehak R, Dumouchel P, Ouellet P (2011) Front-end factor analysis for speaker verification. IEEE Trans Audio Speech Lang Process 19(4):788–798
Article Google Scholar
Harandi M, Sanderson C, Shen C, Lovell BC (2013) Dictionary learning and sparse coding on grassmann manifolds: an extrinsic solution. In: Proceedings of the IEEE International Conference on Computer Vision (ICCV), pp. 3120–3127
Haris BC, Sinha R (2012) Sparse representation over learned and discriminatively learned dictionaries for speaker verification. In: Acoustics, Speech and Signal Processing (ICASSP), no. 3, pp. 4785–4788
Hautamäki V, Kinnunen T, Kärkkäinen I, Saastamoinen J, Tuononen M, Fränti P (2008) Maximum a posteriori adaptation of the centroid model for speaker verification. IEEE Signal Process Lett 15:162–165
Article Google Scholar
Hautamӓki V, Kinnunen T, Sedlák F, Lee KA, Ma B, Li H (2013) Sparse classifier fusion for speaker verification. IEEE Trans Audio Speech Lang Process 21(8):1622–1631
Article Google Scholar
Hurmalainen A, Saeidi R, Virtanen T (2015) Noise robust speaker recognition with convolutive sparse coding. In: Proceedings of the Annual Conference of the International Speech Communication Association, Interspeech, pp. 244–248
Kim C, Stern RM (2012) Power-Normalized Cepstral Coefficients (PNCC) for robust speech recognition, In: IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) , Kyoto, pp. 4101–4104, 2012
Kua JMK, Epps J, Ambikairajah E (2013) i-Vector with sparse representation classification for speaker verification. Speech Comm 55(5):707–720
Article Google Scholar
Labusch K, Barth E, Martinetz T (2009) Sparse coding neural gas: learning of overcomplete data representations. Neurocomputing 72(7–9):1547–1555
Article Google Scholar
Li M, Zhang X, Yan Y, Narayanan SS (2011) Speaker verification using sparse representations on total variability i-vectors. In: Interspeech, pp. 2729–2732
Nakagawa S, Wang L, Ohtsuka S (2012) Speaker identification and verification by combining MFCC and phase information. IEEE Trans Audio Speech Lang Process 20(4):1085–1095
Article Google Scholar
Naseem I, Togneri R, Bennamoun M (2010) Sparse representation for speaker identification 2010 20th International Conference Pattern Recognition, pp. 4460–4463, Aug. 2010
Olshausen BA, Field DJ (2004) Sparse coding of sensory inputs. Curr Oppin Neurobiol 14:481–487
Article Google Scholar
Padmanabhan R, Hari S, Parthasarathi K, Murthy HA (2009) Robustness of phase based features for speaker recognition. In: Proc. Interspeech, pp. 2355–235
Reynolds DA, Quatieri TF, Dunn RB (2000) Speaker verification using adapted Gaussian mixture models. Digital Signal Process 10(1–3):19–41
Article Google Scholar
Reynolds DA, Rose RC (1995) Robust text-independent speaker identification using Gaussian mixture speaker models. IEEE Trans Speech Audio Process 3(1):72–83
Article Google Scholar
Roach P (1991) English phonetics and phonology. Cambridge University Press
Sadjadi SO, Slaney M, Heck L (2013) MSR Identity Toolbox v1. 0: A matlab toolbox for speaker-recognition research. Speech Lang Process Tech Comm Newsl 1(4)
Saeidi R, Hurmalainen A, Virtanen T, Van Leeuwen DA (2012) Exemplar-based sparse representation and sparse discrimination for noise robust speaker identification. In: Odyssey speaker and language recognition workshop, pp. 248–255
Solomonoff A, Quillen C, Campbell WM (2004) Channel compensation for SVM speaker recognition. Odyssey 4:57–62
Google Scholar
The evaluation plan of NIST 2004 Speaker Recognition evaluation campaign. [Online]. Available: http://www.nist.gov/speech/tests/spk/2004/SRE-04 evalplan-v1a.pdf
Tzagkarakis C, Mouchtaris A (2013) Sparsity based robust speaker identification using a discriminative dictionary learning approach, In21st European Signal Processing Conference (EUSIPCO 2013), (pp. 1–5). IEEE, 9 Sep. 2013
Wang L, Minami K, Yamamoto K, Nakagawa S (2010) Speaker recognition by combining MFCC and phase information in noisy conditions. IEICE Trans Inf Syst E93–D(9):2397–2406
Article Google Scholar
Yaghoobi M, Blumensath T, Davies ME (2009) Dictionary learning for sparse approximations with the majorization method. IEEE Trans Signal Process 57(6):2178–2191
Article MathSciNet Google Scholar

Download references

Author information

Authors and Affiliations

Department of Electrical and Computer Engineering, University of Birjand, Shahid Avini Highway, Birjand, Iran
Mohammad Hasheminejad & Hassan Farsi

Authors

Mohammad Hasheminejad
View author publications
You can also search for this author in PubMed Google Scholar
Hassan Farsi
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Hassan Farsi.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Hasheminejad, M., Farsi, H. Frame level sparse representation classification for speaker verification. Multimed Tools Appl 76, 21211–21224 (2017). https://doi.org/10.1007/s11042-016-4071-1

Download citation

Received: 19 April 2016
Revised: 13 September 2016
Accepted: 13 October 2016
Published: 24 October 2016
Issue Date: October 2017
DOI: https://doi.org/10.1007/s11042-016-4071-1

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Frame level sparse representation classification for speaker verification

Abstract

Access this article

Similar content being viewed by others

The Optimized Dictionary based Robust Speaker Recognition

Enhanced speaker verification using an adaptive multiple low-rank representation based on the modified adaptive Gaussian mixture model framework

A sub-band-based feature reconstruction approach for robust speaker recognition

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Frame level sparse representation classification for speaker verification

Abstract

Access this article

Similar content being viewed by others

The Optimized Dictionary based Robust Speaker Recognition

Enhanced speaker verification using an adaptive multiple low-rank representation based on the modified adaptive Gaussian mixture model framework

A sub-band-based feature reconstruction approach for robust speaker recognition

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation