Abstract
In this paper, we analyze the application of the sparse representation of frames of the speech signal for the speaker verification. It is lately shown that Sparse Representation Classification (SRC), is promising for speaker recognition. We bring evidence that the frame level sparse representation classification resembles process of speech recognition in human sensory system. Since the recognition of different voices (noises) helps individuals to immediately distinguish between the noise and the original speech signal, a noise aware system was designed. As a principal in the sparse representation, we argued the mutual coherence of the dictionary columns, called dictionary atoms, which is not efficiently considered in the already published SRC base speaker verification researches. To suppress the mutual coherence, we use a dictionary learning method to construct a dictionary with effective atoms. Our proposed Frame Level Sparse Representation Classification (FSRC), provides new insights to the SRC based speaker verification. We demonstrate that, in the SRC based speaker verification, using a dictionary whose atoms are orthogonal can be more extensible than a dictionary whose atoms are highly correlated, and that the mutual coherence suppression is even more effective than imposing strong orthogonality on the dictionary atoms. We consider the performance of state-of-the-art speaker recognition systems and the proposed method on NIST SRE 2004 data. Experimental results show that in comparison to baseline methods, when we have enough amount of information in the registration of targets, the proposed method improves the performance of speaker verification system in noisy conditions.
Similar content being viewed by others
References
Brümmer N (2010) Measuring, refining and calibrating speaker and language information extracted from speech, PhD Dissertation, Stellenbosch University, 2010
Brümmer N, Swart A, van Leeuwen D (2014) A comparison of linear and non-linear calibrations for speaker recognition, arXiv:1402.2447
Campbell WM, Sturim DE, Reynolds DA (2006) Support vector machines using GMM supervectors for speaker verification. IEEE Signal Process Lett 13(5):308–311
Chi YT, Ali M, Rajwade A, Ho J (2013) Block and group regularized sparse modeling for dictionary learning. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2013, pp. 377–382
Dehak N, Kenny PJ, Dehak R, Dumouchel P, Ouellet P (2011) Front-end factor analysis for speaker verification. IEEE Trans Audio Speech Lang Process 19(4):788–798
Harandi M, Sanderson C, Shen C, Lovell BC (2013) Dictionary learning and sparse coding on grassmann manifolds: an extrinsic solution. In: Proceedings of the IEEE International Conference on Computer Vision (ICCV), pp. 3120–3127
Haris BC, Sinha R (2012) Sparse representation over learned and discriminatively learned dictionaries for speaker verification. In: Acoustics, Speech and Signal Processing (ICASSP), no. 3, pp. 4785–4788
Hautamäki V, Kinnunen T, Kärkkäinen I, Saastamoinen J, Tuononen M, Fränti P (2008) Maximum a posteriori adaptation of the centroid model for speaker verification. IEEE Signal Process Lett 15:162–165
Hautamӓki V, Kinnunen T, Sedlák F, Lee KA, Ma B, Li H (2013) Sparse classifier fusion for speaker verification. IEEE Trans Audio Speech Lang Process 21(8):1622–1631
Hurmalainen A, Saeidi R, Virtanen T (2015) Noise robust speaker recognition with convolutive sparse coding. In: Proceedings of the Annual Conference of the International Speech Communication Association, Interspeech, pp. 244–248
Kim C, Stern RM (2012) Power-Normalized Cepstral Coefficients (PNCC) for robust speech recognition, In: IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) , Kyoto, pp. 4101–4104, 2012
Kua JMK, Epps J, Ambikairajah E (2013) i-Vector with sparse representation classification for speaker verification. Speech Comm 55(5):707–720
Labusch K, Barth E, Martinetz T (2009) Sparse coding neural gas: learning of overcomplete data representations. Neurocomputing 72(7–9):1547–1555
Li M, Zhang X, Yan Y, Narayanan SS (2011) Speaker verification using sparse representations on total variability i-vectors. In: Interspeech, pp. 2729–2732
Nakagawa S, Wang L, Ohtsuka S (2012) Speaker identification and verification by combining MFCC and phase information. IEEE Trans Audio Speech Lang Process 20(4):1085–1095
Naseem I, Togneri R, Bennamoun M (2010) Sparse representation for speaker identification 2010 20th International Conference Pattern Recognition, pp. 4460–4463, Aug. 2010
Olshausen BA, Field DJ (2004) Sparse coding of sensory inputs. Curr Oppin Neurobiol 14:481–487
Padmanabhan R, Hari S, Parthasarathi K, Murthy HA (2009) Robustness of phase based features for speaker recognition. In: Proc. Interspeech, pp. 2355–235
Reynolds DA, Quatieri TF, Dunn RB (2000) Speaker verification using adapted Gaussian mixture models. Digital Signal Process 10(1–3):19–41
Reynolds DA, Rose RC (1995) Robust text-independent speaker identification using Gaussian mixture speaker models. IEEE Trans Speech Audio Process 3(1):72–83
Roach P (1991) English phonetics and phonology. Cambridge University Press
Sadjadi SO, Slaney M, Heck L (2013) MSR Identity Toolbox v1. 0: A matlab toolbox for speaker-recognition research. Speech Lang Process Tech Comm Newsl 1(4)
Saeidi R, Hurmalainen A, Virtanen T, Van Leeuwen DA (2012) Exemplar-based sparse representation and sparse discrimination for noise robust speaker identification. In: Odyssey speaker and language recognition workshop, pp. 248–255
Solomonoff A, Quillen C, Campbell WM (2004) Channel compensation for SVM speaker recognition. Odyssey 4:57–62
The evaluation plan of NIST 2004 Speaker Recognition evaluation campaign. [Online]. Available: http://www.nist.gov/speech/tests/spk/2004/SRE-04 evalplan-v1a.pdf
Tzagkarakis C, Mouchtaris A (2013) Sparsity based robust speaker identification using a discriminative dictionary learning approach, In21st European Signal Processing Conference (EUSIPCO 2013), (pp. 1–5). IEEE, 9 Sep. 2013
Wang L, Minami K, Yamamoto K, Nakagawa S (2010) Speaker recognition by combining MFCC and phase information in noisy conditions. IEICE Trans Inf Syst E93–D(9):2397–2406
Yaghoobi M, Blumensath T, Davies ME (2009) Dictionary learning for sparse approximations with the majorization method. IEEE Trans Signal Process 57(6):2178–2191
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Hasheminejad, M., Farsi, H. Frame level sparse representation classification for speaker verification. Multimed Tools Appl 76, 21211–21224 (2017). https://doi.org/10.1007/s11042-016-4071-1
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-016-4071-1