Search for speaker identity in historical oral archives

Silovsky, Jan; Nouza, Jan; Kucharova, Michaela

doi:10.1007/s11042-014-2067-2

Search for speaker identity in historical oral archives

Published: 06 July 2014

Volume 75, pages 3767–3786, (2016)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

Jan Silovsky¹,
Jan Nouza¹ &
Michaela Kucharova¹

273 Accesses
1 Citation
Explore all metrics

Abstract

We present our ongoing research focused on speaker recognition in historical oral archives. This research is part of our long-term effort aimed at enabling versatile access to the archive of the Czech Radio (CRo). Based on a manually annotated partition of the archive, we compiled a database covering a time span of more than 30 years to carry out our experimental study. Hence we were able to investigate the impact of various aspects that make it challenging to process historical data. We show the shift of scores for target (genuine) speaker trials introduced by the aging effect, the value of the signal-to-noise ratio or by the variable amount of the enrollment and test data. Scores for speaker detection trials were assessed by a system based on the i-vector paradigm and probabilistic linear discriminative analysis. We also assessed the performance of this system using an evaluation database containing contemporary recordings collected over a time span of approximately 4 years. Although using state-of-the-art techniques, capable of dealing with nuisance inter-session variability, we demonstrate remarkable degradation in the performance of the system in the evaluation containing historical data compared to the one containing contemporary data only. Specifically, the Equal Error Rate (EER) of the system rose to 8.27 % from 1.93 %. The revealed difference thus exemplifies that compensation techniques need to be employed to cope with additional variability introduced in the historical data by various sources.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Efficient speaker identification using spectral entropy

Article 02 January 2019

Intra-Speaker Variability Assessment for Speaker Recognition in Degraded Conditions: A Case of African Tone Languages

Human Speaker Recognition Based Database Method

Notes

The original name of the company was the Radiojournal company.
http://www.nist.gov/itl/iad/mig/sre.cfm
In [19], the nuisance variability is expressed as sum of an intra-speaker variability confined to a lower-dimensional subspace and a residual noise which is assumed to be Gaussian with a diagonal covariance matrix. A full covariance matrix used in our case is thus simply a generalization of the model.
We used relevance factor of 16.0 in our experiments.
We used the Bosaris toolkit available at https://sites.google.com/site/bosaristoolkit/ to plot our DET curves.
Please note that the results presented in [23] and in this work are not directly comparable. In [23] we used the test database in a two-fold cross-validation setup with one fold used for calibration training and the second for testing, and vice versa. Here we pooled all the test data together. Furthermore, different development data sets were used in [23] and in this work. In the former study, the available data was much more limited, particularly for estimation of intra-speaker variability, requiring recordings from multiple sessions per speaker.
Let us stress that we strictly distinguished between different excerpts and different sessions. Hence, no model was trained for a speaker having multiple excerpts available but they were all drawn from a single session.
The linear regression fit curves displayed in the figure are not intended to represent a true dependency but just its trend.

References

Arthur D, Vassilvitskii S (2007) K-means++: the advantages of careful seeding. In: Proceedings of the eighteenth annual ACM-SIAM symposium on discrete algorithms, SODA ’07. Society for Industrial and Applied Mathematics, Philadelphia, PA, USA, pp 1027–1035
MATH Google Scholar
Boháč M, Blavka K (2013) Text-to-speech alignment for imperfect transcriptions. In: Habernal I, Matoušek V (eds) Text, speech, and dialogue, lecture notes in computer science, vol 8082. Springer, Berlin Heidelberg, pp 536–543
Google Scholar
Brümmer N (2009) EM for JFA. Tech. rep., South Africa. Available at https://sites.google.com/site/nikobrummer/EMforJFA.pdf?attredirects=0
Brummer N, Burget L, Kenny P, Matějka P, de EV, Karafiát M, Kockmann M, Glembek O, Plchot O, Baum D, Senoussauoi M (2010) ABC system description for NIST SRE 2010. In: Proc. NIST 2010 speaker recognition evaluation. Brno University of Technology, pp 1–20
Chaloupka J, Nouza J, Červa P, Málek J (2013) Downdating lexicon and language model for automatic transcription of Czech historical spoken documents. In: Habernal I, Matoušek V (eds) Text, speech, and dialogue, lecture notes in computer science, vol 8082. Springer, Berlin Heidelberg, pp 201–208
Google Scholar
Dehak N, Kenny P, Dehak R, Dumouchel P, Ouellet P (2011) Front-end factor analysis for speaker verification. IEEE Trans Audio Speech Lang Process 19 (4):788–798
Article Google Scholar
Doddington GR, Przybocki MA, Martin AF, Reynolds DA (2000) The NIST speaker recognition evaluation - overview, methodology, systems, results, perspective. Speech Commun 31 (2–3):225–254
Article Google Scholar
Ferrer L, Graciarena M, Zymnis A, Shriberg E (2008) System combination using auxiliary information for speaker verification. In: IEEE international conference on acoustics, speech and signal processing - ICASSP 2008, Las Vegas, pp 4853–4856
Garcia-Romero D, Espy-Wilson CY (2011) Analysis of i-vector length normalization in speaker recognition systems. In: Interspeech’11. Florence, pp 249–252
Kanagasundaram A, Dean D, Gonzalez-Dominguez J, Sridharan S, Ramos D, Gonzalez-Rodriguez J (2013) Improving the PLDA based speaker verification in limited microphone data conditions. In: Interspeech 2013. International Speech communication association (ISCA ), Lyon, pp 3674–3678
Kelly F, Drygajlo A, Harte N (2012) Speaker verification with long-term ageing data. In: 2012 5th IAPR international conference on biometrics (ICB), pp 478–483
Kelly F, Harte N (2011) Effects of long-term ageing on speaker verification. In: Vielhauer C, Dittmann J, Drygajlo A, Juul N, Fairhurst M (eds) Biometrics and ID management, lecture notes in computer science, vol 6583. Springer, Berlin Heidelberg, pp 113–124
Google Scholar
Kenny P, Boulianne G, Dumouchel P (2005) Eigenvoice modeling with sparse training data. IEEE Trans Process 13 (3):345–354
Google Scholar
Kenny P, Stafylakis T, Ouellet P, Alam M, Dumouchel P (2013) PLDA for speaker verification with utterances of arbitrary duration. In: 2013 IEEE international conference on acoustics, speech and signal processing (ICASSP), pp 7649–7653
Kim C, Stern RM (2008) Robust signal-to-noise ratio estimation based on waveform amplitude distribution analysis. In: INTERSPEECH. ISCA, pp 2598–2601
Kinnunen T, Li H (2010) An overview of text-independent speaker recognition: from features to supervectors. Speech Commun 52:12–40
Article Google Scholar
Matveev Y (2013) The problem of voice template aging in speaker recognition systems. In: železný M, Habernal I, Ronzhin A (eds) Speech and computer, lecture notes in computer science, vol 8113, pp 345–353. Springer International Publishing
Nouza J, Blavka K, Bohac M, Cerva P, Zdansky J, Silovsky J, Prazak J (2012) Voice technology to enable sophisticated access to historical audio archive of the czech radio. In: Grana C, Cucchiara R (eds) Multimedia for cultural heritage, communications in computer and information science, vol 247. Springer, Berlin Heidelberg, pp 27–38
Google Scholar
Prince SJD, Elder JH (2007) Probabilistic linear discriminant analysis for inferences about identity. In: Proceedings ICCV 2007, Rio de Janeiro, Brazil, pp 1–8
Rajan P, Tomi Kinnunen VH (2013) Effect of multicondition training on i-vector PLDA configurations for speaker recognition. In: Interspeech’13. Lyon, pp 3694–3697
Reynolds DA, Quatieri TF, Dunn RB (2000) Speaker verification using Adapted Gaussian mixture models. Digit Signal Process 1–3:19–41
Article Google Scholar
Sarkar AK, Matrouf D, Bousquet P-M, Bonastre J-F (2012) Study of the Effect of I-vector Modeling on Short and Mismatch Utterance Duration for Speaker Verification. In: INTERSPEECH’12. ISCA, Portland, OR, USA
Silovsky J, Cerva P, Zdansky J (2009) Comparison of generative and discriminative approaches for speaker recognition with limited data. Radioengineering 18 (3):307–316
Google Scholar
Silovsky J, Zdansky J, Nouza J, Cerva P, Prazak J (2012) Incorporation of the ASR output in speaker segmentation and clustering within the task of speaker diarization of broadcast streams. In: MMSP’12. Banff, pp 118–123
van Leeuwen DA, Brummer N (2007) An introduction to application-independent evaluation of speaker recognition systems. Lect Notes Comput Sci 4343/2007:330–353
Article Google Scholar

Download references

Acknowledgments

This research work was supported by the Czech Ministry of Culture (project no. DF11P01OVV013 in program NAKI).

Author information

Authors and Affiliations

Technical University of Liberec, Studentska 1402/2, 461 17, Liberec, Czech Republic
Jan Silovsky, Jan Nouza & Michaela Kucharova

Authors

Jan Silovsky
View author publications
You can also search for this author in PubMed Google Scholar
Jan Nouza
View author publications
You can also search for this author in PubMed Google Scholar
Michaela Kucharova
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jan Silovsky.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Silovsky, J., Nouza, J. & Kucharova, M. Search for speaker identity in historical oral archives. Multimed Tools Appl 75, 3767–3786 (2016). https://doi.org/10.1007/s11042-014-2067-2

Download citation

Received: 05 December 2013
Revised: 03 April 2014
Accepted: 29 April 2014
Published: 06 July 2014
Issue Date: April 2016
DOI: https://doi.org/10.1007/s11042-014-2067-2

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Search for speaker identity in historical oral archives

Abstract

Access this article

Similar content being viewed by others

Efficient speaker identification using spectral entropy

Intra-Speaker Variability Assessment for Speaker Recognition in Degraded Conditions: A Case of African Tone Languages

Human Speaker Recognition Based Database Method

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Search for speaker identity in historical oral archives

Abstract

Access this article

Similar content being viewed by others

Efficient speaker identification using spectral entropy

Intra-Speaker Variability Assessment for Speaker Recognition in Degraded Conditions: A Case of African Tone Languages

Human Speaker Recognition Based Database Method

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation