Skip to main content

A Decision Fusion System Across Time and Classifiers for Audio-Visual Person Identification

  • Conference paper
Multimodal Technologies for Perception of Humans (CLEAR 2006)

Abstract

In this paper the person identification system developed at Athens Information Technology is presented. It comprises of an audio-only (speech), a video-only (face) and an audiovisual fusion subsystem. Audio recognition is based on the Gaussian Mixture modeling of the principal components of the Mel-Frequency Cepstral Coefficients of speech. Video recognition is based on linear subspace projection methods and temporal fusion using weighted voting on the results. Audiovisual fusion is done by fusing the unimodal identities into the multimodal one, using a suitable confidence metric for the results of the unimodal classifiers.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Phillips, P., et al.: Overview of the Face Recognition Grand Challenge. In: CVPR (2005)

    Google Scholar 

  2. Ekenel, H., Pnevmatikakis, A.: Video-Based Face Recognition Evaluation in the CHIL Project – Run 1. In: Face and Gesture Recognition 2006, Southampton, UK, pp. 85–90 (April 2006)

    Google Scholar 

  3. Waibel, A., Steusloff, H., Stiefelhagen, R., et al.: CHIL: Computers in the Human Interaction Loop. In: 5th International Workshop on Image Analysis for Multimedia Interactive Services (WIAMIS), Lisbon, Portugal (April 2004)

    Google Scholar 

  4. Brunelli, R., Falavigna, D.: Person Recognition Using Multiple Cues. IEEE Trans. Pattern Anal. Mach. Intell. 17(10), 955–966 (1995)

    Article  Google Scholar 

  5. Kittler, J., Hatef, M., Duin, R.P.W., Matas, J.: On combining classifiers. IEEE Trans. Pattern Anal. Mach. Intell. 20(3), 226–239 (1998)

    Article  Google Scholar 

  6. Turk, M., Pentland, A.: Eigenfaces for Recognition. J. Cognitive Neuroscience 3, 71–86 (1991)

    Article  Google Scholar 

  7. Belhumeur, P., Hespanha, J., Kriegman, D.: Eigenfaces vs. Fisherfaces: Recognition Using Class Specific Linear Projection. IEEE Trans. Pattern Analysis and Machine Intelligence 19(7), 711–720 (1997)

    Article  Google Scholar 

  8. Rentzeperis, E., Stergiou, A., Pnevmatikakis, A., Polymenakos, L.: Impact of Face Registration Errors on Recognition. In: Artificial Intelligence Applications and Innovations, Peania, Greece (June 2006)

    Google Scholar 

  9. Jesorsky, O., Kirchberg, K., Frischholz, R.: Robust Face Detection Using the Hausdorff Distance. In: Bigun, J., Smeraldi, F. (eds.) AVBPA 2001. LNCS, vol. 2091, pp. 90–95. Springer, Heidelberg (2001)

    Chapter  Google Scholar 

  10. Yu, H., Yang, J.: A direct LDA algorithm for high-dimensional data with application to face recognition. Pattern Recognition 34, 2067–2070 (2001)

    Article  MATH  Google Scholar 

  11. Sohn, J., Kim, N.S., Sung, W.: A Statistical Model Based Voice Activity Detection. IEEE Sig. Proc. Letters 6(1) (1999)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Rainer Stiefelhagen John Garofolo

Rights and permissions

Reprints and permissions

Copyright information

© 2007 Springer Berlin Heidelberg

About this paper

Cite this paper

Stergiou, A., Pnevmatikakis, A., Polymenakos, L. (2007). A Decision Fusion System Across Time and Classifiers for Audio-Visual Person Identification. In: Stiefelhagen, R., Garofolo, J. (eds) Multimodal Technologies for Perception of Humans. CLEAR 2006. Lecture Notes in Computer Science, vol 4122. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-69568-4_19

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-69568-4_19

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-69567-7

  • Online ISBN: 978-3-540-69568-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics