Skip to main content

Automated Alignment and Annotation of Audio-Visual Presentations

  • Conference paper
  • First Online:
Research and Advanced Technology for Digital Libraries (ECDL 2002)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 2458))

Included in the following conference series:

Abstract

Recordings of audio-visual presentations are a potentially valuable component of digital libraries. These recordings can be archived to enable remote access to audio presentations including lectures and seminars. Recordings of presentations often contain multiple information streams involving visual and audio data. If the full benefit of these recordings is to be realised these multiple media streams must be properly integrated to enable rapid navigation. This paper describes the application of information retrieval techniques within a system to automatically synchronise an audio soundtrack with electronic slides from a presentation. A novel component of the system is the detection of sections of the presentation unsupported by prepared slides, such as discussion and question answering, and automatic development of keypoint slides for these elements of the presentation.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. E. W. Brown, S. Srinivasen, A. Coden, D. Ponceleon, J. W. Cooper, and A. Amir. Towards Speech as a Knowldge Resource. IBM Systems Journal, 40(4):985–1001, 2001.

    Article  Google Scholar 

  2. S. Mukhopadyay and B. Smith. Passive Capture and Structuring of Lectures. In Proceedings of the 7th ACM International Conference on Multimedia (Part 1), pages 477–487, Orlando, Florida, 1999. ACM.

    Google Scholar 

  3. J. Hunter and S. Little Building and Indexinga Distributed Multimedia Presentation Archive Using SMIL. In Proceedings of the 5th European Conference on Research and Advanced Technology for Digital Libraries (ECDL 2001), pages 415–428, Darmstadt, 2001.

    Google Scholar 

  4. A. G. Hauptmann and M. J. Witbrock. Informedia: News-on-Demand Multimedia Information Aquistion and Retrieval. In M. T. Maybury, editor, Intelligent Multimedia Information Retrieval, pages 215–239. AAAI/MIT Press, 1997.

    Google Scholar 

  5. M. G. Brown, J. T. Foote, G. J. F. Jones, K. Sparck Jones, and S. J. Young. Open-vocabulary speech indexing for voice and video mail retrieval. In Proceedings of ACM Multimedia 96, pages 307–316, Boston, 1996. ACM.

    Google Scholar 

  6. J. S. Garafolo, C. G. P. Auzanne, and E. M. Voorhees. The TREC Spoken Document Retrieval Track: A Success Story. In Proceedings of the RIAO 2000 Conference: Content-Based Multimedia Information Access, pages 1–20, Paris, 2000.

    Google Scholar 

  7. C. J. van Rijsbergen. Information Retrieval. Butterworths, 2nd edition, 1979.

    Google Scholar 

  8. M. F. Porter. An algorithm for suffix stripping. Program, 14:130–137, 1980.

    Google Scholar 

  9. S. E. Robertson and S. Walker. Some simple effective approximations to the 2-Poisson model for probabilistic weighted retrieval. In Proceedings of the 17th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 232–241, Dublin, 1994. ACM.

    Google Scholar 

  10. S. E. Robertson, S. Walker, M. M. Beaulieu, M. Gatford, and A. Payne. Okapi at TREC-4. In D. K. Harman, editor, Overview of the Fourth Text REtrieval Conference (TREC-4), pages 73–96. NIST, 1996.

    Google Scholar 

  11. M. Hearst. Multi-Paragraph Segmentation of Expository Text. In Proceedings of ACL’94, Las Cruces, New Mexico, U.S.A., 1994.

    Google Scholar 

  12. D. Ponceleon and S. Srinivasen. Structure and Content-Based Segmentation of Speech Transcripts. In Proceedings of the 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 404–405, New Orleans, 2001. ACM.

    Google Scholar 

  13. R. Jin and A. G. Hauptmann. Automatic title generation for spoken broadcast news. In Proceedings of Human Language Technology Conference (HLT 2001), San Diego, 2001.

    Google Scholar 

  14. L. J. Stifelman. Augmenting Real-World Objects: A Paper-Based Audio Netbook. In Proceedings of CHI’96, Vancouver, Canada, 1996.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2002 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Jones, G.J.F., Edens, R.J. (2002). Automated Alignment and Annotation of Audio-Visual Presentations. In: Agosti, M., Thanos, C. (eds) Research and Advanced Technology for Digital Libraries. ECDL 2002. Lecture Notes in Computer Science, vol 2458. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45747-X_21

Download citation

  • DOI: https://doi.org/10.1007/3-540-45747-X_21

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-44178-6

  • Online ISBN: 978-3-540-45747-3

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics