Data Analysis for a Multimedia Library

Hauptmann, Alexander; Jin, Rong; Wactlar, Howard

doi:10.1007/978-3-540-45115-0_2

Alexander Hauptmann⁸,
Rong Jin⁸ &
Howard Wactlar⁸

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 2705))

283 Accesses

Abstract

This section describes the indexing, search, and retrieval of various combinations of audio, video, text, and image media and the automated content processing that enables it. The intent is to provide a framework for data analysis in multimedia digital libraries. The organization of this article is as follows: The introduction briefly distinguishes digital from traditional libraries and touches on the specific issues important to searching the content of multimedia libraries. The second section introduces the Informedia Digital Video Library as an example of a multimedia library, including a quick tour of the functionality. The next section discusses the processing of audio and image information, as it relates to a multimedia library. Section four illustrates the interplay between audio and video information using a video information retrieval experiment as an example. Section five discusses the exporting and sharing of metadata in a digital library using MPEG–7. Finally, section 6 presents one vision of a future digital library, where all personal memory can be recorded and accessed.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Ardizzone, E., La Cascia, M., Avanzato, A., Bruna, A.: Video indexing using MPEG motion compensation vectors. In: IEEE International Conference on Multimedia Computing and Systems, vol. 2, pp. 725–729 (1999)
Google Scholar
Bikel, D.M., Miller, S., Schwartz, R., Weischedel, R.: Nymble: A high performance learning name-finder. In: Proc. 5th Conference on Applied Natural Language Processing, pp. 194–201 (1996)
Google Scholar
Bouthemy, P., Gelgon, M., Ganansia, F.: A unified approach to shot change detection and camera motion characterization. IEEE Trans. Circuits and Systems for Video Technology 9, 1030–1044 (1999)
Article Google Scholar
Bush, V.: As we may think. The Atlantic Monthly 176(7), 101–108 (1945)
Google Scholar
Chang, S.-F., Sikora, T., Puri, A.: Overview of the MPEG-7 standard. IEEE Transactions on Circuits and Systems for Video Technology (2001)
Google Scholar
Christel, M., Martin, D.: Information visualization within a digital video library. Journal of Intelligent Information Systems 11(3), 235–257 (1998)
Article Google Scholar
Christel, M.G., Hauptmann, A.G., Warmack, A.S., Crosby, S.A.: Adjustable filmstrips and skims as abstractions for a digital video library. In: Proc. IEEE Advances in Digital Libraries Conference, pp. 98–104 (1999)
Google Scholar
Christel, M.G., Maher, B., Begun, A.: XSLT for tailored access to a digital video library. In: Proc. Joint Conference on Digital Libraries, pp. 290–299 (2001)
Google Scholar
Christel, M.G., Olligschlaeger, A.M., Huang, C.: Interactive maps for a digital video library. IEEE MultiMedia 7(1), 60–67 (2000)
Article Google Scholar
Bimbo, A.D.: Visual Information Retrieval. Morgan Kaufmann Publishers, San Francisco (1999)
Google Scholar
Encyclopedia Britannica (2002), http://www.britannica.com
Fox, E.A., Marchionini, G.: Toward a worldwide digital library. Communications of the ACM 41(4), 22–28 (1998)
Article Google Scholar
Garofolo, J.S., Auzanne, C.P., Voorhees, E.M.: The TREC spoken document retrieval track: A success story. In: Proc RIAO–2000: Content-Based Multimedia Information Access Conference, pp. 12–14 (2000)
Google Scholar
Goldstein, J., Kantrowitz, M., Mittal, V., Carbonell, J.: Summarizing text documents: Sentence selection and evaluation metrics. In: Proc. ACM SIGIR (1999)
Google Scholar
Hauptmann, A.G., Jin, R., Ng, T.D.: Multi-modal information retrieval from broadcast video using OCR and speech recognition. In: Proc. Joint Conference on Digital Libraries (2002)
Google Scholar
Hauptmann, A.G., Jones, R.E., Seymore, K., Siegler, M.A., Slattery, S.T., Witbrock, M.J.: Experiments in information retrieval from spoken documents. In: Proc. DARPA Workshop on Broadcast News Understanding Systems (1998)
Google Scholar
Hauptmann, A.G., Lee, D.: Topic labeling of broadcast news stories in the Informedia digital video library. In: Proc. ACM Conference on Digital Libraries (1998)
Google Scholar
Hauptmann, A.G., Smith, M.: Text, speech and vision for video segmentation: The Informedia project. In: Proc. AAAI Fall Symposium, Computational Models for Integrating Language and Vision, pp. 10–12 (1995)
Google Scholar
Hauptmann, A.G., Witbrock, M.: Informedia: News-on-demand - multimedia information acquisition and retrieval. In: Maybury, M. (ed.) Intelligent Multimedia Information Retrieval. AAAI Press/MIT Press (1998)
Google Scholar
Hauptmann, A.G., Witbrock, M.J., Christel, M.G.: Artificial intelligence techniques in the interface to a digital video library. In: Proc. Conference on Human Factors in Computing Systems, pp. 2–3 (1997)
Google Scholar
Houghton, R.: Named faces: putting names to faces. IEEE Intelligent Systems 14(5), 45–50 (1999)
Article Google Scholar
Huang, Q., Dom, B., Gorkani, M., Hafner, J., Lee, D., Petkovic, D., Steele, D., Yanker, P.: Query by image and video content: the QBIC system. IEEE Computer 28(9), 23–32 (1995)
MATH Google Scholar
Jin, R., Hauptmann, A.G.: Headline generation using a training corpus. In: Gelbukh, A. (ed.) CICLING 2001. LNCS, vol. 2004, pp. 208–215. Springer, Heidelberg (2001)
Chapter Google Scholar
Jinzenji, K., Ishibashi, S., Kotera, H.: Algorithm for automatically producing layered sprites by detecting camera movement. In: Proc. International Conference on Image Processing, vol. 1, pp. 767–770 (1997)
Google Scholar
Kantor, P., Voorhees, E.M.: Report on the confusion track. In: Proc. Fifth Text Retrieval Conference, (TREC-5), (1997)
Google Scholar
Kimball, O., Schmidt, M., Gish, H., Waterman, J.: Speaker verification with limited enrollment data. In: Proc. ICSLP, vol. 2, pp. 967–970 (1996)
Google Scholar
Kubala, F., Colbath, S., Liu, D., Makhoul, J.: Rough’n’Ready: A meeting recorder and browser. ACM Computing Surveys 31(2es), 7 (1999)
Article Google Scholar
Kubala, F., Colbath, S., Liu, D., Srivastava, A., Makhoul, J.: Integrated technologies for indexing spoken language. Communication of the ACM 43(2), 48–56 (2000)
Article Google Scholar
Kubala, F., Schwartz, R., Stone, R., Weischedel, R.: Named entity extraction from speech. In: Proc. DARPA Broadcast News Workshop (1998)
Google Scholar
Kupiec, J., Pedersen, J., Chen, F.: A trainable document summarizer. In: Proc. ACM SIGIR, pp. 68–73 (1995)
Google Scholar
Lee, H., Smeaton, A.: Searching the Físchlár-NEWS archive on a mobile device. In: Proc. ACM SIGIR, pp. 11–15 (2002)
Google Scholar
Leiner, B.M.: The scope of the digital library. Draft Prepared for the DLib Working Group on Digital Library Metrics (1998)
Google Scholar
Lienhart, R.: Comparison of automatic shot boundary detection algorithms. In: Storage and Retrieval for Still Image and Video Databases VII, Proc. SPIE 3656-29 (1999)
Google Scholar
Mani, I., House, D., Maybury, M., Green, M.: Towards content-based browsing of broadcast news video. Intelligent Multimedia Information Retrieval (1998)
Google Scholar
MPEG Moving Pictures Expert Group. Standards ISO/IEC 13818-2:2000, and ISO/IEC 11172-2 (1993), http://mpeg.telecomitalialab.com/standards.htm
ISO/IEC JTC1/SC29/WG11 N4509. Overview of the MPEG-7 standard, version 6.0 (2000)
Google Scholar
Ney, H.: The use of a one stage dynamic programming algorithm for connected word recognition. IEEE Transactions on Acoustics, Speech and Signal Processing, AASP 32(2), 262–271 (1984)
Google Scholar
Olligschlaeger, A.M., Hauptmann, A.G.: Multimodal information systems and GIS: The Informedia digital video library. ESRI User Conference (1999)
Google Scholar
MPEG-7 Schema Page (2001), http://pmedia.i2.ibm.com:8000/mpeg7/schema/
Park, J.I., Inoue, S., Iwadate, Y.: Estimating camera parameters from motion vectors of digital video. In: IEEE Workshop Multimedia Signal Processing, pp. 105–110 (1998)
Google Scholar
Pentland, A., Starner, T., Etcoff, N., Masoiu, N., Oliyide, O., Turk, M.: Experiments with Eigenfaces. In: Proc. IJCAI Looking at People Workshop (1993)
Google Scholar
Rivlin, Z., Bolles, R., Appelt, D., Cheyer, A., Hakkani-Tur, D.Z., Israel, D., Julia, L., Martin, D., Myers, G., Nitz, K., Sabata, B., Sankar, A., Shriberg, E., Sonmez, K., Stolcke, A., Tur, G.: MAESTRO: Conductor of multimedia analysis technologies. Communications of the ACM 43(2), 57–74 (2000)
Article Google Scholar
Rowley, H., Baluja, S., Kanade, T.: Human face detection in visual scenes. Technical Report CMU-CS-95-158, Carnegie Mellon University, Pittsburgh, PA (1995)
Google Scholar
Salton, G., Singhal, A., Mitra, M., Buckley, C.: Automatic text structuring and summary. Info. Proc. And Management 33, 193–207 (1997)
Article Google Scholar
Sato, T., Kanade, T., Hughes, E., Smith, M.: Video OCR for digital news archives. In: IEEE International Workshop on Content-Based Access of Image and Video Databases, pp. 52–60 (January 1998)
Google Scholar
Sato, T., Kanade, T., Hughes, E.A., Smith, M.A., Satoh, S.: Video OCR: Indexing digital news libraries by recognition of superimposed caption. ACM Multimedia Systems 7(5), 385–395 (1999)
Article Google Scholar
Satoh, S., Kanade, T.: NAME-IT: Association of face and name in video. IEEE CVPR 1997, Puerto Rico (1997)
Google Scholar
Schmidt, M., Golden, J., Gish, H.: GMM sample statistic log-likelihoods for textindependent speaker recognition. In: Proc. Eurospeech 1997, vol. 2, pp. 855–858 (1997)
Google Scholar
Schneiderman, H., Kanade, T.: Probabilistic modeling of local appearance and spatial relationships of object recognition. In: Proc IEEE CVPR (1998)
Google Scholar
Schwartz, R., Imai, T., Kubala, F., Nguyen, L., Makhoul, J.: A maximum likelihood model for topic classification in broadcast news. In: Proc. Eurospeech 1997 (1997)
Google Scholar
Shamos, M.: Vision for the universal library (2002), http://www.ul.cs.cmu.edu/
Shneiderman, B.: Designing the User Interface. Addison-Wesley, Reading (1998)
Google Scholar
Slaughter, L., Oard, D.W., Warnick, V.L., Harding, J.L., Wilkerson, G.J.: A graphical interface for speech-based retrieval. In: Proc. Digital Libraries 1998, pp. 305–306 (1998)
Google Scholar
Smeaton, A., Murphy, N., O’Connor, N., Marlow, S., Lee, H., Mc Donald, K., Browne, P., Ye, J.: The Físchlár digital video system: A digital library of broadcast TV programmes. In: Proc. Joint Conference on Digital Libraries (2001)
Google Scholar
Smeulders, A.W.M., Worring, M., Santini, S., Gupta, A., Jain, R.: Content-based image retrieval at the end of the early years. IEEE Trans. Pattern Analysis and Machine Intelligence 22(12), 1349–1380 (2000)
Article Google Scholar
SonicFoundry (2002), http://sonicfoundry.com/
Virage (2002), http://www.virage.com/
Visionics (2002), http://www.visionics.com
Voorhees, E.M., Harman, D.K.: The Ninth Text Retrieval Conference, TREC-9 (2001)
Google Scholar
Voorhees, E.M., Tice, D.M.: The TREC-8 question answering track report. In: The Eighth Text Retrieval Conference, TREC-8 (2000)
Google Scholar
VTREC. The Video TREC track home page (2001), http://www-nlpir.nist.gov/projects/trecvid/
Wactlar, H., Christel, M., Gong, Y., Hauptmann, A.: Lessons learned from the creation and deployment of a terabyte digital video library. IEEE Computer 32(2), 66–73 (1999)
Google Scholar
Wang, R., Huang, T.: Fast camera motion analysis in the MPEG domain. International Conference on Image Processing 3, 691–694 (1999)
Google Scholar
QBIC web site (2002), http://wwwqbic.almaden.ibm.com
Witbrock, M., Mittal, V.: Ultra-summarization: A statistical approach to generating highly condensed non-extractive summaries. In: Proc. ACM SIGIR (1999)
Google Scholar
Woodland, P.C., Gales, M.J.F., Pye, D., Young, S.J.: Development of the 1996 broadcast news transcription system. In: Proceedings of the 1997 ARPA Workshop on Speech Recognition (February 1997)
Google Scholar

Download references

Author information

Authors and Affiliations

School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, USA
Alexander Hauptmann, Rong Jin & Howard Wactlar

Authors

Alexander Hauptmann
View author publications
You can also search for this author in PubMed Google Scholar
Rong Jin
View author publications
You can also search for this author in PubMed Google Scholar
Howard Wactlar
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

University of Edinburgh, Edinburgh, Scotland
Steve Renals
CEA-LIST, Fontenay-aux-Roses, France
Gregory Grefenstette

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Hauptmann, A., Jin, R., Wactlar, H. (2003). Data Analysis for a Multimedia Library. In: Renals, S., Grefenstette, G. (eds) Text- and Speech-Triggered Information Access. Lecture Notes in Computer Science(), vol 2705. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-45115-0_2

Download citation

DOI: https://doi.org/10.1007/978-3-540-45115-0_2
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-40635-8
Online ISBN: 978-3-540-45115-0
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics