Skip to main content

Integrating vision and language: Towards automatic description of human movements

  • Spatial Reasoning
  • Conference paper
  • First Online:
KI-95: Advances in Artificial Intelligence (KI 1995)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 981))

Included in the following conference series:

Abstract

The integration of vision and natural language processing increasingly attracts attention in different areas of AI research. Up to now, however, there have only been a few attempts at connecting vision systems with natural language access systems. Within the SFB 314, special collaborative program on AI and knowledge-based systems, the automatic natural language description of real world image sequences constitutes a major research goal, which has been pursued during the last ten years. The aim of our approach is to obtain an incremental evaluation and simultaneous description of the perceived time-varying scenes. In this contribution we will report on new results of our joint efforts at combining the natural language access system Vitra with a vision system. We have investigated the problem of describing the movements of articulated bodies in image sequences within an integrated natural language and computer vision system. The paper will focus on our model-based approach for the recognition of pedestrians and on the further evaluation and language production in Vitra.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. E. André, G. Herzog, and T. Rist Von der Bildfolge zur multimedialen Präsentation. In Integration von Bild, Modell und Text '95, pages 129–142, Madgeburg, 1995. ASIM, Techn. Univ. Wien.

    Google Scholar 

  2. Artificial Intelligence Review Journal, 8, Special Volume on the Integration of Natural Language and Vision Processing, 1994.

    Google Scholar 

  3. N. I. Badler, B. L. Webber, J. Kalita, and J. Esakov. Animation from Instructions. In N. I. Badler, B. A. Barsky, and D. Zeltzer, editors, Making Them Move: Mechanics, Control, and Animation of Articulated Figures, pages 51–93. Morgan Kaufmann, San Mateo, CA, 1991.

    Google Scholar 

  4. R. Bajcsy, A. Joshi, E. Krotkov, and A. Zwarico. LandScan: A Natural Language and Computer Vision System for Analyzing Aerial Images. In Proc. of the 9th IJCAI, pages 919–921, Los Angeles, CA, 1985.

    Google Scholar 

  5. X. Briffault and M. Zock. What do we Mean when we Say “to the Left” or “to the Right”? How to Learn about Space by Building and Exploring a Microworld. In P. Jorrand and V. Sgurev, editors, Artificial Intelligence: Methodology, Systems, Applications (AIMSA'94), pages 363–371. World Scientific, Singapore, 1994.

    Google Scholar 

  6. C. Cédras and M. Shah. Motion-based Recognition: A Survey. Image and Vision Computing, 13(2): 129–155, 1995.

    Google Scholar 

  7. Centre National de la Recherche Scientifique. Images et Langages: Multimodalité et Modélisation Cognitive, Colloque Interdisciplinaire du Comité National de la Recherche Scientifique, Paris, 1993.

    Google Scholar 

  8. D. N. Chin, M. McGranaghan, and T.-T. Chen. Understanding Location Descriptions in the LEI System. In Proc. of the 4th Conf. on Applied Natural Language Processing, pages 138–143, Stuttgart, Germany, 1994.

    Google Scholar 

  9. L. Dreschler and H.-H. Nagel. Volumetric Model and 3D-Trajectory of a Moving Car Derived from Monocular TV-Frame Sequences of a Street Scene. Computer Graphics and Image Processing, 20:199–228, 1982.

    Google Scholar 

  10. M. Fürnsinn, M. Khenkhar, and B. Ruschkowski. GEOSYS — Ein Frage-Antwort-System mit räumlichem Vorstellungsvermögen. In C.-R. Rollinger, editor, Probleme des (Text-) Ver Stehens, Ansätze der künstlichen Intelligenz, pages 172–184. Niemeyer, Tübingen, 1984.

    Google Scholar 

  11. K.-P. Gapp. Basic Meanings of Spatial Relations: Computation and Evaluation in 3D Space. In Proc. of AAAI-94, pages 1393–1398, Seattle, WA, 1994.

    Google Scholar 

  12. G. Herzog. Utilizing Interval-Based Event Representations for Incremental High-Level Scene Analysis. In M. Aurnague, A. Borillo, M. Borillo, and M. Bras, editors, Proc. of the 4th International Workshop on Semantics of Time, Space, and Movement and Spatio-Temporal Reasoning, pages 425–435, Château de Bonas, France, 1992.

    Google Scholar 

  13. G. Herzog, T. Rist, and E. André. Sprache und Raum: Natürlichsprachlicher Zugang zu visuellen Daten. In C. Freksa and C. Habel, editors, Repräsentation und Verarbeitung räumlichen Wissens, pages 207–220. Springer, Berlin, Heidelberg, 1990.

    Google Scholar 

  14. G. Herzog, C.-K. Sung, E. André, W. Enkelmann, H.-H. Nagel, T. Rist, W. Wahlster, and G. Zimmermann. Incremental Natural Language Description of Dynamic Imagery. In C. Freksa and W. Brauer, editors, Wissensbasierte Systeme. 3. Int. GI-KongreΒ, pages 153–162. Springer, Berlin, Heidelberg, 1989.

    Google Scholar 

  15. G. Herzog and P. Wazinski. Visual TRAnslator: Linking Perceptions and Natural Language Descriptions. Artificial Intelligence Review, 8(2/3):175–187, 1994.

    Google Scholar 

  16. B. Hildebrandt, R. Moratz, G. Rickheit, and G. Sagerer. Integration von Bild-und Sprachverstehen in einer kognitiven Architektur. Kognitionswissenschaft, 4(3): 118–128, 1995.

    Google Scholar 

  17. D. Hogg. Model-based Vision: A Program to See a Walking Person. Image and Vision Computing, 1(1):5–20, 1983.

    Google Scholar 

  18. D. Hogg. Interpreting Images of a Known Moving Object. PhD thesis, University of Sussex, Brighton, UK, 1984.

    Google Scholar 

  19. A. Kilger. Using UTAGs for Incremental and Parallel Generation. Computational Intelligence, 10(4):591–603, 1994.

    Google Scholar 

  20. D. Koller. Detektion, Verfolgung und Klassifikation bewegter Objekte in monokularen Bildfolgen am Beispiel von StraΒenverkehrsszenen. Infix, St. Augustin, 1992.

    Google Scholar 

  21. W. Maaß, P. Wazinski, and G. Herzog. VITRA GUIDE: Multimodal Route Descriptions for Computer Assisted Vehicle Navigation. In Proc. of the Sixth Int. Conf. on Industrial and Engineering Applications of Artificial Intelligence and Expert Systems IEA/AIE-93, pages 144–147, Edinburgh, Scotland, 1993.

    Google Scholar 

  22. D. Marr and H. K. Nishihara. Representation and Recognition of the Spatial Organization of three-dimensional Shapes. In Proc. Royal Society B200, pages 269–294, London, 1978.

    Google Scholar 

  23. P. McKevitt, editor. Proc. of AAAI-94 Workshop on Integration of Natural Language and Vision Processing, Seattle, WA, 1994.

    Google Scholar 

  24. M. P. Murray, A. B. Drought, and R. C. Kory. Walking Patterns of Normal Men. Journal of Bone and Joint Surgery, 46-A(2):335–360, 1964.

    Google Scholar 

  25. B. Neumann and H.-J. Novak. NAOS: Ein System zur natürlichsprachlichen Beschreibung zeitveränderlicher Szenen. Informatik Forschung und Entwicklung, 1:83–92, 1986.

    Google Scholar 

  26. P. Olivier, T. Maeda, and J. Tsujii. Automatic Depiction of Spatial Descriptions. In Proc. of AAAI-94, pages 1405–1410, Seattle, WA, 1994.

    Google Scholar 

  27. G. Retz-Schmidt. Die Interpretation des Verhaltens mehrerer Akteure in Szenenfolgen. Springer, Berlin, Heidelberg, 1992.

    Google Scholar 

  28. K. Rohr. Auf dem Wege zu modellgestütztem Erkennen von bewegten nicht-starren Körpern in Realweltbildfolgen. In H. Burkhardt, K. H. Höhne, and B. Neumann, editors, Mustererkennung 1989, 11. DAGM Symposium, pages 324–328. Springer, Berlin, Heidelberg, 1989.

    Google Scholar 

  29. K. Rohr. Incremental Recognition of Pedestrians from Image Sequences. In Proc. of IEEE Conf. on Computer Vision & Pattern Recognition, pages 8–13, New York, NY, 1993.

    Google Scholar 

  30. K. Rohr. Towards Model-based Recognition of Human Movements in Image Sequences. Computer Vision, Graphics, and Image Processing (CVGIP): Image Understanding, 59(1):94–115, 1994.

    Google Scholar 

  31. J. R. J. Schirra, G. Bosch, C.-K. Sung, and G. Zimmermann. From Image Sequences to Natural Language: A First Step Towards Automatic Perception and Description of Motions. Applied Artificial Intelligence, 1:287–305, 1987.

    Google Scholar 

  32. E. Stopp, K.-P. Gapp, G. Herzog, T. Längle, and T. C. Lüth. Utilizing Spatial Relations for Natural Language Access to an Autonomous Mobile Robot. In B. Nebel and L. Dreschler-Fischer, editors, KI-94: Advances in Artificial Intelligence, pages 39–50. Springer, Berlin, Heidelberg, 1994.

    Google Scholar 

  33. I. Wachsmuth and Y. Cao. Interactive Graphics Design with Situated Agents. In W. Strasser and F. Wahl, editors, Graphics and Robotics. Springer, Berlin, Heidelberg, 1994.

    Google Scholar 

  34. W. Wahlster. Text and Images. In R. A. Cole, J. Mariani, H. Uszkoreit, A. Zaenen, and V. Zue, editors, Survey on Speech and Natural Language Technology. Kluwer, Dordrecht, 1994.

    Google Scholar 

  35. W. Wahlster, H. Marburger, A. Jameson, and S. Busemann. Over-answering Yes-No Questions: Extended Responses in a NL Interface to a Vision System. In Proc. of the 8th IJCAI, pages 643–646, Karlsruhe, FRG, 1983.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Ipke Wachsmuth Claus-Rainer Rollinger Wilfried Brauer

Rights and permissions

Reprints and permissions

Copyright information

© 1995 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Herzog, G., Rohr, K. (1995). Integrating vision and language: Towards automatic description of human movements. In: Wachsmuth, I., Rollinger, CR., Brauer, W. (eds) KI-95: Advances in Artificial Intelligence. KI 1995. Lecture Notes in Computer Science, vol 981. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-60343-3_42

Download citation

  • DOI: https://doi.org/10.1007/3-540-60343-3_42

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-60343-6

  • Online ISBN: 978-3-540-44944-7

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics