A Multi-modal Data-Set for Systematic Analyses of Linguistic Ambiguities in Situated Contexts

Alaçam, Özge; Staron, Tobias; Menzel, Wolfgang

doi:10.1007/978-3-319-90596-9_8

Özge Alaçam¹¹,
Tobias Staron¹¹ &
Wolfgang Menzel¹¹

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 795))

Included in the following conference series:

Annual International Symposium on Information Management and Big Data

416 Accesses

Abstract

Human situated language processing involves the interaction of linguistic and visual processing and this cross-modal integration helps to resolve ambiguities and predict what will be revealed next in an unfolding sentence during spoken communication. However, most state-of-the-art parsing approaches rely solely on the language modality. This paper aims to introduce a multi-modal data-set addressing challenging linguistic structures and visual complexities, which state-of-the-art parsers should be able to deal with. It also briefly addresses the multi-modal parsing approach and a proof-of-concept study that shows the contribution of employing visual information during disambiguation.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
Knoeferle’s sentence set [3] was used as baseline since the co-occurrence frequencies between the actions and the Agents in the sentences, as well as between the actions and the Patients, were controlled to single out the effects of semantic associations or preferences during parsing operations. For a syntactic parser, this may seem irrelevant, however, in order to develop a comparable experimental setup for human comprehension, this parameter needs to be taken into account.
2.
Relative Pronoun.
3.
Int.=Interpretation.
4.
The original German sentence is in active voice in OVS word order.
5.
http://www.sketchup.com/ - retrieved on 03.08.2016.
6.
The data-set can be accessed from https://gitlab.com/natsCML/LASC_v1.
7.
See [25] for a study focused more on the experiments on this Subset regarding all three languages: German, English and Turkish.
8.
https://www.heise.de.

References

Tanenhaus, M.K., Spivey-Knowlton, M.J., Eberhard, K.M., Sedivy, J.C.: Integration of visual and linguistic information in spoken language comprehension. Science 268(5217), 1632 (1995)
Article Google Scholar
Altmann, G.T., Kamide, Y.: Incremental interpretation at verbs: restricting the domain of subsequent reference. Cognition 73(3), 247–264 (1999)
Article Google Scholar
Knoeferle, P.S.: The role of visual scenes in spoken language comprehension: evidence from eye-tracking. Ph.D. thesis, Universitätsbibliothek (2005)
Google Scholar
Ferreira, F., Foucart, A., Engelhardt, P.E.: Language processing in the visual world: effects of preview, visual complexity, and prediction. J. Mem. Lang. 69(3), 165–182 (2013)
Article Google Scholar
McRae, K., Hare, M., Ferretti, T., Elman, J.L.: Activating verbs from typical agents, patients, instruments, and locations via event schemas. In: Proceedings of the Twenty-Third Annual Conference of the Cognitive Science Society, Erlbaum Mahwah, NJ, pp. 617–622 (2001)
Google Scholar
Van Berkum, J.J.A., Brown, C.M., Zwitserlood, P., Kooijman, V., Hagoort, P.: Anticipating upcoming words in discourse: evidence from ERPs and reading times. J. Exp. Psychol. Learn. Mem. Cogn. 31(3), 443 (2005)
Article Google Scholar
Coco, M.I., Keller, F.: The interaction of visual and linguistic saliency during syntactic ambiguity resolution. Q. J. Exp. Psychol. 68(1), 46–74 (2015)
Article Google Scholar
Berzak, Y., Barbu, A., Harari, D., Katz, B., Ullman, S.: Do you see what I mean? Visual resolution of linguistic ambiguities. arXiv preprint arXiv:1603.08079 (2016)
McCrae, P.: A computational model for the influence of cross-modal context upon syntactic parsing (2010)
Google Scholar
Mayberry, M.R., Crocker, M.W., Knoeferle, P.: A connectionist model of the coordinated interplay of scene, utterance, and world knowledge. In: Proceedings of the 28th Annual Conference of the Cognitive Science Society, pp. 567–572 (2006)
Google Scholar
McCrae, P.: A model for the cross-modal influence of visual context upon language processing. In: Proceedings of the International Conference Recent Advances in Natural Language Processing (RANLP 2009), Borovets, Bulgaria, pp. 230–235 (2009)
Google Scholar
Baumgärtner, C., Beuck, N., Menzel, W.: An architecture for incremental information fusion of cross-modal representations. In: IEEE Conference on Multisensor Fusion and Integration for Intelligent Systems (MFI), Hamburg, Germany, pp. 498–503. IEEE (2012)
Google Scholar
Beuck, N., Köhn, A., Menzel, W.: Incremental parsing and the evaluation of partial dependency analyses. In: DepLing 2011, Proceedings of the 1st International Conference on Dependency Linguistics (2011)
Google Scholar
Beuck, N., Köhn, A., Menzel, W.: Predictive incremental parsing and its evaluation. In: Computational Dependency Theory. Frontiers in Artificial Intelligence and Applications, vol. 258, pp. 186–206. IOS Press (2013)
Google Scholar
Camerini, P.M., Fratta, L., Maffioli, F.: The k best spanning arborescences of a network. Networks 10(2), 91–109 (1980)
Article MathSciNet Google Scholar
Charniak, E., Johnson, M.: Coarse-to-fine n-best parsing and MaxEnt discriminative reranking. In: Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics, pp. 173–180. Association for Computational Linguistics, June 2005
Google Scholar
Salama, A.R., Menzel, W.: Multimodal graph-based dependency parsing of natural language. In: Hassanien, A.E., Shaalan, K., Gaber, T., Azar, A.T., Tolba, M.F. (eds.) AISI 2016. AISC, vol. 533, pp. 22–31. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-48308-5_3
Chapter Google Scholar
Zhang, Y., Lei, T., Barzilay, R., Jaakkola T., Globerson, A.: Steps to excellence: simple inference with refined scoring of dependency trees. In: Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Baltimore, Maryland, pp. 197–207. Association for Computational Linguistics (2014)
Google Scholar
Lei, T., Xin, Y., Zhang, Y., Barzilay, R., Jaakkola, T.: Low-rank tensors for scoring dependency structures. In: Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Baltimore, Maryland, pp. 1381–1391. Association for Computational Linguistics, June 2014
Google Scholar
Tarjan, R.E.: Finding optimum branchings. Networks 7(1), 25–35 (1977)
Article MathSciNet Google Scholar
Hall, K.: k-best spanning tree parsing. In: Proceedings of the 45th Annual Meeting of the Association for Computational Linguistics, pp. 392–399 (2007)
Google Scholar
Foth, K.A., Köhn, A., Beuck, N., Menzel, W.: Because size does matter: the Hamburg dependency treebank. In: Proceedings of the Language Resources and Evaluation Conference 2014, LREC, European Language Resources Association (ELRA), Reykjavik, Iceland (2014)
Google Scholar
Schiller, A., Teufel, S., Thielen, C.: Guidelines für das tagging deutscher textcorpora mit STTS. Universität Stuttgart und Universität Tübingen (1995)
Google Scholar
Martins, A.F.T., Almeida, M.B., Smith, N.A.: Turning on the turbo: fast third-order non-projective turbo parsers. In: Proceedings of the Annual Meeting of the Association for Computational Linguistics (ACL), pp. 617–622 (2013)
Google Scholar
Staron, T., Alacam, O., Menzel, W.: Incorporating contextual information for language-independent, dynamic disambiguation tasks. In: Proceedings of the 11th Language Resources and Evaluation Conference (LREC) (2018)
Google Scholar

Download references

Acknowledgments

This research was funded by the German Research Foundation (DFG) in project ‘Crossmodal Learning’, TRR-169.

Author information

Authors and Affiliations

Department of Informatics, University of Hamburg, 22527, Hamburg, Germany
Özge Alaçam, Tobias Staron & Wolfgang Menzel

Authors

Özge Alaçam
View author publications
You can also search for this author in PubMed Google Scholar
Tobias Staron
View author publications
You can also search for this author in PubMed Google Scholar
Wolfgang Menzel
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Özge Alaçam .

Editor information

Editors and Affiliations

University of Florida, Gainesville, Florida, USA
Juan Antonio Lossio-Ventura
Universidad del Pacífico, Lima, Peru
Hugo Alatrista-Salas

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Alaçam, Ö., Staron, T., Menzel, W. (2018). A Multi-modal Data-Set for Systematic Analyses of Linguistic Ambiguities in Situated Contexts. In: Lossio-Ventura, J., Alatrista-Salas, H. (eds) Information Management and Big Data. SIMBig 2017. Communications in Computer and Information Science, vol 795. Springer, Cham. https://doi.org/10.1007/978-3-319-90596-9_8

Download citation

DOI: https://doi.org/10.1007/978-3-319-90596-9_8
Published: 21 April 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-90595-2
Online ISBN: 978-3-319-90596-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics