Where Corpus Linguistics and Artificial Intelligence (AI) Meet

Pace-Sigge, Michael

doi:10.1007/978-3-319-90719-2_3

Michael Pace-Sigge²

758 Accesses

Abstract

This chapter will provide a platform to showcase the more recent developments that have grown out of the early laid groundwork. The latest theories in the field of linguistics will be presented, based on empirical data taken from naturally occurring language. In particular, the lexical priming theory will be introduced as a way to explain structures of language that corpus linguists have uncovered. Furthermore, the chapter will discuss the development of increasingly sophisticated algorithms that also deal with the use of language. Here, the focus will be on key achievements in the 1980s by IBM which created a solid foundation for applications that are now widely used in mobile and desktop devices—namely “assistants” like Amazon’s Echo, Apple’s SIRI or Google’s (and Android’s) Google Go.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Hardcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Cambria, Erik, and White Bebo. 2014. Jumping NLP Curves: A Review of Natural Language Processing Research. EEE Computational Intelligence Magazine 9 (2): 48–57.
Article Google Scholar
Canhasi, Ercan. 2016. GSolver: Artificial Solver of Word Association Game. In ICT Innovations 2015, ed. Suzana Loshkovska and Saso Koceski, 49–57. Cham: Springer.
Chapter Google Scholar
Carroll, Glenn, and Eugene Charniak. 1991. A Probabilistic Analysis of Marker-Passing Techniques for Plan-Recognition. In Proceedings of the Seventh Conference on Uncertainty in Artificial Intelligence, August, 69–76. Morgan Kaufmann Publishers Inc.
Chapter Google Scholar
Charniak, Eugene. 1972. Toward a Model of Children’s Story Comprehension. AI-Tech, Rep-266. Cambridge, MA: MIT AI Labs.
Google Scholar
Charniak, Eugene. 1986. A Neat Theory of Marker Passing. AAAI, 584–588.
Google Scholar
Charniak, Eugene, and Robert Goldman. 1988. A Logic for Semantic Interpretation. In Proceedings of the 26th Annual Meeting on Association for Computational Linguistics, 87–94. Association for Computational Linguistics.
Google Scholar
Clark, Stephen. 2015. Vector Space Models of Lexical Meaning. In Handbook of Contemporary Semantic Theory, ed. Shalom Lappin and Chris Fox, 493–522. New York: Wiley.
Chapter Google Scholar
Collins, Allan M., and Elizabeth F. Loftus. 1975. A Spreading-Activation Theory of Semantic Processing. Psychological Review 82 (6): 407–428.
Article Google Scholar
Collobert, Ronan, and Jason Weston. 2008. A Unified Architecture for Natural Language Processing: Deep Neural Networks with Multitask Learning. In Proceedings of the 25th International Conference on Machine Learning, 160–167. Helsinki, Finland: ACM.
Google Scholar
Damavandi, Babak, Shankar Kumar, Noam Shazeer, and Antoine Bruguier. 2016. NN-Grams: Unifying Neural Network and N-Gram Language Models for Speech Recognition. arXiv preprint arXiv:1606.07470.
Das, Dipanjan, Desai Chen, André F.T. Martins, Nathan Schneider, and Noah A. Smith. 2014. Frame-Semantic Parsing. Computational Linguistics 40 (1): 9–56.
Article Google Scholar
Erk, Katrin, and Sebastian Padó. 2008. A Structured Vector Space Model for Word Meaning in Context. In Proceedings of the Conference on Empirical Methods in Natural Language Processing, 897–906. Association for Computational Linguistics.
Google Scholar
Graves, Alex, Greg Wayne, and Ivo Danihelka. 2014. Neural Turing Machines. arXiv preprint arXiv:1410.5401.
Harabagiu, Sanda M., and Dan I. Moldovan. 1997. Parallel Inference on a Linguistic Knowledge Base. In Parallel Processing Symposium, 1997. Proceedings, 11th International, 204–208. IEEE.
Google Scholar
Harrington, Brian. 2010. A Semantic Network Approach to Measuring Relatedness. In Proceedings of the 23rd International Conference on Computational Linguistics: Posters, 356–364.
Google Scholar
Henderson, Matthew. 2015. Machine Learning for Dialog State Tracking: A Review. Machine Learning in Spoken Language Processing Workshop. https://static.googleusercontent.com/media/research.google.com/en//pubs/archive/44018.pdf. Last Accessed 11/2017.
Hermann, Karl Moritz, Tomas Kocisky, Edward Grefenstette, Lasse Espeholt, Will Kay, Mustafa Suleyman, and Phil Blunsom. 2015. Teaching Machines to Read and Comprehend. Advances in Neural Information Processing Systems, 1693–1701.
Google Scholar
Hirschberg, Julia, and Christopher D. Manning. 2015. Advances in Natural Language Processing. Science 349 (6245): 261–266.
Article Google Scholar
Hobbs, Jerry R., Mark Stickel, Paul Martin, and Douglas Edwards. 1988. Interpretation as Abduction. In Proceedings of the 26th Annual Meeting on Association for Computational Linguistics, 95–103. Association for Computational Linguistics.
Google Scholar
Hochreiter, Sepp, and Jürgen Schmidhuber. 1997. Long Short-Term Memory. Neural Computation 9 (8): 1735–1780.
Article Google Scholar
Hoey, Michael. 1991. Patterns of Lexis in Text. Oxford: Oxford University Press.
Google Scholar
Hoey, Michael. 1995. The Lexical Nature of Intertextuality: A Preliminary Study. In Organization in Discourse: Proceedings from the Turku Conference, ed. B. Warvik, S. Tanskanen, and R. Hiltunen, 73–94. Anglicana Turkuensia 14.
Google Scholar
Hoey, Michael. 2005. Lexical Priming: A New Theory of Words and Language. London: Routledge.
Book Google Scholar
Hoey, Michael. 2008. Lexical Priming and Literary Creativity. In Text, Discourse and Corpora, ed. M. Hoey, M. Mahlberg, M. Stubbs, and W. Teubert, 7–30. London: Continuum.
Google Scholar
Hoey, Michael. 2017. Cohesion and Coherence in a Content-Specific Corpus. In Lexical Priming: Applications and Advances, ed. M. Pace-Sigge and K. J. Patterson, 3–40. Amsterdam: John Benjamins.
Google Scholar
Jantunen, Jarmo Harri. 2017. Lexical and Morphological Priming. In Lexical Priming: Applications and Advances, ed. M. Pace-Sigge and K. J. Patterson, 253–272. Amsterdam: John Benjamins.
Google Scholar
Jantunen, Jarmo Harri, and Sisko Brunni. 2013. Morphology, Lexical Priming and Second Language Acquisition: A Corpus-Study on Learner Finnish. In Twenty Years of Learner Corpus Research: Looking Back, Moving Ahead, ed. Sylviane Granger, Gaëtanelle Gilquin, and Fanny Meunier, pp. 235–245. Louvain-la-Neuve: Presses universitaires de Louvain.
Google Scholar
Jean, Sébastien, Kyunghyun Cho, Roland Memisevic, and Yoshua Bengio. 2014. On Using Very Large Target Vocabulary for Neural Machine Translation. arXiv preprint arXiv:1412.2007.
Johnson, Melvin, M. Schuster, Q.V. Le, M. Krikun, Y. Wu, Z. Chen, N. Thorat, F. Viégas, M. Wattenberg, G. Corrado, and M. Hughes. 2016. Googles Multilingual Neural Machine Translation System: Enabling Zero-Shot Translation. arXiv preprint arXiv:1611.04558.
Jozefowicz, Rafal, Oriol Vinyals, Mike Schuster, Noam Shazeer, and Yonghui Wu. 2016. Exploring the Limits of Language Modeling. arXiv preprint arXiv:1602.02410.
Kaiser, Lukasz, Aidan N. Gomez, Noam Shazeer, Ashish Vaswani, Niki Parmar, Llion Jones, and Jakob Uszkoreit. 2017. One Model to Learn Them All. arXiv preprint arXiv:1706.05137.
Lehmann, Fritz. 1992. Semantic Networks. Computers & Mathematics with Applications 23 (2–5): 1–50.
Article Google Scholar
Leviathan, Yanviv and Matias, Yossi. 2018. Google Duplex: An AI System for Accomplishing Real World Tasks Over the Phone. Google AI Blog. https://ai.googleblog.com/2018/05/duplex-ai-system-for-natural-conversation.html. Last Accessed 09/2018.
Lewis, Mike, Denis Yarats, Yann N. Dauphin, Devi Parikh, and Dhruv Batra. 2018, Forthcoming. Deal or No Deal? End-to-End Learning for Negotiation Dialogues. arXiv:1706.05125.
Li, Jiwei, Will Monroe, Alan Ritter, Michel Galley, Jianfeng Gao, and Dan Jurafsky. 2016. Deep Reinforcement Learning for Dialogue Generation. arXiv preprint arXiv:1606.01541.
Louw, Bill. 1993. Irony in the Text or Insincerity in the Writer? The Diagnostic Potential of Semantic Prosodies. In Text and Technology, ed. M. Baker, G. Francis, and E. Tognini-Bonelli, 157–176. Amsterdam: Benjamins.
Chapter Google Scholar
Luong, Minh-Thang, and Christopher D. Manning. 2016. Achieving Open Vocabulary Neural Machine Translation with Hybrid Word-Character Models. arXiv preprint arXiv:1604.00788.
Mac an tSaoir, Ronan. 2014. Using Spreading Activation to Evaluate and Improve Ontologies. COLING, 2237–2248.
Google Scholar
Manin, Yuri I., and Matilde Marcolli. 2016. Semantic Spaces. Mathematics in Computer Science 10 (4): 459–477.
Article Google Scholar
Manning, Chris (with Richard Socher). 2017. Natural Language Processing with Deep Learning CS224N/Ling284. Lecture 11. Stanford University.
Google Scholar
Mikolov, Tomáš, Martin Karafiát, Lukas Burget, Jan Cernocký, and Sanjeev Khudanpur. 2010. Recurrent Neural Network Based Language Model. Interspeech 2: 3–10.
Google Scholar
Mikolov, Tomáš, Stefan Kombrink, Lukáš Burget, Jan Černocký, and Sanjeev Khudanpur. 2011. Extensions of Recurrent Neural Network Language Model. Acoustics, Speech and Signal Processing (ICASSP), 2011 IEEE International Conference on, 5528–5531.
Google Scholar
Mikolov, Tomas, and Geoffrey Zweig. 2012. Context Dependent Recurrent Neural Network Language Model. Microsoft Research Technical Report MSR-TR-2012-92, 234–239.
Google Scholar
Mikolov, Tomas, Kai Chen, Greg Corrado, and Jeffrey Dean. 2013. Efficient Estimation of Word Representations in Vector Space. arXiv preprint arXiv:1301.3781.
Miller, George A. 1956. The Magical Number Seven, Plus or Minus Two: Some Limits on Our Capacity for Processing Information. Psychological Review, 63 (2): 81–97.
Article Google Scholar
Neely, James H. 1976. Semantic Priming and Retrieval from Lexical Memory: Evidence for Facilitatory and Inhibitory Processes. Memory and Cognition 4 (5): 648–654.
Article Google Scholar
Noordman-Vonk, Wietske. 1979. Retrieval from Semantic Memory. Berlin, Heidelberg: Springer.
Book Google Scholar
Norvig, P. 1983. Frame Activated Inferences in a Story Understanding Program. International Joint Conference on Artificial Intelligence (IJCAI), 624–626.
Google Scholar
Norvig, P. 1987. A Unified Theory of Inference for Text Understanding. PhD thesis, University of California, Berkeley.
Google Scholar
Norvig, P. 1989a. Marker Passing as a Weak Method for Text Inferencing. Cognitive Science 13 (4): 569–620.
Google Scholar
Norvig, P. 1989b. Building a Large Lexicon with Lexical Network Theory. In Proceedings of the IJCAI Workshop on Lexical Acquisition, 1–12.
Google Scholar
Norvig, P. 1992. Story Analysis. In Encyclopedia of AI, ed. Stuart Shapiro. New Jersey: Wiley.
Google Scholar
Norvig, P. 2011. On Chomsky and the Two Cultures of Statistical Learning. On-Line Essay in Response to Chomskys Remarks. Available from http://norvig.com/chomsky.html. Last Accessed 11/2017.
Och, Franz Josef. 2003. Minimum Error Rate Training in Statistical Machine Translation. In Proceedings of the 41st Annual Meeting on Association for Computational Linguistics, vol. 1, 160–167. Association for Computational Linguistics.
Google Scholar
Och, Franz Josef, and Hermann Ney. 2002. Discriminative Training and Maximum Entropy Models for Statistical Machine Translation. In Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics (ACL), Philadelphia, PA.
Google Scholar
Och, Franz Josef, and Hermann Ney. 2003. A Systematic Comparison of Various Statistical Alignment Models. Computational Linguistics 29: 19–51.
Article Google Scholar
Och, Franz Josef, Michael E. Jahr, and Ignacio E. Thayer. 2014a. Minimum Error Rate Training with a Large Number of Features for Machine Learning. U.S. Patent 8,645,119.
Google Scholar
Och, F.J., J. Dean, T. Brants, A.M. Franz, J. Ponte, P. Xu, S.M. Teh, J. Chin, I.E. Thayer, A. Carver, and D. Rosart. 2014b. Encoding and Adaptive, Scalable Accessing of Distributed Models. U.S. Patent 8,738,357.
Google Scholar
Pace-Sigge, Michael. 2013. Lexical Priming in Spoken English Usage. Houndmills: Palgrave Macmillan.
Book Google Scholar
Pace-Sigge, Michael, and Katie J. Patterson. 2017. Lexical Priming: Applications and Advances. Amsterdam: John Benjamins.
Book Google Scholar
Patterson, Katie J. 2016. The Analysis of Metaphor: To What Extent Can the Theory of Lexical Priming Help Our Understanding of Metaphor Usage and Comprehension? Journal of Psycholinguistic Research 45 (2): 237–258.
Article Google Scholar
Patterson, Katie J. 2018. Understanding Metaphor through Corpora: A Case Study of Metaphors in Nineteenth Century Writing. New York: Routledge.
Google Scholar
Quillian, M. Ross. 1966. Semantic Memory. Unpublished Doctoral Dissertation, Carnegie Institute of Technology (Reprinted in Part in M. Minsky (ed.), Semantic Information Processing. Cambridge: MIT Press, 1968).
Google Scholar
Quillian, M. Ross. 1969. The Teachable Language Comprehender: A Simulation Program and Theory of Language. Computational Linguistics 12 (8) (August): 459–476.
Article Google Scholar
Sardinha, Tony Berber. 2017. Lexical Priming and Register Variation. In Lexical Priming: Applications and Advances, ed. M. Pace-Sigge and K. J. Patterson, 189–230. Amsterdam: John Benjamins.
Google Scholar
Shastri, Lokendra. 1992. Structured Connectionist Networks of Semantic Networks. Computers & Mathematics with Applications 23 (2–5): 293–328.
Article Google Scholar
Simmons, Robert. 1963. Synthetic Language Behaviour. Data Processing Manager 5 (12): 11–18.
Google Scholar
Sinclair, John M. 1987. The Nature of the Evidence. In Looking Up, ed. J. Sinclair, 150–159. London: Collins.
Google Scholar
Sinclair, John M. 1991. Corpus, Concordance, Collocation. Oxford: Oxford University Press.
Google Scholar
Singhal, Amit, Mehran Sahami, John Lamping, Marcin Kaszkiel, and Monika H. Henzinger. Google Inc. 2011. Search Queries Improved Based on Query Semantic Information. U.S. Patent 8,055,669.
Google Scholar
Sowa, John F. 1987. Semantic Networks. In Encyclopedia of Artificial Intelligence, ed. Stuart C. Shapiro. London: Wiley.
Google Scholar
Steyvers, Mark, and Joshua B. Tenenbaum. 2005. The Large-Scale Structure of Semantic Networks: Statistical Analyses and a Model of Semantic Growth. Cognitive Science 29 (1): 41–78.
Article Google Scholar
Stubbs, Michael. 1995. Collocations and Cultural Connotations of Common Words. Linguistics and Education 7 (4): 379–390.
Article Google Scholar
Sutskever, Ilya, Oriol Vinyals, and Quoc V. Le. 2014. Sequence to Sequence Learning with Neural Networks. In Advances in Neural Information Processing Systems, 3104–3112.
Google Scholar
Szymanski, Julian, and Duch Włodzisław. 2012. Annotating Words Using WordNet Semantic Glosses. In International Conference on Neural Information Processing (ICONIP) 2012, ed. Julian Szymański and Włodzisław Duch, 180–187. Part IV, LNCS 7666.
Chapter Google Scholar
Teufl, Peter, and Stefan Kraxberger. 2011. Extracting Semantic Knowledge from Twitter. In Electronic Participation, 48–59.
Chapter Google Scholar
Titchener, Edward B. 1922. A Note on Wundts Doctrine of Creative Synthesis. The American Journal of Psychology 33 (3): 351–360.
Article Google Scholar
Touretzky, David. 1986. The Mathematics of Inheritance Systems. London: Pitman Publishing.
Google Scholar
Vasserman, Lucy, Vlad Schogol, and Keith Hall. 2015. Sequence-Based Class Tagging for Robust Transcription in ASR. In Sixteenth Annual Conference of the International Speech Communication Association.
Google Scholar
Whitsitt, Sam. 2005. A Critique of the Concept of Semantic Prosody. International Journal of Corpus Linguistics 10 (3): 283–305.
Article Google Scholar
Wilensky, Robert. 1978. Understanding Goal Based Stories. Yale University Computer Science Research Report, New Haven, CT.
Google Scholar
Wilensky, Robert. 1982. Story Points, Strategies for Natural Language Processing. New York: Erlbaum.
Google Scholar
Wilensky, Robert. 1983. Memory and Inference. In International Joint Conference on Artificial Intelligence (IJCAI), 402–404.
Google Scholar
Wu, Dekai 1989. A Probabilistic Approach to Marker Propagation. In International Joint Conference on Artificial Intelligence (IJCAI), 574–582.
Google Scholar
Wundt, Wilhelm Max. 1862. Beiträge zur Theorie der Sinneswahrnehmung. Leipzig und Heidelberg: Wintersche Verlagsbuchhandlung.
Google Scholar
Xiao, Richard. n.d. Corpus Linguistics: The Basics. Making Statistical Claims (PPT). www.lancaster.ac.uk/fass/projects/corpus/ZJU/xpresentations/session%205.ppt. Last Accessed 10/2017.
Xioa, Richard, and Tony McEnery. 2006. Collocation, Semantic Prosody, and Near Synonymy: A Cross-Linguistic Perspective. Applied Linguistics 27 (1): 103–129.
Article Google Scholar
Yu, Yeong-Ho, and Robert F. Simmons. 1988. Constrained Marker Passing. Artificial Intelligence Laboratory, University of Texas at Austin.
Google Scholar

Download references

Author information

Authors and Affiliations

Department of English Language and Culture, University of Eastern Finland, Joensuu, Finland
Michael Pace-Sigge

Authors

Michael Pace-Sigge
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Michael Pace-Sigge .

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Pace-Sigge, M. (2018). Where Corpus Linguistics and Artificial Intelligence (AI) Meet. In: Spreading Activation, Lexical Priming and the Semantic Web. Palgrave Pivot, Cham. https://doi.org/10.1007/978-3-319-90719-2_3

Download citation

DOI: https://doi.org/10.1007/978-3-319-90719-2_3
Published: 05 June 2018
Publisher Name: Palgrave Pivot, Cham
Print ISBN: 978-3-319-90718-5
Online ISBN: 978-3-319-90719-2
eBook Packages: Social SciencesSocial Sciences (R0)

Publish with us

Policies and ethics