Skip to main content

Abstract

In recent years, advances in machine learning have led to significant and widespread improvements in how we interact with our world. One of the most portentous of these advances is the field of deep learning. Based on artificial neural networks that resemble those in the human brain, deep learning is a set of methods that permits computers to learn from data without human supervision and intervention. Furthermore, these methods can adapt to changing environments and provide continuous improvement to learned abilities. Today, deep learning is prevalent in our everyday life in the form of Google’s search, Apple’s Siri, and Amazon’s and Netflix’s recommendation engines to name but a few examples. When we interact with our email systems, online chatbots, and voice or image recognition systems deployed at businesses ranging from healthcare to financial services, we see robust applications of deep learning in action.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 69.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 89.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 119.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    http://www.ldc.upenn.edu/Catalog/CatalogEntry.jsp?catalogId=LDC93S1.

  2. 2.

    Theano exists as another popular framework; however, major development has discontinued given the popularity of more recent frameworks. It is therefore not included in this book.

  3. 3.

    Although this is a single, statistically insignificant, data point, the Google Trends mechanism is useful and roughly correlates with other evaluations such as number of contributors, GitHub popularity, number of articles written, and books written for the various frameworks.

  4. 4.

    Code repositories containing specific implementations that do not provide a full framework may be used, but are not included on this list.

  5. 5.

    Additional text tasks and datasets are captured with associated papers at https://nlpprogress.com/.

  6. 6.

    https://www.kaggle.com.

  7. 7.

    https://www.ldc.upenn.edu/.

  8. 8.

    https://nlp.stanford.edu/data/.

References

  1. Yaser S. Abu-Mostafa, Malik Magdon-Ismail, and Hsuan-Tien Lin. Learning From Data. AMLBook, 2012. ISBN: 1600490069, 9781600490064.

    Google Scholar 

  2. Alejandro Acero. “Acoustical and environmental robustness in automatic speech recognition”. In: Proc. of ICASSP. 1990.

    Google Scholar 

  3. David H. Ackley, Geoffrey E. Hinton, and Terrence J. Sejnowski. “Neurocomputing: Foundations of Research”. In: ed. by James A. Anderson and Edward Rosenfeld. MIT Press, 1988. Chap. A Learning Algorithm for Boltzmann Machines, pp. 635–649.

    Google Scholar 

  4. Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio. “Neural Machine Translation by Jointly Learning to Align and Translate”. In: CoRR abs/1409.0473 (2014).

    Google Scholar 

  5. Collin F. Baker, Charles J. Fillmore, and John B. Lowe. “The Berkeley FrameNet Project”. In: Proceedings of the 17th International Conference on Computational Linguistics - Volume 1. COLING ’98. Association for Computational Linguistics, 1998, pp. 86–90.

    Google Scholar 

  6. Y. Bengio, P. Simard, and P. Frasconi. “Learning Long-term Dependencies with Gradient Descent is Difficult”. In: Trans. Neur. Netw. 5.2 (Mar. 1994), pp. 157–166.

    Article  Google Scholar 

  7. Yoshua Bengio, Réjean Ducharme, and Pascal Vincent. “A Neural Probabilistic Language Model”. In: Proceedings of the 13th International Conference on Neural Information Processing Systems. Denver, CO: MIT Press, 2000, pp. 893–899.

    Google Scholar 

  8. Yoshua Bengio and Yann Lecun. “Scaling learning algorithms towards AI”. In: Large-scale kernel machines. Ed. by L. Bottou et al. MIT Press, 2007.

    Google Scholar 

  9. Yoshua Bengio et al. “Greedy Layer-wise Training of Deep Networks”. In: Proceedings of the 19th International Conference on Neural Information Processing Systems. NIPS’06. Canada: MIT Press, 2006, pp. 153–160.

    Google Scholar 

  10. Daniel G. Bobrow and Allan Collins, eds. Representation and Understanding: Studies in Cognitive Science. Academic Press, Inc., 1975.

    Google Scholar 

  11. Ted Briscoe et al. “A Formalism and Environment for the Development of a Large Grammar of English”. In: Proceedings of the 10th International Joint Conference on Artificial Intelligence - Volume 2. Morgan Kaufmann Publishers Inc., 1987, pp. 703–708.

    Google Scholar 

  12. Peter F. Brown et al. “Class-based N-gram Models of Natural Language”. In: Comput. Linguist. 18.4 (Dec. 1992), pp. 467–479.

    Google Scholar 

  13. Sabine Buchholz and Erwin Marsi. “CoNLL-X Shared Task on Multilingual Dependency Parsing”. In: Proceedings of the Tenth Conference on Computational Natural Language Learning. Association for Computational Linguistics, 2006, pp. 149–164.

    Google Scholar 

  14. S. Ceccato. “Linguistic Analysis and Programming for Mechanical Translation”. In: Gordon and Breach Science, 1961.

    Google Scholar 

  15. Noam Chomsky. Syntactic Structures. Mouton and Co., 1957.

    MATH  Google Scholar 

  16. Jan K Chorowski et al. “Attention-based models for speech recognition”. In: Advances in neural information processing systems. 2015, pp. 577–585.

    Google Scholar 

  17. Ronan Collobert and Jason Weston. “A Unified Architecture for Natural Language Processing: Deep Neural Networks with Multitask Learning”. In: Proceedings of the 25th International Conference on Machine Learning. ACM, 2008, pp. 160–167.

    Google Scholar 

  18. Alexis Conneau and Douwe Kiela. “SentEval: An Evaluation Toolkit for Universal Sentence Representations”. In: arXiv preprint arXiv:1803.05449 (2018).

    Google Scholar 

  19. Alexis Conneau et al. “Supervised Learning of Universal Sentence Representations from Natural Language Inference Data”. In: EMNLP. Association for Computational Linguistics, 2017, pp. 670–680.

    Google Scholar 

  20. Corinna Cortes and Vladimir Vapnik. “Support-Vector Networks”. In: Mach. Learn. 20.3 (Sept. 1995), pp. 273–297.

    Google Scholar 

  21. G. Cybenko. “Approximation by superpositions of a sigmoidal function”. In: Mathematics of Control, Signals, and Systems (MCSS) 2 (1989). URL: http://dx.doi.org/10.1007/BF02551274.

    Article  MathSciNet  Google Scholar 

  22. George E Dahl et al. “Context-dependent pre-trained deep neural networks for large-vocabulary speech recognition”. In: IEEE Transactions on audio, speech, and language processing 20.1 (2012), pp. 30–42.

    Article  Google Scholar 

  23. J. Deng et al. “ImageNet: A Large-Scale Hierarchical Image Database”. In: CVPR09. 2009.

    Google Scholar 

  24. Leon Derczynski et al. “Results of the WNUT2017 shared task on novel and emerging entity recognition”. In: Proceedings of the 3rd Workshop on Noisy User-generated Text. 2017, pp. 140–147.

    Google Scholar 

  25. Bhuwan Dhingra, Kathryn Mazaitis, and William W Cohen. “Quasar: Datasets for Question Answering by Search and Reading”. In: arXiv preprint arXiv:1707.03904 (2017).

    Google Scholar 

  26. Matthew Dunn et al. “SearchQA: A new Q&A dataset augmented with context from a search engine”. In: arXiv preprint arXiv:1704.05179 (2017).

    Google Scholar 

  27. Desmond Elliott et al. “Multi30k: Multilingual English-German image descriptions”. In: arXiv preprint arXiv:1605.00459 (2016).

    Google Scholar 

  28. Katja Filippova and Yasemin Altun. “Overcoming the lack of parallel data in sentence compression”. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing. 2013, pp. 1481–1491.

    Google Scholar 

  29. Nicholas V. Findler, ed. Associative Networks: The Representation and Use of Knowledge by Computers. Academic Press, Inc., 1979.ISBN: 0122563808.

    Google Scholar 

  30. K. Fukushima. “Neural network model for a mechanism of pattern recognition unaffected by shift in position - Neocognitron”. In: Trans. IECE J62-A(10) (1979), pp. 658–665.

    Google Scholar 

  31. Yaroslav Ganin et al. “Domain-adversarial Training of Neural Networks”. In: J. Mach. Learn. Res. 17.1 (Jan. 2016), pp. 2096–2030.

    Google Scholar 

  32. John S Garofolo et al. “DARPA TIMIT acoustic-phonetic continuous speech corpus CD-ROM. NIST speech disc 1-1.1”. In: NASA STI/Recon technical report n 93 (1993).

    Google Scholar 

  33. James Glass and Eugene Weinstein. “SPEECHBUILDER: Facilitating spoken dialogue system development”. In: Seventh European Conference on Speech Communication and Technology. 2001.

    Google Scholar 

  34. Xavier Glorot, Antoine Bordes, and Yoshua Bengio. “Deep Sparse Rectifier Neural Networks.” In: AISTATS. Vol. 15. JMLR.org, 2011, pp. 315–323.

    Google Scholar 

  35. John J Godfrey, Edward C Holliman, and Jane McDaniel. “SWITCHBOARD: Telephone speech corpus for research and development”. In: Acoustics, Speech, and Signal Processing, 1992. ICASSP-92., 1992 IEEE International Conference on. Vol. 1. 1992, pp. 517–520.

    Google Scholar 

  36. Yoav Goldberg. “Neural network methods for natural language processing”. In: Synthesis Lectures on Human Language Technologies 10.1 (2017), pp. 1–309.

    Article  Google Scholar 

  37. Ian Goodfellow, Yoshua Bengio, and Aaron Courville. “Deep learning (adaptive computation and machine learning series)”. In: Adaptive Computation and Machine Learning series (2016), p. 800.

    Google Scholar 

  38. Ian J. Goodfellow et al. “Generative Adversarial Nets”. In: Proceedings of the 27th International Conference on Neural Information Processing Systems - Volume 2. NIPS’14. MIT Press, 2014, pp. 2672–2680.

    Google Scholar 

  39. Alex Graves. “Generating Sequences With Recurrent Neural Networks.” In: CoRR abs/1308.0850 (2013).

    Google Scholar 

  40. Alex Graves, Greg Wayne, and Ivo Danihelka. “Neural Turing Machines”. In: CoRR abs/1410.5401 (2014).

    Google Scholar 

  41. Alex Graves et al. “Hybrid computing using a neural network with dynamic external memory”. In: Nature 538.7626 (Oct. 2016), pp. 471–476. ISSN: 00280836.

    Google Scholar 

  42. Edward Grefenstette et al. “Learning to Transduce with Unbounded Memory”. In: Advances in Neural Information Processing Systems 28: Annual Conference on Neural Information Processing Systems 2015, December 7–12, 2015, Montreal, Quebec, Canada. 2015, pp. 1828–1836.

    Google Scholar 

  43. Max Grusky, Mor Naaman, and Yoav Artzi. “NEWSROOM: A Dataset of 1.3 Million Summaries with Diverse Extractive Strategies”. In: Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Association for Computational Linguistics, 2018, pp. 708–719.

    Google Scholar 

  44. Eva Hajicová, Ivana Kruijff-Korbayová, and Petr Sgall. “Prague Dependency Treebank: Restoration of Deletions”. In: Proceedings of the Second International Workshop on Text, Speech and Dialogue. Springer-Verlag, 1999, pp. 44–49.

    Google Scholar 

  45. Trevor Hastie, Robert Tibshirani, and Jerome Friedman. The Elements of Statistical Learning. Springer Series in Statistics. Springer New York Inc., 2001.

    Book  Google Scholar 

  46. Donald O. Hebb. The organization of behavior: A neuropsychological theory. Wiley, 1949.

    Google Scholar 

  47. Mikael Henaff et al. “Tracking the World State with Recurrent Entity Networks”. In: CoRR abs/1612.03969 (2016).

    Google Scholar 

  48. Iris Hendrickx et al. “Semeval-2010 task 8: Multi-way classification of semantic relations between pairs of nominals”. In: Proceedings of the Workshop on Semantic Evaluations: Recent Achievements and Future Directions. Association for Computational Linguistics. 2009, pp. 94–99.

    Google Scholar 

  49. François Hernandez et al. “TED-LIUM 3: twice as much data and corpus repartition for experiments on speaker adaptation”. In: arXiv preprint arXiv:1805.04699 (2018).

    Google Scholar 

  50. G. E. Hinton and R. S. Zemel. “Autoencoders, Minimum Description Length and Helmholtz Free Energy”. In: Advances in Neural Information Processing Systems (NIPS) 6. Ed. by J. D. Cowan, G. Tesauro, and J. Alspector. Morgan Kaufmann, 1994, pp. 3–10.

    Google Scholar 

  51. Geoffrey E. Hinton, Simon Osindero, and Yee-Whye Teh. “A Fast Learning Algorithm for Deep Belief Nets”. In: Neural Comput. 18.7 (July 2006), pp. 1527–1554.

    Google Scholar 

  52. Sepp Hochreiter. “The Vanishing Gradient Problem During Learning Recurrent Neural Nets and Problem Solutions”. In: Int. J. Uncertain. Fuzziness Knowl.-Based Syst. 6.2 (Apr. 1998), pp. 107–116.

    Google Scholar 

  53. Sepp Hochreiter and Jürgen Schmidhuber. “Long Short-Term Memory”. In: Neural Comput. 9.8 (Nov. 1997), pp. 1735–1780.

    Google Scholar 

  54. J. J. Hopfield. “Neural networks and physical systems with emergent collective computational abilities”. In: Proceedings of the National Academy of Sciences of the United States of America 79.8 (Apr. 1982), pp. 2554–2558.

    Article  MathSciNet  Google Scholar 

  55. Kurt Hornik. “Approximation Capabilities of Multilayer Feedforward Networks”. In: Neural Netw. 4.2 (Mar. 1991), pp. 251–257.

    Google Scholar 

  56. Eduard Hovy et al. “OntoNotes: The 90% Solution”. In: Proceedings of the Human Language Technology Conference of the NAACL, Companion Volume: Short Papers. NAACL-Short ’06. New York, New York: Association for Computational Linguistics, 2006, pp. 57–60.

    Google Scholar 

  57. Eduard Hovy et al. “OntoNotes: the 90% solution”. In: Proceedings of the human language technology conference of the NAACL, Companion Volume: Short Papers. Association for Computational Linguistics. 2006, pp. 57–60.

    Google Scholar 

  58. W. John Hutchins, Leon Dostert, and Paul Garvin. “The Georgetown- I.B.M. experiment”. In: In. John Wiley And Sons, 1955, pp. 124–135.

    Google Scholar 

  59. William J. Hutchins and Harold L. Somers. An introduction to machine translation. Academic Press, 1992.

    MATH  Google Scholar 

  60. Nancy Ide et al. “MASC: the Manually Annotated Sub-Corpus of American English.” In: LREC. European Language Resources Association, June 4, 2010.

    Google Scholar 

  61. Frederick Jelinek, Lalit Bahl, and Robert Mercer. “Design of a linguistic statistical decoder for the recognition of continuous speech”. In: IEEE Transactions on Information Theory 21.3 (1975), pp. 250–256.

    Google Scholar 

  62. Robin Jia and Percy Liang. “Adversarial Examples for Evaluating Reading Comprehension Systems”. In: Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, 2017, pp. 2021–2031.

    Google Scholar 

  63. Karen Sparck Jones. “Natural Language Processing: A Historical Review”. In: Current Issues in Computational Linguistics: In Honour of Don Walker. Springer Netherlands, 1994, pp. 3–16.

    Google Scholar 

  64. Norm Jouppi. “Google supercharges machine learning tasks with TPU custom chip”. In: Google Blog, May 18 (2016).

    Google Scholar 

  65. B. H. Juang and L. R. Rabiner. “Automatic speech recognition - A brief history of the technology development”. In: Elsevier Encyclopedia of Language and Linguistics (2005).

    Google Scholar 

  66. Biing-Hwang Juang and Lawrence R Rabiner. “Automatic speech recognition-a brief history of the technology development”. In: Georgia Institute of Technology. Atlanta Rutgers University and the University of California. Santa Barbara 1 (2005), p. 67.

    Google Scholar 

  67. Daniel Jurafsky. “Speech and language processing: An introduction to natural language processing”. In: Computational linguistics, and speech recognition (2000).

    Google Scholar 

  68. Nal Kalchbrenner, Edward Grefenstette, and Phil Blunsom. “A Convolutional Neural Network for Modelling Sentences”. In: Association for Computational Linguistics, 2014, pp. 655–665.

    Google Scholar 

  69. Suyoun Kim, Takaaki Hori, and Shinji Watanabe. “Joint CTC attention based end-to-end speech recognition using multi-task learning”. In: Acoustics, Speech and Signal Processing (ICASSP), 2017 IEEE International Conference on. IEEE. 2017, pp. 4835–4839.

    Google Scholar 

  70. Yoon Kim. “Convolutional Neural Networks for Sentence Classification”. In: 2014, pp. 1746–1751.

    Google Scholar 

  71. T. Kohonen. “Self-Organized Formation of Topologically Correct Feature Maps”. In: Biological Cybernetics 43.1 (1982), pp. 59–69.

    Article  MathSciNet  Google Scholar 

  72. Alex Krizhevsky, Ilya Sutskever, and Geoffrey E. Hinton. “ImageNet Classification with Deep Convolutional Neural Networks”. In: Proceedings of the 25th International Conference on Neural Information Processing Systems - Volume 1. Curran Associates Inc., 2012, pp. 1097–1105.

    Google Scholar 

  73. Ankit Kumar et al. “Ask Me Anything: Dynamic Memory Networks for Natural Language Processing”. In: Proceedings of the 33nd International Conference on Machine Learning, ICML 2016, New York City, NY, USA, June 19–24, 2016. 2016, pp. 1378–1387.

    Google Scholar 

  74. John D. Lafferty, Andrew McCallum, and Fernando C. N. Pereira. “Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data”. In: Proceedings of the Eighteenth International Conference on Machine Learning. Morgan Kaufmann Publishers Inc., 2001, pp. 282–289.

    Google Scholar 

  75. Guillaume Lample et al. “Neural Architectures for Named Entity Recognition.” In: HLT-NAACL. The Association for Computational Linguistics, 2016, pp. 260–270.

    Google Scholar 

  76. Y. LeCun. “Une procédure d’apprentissage pour réseau a seuil asymmetrique (a Learning Scheme for Asymmetric Threshold Networks)”. In: Proceedings of Cognitiva 85. 1985, pp. 599–604.

    Google Scholar 

  77. Y. LeCun et al. “Backpropagation Applied to Handwritten Zip Code Recognition”. In: Neural Computation 1.4 (1989), pp. 541–551.

    Article  Google Scholar 

  78. Yann LeCun and Yoshua Bengio. “Word-level training of a handwritten word recognizer based on convolutional neural networks”. In: 12th IAPR International Conference on Pattern Recognition, Conference B: Pattern Recognition and Neural Networks, ICPR 1994, Jerusalem, Israel, 9–13 October, 1994, Volume 2. 1994, pp. 88–92.

    Google Scholar 

  79. Yann LeCun, Léon Bottou, and Yoshua Bengio. “Reading checks with multilayer graph transformer networks”. In: 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP ’97, Munich, Germany, April 21–24, 1997. 1997, pp. 151–154.

    Google Scholar 

  80. Kai-Fu Lee. “On large-vocabulary speaker-independent continuous speech recognition”. In: Speech communication 7.4 (1988), pp. 375–379.

    Article  Google Scholar 

  81. Long-Ji Lin. “Reinforcement Learning for Robots Using Neural Networks”. UMI Order No. GAX93-22750. PhD thesis. Pittsburgh, PA, USA, 1992.

    Google Scholar 

  82. S. Linnainmaa. “The representation of the cumulative rounding error of an algorithm as a Taylor expansion of the local rounding errors”. MA thesis. Univ. Helsinki, 1970.

    Google Scholar 

  83. Bing Liu et al. “Dialogue Learning with Human Teaching and Feedback in End-to-End Trainable Task-Oriented Dialogue Systems”. In: Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers). Association for Computational Linguistics, 2018, pp. 2060–2069.

    Google Scholar 

  84. Bruce Lowerre and Raj Reddy. “The HARPY speech understanding system”. In: Readings in speech recognition. Elsevier, 1990, pp. 576–586.

    Google Scholar 

  85. Minh-Thang Luong, Richard Socher, and Christopher D Manning. “Better Word Representations with Recursive Neural Networks for Morphology”. In: CoNLL-2013 (2013), p. 104.

    Google Scholar 

  86. C. Macleod, N. Ide, and R. Grishman. “The American National Corpus: Standardized Resources for American English”. In: Proceedings of 2nd Language Resources and Evaluation Conference (LREC). 2002, pp. 831–836.

    Google Scholar 

  87. Inderjeet Mani. Advances in Automatic Text Summarization. Ed. by Mark T. Maybury. MIT Press, 1999.

    Google Scholar 

  88. Christopher D Manning, Christopher D Manning, and Hinrich Schütze. Foundations of statistical natural language processing. MIT press, 1999.

    Google Scholar 

  89. Christopher D. Manning and Hinrich Schütze. Foundations of Statistical Natural Language Processing. MIT Press, 1999.

    MATH  Google Scholar 

  90. Mitchell Marcus et al. “The Penn Treebank: Annotating Predicate Argument Structure”. In: Proceedings of the Workshop on Human Language Technology. Association for Computational Linguistics, 1994, pp. 114–119.

    Google Scholar 

  91. Mitchell P Marcus, Mary Ann Marcinkiewicz, and Beatrice Santorini. “Building a large annotated corpus of English: The Penn Treebank”. In: Computational linguistics 19.2 (1993), pp. 313–330.

    Google Scholar 

  92. Margaret Masterman. “Semantic message detection for machine translation using an interlingua”. In: Proceedings of the International Conference on Machine Translation. Her Majesty’s Stationery Office, 1961, pp. 438–475.

    Google Scholar 

  93. Warren S. McCulloch and Walter Pitts. “Neurocomputing: Foundations of Research”. In: MIT Press, 1988. Chap. A Logical Calculus of the Ideas Immanent in Nervous Activity, pp. 15–27.

    Google Scholar 

  94. Brian McFee et al. “librosa: Audio and music signal analysis in python”. In: Proceedings of the 14th python in science conference. 2015, pp. 18–25.

    Google Scholar 

  95. Dirk Merkel. “Docker: lightweight Linux containers for consistent development and deployment”. In: Linux Journal 2014.239 (2014), p. 2.

    Google Scholar 

  96. Tomas Mikolov et al. “Recurrent neural network based language model.” In: INTERSPEECH. Ed. by Takao Kobayashi, Keikichi Hirose, and Satoshi Nakamura. ISCA, 2010, pp. 1045–1048.

    Google Scholar 

  97. Tomas Mikolov et al. “Distributed Representations of Words and Phrases and their Compositionality”. In: Advances in Neural Information Processing Systems 26. Ed. by C. J. C. Burges et al. Curran Associates, Inc., 2013, pp. 3111–3119.

    Google Scholar 

  98. Tomas Mikolov et al. “Efficient Estimation of Word Representations in Vector Space”. In: CoRR abs/1301.3781 (2013).

    Google Scholar 

  99. George A. Miller. “WordNet: A Lexical Database for English”. In: Commun. ACM 38.11 (Nov. 1995), pp. 39–41.

    Google Scholar 

  100. Marvin Minsky and Seymour Papert. Perceptrons: An Introduction to Computational Geometry. Cambridge, MA, USA: MIT Press, 1969.

    MATH  Google Scholar 

  101. Mike Mintz et al. “Distant Supervision for Relation Extraction Without Labeled Data”. In: Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP: Volume 2 - Volume 2. ACL ’09. Association for Computational Linguistics, 2009, pp. 1003–1011.

    Google Scholar 

  102. Abdel-rahman Mohamed, George Dahl, and Geoffrey Hinton. “Deep belief networks for phone recognition”. In: Nips workshop on deep learning for speech recognition and related applications. Vol. 1. 9. Vancouver, Canada. 2009, p. 39.

    Google Scholar 

  103. Abdel-rahman Mohamed et al. “Deep Belief Networks using discriminative features for phone recognition”. In: ICASSP. IEEE, 2011, pp. 5060–5063.

    Google Scholar 

  104. Mehryar Mohri, Fernando Pereira, and Michael Riley. “Speech recognition with weighted finite-state transducers”. In: Springer Handbook of Speech Processing. Springer, 2008, pp. 559–584.

    Google Scholar 

  105. Hy Murveit et al. “SRI’s DECIPHER system”. In: Proceedings of the workshop on Speech and Natural Language. Association for Computational Linguistics. 1989, pp. 238–242.

    Google Scholar 

  106. Vinod Nair and Geoffrey E. Hinton. “Rectified Linear Units Improve Restricted Boltzmann Machines”. In: Proceedings of the 27th International Conference on International Conference on Machine Learning. ICML’10. Omnipress, 2010, pp. 807–814.

    Google Scholar 

  107. Ramesh Nallapati et al. “Abstractive text summarization using sequence-to-sequence RNNs and beyond”. In: arXiv preprint arXiv:1602.06023 (2016).

    Google Scholar 

  108. Radford M Neal. “Bayesian learning for neural networks”. PhD thesis. University of Toronto, 1995.

    Google Scholar 

  109. Lance Norskog and Chris Bagwell. “Sox-Sound eXchange”. In: (2018).

    Google Scholar 

  110. Vassil Panayotov et al. “LibriSpeech: an ASR corpus based on public domain audio books”. In: Acoustics, Speech and Signal Processing (ICASSP), 2015 IEEE International Conference on. 2015, pp. 5206–5210.

    Google Scholar 

  111. Bo Pang, Lillian Lee, and Shivakumar Vaithyanathan. “Thumbs Up?: Sentiment Classification Using Machine Learning Techniques”. In: Proceedings of the ACL-02 Conference on Empirical Methods in Natural Language Processing - Volume 10. Association for Computational Linguistics, 2002, pp. 79–86.

    Google Scholar 

  112. Kishore Papineni et al. “BLEU: A Method for Automatic Evaluation of Machine Translation”. In: Proceedings of the 40th Annual Meeting on Association for Computational Linguistics. Association for Computational Linguistics, 2002, pp. 311–318.

    Google Scholar 

  113. D. B. Parker. Learning-Logic. Tech. rep. TR-47. Center for Comp. Research in Economics and Management Sci., MIT, 1985.

    Google Scholar 

  114. Douglas B Paul and Janet M Baker. “The design for the Wall Street Journal-based CSR corpus”. In: Proceedings of the workshop on Speech and Natural Language. 1992, pp. 357–362.

    Google Scholar 

  115. Romain Paulus, Caiming Xiong, and Richard Socher. “A Deep Reinforced Model for Abstractive Summarization”. In: CoRR abs/1705.04304 (2017).

    Google Scholar 

  116. John R. Pierce and John B. Carroll. Language and Machines: Computers in Translation and Linguistics. Washington, DC, USA: National Academy of Sciences/National Research Council, 1966.

    Google Scholar 

  117. Barbara Plank, Anders Søgaard, and Yoav Goldberg. “Multilingual Part-of-Speech Tagging with Bidirectional Long Short-Term Memory Models and Auxiliary Loss”. In: Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers). Association for Computational Linguistics, 2016, pp. 412–418.

    Google Scholar 

  118. Dean A. Pomerleau. “Advances in Neural Information Processing Systems 1”. In: Morgan Kaufmann Publishers Inc., 1989. Chap. ALVINN: An Autonomous Land Vehicle in a Neural Network, pp. 305–313.

    Google Scholar 

  119. Sameer Pradhan et al. “Towards robust linguistic analysis using OntoNotes”. In: Proceedings of the Seventeenth Conference on Computational Natural Language Learning. 2013, pp. 143–152.

    Google Scholar 

  120. R Quillian. A notation for representing conceptual information: an application to semantics and mechanical English paraphrasing. 1963.

    Google Scholar 

  121. Marc’Aurelio Ranzato et al. “Sequence Level Training with Recurrent Neural Networks”. In: CoRR abs/1511.06732 (2015).

    Google Scholar 

  122. Sebastian Riedel, Limin Yao, and Andrew McCallum. “Modeling relations and their mentions without labeled text”. In: Joint European Conference on Machine Learning and Knowledge Discovery in Databases. Springer. 2010, pp. 148–163.

    Google Scholar 

  123. F. Rosenblatt. “The Perceptron: A Probabilistic Model for Information Storage and Organization in The Brain”. In: Psychological Review (1958), pp. 65–386.

    Google Scholar 

  124. Anthony Rousseau, Paul Deléglise, and Yannick Esteve. “TEDLIUM: an Automatic Speech Recognition dedicated corpus.” In: LREC. 2012, pp. 125–129.

    Google Scholar 

  125. David E. Rumelhart, Geoffrey E. Hinton, and Ronald J. Williams. “Neurocomputing: Foundations of Research”. In: ed. by James A. Anderson and Edward Rosenfeld. MIT Press, 1988. Chap. Learning Representations by Back-propagating Errors, pp. 696–699.

    Google Scholar 

  126. Roger C. Schank and Larry Tesler. “A Conceptual Dependency Parser for Natural Language”. In: Proceedings of the 1969 Conference on Computational Linguistics. COLING ’69. Association for Computational Linguistics, 1969, pp. 1–3.

    Google Scholar 

  127. J. Schmidhuber. “Learning Complex, Extended Sequences Using the Principle of History Compression”. In: Neural Computation 4.2 (1992), pp. 234–242.

    Article  Google Scholar 

  128. J. Schmidhuber. Habilitation thesis. 1993.

    Google Scholar 

  129. J. Schmidhuber. “Deep Learning in Neural Networks: An Overview”. In: Neural Networks 61 (2015), pp. 85–117.

    Article  Google Scholar 

  130. Nicol N. Schraudolph, Peter Dayan, and Terrence J. Sejnowski. “Temporal Difference Learning of Position Evaluation in the Game of Go”. In: Advances in Neural Information Processing Systems 6, [7th NIPS Conference, Denver, Colorado, USA, 1993]. 1993, pp. 817–824.

    Google Scholar 

  131. H. Schwenk. “WMT 2014 EN-FR”. In: (2018).

    Google Scholar 

  132. Sainbayar Sukhbaatar et al. “End-To-End Memory Networks”. In: Advances in Neural Information Processing Systems 28: Annual Conference on Neural Information Processing Systems 2015, December 7–12, 2015, Montreal, Quebec, Canada. 2015, pp. 2440–2448.

    Google Scholar 

  133. Ilya Sutskever. “Training recurrent neural networks”. In: Ph.D. Thesis from University of Toronto, Toronto, Ont., Canada (2013).

    Google Scholar 

  134. Ilya Sutskever, Oriol Vinyals, and Quoc V. Le. “Sequence to Sequence Learning with Neural Networks”. In: Proceedings of the 27th International Conference on Neural Information Processing Systems - Volume 2. MIT Press, 2014, pp. 3104–3112.

    Google Scholar 

  135. Shahbaz Syed et al. Dataset for generating TL;DR. Feb. 2018.

    Google Scholar 

  136. Yaniv Taigman et al. “DeepFace: Closing the Gap to Human-Level Performance in Face Verification”. In: CVPR. IEEE Computer Society, 2014, pp. 1701–1708.

    Google Scholar 

  137. Gerald Tesauro. “Temporal Difference Learning and TD-Gammon”. In: Commun. ACM 38.3 (Mar. 1995), pp. 58–68.

    Google Scholar 

  138. Sebastian Thrun. “Learning to Play the Game of Chess”. In: Advances in Neural Information Processing Systems 7, [NIPS Conference, Denver, Colorado, USA, 1994]. 1994, pp. 1069–1076.

    Google Scholar 

  139. Erik F. Tjong Kim Sang and Sabine Buchholz. “Introduction to the CoNLL-2000 Shared Task: Chunking”. In: Proceedings of the 2Nd Workshop on Learning Language in Logic and the 4th Conference on Computational Natural Language Learning - Volume 7. ConLL’00. Association for Computational Linguistics, 2000, pp. 127–132.

    Google Scholar 

  140. Erik F. Tjong Kim Sang and Fien De Meulder. “Introduction to the CoNLL-2003 Shared Task: Language-independent Named Entity Recognition”. In: Proceedings of the Seventh Conference on Natural Language Learning at HLT-NAACL 2003 - Volume 4. Association for Computational Linguistics, 2003, pp. 142–147.

    Google Scholar 

  141. Erik F Tjong Kim Sang and Fien De Meulder. “Introduction to the CoNLL-2003 shared task: Language-independent named entity recognition”. In: Proceedings of the seventh conference on Natural language learning at HLT-NAACL 2003-Volume 4. Association for Computational Linguistics. 2003, pp. 142–147.

    Google Scholar 

  142. Edmondo Trentin and Marco Gori. “A survey of hybrid ANN/HMM models for automatic speech recognition”. In: Neurocomputing 37.1–4 (2001), pp. 91–126.

    Google Scholar 

  143. Adam Trischler et al. “NewsQA: A machine comprehension dataset”. In: arXiv preprint arXiv:1611.09830 (2016).

    Google Scholar 

  144. A. M. Turing. “Computers &Amp; Thought”. In: MIT Press, 1995. Chap. Computing Machinery and Intelligence, pp. 11–35.

    Google Scholar 

  145. Emmanuel Vincent et al. “The 4th CHiME speech separation and recognition challenge”. In: (2016).

    Google Scholar 

  146. Alexander Waibel et al. “Phoneme recognition using time-delay neural networks”. In: Readings in speech recognition. Elsevier, 1990, pp. 393–404.

    Google Scholar 

  147. Xin Wang et al. “No Metrics Are Perfect: Adversarial Reward Learning for Visual Storytelling”. In: Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Association for Computational Linguistics, 2018, pp. 899–909.

    Google Scholar 

  148. Christopher John Cornish Hellaby Watkins. “Learning from Delayed Rewards”. PhD thesis. Cambridge, UK: King’s College, 1989.

    Google Scholar 

  149. P. J. Werbos. “Beyond Regression: New Tools for Prediction and Analysis in the Behavioral Sciences”. PhD thesis. Harvard University, 1974.

    Google Scholar 

  150. Jason Weston, Sumit Chopra, and Antoine Bordes. “Memory Networks”. In: CoRR abs/1410.3916 (2014).

    Google Scholar 

  151. Jason Weston et al. “Towards AI-Complete Question Answering: A Set of Prerequisite Toy Tasks”. In: CoRR abs/1502.05698 (2015).

    Google Scholar 

  152. Bernard Widrow and Marcian E. Hoff. “Adaptive Switching Circuits”. In: 1960 IRE WESCON Convention Record, Part 4. IRE, 1960, pp. 96–104.

    Google Scholar 

  153. Yonghui Wu et al. “Google’s neural machine translation system: Bridging the gap between human and machine translation”. In: arXiv preprint arXiv:1609.08144 (2016).

    Google Scholar 

  154. Dong Yu and Li Deng. Automatic Speech Recognition - A Deep Learning Approach. Springer, 2014.

    MATH  Google Scholar 

  155. Dong Yu and Li Deng. Automatic Speech Recognition: A Deep Learning Approach. Springer, 2015.

    MATH  Google Scholar 

  156. X. Zhang Z. Chen H. Zhang and L. Zhao. Quora question pairs.

    Google Scholar 

  157. Anna Zdrojewska et al. “Comparison of the Novel Classification Methods on the Reuters-21578 Corpus.” In: MISSI. Vol. 833. Springer, 2018, pp. 290–299.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Kamath, U., Liu, J., Whitaker, J. (2019). Introduction. In: Deep Learning for NLP and Speech Recognition . Springer, Cham. https://doi.org/10.1007/978-3-030-14596-5_1

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-14596-5_1

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-14595-8

  • Online ISBN: 978-3-030-14596-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics