Abstract
In recent years, advances in machine learning have led to significant and widespread improvements in how we interact with our world. One of the most portentous of these advances is the field of deep learning. Based on artificial neural networks that resemble those in the human brain, deep learning is a set of methods that permits computers to learn from data without human supervision and intervention. Furthermore, these methods can adapt to changing environments and provide continuous improvement to learned abilities. Today, deep learning is prevalent in our everyday life in the form of Google’s search, Apple’s Siri, and Amazon’s and Netflix’s recommendation engines to name but a few examples. When we interact with our email systems, online chatbots, and voice or image recognition systems deployed at businesses ranging from healthcare to financial services, we see robust applications of deep learning in action.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
http://www.ldc.upenn.edu/Catalog/CatalogEntry.jsp?catalogId=LDC93S1.
- 2.
Theano exists as another popular framework; however, major development has discontinued given the popularity of more recent frameworks. It is therefore not included in this book.
- 3.
Although this is a single, statistically insignificant, data point, the Google Trends mechanism is useful and roughly correlates with other evaluations such as number of contributors, GitHub popularity, number of articles written, and books written for the various frameworks.
- 4.
Code repositories containing specific implementations that do not provide a full framework may be used, but are not included on this list.
- 5.
Additional text tasks and datasets are captured with associated papers at https://nlpprogress.com/.
- 6.
- 7.
- 8.
References
Yaser S. Abu-Mostafa, Malik Magdon-Ismail, and Hsuan-Tien Lin. Learning From Data. AMLBook, 2012. ISBN: 1600490069, 9781600490064.
Alejandro Acero. “Acoustical and environmental robustness in automatic speech recognition”. In: Proc. of ICASSP. 1990.
David H. Ackley, Geoffrey E. Hinton, and Terrence J. Sejnowski. “Neurocomputing: Foundations of Research”. In: ed. by James A. Anderson and Edward Rosenfeld. MIT Press, 1988. Chap. A Learning Algorithm for Boltzmann Machines, pp. 635–649.
Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio. “Neural Machine Translation by Jointly Learning to Align and Translate”. In: CoRR abs/1409.0473 (2014).
Collin F. Baker, Charles J. Fillmore, and John B. Lowe. “The Berkeley FrameNet Project”. In: Proceedings of the 17th International Conference on Computational Linguistics - Volume 1. COLING ’98. Association for Computational Linguistics, 1998, pp. 86–90.
Y. Bengio, P. Simard, and P. Frasconi. “Learning Long-term Dependencies with Gradient Descent is Difficult”. In: Trans. Neur. Netw. 5.2 (Mar. 1994), pp. 157–166.
Yoshua Bengio, Réjean Ducharme, and Pascal Vincent. “A Neural Probabilistic Language Model”. In: Proceedings of the 13th International Conference on Neural Information Processing Systems. Denver, CO: MIT Press, 2000, pp. 893–899.
Yoshua Bengio and Yann Lecun. “Scaling learning algorithms towards AI”. In: Large-scale kernel machines. Ed. by L. Bottou et al. MIT Press, 2007.
Yoshua Bengio et al. “Greedy Layer-wise Training of Deep Networks”. In: Proceedings of the 19th International Conference on Neural Information Processing Systems. NIPS’06. Canada: MIT Press, 2006, pp. 153–160.
Daniel G. Bobrow and Allan Collins, eds. Representation and Understanding: Studies in Cognitive Science. Academic Press, Inc., 1975.
Ted Briscoe et al. “A Formalism and Environment for the Development of a Large Grammar of English”. In: Proceedings of the 10th International Joint Conference on Artificial Intelligence - Volume 2. Morgan Kaufmann Publishers Inc., 1987, pp. 703–708.
Peter F. Brown et al. “Class-based N-gram Models of Natural Language”. In: Comput. Linguist. 18.4 (Dec. 1992), pp. 467–479.
Sabine Buchholz and Erwin Marsi. “CoNLL-X Shared Task on Multilingual Dependency Parsing”. In: Proceedings of the Tenth Conference on Computational Natural Language Learning. Association for Computational Linguistics, 2006, pp. 149–164.
S. Ceccato. “Linguistic Analysis and Programming for Mechanical Translation”. In: Gordon and Breach Science, 1961.
Noam Chomsky. Syntactic Structures. Mouton and Co., 1957.
Jan K Chorowski et al. “Attention-based models for speech recognition”. In: Advances in neural information processing systems. 2015, pp. 577–585.
Ronan Collobert and Jason Weston. “A Unified Architecture for Natural Language Processing: Deep Neural Networks with Multitask Learning”. In: Proceedings of the 25th International Conference on Machine Learning. ACM, 2008, pp. 160–167.
Alexis Conneau and Douwe Kiela. “SentEval: An Evaluation Toolkit for Universal Sentence Representations”. In: arXiv preprint arXiv:1803.05449 (2018).
Alexis Conneau et al. “Supervised Learning of Universal Sentence Representations from Natural Language Inference Data”. In: EMNLP. Association for Computational Linguistics, 2017, pp. 670–680.
Corinna Cortes and Vladimir Vapnik. “Support-Vector Networks”. In: Mach. Learn. 20.3 (Sept. 1995), pp. 273–297.
G. Cybenko. “Approximation by superpositions of a sigmoidal function”. In: Mathematics of Control, Signals, and Systems (MCSS) 2 (1989). URL: http://dx.doi.org/10.1007/BF02551274.
George E Dahl et al. “Context-dependent pre-trained deep neural networks for large-vocabulary speech recognition”. In: IEEE Transactions on audio, speech, and language processing 20.1 (2012), pp. 30–42.
J. Deng et al. “ImageNet: A Large-Scale Hierarchical Image Database”. In: CVPR09. 2009.
Leon Derczynski et al. “Results of the WNUT2017 shared task on novel and emerging entity recognition”. In: Proceedings of the 3rd Workshop on Noisy User-generated Text. 2017, pp. 140–147.
Bhuwan Dhingra, Kathryn Mazaitis, and William W Cohen. “Quasar: Datasets for Question Answering by Search and Reading”. In: arXiv preprint arXiv:1707.03904 (2017).
Matthew Dunn et al. “SearchQA: A new Q&A dataset augmented with context from a search engine”. In: arXiv preprint arXiv:1704.05179 (2017).
Desmond Elliott et al. “Multi30k: Multilingual English-German image descriptions”. In: arXiv preprint arXiv:1605.00459 (2016).
Katja Filippova and Yasemin Altun. “Overcoming the lack of parallel data in sentence compression”. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing. 2013, pp. 1481–1491.
Nicholas V. Findler, ed. Associative Networks: The Representation and Use of Knowledge by Computers. Academic Press, Inc., 1979.ISBN: 0122563808.
K. Fukushima. “Neural network model for a mechanism of pattern recognition unaffected by shift in position - Neocognitron”. In: Trans. IECE J62-A(10) (1979), pp. 658–665.
Yaroslav Ganin et al. “Domain-adversarial Training of Neural Networks”. In: J. Mach. Learn. Res. 17.1 (Jan. 2016), pp. 2096–2030.
John S Garofolo et al. “DARPA TIMIT acoustic-phonetic continuous speech corpus CD-ROM. NIST speech disc 1-1.1”. In: NASA STI/Recon technical report n 93 (1993).
James Glass and Eugene Weinstein. “SPEECHBUILDER: Facilitating spoken dialogue system development”. In: Seventh European Conference on Speech Communication and Technology. 2001.
Xavier Glorot, Antoine Bordes, and Yoshua Bengio. “Deep Sparse Rectifier Neural Networks.” In: AISTATS. Vol. 15. JMLR.org, 2011, pp. 315–323.
John J Godfrey, Edward C Holliman, and Jane McDaniel. “SWITCHBOARD: Telephone speech corpus for research and development”. In: Acoustics, Speech, and Signal Processing, 1992. ICASSP-92., 1992 IEEE International Conference on. Vol. 1. 1992, pp. 517–520.
Yoav Goldberg. “Neural network methods for natural language processing”. In: Synthesis Lectures on Human Language Technologies 10.1 (2017), pp. 1–309.
Ian Goodfellow, Yoshua Bengio, and Aaron Courville. “Deep learning (adaptive computation and machine learning series)”. In: Adaptive Computation and Machine Learning series (2016), p. 800.
Ian J. Goodfellow et al. “Generative Adversarial Nets”. In: Proceedings of the 27th International Conference on Neural Information Processing Systems - Volume 2. NIPS’14. MIT Press, 2014, pp. 2672–2680.
Alex Graves. “Generating Sequences With Recurrent Neural Networks.” In: CoRR abs/1308.0850 (2013).
Alex Graves, Greg Wayne, and Ivo Danihelka. “Neural Turing Machines”. In: CoRR abs/1410.5401 (2014).
Alex Graves et al. “Hybrid computing using a neural network with dynamic external memory”. In: Nature 538.7626 (Oct. 2016), pp. 471–476. ISSN: 00280836.
Edward Grefenstette et al. “Learning to Transduce with Unbounded Memory”. In: Advances in Neural Information Processing Systems 28: Annual Conference on Neural Information Processing Systems 2015, December 7–12, 2015, Montreal, Quebec, Canada. 2015, pp. 1828–1836.
Max Grusky, Mor Naaman, and Yoav Artzi. “NEWSROOM: A Dataset of 1.3 Million Summaries with Diverse Extractive Strategies”. In: Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Association for Computational Linguistics, 2018, pp. 708–719.
Eva Hajicová, Ivana Kruijff-Korbayová, and Petr Sgall. “Prague Dependency Treebank: Restoration of Deletions”. In: Proceedings of the Second International Workshop on Text, Speech and Dialogue. Springer-Verlag, 1999, pp. 44–49.
Trevor Hastie, Robert Tibshirani, and Jerome Friedman. The Elements of Statistical Learning. Springer Series in Statistics. Springer New York Inc., 2001.
Donald O. Hebb. The organization of behavior: A neuropsychological theory. Wiley, 1949.
Mikael Henaff et al. “Tracking the World State with Recurrent Entity Networks”. In: CoRR abs/1612.03969 (2016).
Iris Hendrickx et al. “Semeval-2010 task 8: Multi-way classification of semantic relations between pairs of nominals”. In: Proceedings of the Workshop on Semantic Evaluations: Recent Achievements and Future Directions. Association for Computational Linguistics. 2009, pp. 94–99.
François Hernandez et al. “TED-LIUM 3: twice as much data and corpus repartition for experiments on speaker adaptation”. In: arXiv preprint arXiv:1805.04699 (2018).
G. E. Hinton and R. S. Zemel. “Autoencoders, Minimum Description Length and Helmholtz Free Energy”. In: Advances in Neural Information Processing Systems (NIPS) 6. Ed. by J. D. Cowan, G. Tesauro, and J. Alspector. Morgan Kaufmann, 1994, pp. 3–10.
Geoffrey E. Hinton, Simon Osindero, and Yee-Whye Teh. “A Fast Learning Algorithm for Deep Belief Nets”. In: Neural Comput. 18.7 (July 2006), pp. 1527–1554.
Sepp Hochreiter. “The Vanishing Gradient Problem During Learning Recurrent Neural Nets and Problem Solutions”. In: Int. J. Uncertain. Fuzziness Knowl.-Based Syst. 6.2 (Apr. 1998), pp. 107–116.
Sepp Hochreiter and Jürgen Schmidhuber. “Long Short-Term Memory”. In: Neural Comput. 9.8 (Nov. 1997), pp. 1735–1780.
J. J. Hopfield. “Neural networks and physical systems with emergent collective computational abilities”. In: Proceedings of the National Academy of Sciences of the United States of America 79.8 (Apr. 1982), pp. 2554–2558.
Kurt Hornik. “Approximation Capabilities of Multilayer Feedforward Networks”. In: Neural Netw. 4.2 (Mar. 1991), pp. 251–257.
Eduard Hovy et al. “OntoNotes: The 90% Solution”. In: Proceedings of the Human Language Technology Conference of the NAACL, Companion Volume: Short Papers. NAACL-Short ’06. New York, New York: Association for Computational Linguistics, 2006, pp. 57–60.
Eduard Hovy et al. “OntoNotes: the 90% solution”. In: Proceedings of the human language technology conference of the NAACL, Companion Volume: Short Papers. Association for Computational Linguistics. 2006, pp. 57–60.
W. John Hutchins, Leon Dostert, and Paul Garvin. “The Georgetown- I.B.M. experiment”. In: In. John Wiley And Sons, 1955, pp. 124–135.
William J. Hutchins and Harold L. Somers. An introduction to machine translation. Academic Press, 1992.
Nancy Ide et al. “MASC: the Manually Annotated Sub-Corpus of American English.” In: LREC. European Language Resources Association, June 4, 2010.
Frederick Jelinek, Lalit Bahl, and Robert Mercer. “Design of a linguistic statistical decoder for the recognition of continuous speech”. In: IEEE Transactions on Information Theory 21.3 (1975), pp. 250–256.
Robin Jia and Percy Liang. “Adversarial Examples for Evaluating Reading Comprehension Systems”. In: Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, 2017, pp. 2021–2031.
Karen Sparck Jones. “Natural Language Processing: A Historical Review”. In: Current Issues in Computational Linguistics: In Honour of Don Walker. Springer Netherlands, 1994, pp. 3–16.
Norm Jouppi. “Google supercharges machine learning tasks with TPU custom chip”. In: Google Blog, May 18 (2016).
B. H. Juang and L. R. Rabiner. “Automatic speech recognition - A brief history of the technology development”. In: Elsevier Encyclopedia of Language and Linguistics (2005).
Biing-Hwang Juang and Lawrence R Rabiner. “Automatic speech recognition-a brief history of the technology development”. In: Georgia Institute of Technology. Atlanta Rutgers University and the University of California. Santa Barbara 1 (2005), p. 67.
Daniel Jurafsky. “Speech and language processing: An introduction to natural language processing”. In: Computational linguistics, and speech recognition (2000).
Nal Kalchbrenner, Edward Grefenstette, and Phil Blunsom. “A Convolutional Neural Network for Modelling Sentences”. In: Association for Computational Linguistics, 2014, pp. 655–665.
Suyoun Kim, Takaaki Hori, and Shinji Watanabe. “Joint CTC attention based end-to-end speech recognition using multi-task learning”. In: Acoustics, Speech and Signal Processing (ICASSP), 2017 IEEE International Conference on. IEEE. 2017, pp. 4835–4839.
Yoon Kim. “Convolutional Neural Networks for Sentence Classification”. In: 2014, pp. 1746–1751.
T. Kohonen. “Self-Organized Formation of Topologically Correct Feature Maps”. In: Biological Cybernetics 43.1 (1982), pp. 59–69.
Alex Krizhevsky, Ilya Sutskever, and Geoffrey E. Hinton. “ImageNet Classification with Deep Convolutional Neural Networks”. In: Proceedings of the 25th International Conference on Neural Information Processing Systems - Volume 1. Curran Associates Inc., 2012, pp. 1097–1105.
Ankit Kumar et al. “Ask Me Anything: Dynamic Memory Networks for Natural Language Processing”. In: Proceedings of the 33nd International Conference on Machine Learning, ICML 2016, New York City, NY, USA, June 19–24, 2016. 2016, pp. 1378–1387.
John D. Lafferty, Andrew McCallum, and Fernando C. N. Pereira. “Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data”. In: Proceedings of the Eighteenth International Conference on Machine Learning. Morgan Kaufmann Publishers Inc., 2001, pp. 282–289.
Guillaume Lample et al. “Neural Architectures for Named Entity Recognition.” In: HLT-NAACL. The Association for Computational Linguistics, 2016, pp. 260–270.
Y. LeCun. “Une procédure d’apprentissage pour réseau a seuil asymmetrique (a Learning Scheme for Asymmetric Threshold Networks)”. In: Proceedings of Cognitiva 85. 1985, pp. 599–604.
Y. LeCun et al. “Backpropagation Applied to Handwritten Zip Code Recognition”. In: Neural Computation 1.4 (1989), pp. 541–551.
Yann LeCun and Yoshua Bengio. “Word-level training of a handwritten word recognizer based on convolutional neural networks”. In: 12th IAPR International Conference on Pattern Recognition, Conference B: Pattern Recognition and Neural Networks, ICPR 1994, Jerusalem, Israel, 9–13 October, 1994, Volume 2. 1994, pp. 88–92.
Yann LeCun, Léon Bottou, and Yoshua Bengio. “Reading checks with multilayer graph transformer networks”. In: 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP ’97, Munich, Germany, April 21–24, 1997. 1997, pp. 151–154.
Kai-Fu Lee. “On large-vocabulary speaker-independent continuous speech recognition”. In: Speech communication 7.4 (1988), pp. 375–379.
Long-Ji Lin. “Reinforcement Learning for Robots Using Neural Networks”. UMI Order No. GAX93-22750. PhD thesis. Pittsburgh, PA, USA, 1992.
S. Linnainmaa. “The representation of the cumulative rounding error of an algorithm as a Taylor expansion of the local rounding errors”. MA thesis. Univ. Helsinki, 1970.
Bing Liu et al. “Dialogue Learning with Human Teaching and Feedback in End-to-End Trainable Task-Oriented Dialogue Systems”. In: Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers). Association for Computational Linguistics, 2018, pp. 2060–2069.
Bruce Lowerre and Raj Reddy. “The HARPY speech understanding system”. In: Readings in speech recognition. Elsevier, 1990, pp. 576–586.
Minh-Thang Luong, Richard Socher, and Christopher D Manning. “Better Word Representations with Recursive Neural Networks for Morphology”. In: CoNLL-2013 (2013), p. 104.
C. Macleod, N. Ide, and R. Grishman. “The American National Corpus: Standardized Resources for American English”. In: Proceedings of 2nd Language Resources and Evaluation Conference (LREC). 2002, pp. 831–836.
Inderjeet Mani. Advances in Automatic Text Summarization. Ed. by Mark T. Maybury. MIT Press, 1999.
Christopher D Manning, Christopher D Manning, and Hinrich Schütze. Foundations of statistical natural language processing. MIT press, 1999.
Christopher D. Manning and Hinrich Schütze. Foundations of Statistical Natural Language Processing. MIT Press, 1999.
Mitchell Marcus et al. “The Penn Treebank: Annotating Predicate Argument Structure”. In: Proceedings of the Workshop on Human Language Technology. Association for Computational Linguistics, 1994, pp. 114–119.
Mitchell P Marcus, Mary Ann Marcinkiewicz, and Beatrice Santorini. “Building a large annotated corpus of English: The Penn Treebank”. In: Computational linguistics 19.2 (1993), pp. 313–330.
Margaret Masterman. “Semantic message detection for machine translation using an interlingua”. In: Proceedings of the International Conference on Machine Translation. Her Majesty’s Stationery Office, 1961, pp. 438–475.
Warren S. McCulloch and Walter Pitts. “Neurocomputing: Foundations of Research”. In: MIT Press, 1988. Chap. A Logical Calculus of the Ideas Immanent in Nervous Activity, pp. 15–27.
Brian McFee et al. “librosa: Audio and music signal analysis in python”. In: Proceedings of the 14th python in science conference. 2015, pp. 18–25.
Dirk Merkel. “Docker: lightweight Linux containers for consistent development and deployment”. In: Linux Journal 2014.239 (2014), p. 2.
Tomas Mikolov et al. “Recurrent neural network based language model.” In: INTERSPEECH. Ed. by Takao Kobayashi, Keikichi Hirose, and Satoshi Nakamura. ISCA, 2010, pp. 1045–1048.
Tomas Mikolov et al. “Distributed Representations of Words and Phrases and their Compositionality”. In: Advances in Neural Information Processing Systems 26. Ed. by C. J. C. Burges et al. Curran Associates, Inc., 2013, pp. 3111–3119.
Tomas Mikolov et al. “Efficient Estimation of Word Representations in Vector Space”. In: CoRR abs/1301.3781 (2013).
George A. Miller. “WordNet: A Lexical Database for English”. In: Commun. ACM 38.11 (Nov. 1995), pp. 39–41.
Marvin Minsky and Seymour Papert. Perceptrons: An Introduction to Computational Geometry. Cambridge, MA, USA: MIT Press, 1969.
Mike Mintz et al. “Distant Supervision for Relation Extraction Without Labeled Data”. In: Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP: Volume 2 - Volume 2. ACL ’09. Association for Computational Linguistics, 2009, pp. 1003–1011.
Abdel-rahman Mohamed, George Dahl, and Geoffrey Hinton. “Deep belief networks for phone recognition”. In: Nips workshop on deep learning for speech recognition and related applications. Vol. 1. 9. Vancouver, Canada. 2009, p. 39.
Abdel-rahman Mohamed et al. “Deep Belief Networks using discriminative features for phone recognition”. In: ICASSP. IEEE, 2011, pp. 5060–5063.
Mehryar Mohri, Fernando Pereira, and Michael Riley. “Speech recognition with weighted finite-state transducers”. In: Springer Handbook of Speech Processing. Springer, 2008, pp. 559–584.
Hy Murveit et al. “SRI’s DECIPHER system”. In: Proceedings of the workshop on Speech and Natural Language. Association for Computational Linguistics. 1989, pp. 238–242.
Vinod Nair and Geoffrey E. Hinton. “Rectified Linear Units Improve Restricted Boltzmann Machines”. In: Proceedings of the 27th International Conference on International Conference on Machine Learning. ICML’10. Omnipress, 2010, pp. 807–814.
Ramesh Nallapati et al. “Abstractive text summarization using sequence-to-sequence RNNs and beyond”. In: arXiv preprint arXiv:1602.06023 (2016).
Radford M Neal. “Bayesian learning for neural networks”. PhD thesis. University of Toronto, 1995.
Lance Norskog and Chris Bagwell. “Sox-Sound eXchange”. In: (2018).
Vassil Panayotov et al. “LibriSpeech: an ASR corpus based on public domain audio books”. In: Acoustics, Speech and Signal Processing (ICASSP), 2015 IEEE International Conference on. 2015, pp. 5206–5210.
Bo Pang, Lillian Lee, and Shivakumar Vaithyanathan. “Thumbs Up?: Sentiment Classification Using Machine Learning Techniques”. In: Proceedings of the ACL-02 Conference on Empirical Methods in Natural Language Processing - Volume 10. Association for Computational Linguistics, 2002, pp. 79–86.
Kishore Papineni et al. “BLEU: A Method for Automatic Evaluation of Machine Translation”. In: Proceedings of the 40th Annual Meeting on Association for Computational Linguistics. Association for Computational Linguistics, 2002, pp. 311–318.
D. B. Parker. Learning-Logic. Tech. rep. TR-47. Center for Comp. Research in Economics and Management Sci., MIT, 1985.
Douglas B Paul and Janet M Baker. “The design for the Wall Street Journal-based CSR corpus”. In: Proceedings of the workshop on Speech and Natural Language. 1992, pp. 357–362.
Romain Paulus, Caiming Xiong, and Richard Socher. “A Deep Reinforced Model for Abstractive Summarization”. In: CoRR abs/1705.04304 (2017).
John R. Pierce and John B. Carroll. Language and Machines: Computers in Translation and Linguistics. Washington, DC, USA: National Academy of Sciences/National Research Council, 1966.
Barbara Plank, Anders Søgaard, and Yoav Goldberg. “Multilingual Part-of-Speech Tagging with Bidirectional Long Short-Term Memory Models and Auxiliary Loss”. In: Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers). Association for Computational Linguistics, 2016, pp. 412–418.
Dean A. Pomerleau. “Advances in Neural Information Processing Systems 1”. In: Morgan Kaufmann Publishers Inc., 1989. Chap. ALVINN: An Autonomous Land Vehicle in a Neural Network, pp. 305–313.
Sameer Pradhan et al. “Towards robust linguistic analysis using OntoNotes”. In: Proceedings of the Seventeenth Conference on Computational Natural Language Learning. 2013, pp. 143–152.
R Quillian. A notation for representing conceptual information: an application to semantics and mechanical English paraphrasing. 1963.
Marc’Aurelio Ranzato et al. “Sequence Level Training with Recurrent Neural Networks”. In: CoRR abs/1511.06732 (2015).
Sebastian Riedel, Limin Yao, and Andrew McCallum. “Modeling relations and their mentions without labeled text”. In: Joint European Conference on Machine Learning and Knowledge Discovery in Databases. Springer. 2010, pp. 148–163.
F. Rosenblatt. “The Perceptron: A Probabilistic Model for Information Storage and Organization in The Brain”. In: Psychological Review (1958), pp. 65–386.
Anthony Rousseau, Paul Deléglise, and Yannick Esteve. “TEDLIUM: an Automatic Speech Recognition dedicated corpus.” In: LREC. 2012, pp. 125–129.
David E. Rumelhart, Geoffrey E. Hinton, and Ronald J. Williams. “Neurocomputing: Foundations of Research”. In: ed. by James A. Anderson and Edward Rosenfeld. MIT Press, 1988. Chap. Learning Representations by Back-propagating Errors, pp. 696–699.
Roger C. Schank and Larry Tesler. “A Conceptual Dependency Parser for Natural Language”. In: Proceedings of the 1969 Conference on Computational Linguistics. COLING ’69. Association for Computational Linguistics, 1969, pp. 1–3.
J. Schmidhuber. “Learning Complex, Extended Sequences Using the Principle of History Compression”. In: Neural Computation 4.2 (1992), pp. 234–242.
J. Schmidhuber. Habilitation thesis. 1993.
J. Schmidhuber. “Deep Learning in Neural Networks: An Overview”. In: Neural Networks 61 (2015), pp. 85–117.
Nicol N. Schraudolph, Peter Dayan, and Terrence J. Sejnowski. “Temporal Difference Learning of Position Evaluation in the Game of Go”. In: Advances in Neural Information Processing Systems 6, [7th NIPS Conference, Denver, Colorado, USA, 1993]. 1993, pp. 817–824.
H. Schwenk. “WMT 2014 EN-FR”. In: (2018).
Sainbayar Sukhbaatar et al. “End-To-End Memory Networks”. In: Advances in Neural Information Processing Systems 28: Annual Conference on Neural Information Processing Systems 2015, December 7–12, 2015, Montreal, Quebec, Canada. 2015, pp. 2440–2448.
Ilya Sutskever. “Training recurrent neural networks”. In: Ph.D. Thesis from University of Toronto, Toronto, Ont., Canada (2013).
Ilya Sutskever, Oriol Vinyals, and Quoc V. Le. “Sequence to Sequence Learning with Neural Networks”. In: Proceedings of the 27th International Conference on Neural Information Processing Systems - Volume 2. MIT Press, 2014, pp. 3104–3112.
Shahbaz Syed et al. Dataset for generating TL;DR. Feb. 2018.
Yaniv Taigman et al. “DeepFace: Closing the Gap to Human-Level Performance in Face Verification”. In: CVPR. IEEE Computer Society, 2014, pp. 1701–1708.
Gerald Tesauro. “Temporal Difference Learning and TD-Gammon”. In: Commun. ACM 38.3 (Mar. 1995), pp. 58–68.
Sebastian Thrun. “Learning to Play the Game of Chess”. In: Advances in Neural Information Processing Systems 7, [NIPS Conference, Denver, Colorado, USA, 1994]. 1994, pp. 1069–1076.
Erik F. Tjong Kim Sang and Sabine Buchholz. “Introduction to the CoNLL-2000 Shared Task: Chunking”. In: Proceedings of the 2Nd Workshop on Learning Language in Logic and the 4th Conference on Computational Natural Language Learning - Volume 7. ConLL’00. Association for Computational Linguistics, 2000, pp. 127–132.
Erik F. Tjong Kim Sang and Fien De Meulder. “Introduction to the CoNLL-2003 Shared Task: Language-independent Named Entity Recognition”. In: Proceedings of the Seventh Conference on Natural Language Learning at HLT-NAACL 2003 - Volume 4. Association for Computational Linguistics, 2003, pp. 142–147.
Erik F Tjong Kim Sang and Fien De Meulder. “Introduction to the CoNLL-2003 shared task: Language-independent named entity recognition”. In: Proceedings of the seventh conference on Natural language learning at HLT-NAACL 2003-Volume 4. Association for Computational Linguistics. 2003, pp. 142–147.
Edmondo Trentin and Marco Gori. “A survey of hybrid ANN/HMM models for automatic speech recognition”. In: Neurocomputing 37.1–4 (2001), pp. 91–126.
Adam Trischler et al. “NewsQA: A machine comprehension dataset”. In: arXiv preprint arXiv:1611.09830 (2016).
A. M. Turing. “Computers &Amp; Thought”. In: MIT Press, 1995. Chap. Computing Machinery and Intelligence, pp. 11–35.
Emmanuel Vincent et al. “The 4th CHiME speech separation and recognition challenge”. In: (2016).
Alexander Waibel et al. “Phoneme recognition using time-delay neural networks”. In: Readings in speech recognition. Elsevier, 1990, pp. 393–404.
Xin Wang et al. “No Metrics Are Perfect: Adversarial Reward Learning for Visual Storytelling”. In: Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Association for Computational Linguistics, 2018, pp. 899–909.
Christopher John Cornish Hellaby Watkins. “Learning from Delayed Rewards”. PhD thesis. Cambridge, UK: King’s College, 1989.
P. J. Werbos. “Beyond Regression: New Tools for Prediction and Analysis in the Behavioral Sciences”. PhD thesis. Harvard University, 1974.
Jason Weston, Sumit Chopra, and Antoine Bordes. “Memory Networks”. In: CoRR abs/1410.3916 (2014).
Jason Weston et al. “Towards AI-Complete Question Answering: A Set of Prerequisite Toy Tasks”. In: CoRR abs/1502.05698 (2015).
Bernard Widrow and Marcian E. Hoff. “Adaptive Switching Circuits”. In: 1960 IRE WESCON Convention Record, Part 4. IRE, 1960, pp. 96–104.
Yonghui Wu et al. “Google’s neural machine translation system: Bridging the gap between human and machine translation”. In: arXiv preprint arXiv:1609.08144 (2016).
Dong Yu and Li Deng. Automatic Speech Recognition - A Deep Learning Approach. Springer, 2014.
Dong Yu and Li Deng. Automatic Speech Recognition: A Deep Learning Approach. Springer, 2015.
X. Zhang Z. Chen H. Zhang and L. Zhao. Quora question pairs.
Anna Zdrojewska et al. “Comparison of the Novel Classification Methods on the Reuters-21578 Corpus.” In: MISSI. Vol. 833. Springer, 2018, pp. 290–299.
Author information
Authors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this chapter
Cite this chapter
Kamath, U., Liu, J., Whitaker, J. (2019). Introduction. In: Deep Learning for NLP and Speech Recognition . Springer, Cham. https://doi.org/10.1007/978-3-030-14596-5_1
Download citation
DOI: https://doi.org/10.1007/978-3-030-14596-5_1
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-14595-8
Online ISBN: 978-3-030-14596-5
eBook Packages: Computer ScienceComputer Science (R0)