Abstract
The paper reviews the work done by the research group in the field of text mining for automating term system construction, i.e., term identification and categorization. The automation is achieved via the usage of binary search trees, a multilayer version of the Rosenblatt’s perceptron. The algorithm comprises elements of supervised machine learning with given classes. The paper provides detailed description of the methods and digital resources (online dictionaries, taggers, and corpora) used for manual collection of the database for the subsequent neural network learning. Finally, we present our perspectives on the developed software modification with the transition to deep learning.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Semograf: https://semograph.com/. Accessed 30 July 2018
CLAWS: http://ucrel.lancs.ac.uk/claws/trial.html. Accessed 30 July 2018
LexiCon Research Group: http://lexicon.ugr.es/fbt. Accessed 30 July 2018
Multitran: www.multitran.ru. Accessed 30 July 2018
Linguee: www.linguee.ru. Accessed 30 July 2018
Reverso Context: context.reverso.net. Accessed 30 July 2018
The Longman Dictionary of Contemporary English Online: www.ldoceonline.com. Accessed 30 July 2018
Macmillan Dictionary: www.macmillandictionary.com. Accessed 30 July 2018
Your Dictionary: www.yourdictionary.com. Accessed 30 July 2018
SearchSecurityTechTarget: http://searchsecurity.techtarget.com. Accessed 30 July 2018
Encyclopedia PCMag.com: www.pcmag.com/encyclopedia. Accessed 30 July 2018
Technology Dictionary: www.techopedia.com/dictionary. Accessed 30 July 2018
Computer Security Concepts: http://hitachi-id.com/concepts. Accessed 30 July 2018
Securelist: https://securelist.com/encyclopedia. Accessed 30 July 2018
Computer Hope: www.computerhope.com/jargon.htm. Accessed 30 July 2018
Tech Terms: https://techterms.com/definition/api. Accessed 30 July 2018
About Tech: http://pcsupport.about.com/od/glossaryterms. Accessed 30 July 2018
Panda Security: www.pandasecurity.com/russia/homeusers/security-info/glossary. Accessed 30 July 2018
WordNet: http://wordnetweb.princeton.edu/perl/webwn. Accessed 30 July 2018
British National Corpus: http://www.natcorp.ox.ac.uk/. Accessed 30 July 2018
Cambridge Dictionary: Prefixes from English Grammar Today: https://dictionary.cambridge.org/grammar/british-grammar/word-formation/prefixes. Accessed 30 July 2018
Cambridge Dictionary: Prefixes from English Grammar Today: https://dictionary.cambridge.org/grammar/british-grammar/word-formation/suffixes. Accessed 30 July 2018
Online Etymology Dictionary: https://www.etymonline.com/. Accessed 30 July 2018
Fillmore, Ch.J.: Frame semantics. In: Linguistics in the Morning Calm: Selected Papers from the SICOL-1981, Hanship, Seoul, pp. 111–137 (1982)
Stoyanov, R.: The Hunt for Lurk: how we helped to catch one of the most dangerous gangs of financial cybercriminals. https://securelist.com/analysis/publications/75944/the-hunt-for-lurk/. Accessed 30 July 2018
Cambridge Dictionary: https://dictionary.cambridge.org/dictionary/english/use. Accessed 30 July 2018
ABBYY Lingvo: http://www.lingvo.ru. Accessed 30 July 2018
The Free Dictionary by Farlex: https://www.thefreedictionary.com/infect. Accessed 30 July 2018
Acknowledgments
The reported study was funded by RFBR according to the research project № 18-012-00825 A.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Isaeva, E., Bakhtin, V., Tararkov, A. (2019). Collecting the Database for the Neural Network Deep Learning Implementation. In: Antipova, T., Rocha, A. (eds) Digital Science. DSIC18 2018. Advances in Intelligent Systems and Computing, vol 850. Springer, Cham. https://doi.org/10.1007/978-3-030-02351-5_2
Download citation
DOI: https://doi.org/10.1007/978-3-030-02351-5_2
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-02350-8
Online ISBN: 978-3-030-02351-5
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)