Abstract
Machine learning and deep learning applications are widely used, especially in the field of speech recognition. The authors have combined a number of machine learning algorithms with deep learning to recognize speech for device control, applying to the speech recognition problem to the advising education enrollment robot. As a result, a three-step machine learning model has been built: data preprocessing, speech recognition using neural networks, and answering questions based on recognized keywords. In which, for the data preprocessing step, the authors convert the sound wave into a spectral image. The speech recognition step uses CNN for noise filtering and feature extraction, and uses an LSTM network for keyword recognition. Tests under different conditions such as voice speed and loudness, environments with different noise levels have proven the effectiveness of the proposed model and algorithms.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
References
Albaqshi, H., Sagheer, A.: Dysarthric speech recognition using convolutional recurrent neural networks. Int. J. Intell. Eng. Syst. 13(6), 384–392 (2020)
Han, W., et al.: Improving convolutional neural networks for automatic speech recognition with global context. Interspeech (2020). https://doi.org/10.21437/interspeech.2020-2059
Warden, P., Brain, G.: Speech Commands: A Dataset for Limited-vocabulary Speech Recognition. Mountain View, California (2018)
Nassif, A.B., Shahin, I., Attili, I., Azzeh, M., Shaalan, K.: Speech recognition using deep neural networks: a systematic review. IEEE Access 7, 19143–19165 (2019)
Wu, C., Karanasou, P., Gales, M., Sim, K.C.: Stimulated deep neural network for speech recognition. Interspeech (2016). https://doi.org/10.21437/Interspeech.2016-580
Manaswi, N.K.: Deep Learning with Applications Using Python: Chatbots and Face, Object, and Speech Recognition with TensorFlow and Keras, 1st edn. Apress, Berkeley, CA (2018)
Thomas, F., Christopher, K.: Deep learning with long short-term memory networks for financial market predictions. Eur. J. Oper. Res. 270(2), 654–669 (2018)
Mac, D.H., Tong, V.V., Bui, T.T., Tran, Q.D., Nguyen, L.G.: A method to improve LSTM using statistical features for DGA botnet detection. Res. Dev. Inf. Commun. Technol. E-3(14), 33–42 (2018)
CODE24h. https://code24h.com. Accessed 30 June 2021
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Nguyen, V.T. et al. (2021). Using Machine Learning Algorithms Combined with Deep Learning in Speech Recognition. In: Dang, T.K., Küng, J., Chung, T.M., Takizawa, M. (eds) Future Data and Security Engineering. Big Data, Security and Privacy, Smart City and Industry 4.0 Applications. FDSE 2021. Communications in Computer and Information Science, vol 1500. Springer, Singapore. https://doi.org/10.1007/978-981-16-8062-5_35
Download citation
DOI: https://doi.org/10.1007/978-981-16-8062-5_35
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-16-8061-8
Online ISBN: 978-981-16-8062-5
eBook Packages: Computer ScienceComputer Science (R0)