Skip to main content

Punctuation Prediction in Vietnamese ASRs Using Transformer-Based Models

  • Conference paper
  • First Online:
PRICAI 2021: Trends in Artificial Intelligence (PRICAI 2021)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 13032))

Included in the following conference series:

  • 1357 Accesses

Abstract

Punctuation prediction is the task of predicting and inserting punctuation like periods, commas, exclamation marks, etc. into the appropriate positions in transcribed texts in ASR systems. This helps to improve user readability and the performance of many downstream tasks. While most related studies have been performed for popular languages like English and Chinese, there is very little work done for low-resource languages. In order to stimulate the research on these languages, in this paper, we target to improve the quality of punctuation prediction for Vietnamese ASRs. Specifically, we propose a method based on recent advances on pre-trained language models (LMs) for general purposes such as BERT and ELECTRA. The benefit of using these models is that they can be effectively fine-tuned on this punctuation prediction task where only a small amount of training data is available. To further enhance the performance, a simple yet effective technique to provide more context information in predicting punctuation marks for the very left and right words in each segment is also proposed. The experimental results of the proposed model on public benchmark datasets are quite promising. Overall, the proposed architecture substantially enhanced the prediction performance by a large margin and yielded a new state-of-the-art result on these datasets. Specifically, we achieved the \(F_1\) scores of 71.49% and 80.38% on the Novel and Newspaper public datasets, respectively.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    https://sites.google.com/site/iwsltevaluation2016/.

  2. 2.

    http://ssli.ee.washington.edu/people/leixin/TDT4.html.

  3. 3.

    https://github.com/google-research/bert/blob/master/multilingual.md.

  4. 4.

    https://github.com/fpt-corp/viBERT.

  5. 5.

    https://github.com/fpt-corp/vELECTRA.

  6. 6.

    https://pytorch.org/.

References

  1. Alam, T., Khan, A., Alam, F.: Punctuation restoration using transformer models for high-and low-resource languages. In: Proceedings of the 2020 EMNLP Workshop W-NUT: The Sixth Workshop on Noisy User-Generated Text. Association for Computational Linguistics, pp. 132–142 (2020)

    Google Scholar 

  2. Ballesteros, M., Wanner, L.: A neural network architecture for multilingual punctuation generation. In: Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, Austin, Texas, 1–5 November, pp. 1048–1053 (2016)

    Google Scholar 

  3. Bui, V.T., Tran, O.T., Le, P.H.: Improving sequence tagging for Vietnamese text using transformer-based neural models. In: Proceedings of the 34th Pacific Asia Conference on Language, Information and Computation, pp. 13–20 (2020)

    Google Scholar 

  4. Che, X., Wang, C., Yang, H., Meinel, C.: Punctuation prediction for unsegmented transcript based on word vector. In: The 10th International Conference on Language Resources and Evaluation (LREC), pp. 654–658 (2016)

    Google Scholar 

  5. Cho, E., Niehues, J., Kilgour, K., Waibel, A.: Punctuation insertion for real-time spoken language translation. In: Proceedings of the Eleventh International Workshop on Spoken Language Translation (2015)

    Google Scholar 

  6. Christensen, H., Gotoh, Y., Renals, S.: Punctuation annotation using statistical prosody models. In: ISCA Tutorial and Research Workshop (ITRW) on Prosody in Speech Recognition and Understanding (2001)

    Google Scholar 

  7. Clark, K., Luong, M.T., Le, Q.V., Manning, C.D.: ELECTRA: pretraining text encoders as discriminators rather than generators. In: Proceedings of ICLR (2020)

    Google Scholar 

  8. Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Proceedings of NAACL, Minnesota, USA, pp. 1–16 (2019)

    Google Scholar 

  9. Igras-Cybulska, M., Ziołko, B., Zelasko, P., Witkowski, M.: Structure of pauses in speech in the context of speaker verification and classification of speech type. EURASIP J. Audio Speech Music Process. 2016(1), Article ID. 18 (2016)

    Google Scholar 

  10. Levy, T., Silber-Varod, V., Moyal, A.: The effect of pitch, intensity and pause duration in punctuation detection. In: IEEE 27th Convention of Electrical and Electronics Engineers in Israel (IEEEI), pp. 1–4. IEEE (2012)

    Google Scholar 

  11. Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. In: Proceedings of ICLR (2019)

    Google Scholar 

  12. Lu, W., Ng, H.T.: Better punctuation prediction with dynamic conditional random fields proceedings of the 2010 conference on empirical methods in natural language processing, pp. 177–186. MIT, Massachusetts, USA. Association for Computational Linguistics (2010)

    Google Scholar 

  13. Ngo, X.B., Tu, M.P.: Leveraging user ratings for resource-poor sentiment classification. Procedia Comput. Sci. 60, 322–331 (2015). ISSN: 1877-0509, https://doi.org/10.1016/j.procs.2015.08.134

  14. Nguyen, B., et al.: Fast and accurate capitalization and punctuation for automatic speech recognition using transformer and chunk merging. In: 22nd Conference of the Oriental COCOSDA International Committee for the Co-ordination and Standardisation of Speech Databases and Assessment Techniques (O-COCOSDA), pp. 1–5 (2019)

    Google Scholar 

  15. Pham, T., Nguyen, N., Pham, Q., Cao, H., Nguyen, B.: Vietnamese punctuation prediction using deep neural networks. In: proceedings of the International Conference on Current Trends in Theory and Practice of Informatics: SOFSEM 2020: Theory and Practice of Computer Science, pp. 388–400 (2020)

    Google Scholar 

  16. Schutze, H.: Ambiguity Resolution in Language Learning: Computational and Cognitive Models, 176 p. CSLI Publications, Stanford (1997)

    Google Scholar 

  17. Sennrich, R., Haddow, B., Birch, A.: Neural machine translation of rare words with subword units. In: Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Germany, pp. 1715–1725. Association for Computational Linguistics (2016)

    Google Scholar 

  18. Sproat, R., Jaitly, N.: RNN approaches to text normalization: a challenge. arXiv preprint arXiv:1611.00068 (2016)

  19. Sunkara, M., Ronanki, S., Dixit, K., Bodapati, S., Kirchhoff, K.: Robust prediction of punctuation and truecasing for medical ASR. In: Proceedings of the 1st Workshop on NLP for Medical Conversations, pp. 53–62. Association for Computational Linguistics (2020)

    Google Scholar 

  20. Tilk, O., Alum, T.: Bidirectional recurrent neural network with attention mechanism for punctuation restoration. In: Interspeech, pp. 3047–3051 (2016)

    Google Scholar 

  21. Tran, O.T., Ngo, B.X., Le Nguyen, M., Shimazu, A.: Answering legal questions by mining reference information. In: Nakano, Y., Satoh, K., Bekki, D. (eds.) JSAI-isAI 2013. LNCS (LNAI), vol. 8417, pp. 214–229. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10061-6_15

    Chapter  Google Scholar 

  22. Tran, O.T., Bui, V.T.: A BERT-based hierarchical model for Vietnamese aspect based sentiment analysis. In: 12th International Conference on Knowledge and Systems Engineering (KSE), 2020, pp. 269–274 (2020). https://doi.org/10.1109/KSE50997.2020.9287650

  23. Tran, O.T., Bui, V.T.: Neural text normalization in Speech-to-Text systems with rich features. Appl. Artif. Intell. 35(3), 193–205 (2021)

    Article  Google Scholar 

  24. Ueffing, N., Bisani, M., Vozila, P.: Improved models for automatic punctuation prediction for spoken and written text. In: Interspeech, pp. 3097–3101, Lyon, France (2013)

    Google Scholar 

  25. Vaswani, A., et al.: Attention is all you need. Adv. Neural Inf. Process. Syst. 30, 5998–6008 (2017)

    Google Scholar 

  26. Zhao, Y., Wang, C., Fu, G.: A CRF sequence labeling approach to Chinese punctuation prediction. In: Proceedings of PACLIC, pp. 508–514 (2012)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Oanh Thi Tran .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Bui, V.T., Tran, O.T. (2021). Punctuation Prediction in Vietnamese ASRs Using Transformer-Based Models. In: Pham, D.N., Theeramunkong, T., Governatori, G., Liu, F. (eds) PRICAI 2021: Trends in Artificial Intelligence. PRICAI 2021. Lecture Notes in Computer Science(), vol 13032. Springer, Cham. https://doi.org/10.1007/978-3-030-89363-7_15

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-89363-7_15

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-89362-0

  • Online ISBN: 978-3-030-89363-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics