A Study on Dialog Act Recognition Using Character-Level Tokenization

Ribeiro, Eugénio; Ribeiro, Ricardo; de Matos, David Martins

doi:10.1007/978-3-319-99344-7_9

Eugénio Ribeiro^16,17,
Ricardo Ribeiro^16,18 &
David Martins de Matos^16,17

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 11089))

Included in the following conference series:

International Conference on Artificial Intelligence: Methodology, Systems, and Applications

943 Accesses
2 Citations

Abstract

Dialog act recognition is an important step for dialog systems since it reveals the intention behind the uttered words. Most approaches on the task use word-level tokenization. In contrast, this paper explores the use of character-level tokenization. This is relevant since there is information at the sub-word level that is related to the function of the words and, thus, their intention. We also explore the use of different context windows around each token, which are able to capture important elements, such as affixes. Furthermore, we assess the importance of punctuation and capitalization. We performed experiments on both the Switchboard Dialog Act Corpus and the DIHANA Corpus. In both cases, the experiments not only show that character-level tokenization leads to better performance than the typical word-level approaches, but also that both approaches are able to capture complementary information. Thus, the best results are achieved by combining tokenization at both levels.

This work was supported by national funds through Fundação para a Ciência e a Tecnologia with reference UID/CEC/50021/2013 and by Universidade de Lisboa.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
https://dumps.wikimedia.org/enwiki/.

References

Abadi, M., et al.: TensorFlow: large-scale machine learning on heterogeneous systems (2015). https://www.tensorflow.org/
Alcácer, N., Benedí, J.M., Blat, F., Granell, R., Martínez, C.D., Torres, F.: Acquisition and labelling of a spontaneous speech dialogue corpus. In: SPECOM, pp. 583–586 (2005)
Google Scholar
Benedí, J.M., Lleida, E., Varona, A., Castro, M.J., Galiano, I., Justo, R., de Letona, I.L., Miguel, A.: Design and acquisition of a telephone spontaneous speech dialogue corpus in Spanish: DIHANA. In: LREC, pp. 1636–1639 (2006)
Google Scholar
Cardellino, C.: Spanish billion words corpus and embeddings (2016). http://crscardellino.me/SBWCE/
Chollet, F., et al.: Keras: the python deep learning library (2015). https://keras.io/
Gambäck, B., Olsson, F., Täckström, O.: Active learning for dialogue act classification. In: INTERSPEECH, pp. 1329–1332 (2011)
Google Scholar
Goldberg, Y.: A primer on neural network models for natural language processing. J. Artif. Intell. Res. 57, 345–420 (2016)
Article MathSciNet Google Scholar
Jaech, A., Mulcaire, G., Hathi, S., Ostendorf, M., Smith, N.A.: Hierarchical character-word models for language identification. In: International Workshop on Natural Language Processing for Social Media, pp. 84–93 (2016)
Google Scholar
Ji, Y., Haffari, G., Eisenstein, J.: A latent variable recurrent neural network for discourse relation language models. In: NAACL-HLT, pp. 332–342 (2016)
Google Scholar
Jurafsky, D., Shriberg, E., Biasca, D.: Switchboard SWBD-DAMSL Shallow-Discourse-Function Annotation Coders Manual. Tech. Rep. Draft 13, University of Colorado, Institute of Cognitive Science (1997)
Google Scholar
Kalchbrenner, N., Blunsom, P.: Recurrent convolutional neural networks for discourse compositionality. In: Workshop on Continuous Vector Space Models and their Compositionality, pp. 119–126 (2013)
Google Scholar
Khanpour, H., Guntakandla, N., Nielsen, R.: Dialogue act classification in domain-independent conversations using a deep recurrent neural network. In: COLING, pp. 2012–2021 (2016)
Google Scholar
Král, P., Cerisara, C.: Dialogue act recognition approaches. Comput. Inform. 29(2), 227–250 (2010)
MATH Google Scholar
Lee, J.Y., Dernoncourt, F.: Sequential short-text classification with recurrent and convolutional neural networks. In: NAACL-HLT, pp. 515–520 (2016)
Google Scholar
Liu, Y., Han, K., Tan, Z., Lei, Y.: Using context information for dialog act classification in DNN framework. In: EMNLP, pp. 2160–2168 (2017)
Google Scholar
Manning, C.D.: Computational linguistics and deep learning. Comput. Linguist. 41(4), 701–707 (2015)
Article MathSciNet Google Scholar
Martínez-Hinarejos, C.D., Benedí, J.M., Granell, R.: Statistical framework for a Spanish spoken dialogue corpus. Speech Commun. 50(11–12), 992–1008 (2008)
Article Google Scholar
Martínez-Hinarejos, C.D., Sanchis, E., García-Granada, F., Aibar, P.: A labelling proposal to annotate dialogues. LREC 5, 1566–1582 (2002)
Google Scholar
Mikolov, T., Karafit, M., Burget, L., Cernock, J., Khudanpur, S.: Recurrent neural network based language model. In: INTERSPEECH, pp. 1045–1048 (2010)
Google Scholar
Mikolov, T., Sutskever, I., Chen, K., Corrado, G.S., Dean, J.: Distributed representations of words and phrases and their compositionality. In: NIPS, pp. 3111–3119 (2013)
Google Scholar
Pennington, J., Socher, R., Manning, C.D.: GloVe: global vectors for word representation. In: EMNLP, pp. 1532–1543 (2014)
Google Scholar
Ribeiro, E., Ribeiro, R., de Matos, D.M.: The influence of context on dialogue act recognition. CoRR abs/1506.00839 (2015). http://arxiv.org/abs/1506.00839
Santos, C.D., Zadrozny, B.: Learning character-level representations for part-of-speech tagging. In: ICML, pp. 1818–1826 (2014)
Google Scholar
Searle, J.R.: Speech Acts: An Essay in the Philosophy of Language. Cambridge University Press, Cambridge, London (1969)
Book Google Scholar
Stolcke, A., Coccaro, N., Bates, R., Taylor, P., Van Ess-Dykema, C., Ries, K., Shriberg, E., Jurafsky, D., Martin, R., Meteer, M.: Dialogue act modeling for automatic tagging and recognition of conversational speech. Comput. Linguist. 26(3), 339–373 (2000)
Article Google Scholar
Tamarit, V., Martínez-Hinarejos, C.D.: Dialog act labeling in the DIHANA corpus using prosody information. In: V Jornadas en Tecnología del Habla, pp. 183–186 (2008)
Google Scholar
Zhang, X., Zhao, J., LeCun, Y.: Character-level convolutional networks for text classification. NIPS 1, 649–657 (2015)
Google Scholar

Download references

Author information

Authors and Affiliations

L²F – Spoken Language Systems Laboratory, INESC-ID, Lisboa, Portugal
Eugénio Ribeiro, Ricardo Ribeiro & David Martins de Matos
Instituto Superior Técnico, Universidade de Lisboa, Lisboa, Portugal
Eugénio Ribeiro & David Martins de Matos
Instituto Universitário de Lisboa (ISCTE-IUL), Lisboa, Portugal
Ricardo Ribeiro

Authors

Eugénio Ribeiro
View author publications
You can also search for this author in PubMed Google Scholar
Ricardo Ribeiro
View author publications
You can also search for this author in PubMed Google Scholar
David Martins de Matos
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Eugénio Ribeiro .

Editor information

Editors and Affiliations

Institute of Information and Communication Technologies, Bulgarian Academy of Sciences, Sofia, Bulgaria
Gennady Agre
Universität des Saarlandes, Saarbrücken, Germany
Josef van Genabith
DFKI GmbH, Saarbrücken, Germany
Thierry Declerck

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Ribeiro, E., Ribeiro, R., de Matos, D.M. (2018). A Study on Dialog Act Recognition Using Character-Level Tokenization. In: Agre, G., van Genabith, J., Declerck, T. (eds) Artificial Intelligence: Methodology, Systems, and Applications. AIMSA 2018. Lecture Notes in Computer Science(), vol 11089. Springer, Cham. https://doi.org/10.1007/978-3-319-99344-7_9

Download citation

DOI: https://doi.org/10.1007/978-3-319-99344-7_9
Published: 29 August 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-99343-0
Online ISBN: 978-3-319-99344-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics