Adopting Semantic Similarity for Utterance Candidates Discovery from Human-to-Human Dialogue Corpus

Shtykh, Roman Y.; Makita, Mitsuharu

doi:10.1007/978-3-319-33500-1_12

Roman Y. Shtykh^16,17 &
Mitsuharu Makita¹⁶

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 9577))

Included in the following conference series:

International Workshop on Future and Emerging Trends in Language Technology

524 Accesses
1 Altmetric

Abstract

Having appropriate utterances in response to user input is an essential element to sustain the flow of conversation in dialogue systems, and a basic and fundamental element for maintaining such conversation coherence is an adjacency pair. To find appropriate candidates for adjacency pairs completion, and thus contribute to avoiding conversational disrupt in casual chatbot systems, we suggest an approach that utilizes human-to-human chat logs, and combines standard Information Retrieval methods and semantic similarity measures based on distributed word representations. The experimental results show the approach improves the quality of utterance pairs compared to standard IR-based methods.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
As described in Sect. 3.
2.
CyberAgent, Inc. https://www.cyberagent.co.jp/en/.
3.
Apache Lucene is used for the experiments.
4.
https://pypi.python.org/pypi/gensim implementation.
5.
We don’t consider an extra crowdsourcing step proposed in the paper though.

References

Gandhe, S., Traum, D.: I’ve Said It Before, and I’ll Say It Again: an empirical investigation of the upper bound of the selection approach to dialogue. In: Proceedings of the SIGdial 2010 Conference, Tokyo, Japan, pp. 245–248 (2010)
Google Scholar
Schegloff, E.A.: Sequence Organization in Interaction: A Primer in Conversation Analysis I. Cambridge University Press, Cambridge (2007)
Book Google Scholar
Huang, J., Zhou, M., Yang, D.: Extracting chatbot knowledge from online discussion forums. In: Proceedings of the 20th International Joint Conference on Artifical Intelligence, Hyderabad, India, pp. 423–428 (2007)
Google Scholar
Wu, Y., Wang, G., Li, W., Li, Z.: Automatic chatbot knowledge acquisition from online forum via rough set and ensemble learning. In: Proceedings of the IFIP International Conference on Network and Parallel Computing, Shanghai, China, pp. 242–246 (2008)
Google Scholar
Higashinaka, R., Kobayashi, N., Hirano, T., Miyazaki, C., Meguro, T., Makino, T., Matsuo, Y.: Syntactic filtering and content-based retrieval of Twitter sentences for the generation of system utterances in dialogue systems. In: Proceedings of International Workshop Series on Spoken Dialogue Systems Technology, Napa, USA, pp. 113–123 (2014)
Google Scholar
Bessho, F., Harada, T., Kuniyoshi, Y.: Dialog system using real-time crowdsourcing and Twitter large-scale corpus. In: Proceedings of the SIGdial 2012 Conference, Stroudsburg, PA, USA, pp. 227–231 (2012)
Google Scholar
Gandhe, S., Traum, D.: Creating spoken dialogue characters from corpora without annotations. In: Proceedings of Interspeech-2007, Antwerp, Belgium, pp. 2201–2204 (2007)
Google Scholar
Nio, L., Sakti, S., Neubig, G., Toda, T., Nakamura, S.: Utilizing human-to-human conversation examples for a multi domain chat-oriented dialog system. IEICE Trans. 97–D(6), 1497–1505 (2014)
Google Scholar
Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space. In: Proceedings of ICLR Workshop (2013)
Google Scholar
Mikolov, T., Sutskever, I., Chen, K., Corrado, G., Dean, J.: Distributed representations of words and phrases and their compositionality. In: Advances in Neural Information Processing Systems, pp. 3111–3119 (2013)
Google Scholar
Baroni, M., Dinu, G., Kruszewski, G.: A systematic comparison of context-counting vs. context-predicting semantic vectors. In: Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics, Baltimore, Maryland, USA, pp. 238–247 (2014)
Google Scholar
Hearst, M.A.: TextTiling: segmenting text into multi-paragraph subtopic passages. Comput. Linguist. 23(1), 33–64 (1997)
Google Scholar
Firth, J.R.: A synopsis of linguistic theory 1930–1955. Studies in Linguistic Analysis, pp. 1–32 (1957)
Google Scholar
Blei, D., Ng, A., Jordan, M.: Latent Dirichlet allocation. J. Mach. Learn. Res. 3, 993–1022 (2003)
MATH Google Scholar
Le, Q., Mikolov, T.: Distributed representations of sentences and documents. In: Proceedings of the 31st International Conference on Machine Learning (ICML-2014), pp. 1188–1196 (2014)
Google Scholar

Download references

Author information

Authors and Affiliations

CyberAgent, Inc., Akihabara Dai Bld., Sotokanda 1-18-13, Chiyoda-ku, Tokyo, 101-8608, Japan
Roman Y. Shtykh & Mitsuharu Makita
Media Research Institute, Waseda University, 2-579-15 Mikajima, Tokorozawa, 359-1192, Japan
Roman Y. Shtykh

Authors

Roman Y. Shtykh
View author publications
You can also search for this author in PubMed Google Scholar
Mitsuharu Makita
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Roman Y. Shtykh .

Editor information

Editors and Affiliations

University of Seville, Seville, Spain
José F. Quesada
University of Seville, Seville, Spain
Francisco-Jesús Martín Mateos
University of Seville, Seville, Spain
Teresa Lopez-Soto

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Shtykh, R.Y., Makita, M. (2016). Adopting Semantic Similarity for Utterance Candidates Discovery from Human-to-Human Dialogue Corpus. In: Quesada, J., Martín Mateos, FJ., Lopez-Soto, T. (eds) Future and Emergent Trends in Language Technology. FETLT 2015. Lecture Notes in Computer Science(), vol 9577. Springer, Cham. https://doi.org/10.1007/978-3-319-33500-1_12

Download citation

DOI: https://doi.org/10.1007/978-3-319-33500-1_12
Published: 26 April 2016
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-33499-8
Online ISBN: 978-3-319-33500-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics