Abstract
Having appropriate utterances in response to user input is an essential element to sustain the flow of conversation in dialogue systems, and a basic and fundamental element for maintaining such conversation coherence is an adjacency pair. To find appropriate candidates for adjacency pairs completion, and thus contribute to avoiding conversational disrupt in casual chatbot systems, we suggest an approach that utilizes human-to-human chat logs, and combines standard Information Retrieval methods and semantic similarity measures based on distributed word representations. The experimental results show the approach improves the quality of utterance pairs compared to standard IR-based methods.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
As described in Sect. 3.
- 2.
CyberAgent, Inc. https://www.cyberagent.co.jp/en/.
- 3.
Apache Lucene is used for the experiments.
- 4.
https://pypi.python.org/pypi/gensim implementation.
- 5.
We don’t consider an extra crowdsourcing step proposed in the paper though.
References
Gandhe, S., Traum, D.: I’ve Said It Before, and I’ll Say It Again: an empirical investigation of the upper bound of the selection approach to dialogue. In: Proceedings of the SIGdial 2010 Conference, Tokyo, Japan, pp. 245–248 (2010)
Schegloff, E.A.: Sequence Organization in Interaction: A Primer in Conversation Analysis I. Cambridge University Press, Cambridge (2007)
Huang, J., Zhou, M., Yang, D.: Extracting chatbot knowledge from online discussion forums. In: Proceedings of the 20th International Joint Conference on Artifical Intelligence, Hyderabad, India, pp. 423–428 (2007)
Wu, Y., Wang, G., Li, W., Li, Z.: Automatic chatbot knowledge acquisition from online forum via rough set and ensemble learning. In: Proceedings of the IFIP International Conference on Network and Parallel Computing, Shanghai, China, pp. 242–246 (2008)
Higashinaka, R., Kobayashi, N., Hirano, T., Miyazaki, C., Meguro, T., Makino, T., Matsuo, Y.: Syntactic filtering and content-based retrieval of Twitter sentences for the generation of system utterances in dialogue systems. In: Proceedings of International Workshop Series on Spoken Dialogue Systems Technology, Napa, USA, pp. 113–123 (2014)
Bessho, F., Harada, T., Kuniyoshi, Y.: Dialog system using real-time crowdsourcing and Twitter large-scale corpus. In: Proceedings of the SIGdial 2012 Conference, Stroudsburg, PA, USA, pp. 227–231 (2012)
Gandhe, S., Traum, D.: Creating spoken dialogue characters from corpora without annotations. In: Proceedings of Interspeech-2007, Antwerp, Belgium, pp. 2201–2204 (2007)
Nio, L., Sakti, S., Neubig, G., Toda, T., Nakamura, S.: Utilizing human-to-human conversation examples for a multi domain chat-oriented dialog system. IEICE Trans. 97–D(6), 1497–1505 (2014)
Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space. In: Proceedings of ICLR Workshop (2013)
Mikolov, T., Sutskever, I., Chen, K., Corrado, G., Dean, J.: Distributed representations of words and phrases and their compositionality. In: Advances in Neural Information Processing Systems, pp. 3111–3119 (2013)
Baroni, M., Dinu, G., Kruszewski, G.: A systematic comparison of context-counting vs. context-predicting semantic vectors. In: Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics, Baltimore, Maryland, USA, pp. 238–247 (2014)
Hearst, M.A.: TextTiling: segmenting text into multi-paragraph subtopic passages. Comput. Linguist. 23(1), 33–64 (1997)
Firth, J.R.: A synopsis of linguistic theory 1930–1955. Studies in Linguistic Analysis, pp. 1–32 (1957)
Blei, D., Ng, A., Jordan, M.: Latent Dirichlet allocation. J. Mach. Learn. Res. 3, 993–1022 (2003)
Le, Q., Mikolov, T.: Distributed representations of sentences and documents. In: Proceedings of the 31st International Conference on Machine Learning (ICML-2014), pp. 1188–1196 (2014)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer International Publishing Switzerland
About this paper
Cite this paper
Shtykh, R.Y., Makita, M. (2016). Adopting Semantic Similarity for Utterance Candidates Discovery from Human-to-Human Dialogue Corpus. In: Quesada, J., MartÃn Mateos, FJ., Lopez-Soto, T. (eds) Future and Emergent Trends in Language Technology. FETLT 2015. Lecture Notes in Computer Science(), vol 9577. Springer, Cham. https://doi.org/10.1007/978-3-319-33500-1_12
Download citation
DOI: https://doi.org/10.1007/978-3-319-33500-1_12
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-33499-8
Online ISBN: 978-3-319-33500-1
eBook Packages: Computer ScienceComputer Science (R0)