LARQ: Learning to Ask and Rewrite Questions for Community Question Answering

Zhou, Huiyang; Liu, Haoyan; Yan, Zhao; Cao, Yunbo; Li, Zhoujun

doi:10.1007/978-3-030-60457-8_26

Huiyang Zhou¹²,
Haoyan Liu¹³,
Zhao Yan¹²,
Yunbo Cao¹² &
…
Zhoujun Li¹³

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 12431))

Included in the following conference series:

CCF International Conference on Natural Language Processing and Chinese Computing

1983 Accesses

Abstract

Taking advantage of the rapid growth of community platforms, such as Yahoo Answers, Quora, etc., Community Question Answering (CQA) systems are developed to retrieve semantically equivalent questions when users raise a new query. A typical CQA system mainly consists of two key components, a retrieval model and a ranking model, to search for similar questions and select the most related, respectively. In this paper, we propose LARQ, Learning to Ask and Rewrite Questions, which is a novel sentence-level data augmentation method. Different from common lexical-level data augmentation progresses, we take advantage of the Question Generation (QG) model to obtain more accurate, diverse, and semantically-rich query examples. Since the queries differ greatly in a low-resource code-start scenario, incorporating the QG model as an augmentation to the indexed collection significantly improves the response rate of CQA systems. We incorporate LARQ in an online CQA system and the Bank Question (BQ) Corpus to evaluate the enhancements for both the retrieval process and the ranking model. Extensive experimental results show that the LARQ enhanced model significantly outperforms single BERT and XGBoost models, as well as a widely-used QG model (NQG).

H. Zhou and H. Liu—Equal Contributions.

H. Liu—Work done during an internship at Tencent.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

References

Ahmad, A., Constant, N., Yang, Y., Cer, D.: ReQA: an evaluation for end-to-end answer retrieval models. In: EMNLP 2019 MRQA Workshop (2019). https://doi.org/10.18653/v1/d19-5819
Alberti, C., Andor, D., Pitler, E., Devlin, J., Collins, M.: Synthetic QA corpora generation with roundtrip consistency. In: ACL (2019). https://doi.org/10.18653/v1/p19-1620
Bonadiman, D., Kumar, A., Mittal, A.: Large scale question paraphrase retrieval with smoothed deep metric learning. In: W-NUT Workshop (2019). https://doi.org/10.18653/v1/d19-5509
Borisov, A., Markov, I., de Rijke, M., Serdyukov, P.: A neural click model for web search. In: WWW. ACM (2016). https://doi.org/10.1145/2872427.2883033
Chen, J., Chen, Q., Liu, X., Yang, H., Lu, D., Tang, B.: The BQ corpus: a large-scale domain-specific Chinese corpus for sentence semantic equivalence identification. In: EMNLP (2018). https://doi.org/10.18653/v1/d18-1536
Chen, T., Guestrin, C.: Xgboost: a scalable tree boosting system. In: SIGKDD. ACM (2016). https://doi.org/10.1145/2939672.2939785
Chirita, P.A., Nejdl, W., Paiu, R., Kohlschütter, C.: Using ODP metadata to personalize search. In: SIGIR. ACM (2005). https://doi.org/10.1145/1076034.1076067
Cui, Y., et al.: Pre-training with whole word masking for Chinese BERT. arXiv:1906.08101 (2019)
Dai, Z., Xiong, C., Callan, J., Liu, Z.: Convolutional neural networks for soft-matching N-grams in ad-hoc search. In: WSDM. ACM (2018). https://doi.org/10.1145/3159652.3159659
Dehghani, M., Zamani, H., Severyn, A., Kamps, J., Croft, W.B.: Neural ranking models with weak supervision. In: SIGIR. ACM (2017). https://doi.org/10.1145/3077136.3080832
Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: NAACL-HLT (2019)
Google Scholar
Dietz, L., Verma, M., Radlinski, F., Craswell, N.: TREC complex answer retrieval overview. In: TREC (2017)
Google Scholar
Feng, M., Xiang, B., Glass, M.R., Wang, L., Zhou, B.: Applying deep learning to answer selection: a study and an open task. In: Workshop on ASRU. IEEE (2015). https://doi.org/10.1109/asru.2015.7404872
Gu, J., Lu, Z., Li, H., Li, V.O.: Incorporating copying mechanism in sequence-to-sequence learning. In: ACL (2016). https://doi.org/10.18653/v1/p16-1154
Guo, J., Fan, Y., Ai, Q., Croft, W.B.: A deep relevance matching model for ad-hoc retrieval. In: CIKM. ACM (2016). https://doi.org/10.1145/2983323.2983769
Hawking, D.: Challenges in enterprise search. In: ADC, vol. 4. Citeseer (2004)
Google Scholar
Jing, F., Zhang, Q.: Knowledge-enhanced attentive learning for answer selection in community question answering systems. arXiv:1912.07915 (2019)
Kumar, A., Dandapat, S., Chordia, S.: Translating web search queries into natural language questions. In: LREC (2018)
Google Scholar
Lewis, P., Denoyer, L., Riedel, S.: Unsupervised question answering by cloze translation. In: ACL (2019). https://doi.org/10.18653/v1/p19-1484
Liu, X., et al.: LCQMC: a large-scale Chinese question matching corpus. In: COLING (2018)
Google Scholar
Ma, J., Korotkov, I., Yang, Y., Hall, K., McDonald, R.: Zero-shot neural retrieval via domain-targeted synthetic query generation. arXiv:2004.14503 (2020)
MacAvaney, S., Yates, A., Cohan, A., Goharian, N.: CEDR: contextualized embeddings for document ranking. In: SIGIR. ACM (2019). https://doi.org/10.1145/3331184.3331317
Nguyen, M.T., Phan, V.A., Nguyen, T.S., Nguyen, M.L.: Learning to rank questions for community question answering with ranking SVM. arXiv:1608.04185 (2016)
Rücklé, A., Moosavi, N.S., Gurevych, I.: Neural duplicate question detection without labeled training data. In: EMNLP-IJCNLP (2019). https://doi.org/10.18653/v1/d19-1171
Robertson, S.E., Walker, S., Jones, S., Hancock-Beaulieu, M.M., Gatford, M., et al.: Okapi at TREC-3. NIST Special Publication 109 (1995)
Google Scholar
Sakata, W., Shibata, T., Tanaka, R., Kurohashi, S.: FAQ retrieval using query-question similarity and BERT-based query-answer relevance. In: SIGIR. ACM (2019). https://doi.org/10.1145/3331184.3331326
Sen, B., Gopal, N., Xue, X.: Support-BERT: predicting quality of question-answer pairs in MSDN using deep bidirectional transformer. arXiv:2005.08294 (2020)
Simpson, E., Gao, Y., Gurevych, I.: Interactive text ranking with Bayesian optimisation: a case study on community QA and summarisation. arXiv:1911.10183 (2019)
Tan, M., Santos, C.D., Xiang, B., Zhou, B.: LSTM-based deep learning models for non-factoid answer selection. arXiv:1511.04108 (2015)
Vaswani, A., et al.: Attention is all you need. In: NeurIPS (2017)
Google Scholar
Wong, S.C., Gatt, A., Stamatescu, V., McDonnell, M.D.: Understanding data augmentation for classification: when to warp? In: DICTA. IEEE (2016). https://doi.org/10.1109/dicta.2016.7797091
Wu, Y., et al.: Google’s neural machine translation system: bridging the gap between human and machine translation. arXiv:1609.08144 (2016)
Xue, X., Jeon, J., Croft, W.B.: Retrieval models for question and answer archives. In: SIGIR. ACM (2008). https://doi.org/10.1145/1390334.1390416
Yang, W., Zhang, H., Lin, J.: Simple applications of BERT for Ad Hoc document retrieval. arXiv:1903.10972 (2019)
Zamani, H., Dehghani, M., Croft, W.B., Learned-Miller, E., Kamps, J.: From neural re-ranking to neural ranking: learning a sparse representation for inverted indexing. In: CIKM. ACM (2018). https://doi.org/10.1145/3269206.3271800
Zhou, Q., Yang, N., Wei, F., Tan, C., Bao, H., Zhou, M.: Neural question generation from text: a preliminary study. In: Huang, X., Jiang, J., Zhao, D., Feng, Y., Hong, Yu. (eds.) NLPCC 2017. LNCS (LNAI), vol. 10619, pp. 662–671. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-73618-1_56
Chapter Google Scholar

Download references

Author information

Authors and Affiliations

Tencent Cloud Xiaowei, Beijing, China
Huiyang Zhou, Zhao Yan & Yunbo Cao
State Key Lab of Software Development Environment, Beihang University, Beijing, China
Haoyan Liu & Zhoujun Li

Authors

Huiyang Zhou
View author publications
You can also search for this author in PubMed Google Scholar
Haoyan Liu
View author publications
You can also search for this author in PubMed Google Scholar
Zhao Yan
View author publications
You can also search for this author in PubMed Google Scholar
Yunbo Cao
View author publications
You can also search for this author in PubMed Google Scholar
Zhoujun Li
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Huiyang Zhou .

Editor information

Editors and Affiliations

ECE & Ingenuity Labs Research Institute, Queen’s University, Kingston, ON, Canada
Xiaodan Zhu
Department of Computer Science and Technology, Tsinghua University, Beijing, China
Min Zhang
School of Computer Science and Technology, Soochow University, Suzhou, China
Yu Hong
College of Intelligence and Computing, Tianjin University, Tianjin, China
Ruifang He

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Zhou, H., Liu, H., Yan, Z., Cao, Y., Li, Z. (2020). LARQ: Learning to Ask and Rewrite Questions for Community Question Answering. In: Zhu, X., Zhang, M., Hong, Y., He, R. (eds) Natural Language Processing and Chinese Computing. NLPCC 2020. Lecture Notes in Computer Science(), vol 12431. Springer, Cham. https://doi.org/10.1007/978-3-030-60457-8_26

Download citation

DOI: https://doi.org/10.1007/978-3-030-60457-8_26
Published: 02 October 2020
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-60456-1
Online ISBN: 978-3-030-60457-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Societies and partnerships

the China Computer Federation (CCF) (opens in a new tab)