Skip to main content

LARQ: Learning to Ask and Rewrite Questions for Community Question Answering

  • Conference paper
  • First Online:
Natural Language Processing and Chinese Computing (NLPCC 2020)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 12431))

  • 1983 Accesses

Abstract

Taking advantage of the rapid growth of community platforms, such as Yahoo Answers, Quora, etc., Community Question Answering (CQA) systems are developed to retrieve semantically equivalent questions when users raise a new query. A typical CQA system mainly consists of two key components, a retrieval model and a ranking model, to search for similar questions and select the most related, respectively. In this paper, we propose LARQ, Learning to Ask and Rewrite Questions, which is a novel sentence-level data augmentation method. Different from common lexical-level data augmentation progresses, we take advantage of the Question Generation (QG) model to obtain more accurate, diverse, and semantically-rich query examples. Since the queries differ greatly in a low-resource code-start scenario, incorporating the QG model as an augmentation to the indexed collection significantly improves the response rate of CQA systems. We incorporate LARQ in an online CQA system and the Bank Question (BQ) Corpus to evaluate the enhancements for both the retrieval process and the ranking model. Extensive experimental results show that the LARQ enhanced model significantly outperforms single BERT and XGBoost models, as well as a widely-used QG model (NQG).

H. Zhou and H. Liu—Equal Contributions.

H. Liu—Work done during an internship at Tencent.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    https://github.com/ymcui/Chinese-BERT-wwm.

  2. 2.

    http://ltp.ai/.

  3. 3.

    https://github.com/hanxiao/bert-as-service.

  4. 4.

    https://ai.baidu.com/broad/introduction.

  5. 5.

    https://github.com/elastic/elasticsearch.

References

  1. Ahmad, A., Constant, N., Yang, Y., Cer, D.: ReQA: an evaluation for end-to-end answer retrieval models. In: EMNLP 2019 MRQA Workshop (2019). https://doi.org/10.18653/v1/d19-5819

  2. Alberti, C., Andor, D., Pitler, E., Devlin, J., Collins, M.: Synthetic QA corpora generation with roundtrip consistency. In: ACL (2019). https://doi.org/10.18653/v1/p19-1620

  3. Bonadiman, D., Kumar, A., Mittal, A.: Large scale question paraphrase retrieval with smoothed deep metric learning. In: W-NUT Workshop (2019). https://doi.org/10.18653/v1/d19-5509

  4. Borisov, A., Markov, I., de Rijke, M., Serdyukov, P.: A neural click model for web search. In: WWW. ACM (2016). https://doi.org/10.1145/2872427.2883033

  5. Chen, J., Chen, Q., Liu, X., Yang, H., Lu, D., Tang, B.: The BQ corpus: a large-scale domain-specific Chinese corpus for sentence semantic equivalence identification. In: EMNLP (2018). https://doi.org/10.18653/v1/d18-1536

  6. Chen, T., Guestrin, C.: Xgboost: a scalable tree boosting system. In: SIGKDD. ACM (2016). https://doi.org/10.1145/2939672.2939785

  7. Chirita, P.A., Nejdl, W., Paiu, R., Kohlschütter, C.: Using ODP metadata to personalize search. In: SIGIR. ACM (2005). https://doi.org/10.1145/1076034.1076067

  8. Cui, Y., et al.: Pre-training with whole word masking for Chinese BERT. arXiv:1906.08101 (2019)

  9. Dai, Z., Xiong, C., Callan, J., Liu, Z.: Convolutional neural networks for soft-matching N-grams in ad-hoc search. In: WSDM. ACM (2018). https://doi.org/10.1145/3159652.3159659

  10. Dehghani, M., Zamani, H., Severyn, A., Kamps, J., Croft, W.B.: Neural ranking models with weak supervision. In: SIGIR. ACM (2017). https://doi.org/10.1145/3077136.3080832

  11. Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: NAACL-HLT (2019)

    Google Scholar 

  12. Dietz, L., Verma, M., Radlinski, F., Craswell, N.: TREC complex answer retrieval overview. In: TREC (2017)

    Google Scholar 

  13. Feng, M., Xiang, B., Glass, M.R., Wang, L., Zhou, B.: Applying deep learning to answer selection: a study and an open task. In: Workshop on ASRU. IEEE (2015). https://doi.org/10.1109/asru.2015.7404872

  14. Gu, J., Lu, Z., Li, H., Li, V.O.: Incorporating copying mechanism in sequence-to-sequence learning. In: ACL (2016). https://doi.org/10.18653/v1/p16-1154

  15. Guo, J., Fan, Y., Ai, Q., Croft, W.B.: A deep relevance matching model for ad-hoc retrieval. In: CIKM. ACM (2016). https://doi.org/10.1145/2983323.2983769

  16. Hawking, D.: Challenges in enterprise search. In: ADC, vol. 4. Citeseer (2004)

    Google Scholar 

  17. Jing, F., Zhang, Q.: Knowledge-enhanced attentive learning for answer selection in community question answering systems. arXiv:1912.07915 (2019)

  18. Kumar, A., Dandapat, S., Chordia, S.: Translating web search queries into natural language questions. In: LREC (2018)

    Google Scholar 

  19. Lewis, P., Denoyer, L., Riedel, S.: Unsupervised question answering by cloze translation. In: ACL (2019). https://doi.org/10.18653/v1/p19-1484

  20. Liu, X., et al.: LCQMC: a large-scale Chinese question matching corpus. In: COLING (2018)

    Google Scholar 

  21. Ma, J., Korotkov, I., Yang, Y., Hall, K., McDonald, R.: Zero-shot neural retrieval via domain-targeted synthetic query generation. arXiv:2004.14503 (2020)

  22. MacAvaney, S., Yates, A., Cohan, A., Goharian, N.: CEDR: contextualized embeddings for document ranking. In: SIGIR. ACM (2019). https://doi.org/10.1145/3331184.3331317

  23. Nguyen, M.T., Phan, V.A., Nguyen, T.S., Nguyen, M.L.: Learning to rank questions for community question answering with ranking SVM. arXiv:1608.04185 (2016)

  24. Rücklé, A., Moosavi, N.S., Gurevych, I.: Neural duplicate question detection without labeled training data. In: EMNLP-IJCNLP (2019). https://doi.org/10.18653/v1/d19-1171

  25. Robertson, S.E., Walker, S., Jones, S., Hancock-Beaulieu, M.M., Gatford, M., et al.: Okapi at TREC-3. NIST Special Publication 109 (1995)

    Google Scholar 

  26. Sakata, W., Shibata, T., Tanaka, R., Kurohashi, S.: FAQ retrieval using query-question similarity and BERT-based query-answer relevance. In: SIGIR. ACM (2019). https://doi.org/10.1145/3331184.3331326

  27. Sen, B., Gopal, N., Xue, X.: Support-BERT: predicting quality of question-answer pairs in MSDN using deep bidirectional transformer. arXiv:2005.08294 (2020)

  28. Simpson, E., Gao, Y., Gurevych, I.: Interactive text ranking with Bayesian optimisation: a case study on community QA and summarisation. arXiv:1911.10183 (2019)

  29. Tan, M., Santos, C.D., Xiang, B., Zhou, B.: LSTM-based deep learning models for non-factoid answer selection. arXiv:1511.04108 (2015)

  30. Vaswani, A., et al.: Attention is all you need. In: NeurIPS (2017)

    Google Scholar 

  31. Wong, S.C., Gatt, A., Stamatescu, V., McDonnell, M.D.: Understanding data augmentation for classification: when to warp? In: DICTA. IEEE (2016). https://doi.org/10.1109/dicta.2016.7797091

  32. Wu, Y., et al.: Google’s neural machine translation system: bridging the gap between human and machine translation. arXiv:1609.08144 (2016)

  33. Xue, X., Jeon, J., Croft, W.B.: Retrieval models for question and answer archives. In: SIGIR. ACM (2008). https://doi.org/10.1145/1390334.1390416

  34. Yang, W., Zhang, H., Lin, J.: Simple applications of BERT for Ad Hoc document retrieval. arXiv:1903.10972 (2019)

  35. Zamani, H., Dehghani, M., Croft, W.B., Learned-Miller, E., Kamps, J.: From neural re-ranking to neural ranking: learning a sparse representation for inverted indexing. In: CIKM. ACM (2018). https://doi.org/10.1145/3269206.3271800

  36. Zhou, Q., Yang, N., Wei, F., Tan, C., Bao, H., Zhou, M.: Neural question generation from text: a preliminary study. In: Huang, X., Jiang, J., Zhao, D., Feng, Y., Hong, Yu. (eds.) NLPCC 2017. LNCS (LNAI), vol. 10619, pp. 662–671. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-73618-1_56

    Chapter  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Huiyang Zhou .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Zhou, H., Liu, H., Yan, Z., Cao, Y., Li, Z. (2020). LARQ: Learning to Ask and Rewrite Questions for Community Question Answering. In: Zhu, X., Zhang, M., Hong, Y., He, R. (eds) Natural Language Processing and Chinese Computing. NLPCC 2020. Lecture Notes in Computer Science(), vol 12431. Springer, Cham. https://doi.org/10.1007/978-3-030-60457-8_26

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-60457-8_26

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-60456-1

  • Online ISBN: 978-3-030-60457-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics