Abstract
Machine reading comprehension (MRC) is a fundamental task of evaluating the natural language understanding ability of model, which requires complicated reasoning about the knowledge involved in the context as well as world knowledge. However, most existing approaches ignore the complicated reasoning process and solve it with a one-step “black box” model and massive data augmentation. Therefore, in this paper, we propose a modular knowledge reasoning approach based on neural network modules that explicitly model each reasoning process step. Five reasoning modules are designed and learned in an end-to-end manner, which leads to a more interpretable model. Experiments using the reasoning over paragraph effects in situations (ROPES) dataset, a challenging dataset that requires reasoning over paragraph effects in a situation, demonstrate the effectiveness and explainability of our proposed approach. Moreover, the transfer of our reasoning modules to the WinoGrande dataset under the zero-shot setting achieved competitive results compared with the data augmented model, proving the generalization capability.
Similar content being viewed by others
References
Adi Y, Kermany E, Belinkov Y, Lavi O, Goldberg Y (2016) Fine-grained analysis of sentence embeddings using auxiliary prediction tasks. arXiv preprint arXiv:1608.04207
AI2: Ropes leaderboard. https://leaderboard.allenai.org/ropes/submissions/get-started. Accessed October 25, 2020
AI2: Winogrande leaderboard. https://leaderboard.allenai.org/winogrande/submissions/get-started. Accessed December 4, 2020
Andreas J, Rohrbach M, Darrell T, Klein D (2016) Neural module networks. In: Proc. IEEE Comput. Soc. Conf. Comput. Vis. Pattern Recognit., pp. 39–48
Banerjee P, Pal KK, Mitra A, Baral C (2019) Careful selection of knowledge to solve open book question answering. In: Proc. 57th Ann. Meet. Assoc. Comput. Linguist., pp. 6120–6129. Association for Computational Linguistics, Florence, Italy. Doi: https://doi.org/10.18653/v1/P19-1615
Belinkov Y, Glass J (2019) Analysis methods in neural language processing: A survey. Trans Assoc Comput Linguist (TACL) 7:49–72. https://doi.org/10.1162/tacl_a_00254
Browne MW (2000) Cross-validation methods. J Math Psychol 44(1):108–132
Burman P (1989) A comparative study of ordinary cross-validation, v-fold cross-validation and the repeated learning-testing methods. Biometrika 76(3):503–514
Cao Q, Li B, Liang X, Wang K, Lin L (2021) Knowledge-routed visual question reasoning: Challenges for deep representation embedding. IEEE Trans Neural Netw Learn Syst. https://doi.org/10.1109/TNNLS.2020.3045034
Carcenac M (2008) A modular neural network applied to image transformation and mental images. Neural Comput Appl 17(5):549–568
Choi E, He H, Iyyer M, Yatskar M, Yih Wt, Choi Y, Liang P, Zettlemoyer L (2018) QuAC: Question answering in context. In: EMNLP 2018, pp. 2174–2184. Association for Computational Linguistics, Brussels, Belgium. Doi: https://doi.org/10.18653/v1/D18-1241
Devlin J, Chang MW, Lee K, Toutanova K (2019) BERT: Pre-training of deep bidirectional transformers for language understanding. In: Proc. 2019 Conf. North Amer. Chap. Assoc. Comput. Linguist. Human Lang. Technol., pp. 4171–4186. Association for Computational Linguistics, Minneapolis, Minnesota. Doi: https://doi.org/10.18653/v1/N19-1423
Dua D, Gottumukkala A, Talmor A, Singh S, Gardner M (2019) Orb: An open reading benchmark for comprehensive evaluation of machine reading comprehension. arXiv preprint arXiv:1912.12598
Dua D, Wang Y, Dasigi P, Stanovsky G, Singh S, Gardner M (2019) DROP: A reading comprehension benchmark requiring discrete reasoning over paragraphs. In: Proc. 2019 Conf. North Amer. Chapt. Assoc. Comput. Linguist. Human Lang. Technol., Volume 1 (Long and Short Papers), pp. 2368–2378. Association for Computational Linguistics, Minneapolis, Minnesota. Doi: https://doi.org/10.18653/v1/N19-1246
Durrani N, Sajjad H, Dalvi F, Belinkov Y (2020) Analyzing individual neurons in pre-trained language models. arXiv preprint arXiv:2010.02695
Elazar Y, Ravfogel S, Jacovi A, Goldberg Y (2021) Amnesic probing: Behavioral explanation with amnesic counterfactuals. Trans Assoc Comput Linguist 9:160–175
Evans JSB (1984) Heuristic and analytic processes in reasoning. Br J Psychol 75(4):451–468
Face H (2020) Transformers. https://github.com/huggingface/transformers. Accessed October 25, 2020
Feng Y, Chen X, Lin BY, Wang P, Yan J, Ren X (2020) Scalable multi-hop relational reasoning for knowledge-aware question answering. In: EMNLP 2020. Association for Computational Linguistics, Online. Doi: https://doi.org/10.18653/v1/2020.emnlp-main.99
Gardner M, Artzi Y, Basmova V, Berant J, Bogin B, Chen S, Dasigi P, Dua D, Elazar Y, Gottumukkala A, Gupta N, Hajishirzi H, Ilharco G, Khashabi D, Lin K, Liu J, Liu NF, Mulcaire P, Ning Q, Singh S, Smith NA, Subramanian S, Tsarfaty R, Wallace E, Zhang A, Zhou B (2020) Evaluating NLP models via contrast sets. arXiv preprint arXiv:2004.027
Geva M, Goldberg Y, Berant J (2019) Are we modeling the task or the annotator? an investigation of annotator bias in natural language understanding datasets. arXiv preprint arXiv:1908.07898
Gupta N, Lewis M (2018) Neural compositional denotational semantics for question answering. In: EMNLP 2018, pp. 2152–2161. Association for Computational Linguistics, Brussels, Belgium. Doi: https://doi.org/10.18653/v1/D18-1239
Gupta N, Lin K, Roth D, Singh S, Gardner M (2019) Neural module networks for reasoning over text. arXiv preprint arXiv:1912.04971
Hao Y, Dong L, Wei F, Xu K (2020) Self-attention attribution: Interpreting information interactions inside transformer. arXiv preprint arXiv:2004.11207
Hu P (2020) A classifier of matrix modular neural network to simplify complex classification tasks. Neural Comput Appl 32(5):1367–1377
Hu R, Andreas J, Rohrbach M, Darrell T, Saenko K (2017) Learning to reason: End-to-end module networks for visual question answering. In: Proc. IEEE Int. Conf. Comput. Vis., pp. 804–813
Huang W, Qu Q, Yang M (2020) Interactive knowledge-enhanced attention network for answer selection. Neural Comput Appl pp. 1–17
Hupkes D, Veldhoen S, Zuidema W (2018) Visualisation and’diagnostic classifiers’ reveal how recurrent and recursive neural networks process hierarchical structure. J Artif Intell Res 61:907–926
Janizek JD, Sturmfels P, Lee SI (2021) Explaining explanations: Axiomatic feature interactions for deep networks. J Mach Learn Res 22(104):1–54
Jhamtani H, Clark P (2020) Learning to explain: Datasets and models for identifying valid reasoning chains in multihop question-answering. arXiv preprint arXiv:2010.03274
Jia R, Liang P (2017) Adversarial examples for evaluating reading comprehension systems. In: EMNLP 2017. Association for Computational Linguistics, Copenhagen, Denmark, pp. 2021–2031. https://doi.org/10.18653/v1/D17-1215
Jiang Y, Bansal M (2019) Self-assembling modular networks for interpretable multi-hop reasoning. In: EMNLP-IJCNLP. Association for Computational Linguistics, Hong Kong, China, pp. 4474–4484. Doi: https://doi.org/10.18653/v1/D19-1455
Jiang Y, Joshi N, Chen YC, Bansal M (2019) Explore, propose, and assemble: An interpretable model for multi-hop reading comprehension. arXiv preprint arXiv:1906.05210
Johnson J, Hariharan B, van der Maaten L, Fei-Fei L, Lawrence Zitnick C, Girshick R (2017) Clevr: A diagnostic dataset for compositional language and elementary visual reasoning. In: Proc. IEEE Comput. Soc. Conf. Comput. Vis. Pattern Recognit., pp. 2901–2910
Khashabi D, Khot T, Sabharwal A, Tafjord O, Clark P, Hajishirzi H (2020) UnifiedQA: Crossing format boundaries with a single qa system. In: Findings Assoc. Computat. Ling. EMNLP 2020. Association for Computational Linguistics, Online, pp. 1896–1907. Doi: https://doi.org/10.18653/v1/2020.findings-emnlp.171
Klein T, Nabi M (2020) Contrastive self-supervised learning for commonsense reasoning. In: Proc. 58th Ann. Meet. Assoc. Comput. Linguist. Association for Computational Linguistics, Online, pp. 7517–7523. https://doi.org/10.18653/v1/2020.acl-main.671
Krishnamurthy J, Dasigi P, Gardner M (2017) Neural semantic parsing with type constraints for semi-structured tables. In: EMNLP 2017, pp. 1516–1526
Lai G, Xie Q, Liu H, Yang Y, Hovy E (2017) RACE: Large-scale ReAding comprehension dataset from examinations. In: EMNLP 2017. Association for Computational Linguistics, Copenhagen, Denmark, pp. 785–794. Doi: https://doi.org/10.18653/v1/D17-1082
Li X, Zang H, Yu X, Wu H, Zhang Z, Liu J, Wang M (2021) On improving knowledge graph facilitated simple question answering system. Neural Comput Appl pp. 1–10
Lin BY, Chen X, Chen J, Ren X (2019) Kagnet: Knowledge-aware graph networks for commonsense reasoning. arXiv preprint arXiv:1909.02151
Lin H, Sun L, Han X (2017) Reasoning with heterogeneous knowledge for commonsense machine comprehension. In: EMNLP 2017. Association for Computational Linguistics, Copenhagen, Denmark, pp. 2032–2043. Doi: https://doi.org/10.18653/v1/D17-1216
Lin K, Tafjord O, Clark P, Gardner M (2019) Reasoning over paragraph effects in situations. In: Proc. 2nd Workshop Machine Reading for Ques. Answering. Association for Computational Linguistics, Hong Kong, China, pp. 58–62. Doi: https://doi.org/10.18653/v1/D19-5808
Liu J, Gardner M (2020) Multi-step inference for reasoning over paragraphs. arXiv preprint arXiv:2004.02995
Liu K, Liu X, Yang A, Liu J, Su J, Li S, She Q (2020) A robust adversarial training approach to machine reading comprehension. In: AAAI, pp. 8392–8400
Liu Y, Ott M, Goyal N, Du J, Joshi M, Chen D, Levy O, Lewis M, Zettlemoyer L, Stoyanov V (2019) RoBERTa: A robustly optimized bert pretraining approach. arXiv preprint arXiv:1907.11692
Lourie N, Le Bras R, Bhagavatula C, Choi Y (2021) Unicorn on rainbow: A universal commonsense reasoning model on a new multitask benchmark. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 35, pp. 13480–13488
Lv S, Guo D, Xu J, Tang D, Duan N, Gong M, Shou L, Jiang D, Cao G, Hu S (2020) Graph-based reasoning over heterogeneous external knowledge for commonsense question answering. In: AAAI, pp. 8449–8456 (2020)
Min S, Zhong V, Zettlemoyer L, Hajishirzi H (2019) Multi-hop reading comprehension through question decomposition and rescoring. arXiv preprint arXiv:1906.02916
Mitra A, Baral C, Bhattacharjee A, Shrivastava I (2019) A generate-validate approach to answering questions about qualitative relationships. ArXiv preprint arXiv:1908.03645
Mokhtari K, Reichard CA (2002) Assessing students’ metacognitive awareness of reading strategies. J Edu Psychol 94(2):249
Murdoch WJ, Liu PJ, Yu B (2018) Beyond word importance: Contextual decomposition to extract interactions from lstms. arXiv preprint arXiv:1801.05453
Petroni F, Rocktäschel T, Riedel S, Lewis P, Bakhtin A, Wu Y, Miller A (2019) Language models as knowledge bases? In: Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pp. 2463–2473. Association for Computational Linguistics, Hong Kong, China. 10.18653/v1/D19-1250
Raffel C, Shazeer N, Roberts A, Lee K, Narang S, Matena M, Zhou Y, Li W, Liu PJ (2020) Exploring the limits of transfer learning with a unified text-to-text transformer. J Mach Learn Res 21(140):1–67
Rajani NF, McCann B, Xiong C, Socher R (2019) Explain yourself! leveraging language models for commonsense reasoning. arXiv preprint arXiv:1906.02361
Rajpurkar P, Jia R, Liang P (2020) Squad 2.0:the stanford question answering dataset. https://rajpurkar.github.io/SQuAD-explorer/. Accessed October 25, 2020
Rajpurkar P, Jia R, Liang P (2018) Know what you don’t know: Unanswerable questions for SQuAD. In: Proc. 56th Ann. Meet. Assoc. Comput. Linguist., pp. 784–789. Association for Computational Linguistics, Melbourne, Australia. Doi: https://doi.org/10.18653/v1/P18-2124
Ran Q, Lin Y, Li P, Zhou J, Liu Z (2019) NumNet: Machine reading comprehension with numerical reasoning. In: EMNLP-IJCNLP. Association for Computational Linguistics, Hong Kong, China, pp. 2474–2484. Doi: https://doi.org/10.18653/v1/D19-1251
Raschka S (2018) Model evaluation, model selection, and algorithm selection in machine learning. arXiv preprint arXiv:1811.12808
Ravichander A, Dalmia S, Ryskina M, Metze F, Hovy E, Black AW (2021) Noiseqa: Challenge set evaluation for user-centric question answering. arXiv preprint arXiv:2102.08345
Ren M, Geng X, Qin T, Huang H, Jiang D (2020) Towards interpretable reasoning over paragraph effects in situation. In: EMNLP 2020. Association for Computational Linguistics, Online, pp. 6745–6758. Doi: https://doi.org/10.18653/v1/2020.emnlp-main.548
Sakaguchi K, Bras R, Bhagavatula C, Yejin C (2020) Winogrande: An adversarial winograd schema challenge at scale. Proc AAAI Conf Artif Intell 34:8732–8740. https://doi.org/10.1609/aaai.v34i05.6399
Sap M, Le Bras R, Allaway E, Bhagavatula C, Lourie N, Rashkin H, Roof B, Smith NA, Choi Y (2019) Atomic: An atlas of machine commonsense for if-then reasoning. In: Proceedings of the AAAI Conference on Artificial Intelligence, 33: 3027–3035
Sheorey R, Mokhtari K (2001) Differences in the metacognitive awareness of reading strategies among native and non-native readers. System 29(4):431–449
Strobelt H, Gehrmann S, Behrisch M, Perer A, Pfister H, Rush AM (2018) S eq 2s eq-v is: A visual debugging tool for sequence-to-sequence models. IEEE Trans Visual Comput Gr 25(1):353–363
Tafjord O, Clark P, Gardner M, Yih Wt, Sabharwal A (2019) Quarel: A dataset and models for answering questions about qualitative relationships. In: Proceedings of the AAAI Conference on Artificial Intelligence, 33: 7063–7071
Talmor A, Herzig J, Lourie N, Berant J (2018) Commonsenseqa: A question answering challenge targeting commonsense knowledge. arXiv preprint arXiv:1811.00937
Wu Z, Peng H, Smith NA (2021) Infusing finetuning with semantic dependencies. Trans Assoc Comput Linguist 9:226–242
Yadav V, Bethard S, Surdeanu M (2020) Unsupervised alignment-based iterative evidence retrieval for multi-hop question answering. In: Proc. 58th Ann. Meet. Assoc. Comput. Linguist. Association for Computational Linguistics, pp. 4514–4525. Doi: https://doi.org/10.18653/v1/2020.acl-main.414
Yang Z, Qi P, Zhang S, Bengio Y, Cohen W, Salakhutdinov R, Manning CD (2018) HotpotQA: A dataset for diverse, explainable multi-hop question answering. In: EMNLP 2018. Association for Computational Linguistics, Brussels, Belgium, pp. 2369–2380. Doi: https://doi.org/10.18653/v1/D18-1259
Zhu Y, Liang X, Lin B, Ye Q, Jiao J, Lin L, Liang X (2020) Configurable graph reasoning for visual relationship detection. IEEE Trans Neural Netw Learn Syst. https://doi.org/10.1109/TNNLS.2020.3027575
Acknowledgements
This work was supported by the National Natural Science Foundation of China (No. 61751201) and the National Key R&D Plan (No. 2016QY03D0602).
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no conflict of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Ren, M., Huang, H. & Gao, Y. Interpretable modular knowledge reasoning for machine reading comprehension. Neural Comput & Applic 34, 9901–9918 (2022). https://doi.org/10.1007/s00521-022-06975-2
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00521-022-06975-2