Sentence Paraphrase Detection Using Classification Models

Tian, Liuyang; Ning, Hui; Kong, Leilei; Chen, Kaisheng; Qi, Haoliang; Han, Zhongyuan

doi:10.1007/978-3-319-73606-8_13

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 10478))

Included in the following conference series:

Forum for Information Retrieval Evaluation

655 Accesses

Abstract

In this paper, we address on the task of sentence paraphrase detection which is focused on deciding whether the two sentences have the relationship of paraphrase. A supervised learning strategy for paraphrase detection is described whereby the two sentences are classified to decide the paraphrase relationship and using only the lexical features operated at n-gram as the classification features. Gradient Boosting, K-Nearest Neighbor, Decision Tree and Support vector machine are chosen as the classifiers. The performance of the classification method is compared and the features are analyzed to determine which of them are most important for paraphrase detection. Evaluation is performed on the corpus of 2016 Detecting Paraphrase in Indian Languages task proposed by Forum of Information Retrieval Evaluation (DPIL-FIRE2016). The experimental results show that the Gradient Boosting can achieve the highest Overall Score. By using the learned classifier, we got the highest F1 measure for both Task1 and Task2 on Malayalam and Tamil, and the highest F1 measure for Task2 on Punjabi in DPIL-FIRE2016.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
http://nlp.amrita.edu/dpil_cen/

References

Huang, E.: Paraphrase detection using recursive autoencoder (2011). http://nlp.stanford.edu/courses/cs224n/2011/reports/ehhuang.pdf
Anand Kumar, M., Singh, S., Kavirajan, B., et al.: DPIL@FIRE2016: overview of shared task on detecting paraphrases in Indian Languages. In: Working notes of FIRE 2016–Forum for Information Retrieval Evaluation, Kolkata, India, 7–10 December 2016
Google Scholar
Alzahrani, S.M., Salim, N., Abraham, A.: Understanding plagiarism linguistic patterns, textual features, and detection methods. IEEE Trans. Syst. Man Cybern. Part C (Appl. Rev.) 42(2), 133–149 (2012)
Article Google Scholar
Banerjee, S., Lavie, A.: METEOR: an automatic metric for MT evaluation with improved correlation with human judgments. In: Proceedings of the ACL Workshop on Intrinsic and Extrinsic Evaluation Measures for Machine Translation and/or Summarization, vol. 29, pp. 65–72 (2005)
Google Scholar
McClendon, J.L., Mack, N.A., Hodges, L.F.: The use of paraphrase identification in the retrieval of appropriate responses for script based conversational agents. In: FLAIRS Conference (2014)
Google Scholar
Dolan, W.B., Brockett, C.: Automatically constructing a corpus of sentential paraphrases. In: Proceedings of IWP (2005)
Google Scholar
Lintean, M., Rus, V.: Paraphrase identification using weighted dependencies and word semantics. In: Twenty-Second International FLAIRS Conference (2009)
Google Scholar
Socher, R., Huang, E.H., Pennin, J., et al.: Dynamic pooling and unfolding recursive autoencoders for paraphrase detection. In: Advances in Neural Information Processing Systems, pp. 801–809 (2011)
Google Scholar
Hu, B., Lu, Z., Li, H., et al.: Convolutional neural network architectures for matching natural language sentences. In: Advances in Neural Information Processing Systems, pp. 2042–2050 (2014)
Google Scholar
Fernando, S., Stevenson, M.: A semantic similarity approach to paraphrase detection. In: Proceedings of the 11th Annual Research Colloquium of the UK Special Interest Group for Computational Linguistics, pp. 45–52 (2008)
Google Scholar
Rus, V., McCarthy, P.M., Lintean, M.C., McNamara, D.S., Graesser, A.C.: Paraphrase identification with lexico-syntactic graph subsumption. In: FLAIRS Conference, pp. 201–206 (2008)
Google Scholar
Saini, A., Gurgaon, H.: Anuj@DPIL-FIRE2016: a novel paraphrase detection method in Hindi Language using machine learning. In: FIRE (Working Notes), pp. 270–274 (2016)
Google Scholar
Sarkar, K.: KS_JU@DPIL-FIRE2016: detecting paraphrases in Indian Languages using multinomial logistic regression model. arXiv preprint arXiv:1612.08171 (2016)
Thangarajan, R., Kogilavani, S.V., Karthic, A., et al.: KEC@ DPIL-FIRE2016: detection of paraphrases on Indian Languages (2016)
Google Scholar
Sarkar, S., Saha, S., Bentham, J., et al.: NLP-NITMZ@DPIL-FIRE2016: Language independent paraphrases detection. In: FIRE (Working Notes), pp. 256–259 (2016)
Google Scholar
Friedl, M.A., Brodley, C.E.: Decision tree classification of land cover from remotely sensed data. Remote Sens. Environ. 61(3), 399–409 (1997)
Article Google Scholar
Peterson, L.E.: K-nearest neighbor. Scholarpedia 4(2), 1883 (2009)
Article Google Scholar
Wu, T.F., Lin, C.J., Weng, R.C.: Probability estimates for multi-class classification by pairwise coupling. J. Mach. Learn. Res. 5(Aug), 975–1005 (2004)
MathSciNet MATH Google Scholar
Friedman, J.H.: Greedy function approximation: a gradient boosting machine. Ann. Stat. 1189–1232 (2001)
Google Scholar
Friedman, J.H.: Stochastic gradient boosting. Comput. Stat. Data Anal. 38(4), 367–378 (2002)
Article MathSciNet MATH Google Scholar
Li, H.: Statistical learning methods. Tsinghua University Press, Beijing (2012). (in Chinese)
Google Scholar
Li, C., Lu, J., Lu, Y.: Efficient merging and filtering algorithms for approximate string searches. In: ICDE, pp. 257–266 (2008)
Google Scholar
Wang, J., Li, G., Feng, J.: Fast-join: an efficient method for fuzzy token matching based string similarity join. In: ICDE, pp. 458–469 (2011)
Google Scholar

Download references

Acknowledgments

This work is supported by Social Science Fund of Heilongjiang Province (NO. 16XWB02).

Author information

Authors and Affiliations

School of Computer Science and Technology, Harbin Engineering University, Harbin, 150001, China
Liuyang Tian & Hui Ning
School of Computer Science and Technology, Heilongjiang Institute of Technology, Harbin, 150050, China
Leilei Kong, Kaisheng Chen, Haoliang Qi & Zhongyuan Han

Authors

Liuyang Tian
View author publications
You can also search for this author in PubMed Google Scholar
Hui Ning
View author publications
You can also search for this author in PubMed Google Scholar
Leilei Kong
View author publications
You can also search for this author in PubMed Google Scholar
Kaisheng Chen
View author publications
You can also search for this author in PubMed Google Scholar
Haoliang Qi
View author publications
You can also search for this author in PubMed Google Scholar
Zhongyuan Han
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Leilei Kong .

Editor information

Editors and Affiliations

DAIICT, Gujarat, India
Prasenjit Majumder
Indian Statistical Institute, Kolkata, India
Mandar Mitra
DAIICT, Gujarat, India
Parth Mehta
DAIICT, Gujarat, India
Jainisha Sankhavara

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Tian, L., Ning, H., Kong, L., Chen, K., Qi, H., Han, Z. (2018). Sentence Paraphrase Detection Using Classification Models. In: Majumder, P., Mitra, M., Mehta, P., Sankhavara, J. (eds) Text Processing. FIRE 2016. Lecture Notes in Computer Science(), vol 10478. Springer, Cham. https://doi.org/10.1007/978-3-319-73606-8_13

Download citation

DOI: https://doi.org/10.1007/978-3-319-73606-8_13
Published: 04 January 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-73605-1
Online ISBN: 978-3-319-73606-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics