Section-wise indexing and retrieval of research articles

Shahid, Abdul; Afzal, Muhammad Tanvir

doi:10.1007/s10586-017-0914-4

Section-wise indexing and retrieval of research articles

Published: 22 May 2017

Volume 21, pages 481–492, (2018)
Cite this article

Cluster Computing Aims and scope Submit manuscript

Abdul Shahid¹ &
Muhammad Tanvir Afzal²

379 Accesses
7 Citations
Explore all metrics

Abstract

Relevant information extraction is a dire need of the scholarly community. There are a number of systems available to find relevant information from scientific literature such as search engines, citation indexes, digital libraries etc. For a search query, a long list of irrelevant documents is presented to the users mainly due to the huge number of availability of the full-text document, and furthermore due to the unstructured nature of indexed scientific resources. The contemporary systems have formally defined the structure of scientific documents. However, populating the already available enriched scientific structure from unstructured/semi-structured scientific documents has not been addressed previously. In this research paper, we have designed, implemented, and evaluated an automated technique that is able to tag each paper’s content with logical sections appearing in the scientific document. The proposed system has been evaluated against the benchmark, subsequently, the proposed system have been also compared with machine learning techniques that may be used for the same task. It has been empirically shown that the overall correctness and completeness of our proposed technique is 0.78 and 0.79 respectively and thus the overall accuracy of about 0.78 was achieved. The achieved results are good as compared to machine learning based classification. The developed system may help future information retrieval systems, digital libraries, and citation indexes to index, retrieve, rank and visualize most relevant scientific documents for the scientific community.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

Larsen, P.O., Ins, M.V.: The rate of growth in scientific publication and the decline in coverage provided by Science Citation Index. Scientometrics 84, 575–603 (2010)
Article Google Scholar
Bollacker, K.D., Lawrence, S., Giles, C.L.: Discovering relevant scientific literature on the Web. IEEE Intell. Syst. 15, 4247 (2000)
Article Google Scholar
Giles, C.L., Bollacker, K.D., Lawrence, S., CiteSeer: An automatic citation indexing system. In: Proceedings of Third ACM Conference on Digital Libraries, Pittsburgh, Pennsylvania, United States, 23–26 (1998)
Beel, J., Gipp, B.: Google scholars ranking algorithm: an introductory overview. In: Proceedings of the 12th International Conference on Scientometrics and Informetrics, 230241(2009)
Blumberg, R., Atre, S.: The problem with unstructured data. Inform Manag. 6287, 42–46 (2003)
Google Scholar
Roberts, R.J., Varmus, H.E., Ashburner, M., Brown, P.O., Eisen, M.B., Khosla, C., Kirschner, M., Nusse, R., Scott, M., Wold, B.: Building a “GenBank” of the published literature. Science 291, 2318–2319 (2001)
Article Google Scholar
Kafkas, S., Pi, X., Marinos, N., Talo, F., Morrison, A., McEntyre, J.R.: Section level search functionality in Europe PMC. J. Biomed. Semant. 6(1), 3–7 (2015)
Article Google Scholar
Guo, Y., Korhonen, A., Liakata, M., Silins, I., Hogberg, J., Stenius, U.: A comparison and user-based evaluation of models of textual information structure in the context of cancer risk assessment. BMC Bioinform. 12(1), 7–17 (2011)
Article Google Scholar
Lin, J., Karakos, D., Demner-Fushman, D., Khudanpur, S.: Generative content models for structural analysis of medical abstracts. In: Proceedings of BioNLP-06, New York, USA, pp. 65–72 (2006)
Hirohata, K., Okazaki, N., Ananiadou, S., Ishizuka, M.: Identifying sections in scientific abstracts using conditional random fields. In: Proceedings of 3rd International Joint Conference on Natural Language Processing, pp. 381–388 (2008)
Lin, R.T.K., Dai, H.J., Bow, Y.Y., Chiu, J.L.T., Tsai, R.T.H.: Using conditional randomfields for result identification in biomedical abstracts. Integr. Comput. Aided Eng. 16(4), 339–352 (2009)
Google Scholar
Teufel, S., Siddharthan, A., Batchelor, C.: Towards domain-independent argumentative zoning. Evidence from chemistry and computational linguistics. In: Proceedings of EMNLP, pp. 1493–1502 (2009)
Teufel, S., Moens, M.: Summarizing scientific articles: experiments with relevance and rhetorical status. Comput. Linguist. 28, 409–445 (2002)
Article Google Scholar
Liakata, M., Teufel, S., Siddharthan, A., Batchelor, C.: Corpora for the conceptualisation and zoning of scientific papers. In: Proceedings of the Seventh conference on International Language Resources and Evaluation (LREC10) (2010)
Teufel, S.: Citations and sentiment. In: Workshop on Text mining for Scholarly Communications and Repositories, University of Manchester, UK (2009)
Maricic, S., Spaventi, J., Pavicic, L., Pifat-Mrzljak, G.: Citation context versus the frequency counts of citation histories. J. Am. Soc. Inf. Sci. 49, 530–540 (1998)
Article Google Scholar
Shahid, A., Afzal, M.T., Qadir, M.A.: Discovering semantic relatedness between scientific articles through citation frequency. In: Workshop on Text mining for Scholarly Communications and Repositories, Australian Journal of Basic Applied Sciences, vol. 5, pp. 1599–1604 (2011)
Peroni, S., Shotton, D., Vitali, F.: Faceted documents: describing document characteristics using semantic lenses. In: ACM Symposium on Document Engineering, pp. 191–194 (2012)
Shotton, D., Portwin, K., Klyne, G., Miles, A.: Adventures in semantic publishing: exemplar semantic enhancements of a research article. PLoS Comput. Biol. 5, e1000361 (2009). doi:10.1371/journal.pcbi.1000361
Article Google Scholar
Mizuta, Y., Korhonen, A., Mullen, T., Collier, N.: Zone analysis in biology articles as a basis for information extraction. Int. J. Med. Inf. Nat. Lang. Process. Biomed. Appl. 75(6), 468–487 (2006)
Google Scholar
Cohen, J., Ahmad, M.T., Qadir, M.: A coefficient of agreement for nominal scales. Educ. Psychol. Measur. 20, 37–46 (1960)
Article Google Scholar
Landis, J.R., Koch, G.G.: The measurement of observer agreement for categorical data. Biometrics 33, 159–174 (1977)
Article MATH Google Scholar
Seringhaus, M.R., Gerstein, M.B.: Publishing perishing? Towards tomorrows information architecture. BMC Bioinform. 8, 17 (2007). doi:10.1186/1471-2105-8-17
Article Google Scholar
Gerstein, M., Seringhaus, M., Fields, S.: Structured digital abstract makes text mining easy. Nature 447, 142 (2007). doi:10.1038/447142a
Article Google Scholar
Porter, M.F.: An algorithm for suffix stripping. Program 14(3), 130137 (1980)
Article Google Scholar
Afzal, M.T., Maurer, H., Balke, W.T., Kulathuramaiyer, N.: Rule based autonomous citation mining with TIERL. J. Dig. Inf. Manag. 8(3), 96–204 (2010)
Nigam, K., McCallum, A.K., Thrun, S., Mitchell, T.: Text classification from labeled and unlabeled documents using EM. Mach. Learn. 39(23), 103–134 (2000)
Article MATH Google Scholar
Joachims, T.: Text categorization with support vector machines: learning with many relevant features. In: European Conference on Machine Learning, pp. 137–142. Springer, Berlin (1998)
Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., Witten, I.H.: The WEKA data mining software: an update. ACM SIGKDD Explor. Newsl. 11(1), 10–18 (2009)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Kohat University of Science and Technology, Kohat, Pakistan
Abdul Shahid
Capital University of Science and Technology, Islamabad, Pakistan
Muhammad Tanvir Afzal

Authors

Abdul Shahid
View author publications
You can also search for this author in PubMed Google Scholar
Muhammad Tanvir Afzal
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Abdul Shahid.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Shahid, A., Afzal, M.T. Section-wise indexing and retrieval of research articles. Cluster Comput 21, 481–492 (2018). https://doi.org/10.1007/s10586-017-0914-4

Download citation

Received: 20 March 2017
Revised: 22 April 2017
Accepted: 05 May 2017
Published: 22 May 2017
Issue Date: March 2018
DOI: https://doi.org/10.1007/s10586-017-0914-4

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Section-wise indexing and retrieval of research articles

Abstract

Access this article

Similar content being viewed by others

Algo_Seer: System for Extracting and Searching Algorithms in Scholarly Big Data

Metadata Extraction for Scientific Papers

A hybrid strategy to extract metadata from scholarly articles by utilizing support vector machine and heuristics

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Section-wise indexing and retrieval of research articles

Abstract

Access this article

Similar content being viewed by others

Algo_Seer: System for Extracting and Searching Algorithms in Scholarly Big Data

Metadata Extraction for Scientific Papers

A hybrid strategy to extract metadata from scholarly articles by utilizing support vector machine and heuristics

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation