Skip to main content

Cost Evaluation of CRF-Based Bibliography Extraction from Reference Strings

  • Conference paper
The Emergence of Digital Libraries – Research and Practices (ICADL 2014)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 8839))

Included in the following conference series:

Abstract

The effective use of digital libraries demands maintenance of bibliographic databases. Especially, the reference fields of academic papers are full of useful bibliographic information such as authors’ names and paper titles. We, therefore, propose a method of automatically extracting bibliographic information from reference strings using a conditional random field (CRF). However, at least a few hundred reference strings are necessary for training the CRF to achieve high extraction accuracies. As described herein, we propose the use of active sampling and pseudo-training data to reduce the amount of training data. Then we evaluate the associated training costs by experimentation.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Ohta, M., Arauchi, D., Takasu, A., Adachi, J.: Error detection of CRF-Based bibliography extraction from reference strings. In: Chen, H.-H., Chowdhury, G. (eds.) ICADL 2012. LNCS, vol. 7634, pp. 229–238. Springer, Heidelberg (2012)

    Google Scholar 

  2. Peng, F., McCallum, A.: Accurate information extraction from research papers using conditional random fields. In: HLT-NAACL, pp. 329–336 (2004)

    Google Scholar 

  3. Councill, I.G., Giles, C.L., Kan, M.Y.: ParsCit: An open-source CRF reference string parsing package. In: Proc. of Language Resources and Evaluation Conference (LREC 20), pp. 661–667 (2008)

    Google Scholar 

  4. Takasu, A., Ohta, M.: Rule management for information extraction from title pages of academic papers. In: Proc. of ICPRAM 2014, pp. 438–444 (2014)

    Google Scholar 

  5. Ohta, M., Arauchi, D., Takasu, A., Adachi, J.: Empirical evaluation of CRF-based bibliography extraction from reference strings. In: Proc. of IAPR DAS 2014, pp. 287–292 (2014)

    Google Scholar 

  6. Kudo, T., Yamamoto, K., Matsumoto, Y.: Applying conditional random fields to Japanese morphological analysis. In: Proc. of EMNLP 2004, pp. 230–237 (2004)

    Google Scholar 

  7. Settles, B., Craven, M.: An analysis of active learning strategies for sequence labeling tasks. In: Proc. of EMNLP 2008, pp. 1070–1079 (2008)

    Google Scholar 

  8. Saar-Tsechansky, M., Provost, F.: Active sampling for class probability estimation and ranking. Machine Learning 54, 153–178 (2004)

    Article  MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2014 Springer International Publishing Switzerland

About this paper

Cite this paper

Kawakami, N., Ohta, M., Takasu, A., Adachi, J. (2014). Cost Evaluation of CRF-Based Bibliography Extraction from Reference Strings. In: Tuamsuk, K., Jatowt, A., Rasmussen, E. (eds) The Emergence of Digital Libraries – Research and Practices. ICADL 2014. Lecture Notes in Computer Science, vol 8839. Springer, Cham. https://doi.org/10.1007/978-3-319-12823-8_28

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-12823-8_28

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-12822-1

  • Online ISBN: 978-3-319-12823-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics