Skip to main content

A Keyword Recommendation Method Using CorKeD Words and Its Application to Earth Science Data

  • Conference paper
  • First Online:
Information Retrieval Technology (AIRS 2015)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 9460))

Included in the following conference series:

Abstract

In various research domains, data providers themselves annotate their own data with keywords from a controlled vocabulary. However, since selecting keywords requires extensive knowledge of the domain and the controlled vocabulary, even data providers have difficulty in selecting appropriate keywords from the vocabulary. Therefore, we propose a method for recommending relevant keywords in a controlled vocabulary to data providers. We focus on a keyword definition, and calculate the similarity between an abstract text of data and the keyword definition. Moreover, considering that there are unnecessary words in the calculation, we extract CorKeD (Corpus-based Keyword Decisive) words from a target domain corpus so that we can measure the similarity appropriately. We conduct an experiment on earth science data, and verify the effectiveness of extracting the CorKeD words, which are the terms that better characterize the domain.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    http://pro.europeana.eu/publication/metadata-quality-task-force-report.

  2. 2.

    http://www.diasjp.net/.

  3. 3.

    http://gcmd.nasa.gov/.

  4. 4.

    https://delicious.com/.

  5. 5.

    http://sites.agu.org/.

  6. 6.

    Chemistry : Journal of the American Chemical Society

    Physics : The European physical journal

    Biology : International journal of biological sciences, Journal of evolutionary biology.

  7. 7.

    http://abstractsearch.agu.org/keywords.

References

  1. Olsen, L.M., Major, G., Shein, K., Scialdone, J., Ritz, S., Stevens, T., Morahan, M., Aleman, A., Vogel, R., Leicester, S., Weir, H., Meaux, M., Grebas, S., Solomon, C., Holland, M., Northcutt, T., Restrepo, R.A., Bilodeau, R.: NASA/Global Change Master Directory (GCMD) Earth Science Keywords. Version 8.0.0.0.0 (2013)

    Google Scholar 

  2. Tuarob, S., Pouchard, L.C., Giles, C.L.: Automatic tag recommendation for metadata annotation using probabilistic topic modeling. In: JCDL, pp. 239–248 (2013)

    Google Scholar 

  3. Shimizu, T., Sueki, T., Yoshikawa, M.: Supporting keyword selection in generating earth science metadata. In: COMPSAC, pp. 603–604 (2013)

    Google Scholar 

  4. Kubo, J., Tsuji, K., Sugimoto, S.: Automatic term recognition using the corpora of the different academic areas (in Japanese). J. Jpn Soc. Inf. Knowl. 20(1), 15–31 (2010)

    Article  Google Scholar 

  5. Krestel, R., Fankhauser, P., Nejdl, W.: Latent dirichlet allocation for tag recommendation. In: RecSys, pp. 61–68 (2009)

    Google Scholar 

  6. Chernyak, E.: An approach to the problem of annotation of research publications. In: WSDM, pp. 429–434 (2015)

    Google Scholar 

  7. Santos, A.P., Rodrigues, F.: Multi-label hierarchical text classification using the acm taxonomy. In: EPIA, pp. 553–564 (2009)

    Google Scholar 

  8. Lu, Y.T., Yu, S.I., Chang, T.C., Hsu, J.Y.J.: A content-based method to enhance tag recommendation. In: IJCAI, pp. 2064–2069 (2009)

    Google Scholar 

  9. Belem, F., Martins, E., Pontes, T., Almeida, J., Goncalves, M.: Associative tag recommendation exploiting multiple textual features. In: SIGIR, pp. 1033–1042 (2011)

    Google Scholar 

  10. Paik, J.H.: A novel TF-IDF weighting scheme for effective ranking. In: SIGIR, pp. 343–352 (2013)

    Google Scholar 

  11. Utiyama, M., Chujo, K., Yamamoto, E., Isahara, H.: A comparison of measures for extracting domain-specific lexicons for english education (in Japanese). J. Nat. Lang. Process. 11(3), 165–197 (2004)

    Article  Google Scholar 

  12. Uchimoto, K., Sekine, S., Murata, M., Ozaku, H., Isahara, H.: Term recognition using corpora from different fields. Terminology 6(2), 233–256 (2001)

    Article  Google Scholar 

  13. Salton, G.: Automatic text processing: the transformation, analysis, and retrieval of information by computer. Addison-Wesley, Boston (1989)

    Google Scholar 

  14. Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent Dirichlet allocation. J. Mach. Learn. Res. 3, 993–1022 (2003)

    MATH  Google Scholar 

  15. Ramage, D., Hall, D., Nallapati, R., Manning, C.D.: Labeled LDA: a supervised topic model for credit attribution in multi-labeled corpora. In: EMNLP, pp. 248–256 (2009)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Youichi Ishida .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer International Publishing Switzerland

About this paper

Cite this paper

Ishida, Y., Shimizu, T., Yoshikawa, M. (2015). A Keyword Recommendation Method Using CorKeD Words and Its Application to Earth Science Data. In: Zuccon, G., Geva, S., Joho, H., Scholer, F., Sun, A., Zhang, P. (eds) Information Retrieval Technology. AIRS 2015. Lecture Notes in Computer Science(), vol 9460. Springer, Cham. https://doi.org/10.1007/978-3-319-28940-3_8

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-28940-3_8

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-28939-7

  • Online ISBN: 978-3-319-28940-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics