Skip to main content

End-to-End Neural Relation Extraction Using Deep Biaffine Attention

  • Conference paper
  • First Online:
Advances in Information Retrieval (ECIR 2019)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 11437))

Included in the following conference series:

Abstract

We propose a neural network model for joint extraction of named entities and relations between them, without any hand-crafted features. The key contribution of our model is to extend a BiLSTM-CRF-based entity recognition model with a deep biaffine attention layer to model second-order interactions between latent features for relation classification, specifically attending to the role of an entity in a directional relationship. On the benchmark “relation and entity recognition” dataset CoNLL04, experimental results show that our model outperforms previous models, producing new state-of-the-art performances.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 109.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 139.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Adel, H., Schütze, H.: Global normalization of convolutional neural networks for joint entity and relation classification. In: Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, pp. 1723–1729 (2017)

    Google Scholar 

  2. Bach, N., Badaskar, S.: A review of relation extraction. Carnegie Mellon University, Technical Report (2007)

    Google Scholar 

  3. Ballesteros, M., Dyer, C., Smith, N.A.: Improved transition-based parsing by modeling characters instead of words with LSTMs. In: Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, pp. 349–359 (2015)

    Google Scholar 

  4. Bekoulis, G., Deleu, J., Demeester, T., Develder, C.: Adversarial training for multi-context joint entity and relation extraction. In: Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pp. 2830–2836 (2018)

    Google Scholar 

  5. Bekoulis, G., Deleu, J., Demeester, T., Develder, C.: Joint entity recognition and relation extraction as a multi-head selection problem. Expert Syst. Appl. 114, 34–45 (2018)

    Article  Google Scholar 

  6. Blanco, R., Cambazoglu, B.B., Mika, P., Torzec, N.: Entity recommendations in web search. In: Alani, H., et al. (eds.) ISWC 2013, Part II. LNCS, vol. 8219, pp. 33–48. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-41338-4_3

    Chapter  Google Scholar 

  7. Dozat, T., Manning, C.D.: Deep Biaffine attention for neural dependency parsing. In: Proceedings of the 5th International Conference on Learning Representations (2017)

    Google Scholar 

  8. Dozat, T., Qi, P., Manning, C.D.: Stanford’s graph-based neural dependency parser at the CoNLL 2017 shared task. In: Proceedings of the CoNLL 2017 Shared Task: Multilingual Parsing from Raw Text to Universal Dependencies, pp. 20–30 (2017)

    Google Scholar 

  9. Gupta, P., Schütze, H., Andrassy, B.: Table filling multi-task recurrent neural network for joint entity and relation extraction. In: Proceedings of the 26th International Conference on Computational Linguistics: Technical Papers, pp. 2537–2547 (2016)

    Google Scholar 

  10. Huang, Z., Xu, W., Yu, K.: Bidirectional LSTM-CRF Models for Sequence Tagging. arXiv preprint arXiv:1508.01991 (2015)

  11. Jiang, J.: Information extraction from text. In: Aggarwal, C.C., Zhai, C. (eds.) Mining Text Data, pp. 11–41. Springer, New York (2012). https://doi.org/10.1007/978-1-4614-3223-4

    Google Scholar 

  12. Kate, R.J., Mooney, R.J.: Joint entity and relation extraction using card-pyramid parsing. In: Proceedings of the Fourteenth Conference on Computational Natural Language Learning, pp. 203–212 (2010)

    Google Scholar 

  13. Katiyar, A., Cardie, C.: Going out on a limb: joint extraction of entity mentions and relations without dependency trees. In: Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp. 917–928 (2017)

    Google Scholar 

  14. Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. CoRR abs/1412.6980 (2014)

    Google Scholar 

  15. Kiperwasser, E., Goldberg, Y.: Simple and accurate dependency parsing using bidirectional LSTM feature representations. Trans. ACL 4, 313–327 (2016)

    Google Scholar 

  16. Lafferty, J.D., McCallum, A., Pereira, F.C.N.: Conditional random fields: probabilistic models for segmenting and labeling sequence data. In: Proceedings of the Eighteenth International Conference on Machine Learning, pp. 282–289 (2001)

    Google Scholar 

  17. Lample, G., Ballesteros, M., Subramanian, S., Kawakami, K., Dyer, C.: Neural architectures for named entity recognition. In: Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp. 260–270 (2016)

    Google Scholar 

  18. Li, Q., Ji, H.: Incremental joint extraction of entity mentions and relations. In: Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp. 402–412 (2014)

    Google Scholar 

  19. Miwa, M., Bansal, M.: End-to-end relation extraction using LSTMs on sequences and tree structures. In: Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp. 1105–1116 (2016)

    Google Scholar 

  20. Miwa, M., Sasaki, Y.: Modeling joint entity and relation extraction with table representation. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing, pp. 1858–1869 (2014)

    Google Scholar 

  21. Neubig, G., et al.: DyNet: The Dynamic Neural Network Toolkit. arXiv preprint arXiv:1701.03980 (2017)

  22. Nguyen, T.H., Grishman, R.: Combining neural networks and log-linear models to improve relation extraction. In: Proceedings of IJCAI Workshop on Deep Learning for Artificial Intelligence (2016)

    Google Scholar 

  23. Pawar, S., Bhattacharyya, P., Palshikar, G.: End-to-end relation extraction using neural networks and markov logic networks. In: Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 1, Long Papers, pp. 818–827 (2017)

    Google Scholar 

  24. Pennington, J., Socher, R., Manning, C.: Glove: global vectors for word representation. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing, pp. 1532–1543 (2014)

    Google Scholar 

  25. Ratinov, L., Roth, D.: Design challenges and misconceptions in named entity recognition. In: Proceedings of the Thirteenth Conference on Computational Natural Language Learning, pp. 147–155 (2009)

    Google Scholar 

  26. Roth, D., tau Yih, W.: Global inference for entity and relation identification via a linear programming formulation. In: Introduction to Statistical Relational Learning. MIT Press, Cambridge (2007)

    Google Scholar 

  27. Roth, D., Yih, W.T.: A linear programming formulation for global inference in natural language tasks. In: Proceedings of the 8th Conference on Computational Natural Language Learning, pp. 1–8 (2004)

    Google Scholar 

  28. Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., Salakhutdinov, R.: Dropout: a simple way to prevent neural networks from overfitting. J. Mach. Learn. Res. 15, 1929–1958 (2014)

    MathSciNet  MATH  Google Scholar 

  29. Thomas, P., Starlinger, J., Vowinkel, A., Arzt, S., Leser, U.: GeneView: a comprehensive semantic search engine for PubMed. Nucleic Acids Res. 40(W1), W585–W591 (2012)

    Article  Google Scholar 

  30. Wang, S., Zhang, Y., Che, W., Liu, T.: Joint extraction of entities and relations based on a novel graph scheme. In: Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence, pp. 4461–4467 (2018)

    Google Scholar 

  31. Zhang, M., Zhang, Y., Fu, G.: End-to-end neural relation extraction with global optimization. In: Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, pp. 1730–1740 (2017)

    Google Scholar 

  32. Zheng, S., et al.: Joint entity and relation extraction based on a hybrid neural network. Neurocomputing 257, 59–66 (2017)

    Article  Google Scholar 

  33. Zheng, S., Wang, F., Bao, H., Hao, Y., Zhou, P., Xu, B.: Joint extraction of entities and relations based on a novel tagging scheme. In: Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp. 1227–1236 (2017)

    Google Scholar 

Download references

Acknowledgments

This work was supported by the ARC projects DP150101550 and LP160101469.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Dat Quoc Nguyen .

Editor information

Editors and Affiliations

Appendix

Appendix

Implementation Details: We apply dropout [28] with a 67% keep probability to the inputs of BiLSTMs and FFNNs. Following [15], we also use word dropout to learn an embedding for unknown words: we replace each word token w appearing \(\#(w)\) times in the training set with a special “unk” symbol with probability \(\mathsf {p}_{unk}(w) = \frac{0.25}{0.25 + \#(w)}\).

Word embeddings are initialized by the 100-dimensional pre-trained GloVe word vectors [24], while character and NER label embeddings are initialized randomly. All these embeddings are then updated during training. For learning character-level word embeddings, we set the size of LSTM hidden states in \(\mathrm {BiLSTM}_{\text {char}}\) to be equal to the size of character embeddings. Here, we perform a minimal grid search of hyper-parameters for Setup 1, resulting in the Adam initial learning rate of 0.0005, the character embedding size of 25, the NER label embedding size of 100, the size of the output layers of both \(\mathrm {FFNN}_{\text {head}}\) and \(\mathrm {FFNN}_{\text {tail}}\) at 100, the number of \(\mathrm {BiLSTM}_{\text {NER}}\) and \(\mathrm {BiLSTM}_{\text {RC}}\) layers at 2 and the size of LSTM hidden states in each layer at 100. These optimal hyper-parameters for Setup 1 are then reused for Setup 2 where we additionally use the boundary tag embedding size of 100.

Metric: Similar to the previous works, when computing the macro-averaged F1 scores, we omit the entity label “Other” and the negative relation “NEG”. Here, for NER an entity is predicted correctly if both the entity boundaries and the entity type are correct, while for EC a multi-token entity is considered as correct if at least one of its comprising tokens is predicted correctly. In all cases, a relation is scored as correct if both the argument entities and the relation type are correct.

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Nguyen, D.Q., Verspoor, K. (2019). End-to-End Neural Relation Extraction Using Deep Biaffine Attention. In: Azzopardi, L., Stein, B., Fuhr, N., Mayr, P., Hauff, C., Hiemstra, D. (eds) Advances in Information Retrieval. ECIR 2019. Lecture Notes in Computer Science(), vol 11437. Springer, Cham. https://doi.org/10.1007/978-3-030-15712-8_47

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-15712-8_47

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-15711-1

  • Online ISBN: 978-3-030-15712-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics