Skip to main content

Extracting Business Process Entities and Relations from Text Using Pre-trained Language Models and In-Context Learning

  • Conference paper
  • First Online:
Enterprise Design, Operations, and Computing (EDOC 2022)

Abstract

The extraction of business processes elements from textual documents is a research area which still lacks the ability to scale to the variety of real-world texts. In this paper we investigate the usage of pre-trained language models and in-context learning to address the problem of information extraction from process description documents as a way to exploit the power of deep learning approaches while relying on few annotated data. In particular, we investigate the usage of the native GPT-3 model and few in-context learning customizations that rely on the usage of conceptual definitions and a very limited number of examples for the extraction of typical business process entities and relationships. The experiments we have conducted provide two types of insights. First, the results demonstrate the feasibility of the proposed approach, especially for what concerns the extraction of activity, participant, and the performs relation between a participant and an activity it performs. They also highlight the challenge posed by control flow relations. Second, it provides a first set of lessons learned on how to interact with these kinds of models that can facilitate future investigations on this subject.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    We have chosen BPMN as an illustrative example but the approach is clearly agnostic to the specific modeling language.

  2. 2.

    The terminology for these instructions varies from paper to paper.

  3. 3.

    The interested reader can found all the PET-related resources at http://huggingface.co/datasets/patriziobellan/PET.

  4. 4.

    The “activity” label is used in PET only to represent the verbal component of what is usually denoted as business process activity.

  5. 5.

    Several definitions exist of many business process elements (see e.g., www.businessprocessglossary.com), but they often present different wordings and even conflicting characteristics [4]. A thorough investigation of the impact of different definitions of business process elements is out of the goal of this paper and is left for future works.

  6. 6.

    In few cases the model was able to provide semantically correct answers which did not match the exact PET labels. A paradigmatic case is the answer “check and repair the computer” as a single activity, instead of the two separate ones which are reported PET, as required by its specific annotation guidelines. We have carefully considered these few cases and decided to evaluate the semantically correct answers as correct answers.

References

  1. van der Aa, H., Carmona, J., Leopold, H., Mendling, J., Padró, L.: Challenges and opportunities of applying natural language processing in business process management. In: COLING 2018 Proceedings of 27th International Conference on Computational Linguistics, pp. 2791–2801. ACL (2018)

    Google Scholar 

  2. van der Aa, H., Di Ciccio, C., Leopold, H., Reijers, H.A.: Extracting declarative process models from natural language. In: Giorgini, P., Weber, B. (eds.) CAiSE 2019. LNCS, vol. 11483, pp. 365–382. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-21290-2_23

    Chapter  Google Scholar 

  3. Ackermann, L., Volz, B.: model[NL]generation: natural language model extraction. In: Proceedings of the 2013 ACM workshop DSM@SPLASH 2013, pp. 45–50. ACM (2013)

    Google Scholar 

  4. Adamo, G., Di Francescomarino, C., Ghidini, C.: Digging into business process meta-models: a first ontological analysis. In: Dustdar, S., Yu, E., Salinesi, C., Rieu, D., Pant, V. (eds.) CAiSE 2020. LNCS, vol. 12127, pp. 384–400. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-49435-3_24

    Chapter  Google Scholar 

  5. Bellan, P., van der Aa, H., Dragoni, M., Ghidini, C., Ponzetto, S.P.: PET: an annotated dataset for process extraction from natural language text tasks. In: Proceedings of the BPM 2022 First Workshop on Natural Language Processing for Business Process Management (NLP4BPM) co-located with the 20th conference Business Process Management, CEUR Workshop Proceedings. CEUR-WS.org (2022)

    Google Scholar 

  6. Bellan, P., Dragoni, M., Ghidini, C.: Process extraction from text: state of the art and challenges for the future. CoRR abs/2110.03754 (2021)

    Google Scholar 

  7. Boratko, M., Li, X., O’Gorman, T., Das, R., Le, D., McCallum, A.: ProtoQA: a question answering dataset for prototypical common-sense reasoning. In: Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, EMNLP 2020, pp. 1122–1136. ACL (2020)

    Google Scholar 

  8. Brown, T.B., et al.: Language models are few-shot learners. In: Annual Conference on Neural Information Processing Systems 2020, NeurIPS 2020 (2020)

    Google Scholar 

  9. Chintagunta, B., Katariya, N., Amatriain, X., Kannan, A.: Medically aware GPT-3 as a data generator for medical dialogue summarization. In: Proceedings of the 6th Machine Learning for Healthcare Conference, Proceedings of Machine Learning Research, vol. 149, pp. 354–372. PMLR (2021)

    Google Scholar 

  10. Chiu, K., Alexander, R.: Detecting hate speech with GPT-3. CoRR abs/2103.12407 (2021)

    Google Scholar 

  11. Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Proceedings of NAACL-HLT 2019, Vol. 1, pp. 4171–4186. ACL (2019)

    Google Scholar 

  12. Epure, E.V., Martín-Rodilla, P., Hug, C., Deneckère, R., Salinesi, C.: Automatic process model discovery from textual methodologies. In: 9th IEEE International Conference on Research Challenges in Information Science, RCIS 2015, pp. 19–30. IEEE (2015)

    Google Scholar 

  13. Ferreira, R.C.B., Thom, L.H., Fantinato, M.: A Semi-automatic approach to identify business process elements in natural language texts. In: ICEIS 2017 - Proceedings of the 19th International Conference on Enterprise Information Systems, Vol. 3, pp. 250–261. SciTePress (2017)

    Google Scholar 

  14. Friedrich, F., Mendling, J., Puhlmann, F.: Process model generation from natural language text. In: Mouratidis, H., Rolland, C. (eds.) CAiSE 2011. LNCS, vol. 6741, pp. 482–496. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-21640-4_36

    Chapter  Google Scholar 

  15. Gao, T., Fisch, A., Chen, D.: Making pre-trained language models better few-shot learners. In: Proceedings of ACL/IJCNLP 2021, pp. 3816–3830. ACL (2021)

    Google Scholar 

  16. Han, X., et al.: A-BPS: automatic business process discovery service using ordered neurons LSTM. In: 2020 IEEE International Conference on Web Services, ICWS 2020, pp. 428–432. IEEE (2020)

    Google Scholar 

  17. Honkisz, K., Kluza, K., Wiśniewski, P.: A concept for generating business process models from natural language description. In: Liu, W., Giunchiglia, F., Yang, B. (eds.) KSEM 2018. LNCS (LNAI), vol. 11061, pp. 91–103. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-99365-2_8

    Chapter  Google Scholar 

  18. Liu, Y., et al.: RoBERTa: a robustly optimized BERT pretraining approach. ArXiv abs/1907.11692 (2019)

    Google Scholar 

  19. López, H.A., Debois, S., Hildebrandt, T.T., Marquard, M.: The process highlighter: from texts to declarative processes and back. In: Proceedings of Dissertation Award, Demo, and Industrial Track, BPM 2018. CEUR Workshop Proceedings, vol. 2196, pp. 66–70. CEUR-WS.org (2018)

    Google Scholar 

  20. Maqbool, B., et al.: A comprehensive investigation of BPMN models generation from textual requirements—techniques, tools and trends. In: Kim, K.J., Baek, N. (eds.) ICISA 2018. LNEE, vol. 514, pp. 543–557. Springer, Singapore (2019). https://doi.org/10.1007/978-981-13-1056-0_54

    Chapter  Google Scholar 

  21. Petrucci, G., Rospocher, M., Ghidini, C.: Expressive ontology learning as neural machine translation. J. Web Semant. 52–53, 66–82 (2018)

    Article  Google Scholar 

  22. Qian, C., et al.: An approach for process model extraction by multi-grained text classification. In: Dustdar, S., Yu, E., Salinesi, C., Rieu, D., Pant, V. (eds.) CAiSE 2020. LNCS, vol. 12127, pp. 268–282. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-49435-3_17

    Chapter  Google Scholar 

  23. Quishpi, L., Carmona, J., Padró, L.: Extracting annotations from textual descriptions of processes. In: Fahland, D., Ghidini, C., Becker, J., Dumas, M. (eds.) BPM 2020. LNCS, vol. 12168, pp. 184–201. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58666-9_11

    Chapter  Google Scholar 

  24. Raffel, C., et al.: Exploring the limits of transfer learning with a unified text-to-text transformer. J. Mach. Learn. Res. 21, 140:1–140:67 (2020)

    Google Scholar 

  25. Sànchez-Ferreres, J., Burattin, A., Carmona, J., Montali, M., Padró, L., Quishpi, L.: Unleashing textual descriptions of business processes. Softw. Syst. Model. 20(6), 2131–2153 (2021). https://doi.org/10.1007/s10270-021-00886-x

    Article  Google Scholar 

  26. Sawant, K.P., Roy, S., Sripathi, S., Plesse, F., Sajeev, A.S.M.: Deriving requirements model from textual use cases. In: 36th International Conference on Software Engineering, ICSE 2014, Proceedings, pp. 235–244. ACM (2014)

    Google Scholar 

  27. Scao, T.L., Rush, A.M.: How many data points is a prompt worth? In: Proceedings of NAACL-HLT 2021, pp. 2627–2636. ACL (2021)

    Google Scholar 

  28. Wang, S., Liu, Y., Xu, Y., Zhu, C., Zeng, M.: Want to reduce labeling cost? GPT-3 can help. In: Findings of the Association for Computational Linguistics: EMNLP 2021, pp. 4195–4205. ACL (2021)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Patrizio Bellan .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Bellan, P., Dragoni, M., Ghidini, C. (2022). Extracting Business Process Entities and Relations from Text Using Pre-trained Language Models and In-Context Learning. In: Almeida, J.P.A., Karastoyanova, D., Guizzardi, G., Montali, M., Maggi, F.M., Fonseca, C.M. (eds) Enterprise Design, Operations, and Computing. EDOC 2022. Lecture Notes in Computer Science, vol 13585. Springer, Cham. https://doi.org/10.1007/978-3-031-17604-3_11

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-17604-3_11

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-17603-6

  • Online ISBN: 978-3-031-17604-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics