Extracting Business Process Entities and Relations from Text Using Pre-trained Language Models and In-Context Learning

Bellan, Patrizio; Dragoni, Mauro; Ghidini, Chiara

doi:10.1007/978-3-031-17604-3_11

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13585))

Included in the following conference series:

International Conference on Enterprise Design, Operations, and Computing

1251 Accesses
10 Citations

Abstract

The extraction of business processes elements from textual documents is a research area which still lacks the ability to scale to the variety of real-world texts. In this paper we investigate the usage of pre-trained language models and in-context learning to address the problem of information extraction from process description documents as a way to exploit the power of deep learning approaches while relying on few annotated data. In particular, we investigate the usage of the native GPT-3 model and few in-context learning customizations that rely on the usage of conceptual definitions and a very limited number of examples for the extraction of typical business process entities and relationships. The experiments we have conducted provide two types of insights. First, the results demonstrate the feasibility of the proposed approach, especially for what concerns the extraction of activity, participant, and the performs relation between a participant and an activity it performs. They also highlight the challenge posed by control flow relations. Second, it provides a first set of lessons learned on how to interact with these kinds of models that can facilitate future investigations on this subject.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
We have chosen BPMN as an illustrative example but the approach is clearly agnostic to the specific modeling language.
2.
The terminology for these instructions varies from paper to paper.
3.
The interested reader can found all the PET-related resources at http://huggingface.co/datasets/patriziobellan/PET.
4.
The “activity” label is used in PET only to represent the verbal component of what is usually denoted as business process activity.
5.
Several definitions exist of many business process elements (see e.g., www.businessprocessglossary.com), but they often present different wordings and even conflicting characteristics [4]. A thorough investigation of the impact of different definitions of business process elements is out of the goal of this paper and is left for future works.
6.
In few cases the model was able to provide semantically correct answers which did not match the exact PET labels. A paradigmatic case is the answer “check and repair the computer” as a single activity, instead of the two separate ones which are reported PET, as required by its specific annotation guidelines. We have carefully considered these few cases and decided to evaluate the semantically correct answers as correct answers.

References

van der Aa, H., Carmona, J., Leopold, H., Mendling, J., Padró, L.: Challenges and opportunities of applying natural language processing in business process management. In: COLING 2018 Proceedings of 27th International Conference on Computational Linguistics, pp. 2791–2801. ACL (2018)
Google Scholar
van der Aa, H., Di Ciccio, C., Leopold, H., Reijers, H.A.: Extracting declarative process models from natural language. In: Giorgini, P., Weber, B. (eds.) CAiSE 2019. LNCS, vol. 11483, pp. 365–382. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-21290-2_23
Chapter Google Scholar
Ackermann, L., Volz, B.: model[NL]generation: natural language model extraction. In: Proceedings of the 2013 ACM workshop DSM@SPLASH 2013, pp. 45–50. ACM (2013)
Google Scholar
Adamo, G., Di Francescomarino, C., Ghidini, C.: Digging into business process meta-models: a first ontological analysis. In: Dustdar, S., Yu, E., Salinesi, C., Rieu, D., Pant, V. (eds.) CAiSE 2020. LNCS, vol. 12127, pp. 384–400. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-49435-3_24
Chapter Google Scholar
Bellan, P., van der Aa, H., Dragoni, M., Ghidini, C., Ponzetto, S.P.: PET: an annotated dataset for process extraction from natural language text tasks. In: Proceedings of the BPM 2022 First Workshop on Natural Language Processing for Business Process Management (NLP4BPM) co-located with the 20th conference Business Process Management, CEUR Workshop Proceedings. CEUR-WS.org (2022)
Google Scholar
Bellan, P., Dragoni, M., Ghidini, C.: Process extraction from text: state of the art and challenges for the future. CoRR abs/2110.03754 (2021)
Google Scholar
Boratko, M., Li, X., O’Gorman, T., Das, R., Le, D., McCallum, A.: ProtoQA: a question answering dataset for prototypical common-sense reasoning. In: Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, EMNLP 2020, pp. 1122–1136. ACL (2020)
Google Scholar
Brown, T.B., et al.: Language models are few-shot learners. In: Annual Conference on Neural Information Processing Systems 2020, NeurIPS 2020 (2020)
Google Scholar
Chintagunta, B., Katariya, N., Amatriain, X., Kannan, A.: Medically aware GPT-3 as a data generator for medical dialogue summarization. In: Proceedings of the 6th Machine Learning for Healthcare Conference, Proceedings of Machine Learning Research, vol. 149, pp. 354–372. PMLR (2021)
Google Scholar
Chiu, K., Alexander, R.: Detecting hate speech with GPT-3. CoRR abs/2103.12407 (2021)
Google Scholar
Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Proceedings of NAACL-HLT 2019, Vol. 1, pp. 4171–4186. ACL (2019)
Google Scholar
Epure, E.V., Martín-Rodilla, P., Hug, C., Deneckère, R., Salinesi, C.: Automatic process model discovery from textual methodologies. In: 9th IEEE International Conference on Research Challenges in Information Science, RCIS 2015, pp. 19–30. IEEE (2015)
Google Scholar
Ferreira, R.C.B., Thom, L.H., Fantinato, M.: A Semi-automatic approach to identify business process elements in natural language texts. In: ICEIS 2017 - Proceedings of the 19th International Conference on Enterprise Information Systems, Vol. 3, pp. 250–261. SciTePress (2017)
Google Scholar
Friedrich, F., Mendling, J., Puhlmann, F.: Process model generation from natural language text. In: Mouratidis, H., Rolland, C. (eds.) CAiSE 2011. LNCS, vol. 6741, pp. 482–496. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-21640-4_36
Chapter Google Scholar
Gao, T., Fisch, A., Chen, D.: Making pre-trained language models better few-shot learners. In: Proceedings of ACL/IJCNLP 2021, pp. 3816–3830. ACL (2021)
Google Scholar
Han, X., et al.: A-BPS: automatic business process discovery service using ordered neurons LSTM. In: 2020 IEEE International Conference on Web Services, ICWS 2020, pp. 428–432. IEEE (2020)
Google Scholar
Honkisz, K., Kluza, K., Wiśniewski, P.: A concept for generating business process models from natural language description. In: Liu, W., Giunchiglia, F., Yang, B. (eds.) KSEM 2018. LNCS (LNAI), vol. 11061, pp. 91–103. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-99365-2_8
Chapter Google Scholar
Liu, Y., et al.: RoBERTa: a robustly optimized BERT pretraining approach. ArXiv abs/1907.11692 (2019)
Google Scholar
López, H.A., Debois, S., Hildebrandt, T.T., Marquard, M.: The process highlighter: from texts to declarative processes and back. In: Proceedings of Dissertation Award, Demo, and Industrial Track, BPM 2018. CEUR Workshop Proceedings, vol. 2196, pp. 66–70. CEUR-WS.org (2018)
Google Scholar
Maqbool, B., et al.: A comprehensive investigation of BPMN models generation from textual requirements—techniques, tools and trends. In: Kim, K.J., Baek, N. (eds.) ICISA 2018. LNEE, vol. 514, pp. 543–557. Springer, Singapore (2019). https://doi.org/10.1007/978-981-13-1056-0_54
Chapter Google Scholar
Petrucci, G., Rospocher, M., Ghidini, C.: Expressive ontology learning as neural machine translation. J. Web Semant. 52–53, 66–82 (2018)
Article Google Scholar
Qian, C., et al.: An approach for process model extraction by multi-grained text classification. In: Dustdar, S., Yu, E., Salinesi, C., Rieu, D., Pant, V. (eds.) CAiSE 2020. LNCS, vol. 12127, pp. 268–282. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-49435-3_17
Chapter Google Scholar
Quishpi, L., Carmona, J., Padró, L.: Extracting annotations from textual descriptions of processes. In: Fahland, D., Ghidini, C., Becker, J., Dumas, M. (eds.) BPM 2020. LNCS, vol. 12168, pp. 184–201. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58666-9_11
Chapter Google Scholar
Raffel, C., et al.: Exploring the limits of transfer learning with a unified text-to-text transformer. J. Mach. Learn. Res. 21, 140:1–140:67 (2020)
Google Scholar
Sànchez-Ferreres, J., Burattin, A., Carmona, J., Montali, M., Padró, L., Quishpi, L.: Unleashing textual descriptions of business processes. Softw. Syst. Model. 20(6), 2131–2153 (2021). https://doi.org/10.1007/s10270-021-00886-x
Article Google Scholar
Sawant, K.P., Roy, S., Sripathi, S., Plesse, F., Sajeev, A.S.M.: Deriving requirements model from textual use cases. In: 36th International Conference on Software Engineering, ICSE 2014, Proceedings, pp. 235–244. ACM (2014)
Google Scholar
Scao, T.L., Rush, A.M.: How many data points is a prompt worth? In: Proceedings of NAACL-HLT 2021, pp. 2627–2636. ACL (2021)
Google Scholar
Wang, S., Liu, Y., Xu, Y., Zhu, C., Zeng, M.: Want to reduce labeling cost? GPT-3 can help. In: Findings of the Association for Computational Linguistics: EMNLP 2021, pp. 4195–4205. ACL (2021)
Google Scholar

Download references

Author information

Authors and Affiliations

Fondazione Bruno Kessler, Trento, Italy
Patrizio Bellan, Mauro Dragoni & Chiara Ghidini
Free University of Bozen-Bolzano, Bolzano, Italy
Patrizio Bellan

Authors

Patrizio Bellan
View author publications
You can also search for this author in PubMed Google Scholar
Mauro Dragoni
View author publications
You can also search for this author in PubMed Google Scholar
Chiara Ghidini
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Patrizio Bellan .

Editor information

Editors and Affiliations

Universidade Federal do Espírito Santo, Vitória, Espírito Santo, Brazil
João Paulo A. Almeida
University of Groningen, Groningen, The Netherlands
Dimka Karastoyanova
University of Twente, Enschede, The Netherlands
Giancarlo Guizzardi
Free University of Bozen-Bolzano, Bolzano, Italy
Marco Montali
Free University of Bozen-Bolzano, Bolzano, Italy
Fabrizio Maria Maggi
University of Twente, Enschede, The Netherlands
Claudenir M. Fonseca

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Bellan, P., Dragoni, M., Ghidini, C. (2022). Extracting Business Process Entities and Relations from Text Using Pre-trained Language Models and In-Context Learning. In: Almeida, J.P.A., Karastoyanova, D., Guizzardi, G., Montali, M., Maggi, F.M., Fonseca, C.M. (eds) Enterprise Design, Operations, and Computing. EDOC 2022. Lecture Notes in Computer Science, vol 13585. Springer, Cham. https://doi.org/10.1007/978-3-031-17604-3_11

Download citation

DOI: https://doi.org/10.1007/978-3-031-17604-3_11
Published: 28 September 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-17603-6
Online ISBN: 978-3-031-17604-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Extracting Business Process Entities and Relations from Text Using Pre-trained Language Models and In-Context Learning