1 Introduction

In today’s dynamic, agile and collaborative environments, software design and development has metamorphosed into a complex social activity, involving teams of software architects, developers, testers and maintainers. In response to this need, several collaborative and social development frameworks for software engineering have been implemented and are in wide use [2].

In collaborative software development, given a set of requirements, typically, several iterations, deliberations and informal discussions are undertaken among the design team members, before a final consensus can be reached, on the features that are to be included in the concrete realisation of the artifact. The requirements and discussions, if at all documented, are recorded as unstructured text, that makes their search and retrieval a cumbersome process. Further, during and after implementation and deployment of the software, several issues may typically arise and get recorded either as part of the maintenance process or in response to new requirements. One critical shortcoming of collaborative environments in wide use today such as Atlassian ConfluenceFootnote 1, JIRAFootnote 2 and GithubFootnote 3 is that they provide generic fields such as “issue” and “comment” which encapsulate all discussion types. This makes it extremely difficult to retrieve information relevant to a specific aspect of a requirement. Further, they also do not provide the interfaces needed to associate design requirements with implementation issues in a structured and systematic way. Integrating the requirements with the issues which should be a core functional capability of most collaborative design environments, is not yet well supported.

In this paper we present an ontology-driven approach that captures the knowledge emerging during software design, development, implementation and maintenance and exploits it for unified governance of the engineering process. The term “unified governance” was defined in the ALIGNEDFootnote 4 project and denotes capturing knowledge of the software development lifecycle from various sources and expressing it using a common ontology so that it can be governed, i.e., queried and new knowledge inferred, in a unified way. We exploit DIO (Design Intent Ontology)Footnote 5, a content ontology design pattern, that provides a generic mechanism for formally describing the intents or the rationales that emerge or are generated during the processes that underlie modern design decision phases. Our approach conforms to one of the main criteria for systems that record design rationale and decisions, mainly that it should be least disruptive to the designers and the actual design process itself [3, 5].

We evaluate our approach within the settings of an industry-driven use case from the Semantic Web Company (SWC henceforth)Footnote 6, where customers and developers of the PoolParty Thesaurus Server (PPT henceforth) document their requirements in a tailored version of Confluence and file bug reports or raise design issues using JIRA. By aligning and creating merged repositories of requirements, customer feedback, bug reports, project documentation and mining of web resources we are able to exploit the integrated knowledge to develop a semantic search mechanism for unified governance.

The remainder of this paper is structured as follows: Sect. 2 discusses related work. Section 3 describes our use case scenario. Section 4 presents the requirements that need to be addressed by a unified governance framework. Section 5 describes our knowledge mining and representation approach. Section 6 presents the architecture and implementation. Section 7 outlines our evaluation strategy and finally Sect. 8 presents conclusions.

2 Related Work

Extensive research [4] has shown that capturing design intents is a very difficult problem. Surveys [8, 11] on the capture of design rationale and barriers to their uptake have also been reported. IBIS [9], one of the most widely adopted argumentation models, is a network-oriented model based on capturing the issues, positions and arguments underlying a design deliberation.

Representing design intents or design rationales as ontologies have been explored for various specialised domains such as software engineering [6], ontology engineering (OE) [13], product engineering [14] and aerospace engineering [10]. However there is no generic, domain-independent design intent capture model available as a design pattern that can be specialised for any design rationale capture scenario.

Based on IBIS, one of the first attempts to provide an ontological representation of design rationale was by Medeiros et al. [6]. The authors proposed the Kuaba ontology for the representation of model-based software design. A major limitation of this approach is that the reuse of the design rationale is strongly dependent on the formal model used for artifact design, thereby introducing encoding bias [7].

An argumentation ontology has been proposed as part of the DILIGENT [13] framework for the collaborative building of ontologies. The basic conceptualisation in the ontology has been derived from IBIS and extended to include argumentation-specific entities, particularly those suitable for ontology engineering. The critical limitations of this ontology are that the issues are not interlinked to the requirements.

The fundamental difference between the various semantic and non-semantic models for representing design rationale and DIO, lies in the granularity of the representation. IBIS-based models provide a single conceptualisation for many aspects of a design intent, while DIO provides a finer level of representation, thereby making the specification more expressive, without compromising computability. As an example, typically the terms, “Idea”, “Position” and “Solution” have been used to represent the solution to a “Question” or an “Issue”, while the term “Argument” has been used for representing the rationales for and against a proposed solution. On the other hand, DIO incorporates the mutually disjoint concepts of “Argument” and “Justification” to represent the distinct rationales as well as the mutually disjoint concepts of “AlternativeSolution” and “MandatedSolution” to represent all solutions and accepted solutions respectively.

Table 1 compares the conceptual abstractions defined in DIO with those in some of the approaches discussed above.

Table 1. A comparison of the ontological approaches for capturing design intents

Wikidsmart by zAgileFootnote 7 claims to provide “semantic enablement” to Atlassian Confluence by capturing the data in a semantic repository, using “a set of ontologies and metamodels specific to a domain of interest”. It is, however, unclear what ontologies are used by the framework’s semantic repository to model the information contained in Confluence. Our work follows a similar approach to Wikidsmart in the way that we also extract information from existing Confluence and JIRA installations and express it using semantic technologies. However, we contribute and publish reusable OWL-based ontologies for modeling this knowledge (DIO and DIOPP), ready to be processed by a wide range of tools of the Linked Data technology stack.

3 Motivating Scenario

The motivation for our work is the current setup at SWC, where Atlassian Confluence is used to support requirements engineering and JIRA is used by team members and SWC customers for issue and change tracking, organising ideas from team members as well as collecting them from customers. Following the agile methodology of software development, the data is recorded in Confluence under headings such as “Requirements”, “Goal”,“User Story”, “Epic” and “Stakeholders”. Additional fields such as “Precondition”, “Detailed description”, “Acceptance criteria & Test scenario” are included to provide further context to the requirements. A single field, “Comment”, captures the opinions/discussion carried out by human agents.

SWC collects the requirements for each version of PPT in the designated Confluence space. Requirements are then linked to pages containing epics and user stories. Most of these pages are structured based on standard templates defined by SWC. The outputs from these template-based pages are largely document-centric and require extensive human intervention to synthesise and synchronise them with PoolParty development tasks.

By using ontologies to annotate and provide metadata to the content extracted from Confluence and JIRA, SWC would be able to create merged repositories of requirements, customer feedback, bug reports and project documentation thereby consolidating PoolParty experiences, customer ideas and market needs in order to integrate them into products. This is a key factor for successful development of SWC products and for raising customer satisfaction and enterprise agility. The integrated information would enable the mining of intents that lead to the development in PoolParty. Questions asked by customers will flow faster into the requirements engineering system. The process will help to generate concise reports on distributed business objects and entities relevant for the development processes, and to coordinate the data management and development workflows required to deliver new versions of the evolving PoolParty product.

4 Requirements for Unified Governance

Based on the challenges that need to be addressed to enable the mapping of design intents for unified governance, where intents are derived from requirements in Confluence and issues in JIRA, we identified the following requirements:

  • Enabling Knowledge reuse: As new agents — humans and software get involved during various phases of design, new processes are introduced. The key motivation behind recording the intents underlying decisions is that the knowledge can be used to inform future agents and processes of the discussions that have happened so far.

  • Shared semantics via annotations: The framework must facilitate a common understanding and uniform interpretation of the design intents among the participants of the decision making process. The unified representation format could act as a lingua franca between various tools and resources that assist in the design and implementation process.

  • Flexibility: The ontology conceptualisation should be flexible enough to be mapped to current requirement specification environments and primitives. It should facilitate the integration of interfaces for capturing design requirements and the design issues.

  • Traceability: It must allow forward and backward traceability of final designs as well as initial design requirements from design intents and issues.

5 The Knowledge Representation Framework

A high level representation of the architecture implemented for integrating the requirements and the issues is illustrated in Fig. 1. The core components of the architecture are the ontologies, the Unified Views extraction pipeline and the unified governance dashboard that provides visual analytics for the results of searches over the integrated datasets.

The DIO content design pattern provides a minimalistic abstraction and defines conceptual, generic entities for the modelling of semantically enriched knowledge required to capture the intents or rationale behind the design of an artifact. DIO is a domain agnostic pattern and most domain specific design rationale models will specialise from it.

Fig. 1.
figure 1

The knowledge representation and extraction architecture

Figure 2 illustrates the graphical representation of DIO. It depicts the entities defined for the pattern and their relationships with entities from PROV-OFootnote 8.

Fig. 2.
figure 2

Graphical representation of DIO

The axiomatisation of a DesignIntentArtifact can be represented in OWL using the Description Logic [1] notation as,

$$\begin{aligned} \mathtt {DesignIntentArtifact} \sqsubseteq ((\exists \mathtt {wasAttributedTo}.\mathtt {DesignIntent}) \\ \sqcap (= 1\mathtt {wasAttributedTo}.\mathtt {Agent}) \sqcap (\ge 1\mathtt {description}) \\ \sqcap (=1\mathtt {version}) \sqcap (=1\mathtt {generatedAtTime})) \end{aligned}$$

The axiomatisation for a DesignIssue captures the intuition that a design issue can have several alternative solution, but just one mandated solution.

$$\begin{aligned} \mathtt {DesignIssue} \sqsubseteq (\mathtt {DesignIntentArtifact} \\ \sqcap (\forall \mathtt {hasAlternativeSolution}.\mathtt {AlternativeSolution}) \\ \sqcap (\forall \mathtt {hasMandatedSolution}.\mathtt {MandatedSolution}) \\ \sqcap (=1\mathtt {hasMandatedSolution}.\mathtt {MandatedSolution})) \end{aligned}$$

For further details, the interested reader is referred to [12].

As DIO is a general purpose design intent ontology, it does not capture attributes specific to PoolParty development. To bridge this gap, we define DIO-PPFootnote 9 - an extension to DIO that is specifically aimed at capturing conceptualisations from Confluence and JIRA. Besides DIO-PP, we also define bespoke mappings to DIO entities, for the data recorded during requirements capture.

5.1 Mapping Epic to Design Requirements

The SWC Confluence environment consist of two main parts: The Epic and the User story.

The Epic captures the high level description of the requirements and the stakeholders affected or involved in addressing the requirements. An Epic consists of: goal (the high level explanation of the requirement), requirements, the set of user stories and stakeholders. The mappingFootnote 10 from the informal requirement specification in Epic to DIO is done as follows: Every Epic, corresponds to a dio:DesignRequirement.

$$\begin{aligned} Epic \rightarrow \mathtt {dio:DesignRequirement} \end{aligned}$$

The epic’s title is mapped as a dc:title for the dio:DesignRequirement

$$\begin{aligned} Epic/title \rightarrow (\mathtt {dio:DesignRequirement}\;\; \\\mathtt {dc:title}\;\; \mathtt {xsd:String}) \end{aligned}$$

The epic’s goal is mapped as a dc:description for the dio:DesignRequirement

$$\begin{aligned} Epic/goal \rightarrow (\mathtt {dio:DesignRequirement}\;\;\\ \mathtt {dc:description}\;\; \mathtt {xsd:String}) \end{aligned}$$

5.2 Mapping User Story to Design Intent

Each Epic includes a set of requirements, each of which are represented as a “User story” in Confluence. Each user story captures the design intent. Every User story, corresponds to a dio:DesignIntent.

$$\begin{aligned} User\;story \rightarrow \mathtt {dio:DesignIntent} \end{aligned}$$

The user story’s title is mapped as a dc:title for the dio:DesignIntent

$$\begin{aligned} User\; story/title \rightarrow (\mathtt {dio:DesignIntent}\;\; \\ \mathtt {dc:title}\;\; \mathtt {xsd:String}) \end{aligned}$$

The user story’s description is mapped as a dc:description for the dio:DesignIntent

$$\begin{aligned} User\; story/description \rightarrow (\mathtt {dio:DesignIntent}\;\; \\ \mathtt {dc:description}\;\; \mathtt {xsd:String}) \end{aligned}$$

The use story may include a link to a JIRA issue, if the issue relates to certain aspects of the requirement. This is captured using the predicate dio:generatesIssue

$$\begin{aligned} User\; story/jira\; issue \rightarrow (\mathtt {dio:DesignIntent}\;\; \\ \mathtt {dio:generatesIssue}\;\; \mathtt {dio:DesignIssue}) \end{aligned}$$

The user story includes part of the solution, that addresses the requirement and is the most significant part of the mapping. The user story captures the following elements that contribute to an AlternativeSolution and MandatedSolution for a DesignIssue in DIO: Preconditions, Detailed Description, Affected Components, Acceptance criteria and test scenarios, Variations, Comments. The DIO-PP ontology conceptualises these elements.

$$\begin{aligned} \mathtt {diopp:Precondition} \sqsubseteq \mathtt {dio:Assumption}\\ \mathtt {diopp:AffectedComponent} \sqsubseteq \mathtt {dio:Heuristic}\\ \mathtt {diopp:AcceptanceCriteria} \sqsubseteq \mathtt {dio:Evaluation}\\ \mathtt {dio:Comment} \sqsubseteq \mathtt {dio:Argument} \\ \sqcup ~\mathtt {dio:Justification} \end{aligned}$$

The user story’s Detailed Description element is mapped as a dc:description for the dio:AlternativeSolution

$$\begin{aligned} User\;story/detailed\;description \rightarrow \\ (\mathtt {dio:AlternativeSolution}\;\; \mathtt {dc:description}\;\; \\ \mathtt {xsd:String}) \end{aligned}$$

5.3 Mapping JIRA Design Issues

JIRA is used as the platform by PoolParty customers to highlight design issues related to the software and request new features. Typical properties that need to be ascribed to a design issue documented in JIRA are the issue type, IssueType, its status, Status, its priority, PriorityType, its resolution status, ResolutionType and the various agents that are associated with the issue: Watcher,Assignee and Reporter. The DIO-PP ontology defines these entities. The axiomatisation of a DesignIssue for PPT can be represented as,

$$\begin{aligned} \mathtt {DesignIssue} \sqsubseteq ((=1\,\mathtt {isOfIssueType}.\mathtt {IssueType}) \sqcap \\ (=1\,\mathtt {hasVersionType}.\mathtt {VersionType}) \sqcap \\ (=1\,\mathtt {hasPriority}.\mathtt {PriorityType}) \sqcap \\ (=1\,\mathtt {hasStatus}.\mathtt {IssueStatusType}) \sqcap \\ (=1\,\mathtt {hasResolution}.\mathtt {IssueResolution}) \sqcap \\ (=1\,\mathtt {hasReporter}.\mathtt {Agent}) \sqcap \\ (=1\,\mathtt {hasAssignee}.\mathtt {Agent}) \sqcap \\ (\ge 1\,\mathtt {hasAffectedComponent}.\mathtt {Component}) \sqcap \\ (\ge 1\,\mathtt {hasWatcher}.\mathtt {Agent}) \sqcap \\ (\ge 1\,\mathtt {hasEnvironment}.\top ) \sqcap \\ (\le 1\,\mathtt {updateDate}) ) \end{aligned}$$

6 Data Extraction Workflow

As highlighted in Sect. 3, SWC collects the requirements and issues for each version of PPT in the designated Confluence space and JIRA respectively. This data needs to be extracted and semantically annotated with DIO and DIO-PP. To perform the data extraction and conversion process, we provide three contributions: (1) an extraction tool which connects to the Confluence and JIRA instances for PPT and converts the contained data into RDF. (2) a Data Processing Unit (DPU) for UnifiedViewsFootnote 11 which wraps the extraction tool so that it can be included into custom data processing pipelines. (3) a UnifiedViews pipeline, which encompasses data extraction, data annotation using a PPT thesaurus and loading the annotated data into a remote Virtuosos triple store.

6.1 Extraction Tool

The extraction tool connects to the Confluence and JIRA REST APIs and retrieves (meta-)data that should be converted into the RDF output format using DIO-PP. While data from JIRA can be retrieved in structured JSON format, certain information we need to formalize must be retrieved by parsing HTML pages from the Confluence installation. The tool consists of two independent extraction components - one extracts data from Confluence, the other one from JIRA. The Confluence extractor iterates through all pages in a hierarchy below a page named “Requirements PP” in the PPT development space, retrieves the JSON metadata (e.g., history information or creation date) of these pages and parses the HTML content. This data is then used to build an in-memory object model which is serialized as RDF data. Extraction and conversion of JIRA data works in a similar way, except that no HTML parsing needs to be performed because the API provides the information in JSON format.

6.2 Integration into a Processing Pipeline

As the extraction tool produces an RDF document, we can leverage the potential of Linked Data to enrich the extracted data with information coming from other sources. Specifically, we enrich data from Confluence and JIRA with annotation (tagging) information coming from an enterprise thesaurus, allowing to, e.g., improve the search functionality. For the Unified Governance use case we show how to add tagging information by providing a pipeline for the UnifiedViews tool, which is illustrated in Fig. 3. The pipeline consists of seven DPUs: the extraction tool (red box) retrieves data from Confluence and JIRA in RDF format. It is then filtered (blue box below) so that it only contains text literals which serve as an input for annotating. This annotation stage (box below filtering step) connects to the PoolParty Extractor annotation API and returns a RDF graph containing the identified annotations for each resource. Afterwards, the original extracted data is combined with the filtered annotated data and both written to the local file system as well as uploaded to a Virtuoso server.

We ran the extraction process on a subset of the requirements (5394) and issues (184534). This resulted in 215401 triples. The extraction is now run at regular intervals with the resulting triples added to the triple store.

Fig. 3.
figure 3

UnifiedViews extraction pipeline

7 Evaluation

In order to assess the benefits of using semantically annotated and integrated requirements engineering datasets for unified governance, we carried out an evaluation that compared the time taken in retrieving requirements and issues from Confluence and JIRA against the time taken when using the graph-search powered unified governance dashboard. Our evaluation involved six participants (members of the consulting team at SWC) with a background in software engineering. All the recruited personnel had reasonable experience in the use of Confluence, JIRA and ontology interpretation.

Confluence and JIRA provide a keyword-driven search facility to retrieve information specific to requirements and issues respectively. The results are returned as natural text. The unified governance dashboard is based on the entities captured in DIO, whose design is governed by competency questions [12], that cover the spectrum of requirements engineering knowledge which should be recorded. In order to measure the accuracy, ease and efficiency of retrieving structured and semantically enriched requirements engineering datasets against the information retrieved through Confluence and JIRA search, our evaluationFootnote 12 consisted of two main tasks:

  • Task 1: Searching Confluence and JIRA to retrieve requirements and answer the corresponding competency questions.

  • Task 2: Using the unified governance dashboard to answer the same competency questions.

Six competency questions were designed using inputs from a wide variety of use cases routinely encountered by the customers and developers of PPT. These questions were chosen after consultation with social scientists, developers and consultants of PPT. In Table 2 we present fourFootnote 13 of these questions, along with the use case and the mapped construct from the DIO and DIOPP ontology which were used to annotate the highlighted entities in the question. Questions 1–3 are specifically targeted towards retrieving information annotated with DIO and DIOPP while question 4 is based on doing a full text search within both the systems.

Table 2. Competency questions used for evaluation

Task 1 involved the following steps:

  • Reading and interpreting the competency questions.

  • Searching Confluence and JIRA.

  • Retrieving information leading to answers to the questions.

  • Documenting the search strategy used in retrieving the answers and analysing the results.

Task 2 involved the following steps:

  • Reading and interpreting the competency questions.

  • Using the facet browsing and full text search features of the unified governance dashboard to retrieve potential answers.

  • Documenting the search strategy used in retrieving the answers and analysing the results.

The volunteers were divided into two groups, A and B. Group A executed task 1, followed by task 2, while for Group B, it was the other way around. All volunteers were provided with Turtle serialisation of the DIO ontology a week before the evaluation. On the day of the evaluation they were provided with the six competency questions, access to Confluence and JIRA and access to the unified governance dashboard. The screenshots of the dashboard interfaces are presented in Fig. 4. In order to maintain consistency across volunteers, they were advised to answer the questions in the prescribed order and the evaluation was undertaken under supervision. For each of the tasks and for each question, the following observations were recorded: (1) The time taken to answer each question in task 1 and 2. This was the total time taken for searching and analysing the results. (2) The answers for each question in task 1 and 2. (3) The strategy used to answer the question.

Figure 5(a – d) illustrates the time taken to answer each of the six questions by four of the participants across the two tasks, while Fig. 5(d) illustrates the average time taken, after discarding the extreme outliers for each participant. Our evaluation shows that the overall time taken to search the integrated datasets annotated with DIO and DIOPP and answer the six competency questions across all participants is 50% less when compared to the time taken to search Confluence and JIRA for the same questions. This is a significant and impressive saving in resources for the SWC, where consultants and developers need to manage a large number of customer requirements and issues on a regular basis. The evaluation also highlighted the weakness of the full text search implementation used in the unified governance dashboard, where for atleast three of the participants, to search answers to questions 5 and 6 took significantly longer as compared to using the full text search provided by Confluence and JIRA. As we had asked the participants to record their search strategies and use of the interface, we were also able to identify pitfalls and bottlenecks in the current design of the unified governance dashboard as an additional evaluation result.

Fig. 4.
figure 4

Unified governance dashboard with full text search and faceted browsing

Fig. 5.
figure 5

Timings recorded by four participants and the average time efficiency for all participants and all questions.

8 Conclusions

In this paper, we have proposed a framework that exploits the DIO ontology for integrating the requirements and issues arising during software design, implementation and maintenance. We have evaluated our approach on an industrial software engineering case study for the PoolParty Thesaurus server development and maintenance. We have shown how a minimalistic extension of DIO can be utilised to associate and extract software design rationales with requirements recorded in Confluence and issues submitted in JIRA for PPT. Our evaluation shows an impressive 50% reduction in the time taken to search semantically annotated and integrated datasets when compared to the search provided by Confluence and JIRA. The evaluation also revealed the limitation of the full text search feature currently implemented in the unified governance dashboard. Improving this will be the focus of our immediate future work. We have presented an end-to-end solution that provides significant improvements in productivity, agility and efficiency in software development environments.

In future we aim to build on the results emerging from periodic and detailed empirical evaluation carried out on the semantic search interface for queries with various levels of complexity and implement an ontology-driven, design-intent powered, self learning system for software engineering environments.