Keywords

1 Introduction

Cognitive systems are systems built with artificial intelligence (AI) resources (thus, an AI-powered system) that learn at scale from their interaction with humans and the environment. These systems evolve naturally from such learning, rather than being explicitly programmed. It allows people to harvest insights from huge quantities of data to understand complex situations, make accurate predictions about the future, and anticipate the unintended consequences of actions [11, 13]. In this approach, humans and computers work more interconnected to achieve unexpected insights.

In order to be useful, a cognitive system must be aware of its users’ goals, so it can help him/her by bringing contextual information from multiple sources and guide him/her through the series of tasks associated with those goals. Recognize, structure, and represent user’s goals, however, is not an easy or straightforward process. For example, users might need to perform more exploratory tasks, where goals might vary according to interaction results (e.g.: visual analytics systems [10]). Moreover, if a user is performing a knowledge-intensive task that requires access, production, and consumption of a large amount of knowledge, his/her goal might be too abstract to structure in a sequence of clear and defined steps. Taking a scientific paper writing process as an example, even if you have a document template to fill up, the content is defined by the combination of several knowledge sources: your previous knowledge, new reference papers that you have just looked up, inputs from discussions with colleagues, and so on.

The knowledge structuring (i.e.: creating a knowledge base) is a challenge by itself and it has been the focus of Knowledge Engineering research [22, pp. 33, 104][23]. Once the knowledge is structured, the challenge falls on UX researchers to investigate users and their goals with that structured knowledge. Questions like who the users are, what they want or need to do, in which preferred ways, and what are users’ goals can guide UX research on this matter.

We argue that an AI-powered system could infer the user’s goals by observing his/her interaction with different applications and considering its knowledge base – about the user, the group(s) s/he is part of, the applications’ domains, the overall context, etc. With this information, this cognitive system is also able to support users on achieving those goals, since it has knowledge not only from previous times when that user performed a specific task, but also from multiple other users performing similar tasks. This system may tailor all this knowledge for a given user, creating a personal and unique dynamic with that user. Those abilities differentiate AI-powered systems from expert systems [8] – which focus on a particular domain of knowledge – and recommender systems [16] – which focus on the needs of a user.

With that in mind and focusing on the UX challenges for AI-powered systems, we present the Cognitive Reasoning Interface (CoRgI) framework. CoRgI’s development has been guided by fieldwork studies performed on a project where a knowledge-intensive process is analyzed and discussed considering the support of a cognitive system.

We start by discussing previous works that we found related to the development of a cognitive system in Sect. 2. Following, in Sect. 3, we describe our fieldwork and discuss our initial findings on how a cognitive system could support knowledge-intensive tasks. These findings led us to decisions regarding CoRgI’s architecture and preliminary user experience design, shown in Sects. 4 and 5 respectively. Section 6 discusses some user studies we plan to conduct to evaluate our solution. We conclude with Sect. 7, presenting some future directions of our work.

2 Related Work

Intelligent agents [22, 30–32] and cognitive assistants [14] have been a trending topic in AI research recently. They are not, however, a new subject on that field [6, 7]. Early research in AI revolved around the study of high-level cognition, a feature that separated it from fields like pattern recognition and robotics. Its original research goal was to augment human intellect to address complex problems. However, AI research distanced itself from that goal and focused on a more specific issue – developing algorithms and technologies – without taking into account the context where those algorithms and technologies engaged with humans [12].

A cognitive agent is defined as a software tool that augments human intelligence [6]. Langley [12] says: “(...) intelligence is the capacity to engage in abstract thought that goes beyond immediate perceptions and actions. In humans, this includes the ability to carry out complex, multi-step reasoning, understand the meaning of natural language, design innovative artifacts, generate plans that achieve goals, and even reason about their own reasoning.” There is no doubt that AI resources are powerful and innovative, but they need to be part of a bigger scene, part of a human motivated context where those resources can really augment human intelligence. Today’s AI resources are fundamental for cognitive systems’ infrastructure and development [12].

Intelligent agents have been used to support different tasks, like learning [15, 17] and knowledge workers in general [14]. The idea of supporting a knowledge worker brings the context AI needs to get back to its original goal of research – augmenting human intelligence. The knowledge related to that worker will frame the space where the cognitive assistant needs to be aware of and learn from. That knowledge is dynamic, as any knowledge base that humans interact with needs to be.

The big technological companies have presented their assistants to the public (e.g.: Microsoft’s CortanaFootnote 1, Apple’s SiriFootnote 2, and Google AssistantFootnote 3), even selling products based on such assistants (e.g.: Amazon’s AlexaFootnote 4 and Google HomeFootnote 5). But a few glitches have been observed while interacting with those assistants, like adult content been offered to childrenFootnote 6 and unsolicited purchasesFootnote 7. Those cases brought up some important and relevant questions about how aware those devices need to be of users and their contexts. A lot has been done to avoid those glitches (e.g. user recognition and parental control settings), but that kind of system is becoming more omnipresent and omniscient in our everyday lives, bringing good and maybe bad impacts. The reality of cognitive assistants on our daily lives could be compared to personal computers made available to everyone in the 70’s [1]. Every person in the world is a potential assistant’s user, even if the person is not completely aware of it or its implications. Those social implications, however, are a theme for another work.

In this paper, we focus on the UX challenges of this new scenario where people and cognitive assistants (AI-powered systems) are interacting in a symbiotic way, collaborating and learning with each other and about each other [14]. Our research is motivated by a real problem scenario related to a knowledge-intensive process [5]. Those processes are human-centered, depends on people’s experience (such as in decision-making scenarios [22]), normally evolve tacit tasks, and do not have a pre-established sequence of activities. Our work here considers the users and the interaction context of that user, the tasks performed, the related domain, and all the knowledge involved on that interaction. Focusing on a knowledge worker scenario, we have more pointers and domain knowledge bases to start the cognitive assistant development [14].

3 Fieldwork

Understanding a knowledge-intensive process requires an efficient and extensive fieldwork. Beyond that, investigating how and where an AI-powered system can be helpful in a knowledge-intensive process adds another layer of complexity on top of understanding the process itself. This work began with several face-to-face activities, precisely designed to comprehend the workflow of the target users and to detect UX improvement and research opportunities. Two researchers were responsible for conducting fieldwork activities and collecting data, that were later shared with the entire UX team working on this research project. The first activities were a series of semi-structured interviews and user observation in their work environment. We primarily focused on understanding their daily routine and all technologies supporting their work practices, including, but not limited to, software applications, auxiliary devices, and workstation configuration. Notes taken during the interviews and observations were classified in two main categories: “Tools” (applications currently being used) and “Work process.” We then created clusters in a focused approach, defining additional semantic categories for a better analysis (box 1 and 2 of Fig. 1).

A more structured activity called “case presentation” was conducted to gather a few examples of different projects and possible applications of our future solution. To this end, we created a set of cards designed and organized in several categories. Each card represented a different set of specific domain information that experts use in their everyday work, as well as actions they might take to solve a problem (e.g.: search Google). As such, these cards represented the domain language utilized by experts when making decisions. The participants were then invited to present a scenario from a project in which they were currently working, using the cards to construct a narrative. The cards were placed on a board, creating a visual storyboard that combined data exposed on sticky notes with comments, drawings, and the cards. The interviews and card activities were captured using a tripod and a camera. We also collected videos and photos of the board where the activity was taking place. This information was later analyzed and five important use cases were considered, allowing us to extract decision moments, pain points, and common practices (box 3 of Fig. 1).

Fig. 1.
figure 1

Fieldwork activities

With all fieldwork data collected and analyzed, we gained a better understanding of user’s activities and workflows, identifying opportunities for a cognitive assistant solution. Three main insights were the focus of our UX design:

  1. 1.

    The importance of the paper notebook (from field observations): We observed that users in the target context have a very strong connection with paper notebooks (usually small-sized), where they write meeting notes, to-do lists, hypothesis, and literature data. The paper notebooks support their everyday routine, revealing itself as an important instrument of their knowledge-intensive activities.

  2. 2.

    The impact of exploring and reviewing the rationale behind each decision (from interviews): Decision points were perceived as core moments that require strong grounded justification. Usually, users face strategic meetings and have to present their reasoning to stakeholders with diverse background. Thus, there is a clear need for reviewing and understanding the trail that led to each conclusion and decision.

  3. 3.

    The complex and diverse search path to answering contextual questions (from card activity): Answering contextual questions may not have a straightforward process in knowledge intensive practices. Cases described at the card activity shown that users frequently strive to discover the right source and even the right procedure to understand complex contextual questions.

In following design sessions, we devised a solution that included an annotation tool. We believe this could complement their paper notebooks, whilst allowing better semantic connections and quick information retrieval. Also, it included a history feature that aims to enable the user to articulate their rationale while making decisions, keeping track of the interaction events and the insights associated at each point. The approach for (3) is the dialog interface, where the user can use natural language to ask complex questions to the cognitive advisor, as an easy-to-access touchpoint.

4 CoRgI

From what we learned with fieldwork observation and data, we discussed and designed a technical solution for a cognitive assistant that aims to support knowledge-intensive and context-aware activities: Cognitive Reasoning Interface (CoRgI). It is a framework (seen in Fig. 2) comprised of three main components: advisor, brain, and cogs. The advisor acts as a hub where the applications can register interaction events and the user can interact with the cognitive system (as will be detailed in Sect. 5). The user may interact with different systems (e.g: use two different computers for the same goal), however, only a single advisor exists in each system.

The advisor in itself is just a front-end to the brain hosted in the cloud. The brain is responsible for storing information in the associate knowledge base regarding user preferences and events. The same brain may be connected to different advisors. Therefore, when the user is interacting with multiple systems, each advisor is connected to the same brain, creating a seamless integrated experience.

Symmetrically, the brain also acts as a hub, to which many cogs subscribe. Each cog is responsible for a single logic and, collectively, they are the “brain’s intelligence.” They subscribe to application-specific interaction events and provide insights considering the user’s current context, the user’s preferences, the application-specific databases, and the knowledge base.

Back to the example of writing a scientific paper, imagine that CoRgI is observing the user writing a survey paper for the 2018 edition of HCI International (HCII). With that input, the advisor could bring an insight about survey papers accepted on previous years at HCII, providing a navigational link to the 10 most recent ones. As an actionable invitation, the advisor could invite the user to see those papers similarities regarding their structure (e.g.: half of the papers have a section with a comparative table which is explained along the paper text, while the other half just presents the survey topics, without comparison). Finally, for the reasoning, the advisor could show the perks of understanding what a survey is, presenting hints and insights related to the subject of the user survey and common survey structures in that given conference. Providing this visualization, CoRgI could help the user to improve his/her research agenda and writing strategies, adapting the content and style to the current target conference.

Inspired by Semiotic Engineering - an HCI theory that views human-computer interaction as a form of human communication between designers and users mediated by a computer system - we consider that the user’s semiosis (i.e.: the process of sign interpretation that leads to the continuous production of meaning) while interacting with a computer system occurs in different abstraction levels [4]. It begins with the strategic level, in which the user establishes his/her goals. It is followed by the tactical level, when the user devises plans to achieve the goals considering the possibilities available in the system. Finally, at the operational level, the user performs a series of actions (operations) that are needed to execute his/her plans and, ultimately, achieve his/her goals.

Fig. 2.
figure 2

CoRgI advisor basic architecture

Thus, the logging of interaction events should follow these three abstraction levels [9]. The operational log level deals with the low-level sequence of operations, typically local interactions (e.g.: user typed text “security incident”/user clicked in button “save”). These logs usually are domain-independent and can be extracted automatically, possibly using some sort of meta-data to increase its meaningfulness. The tactical log level adds a signification layer to the logged data, adding meaning spread over longer interactive paths (i.e.: considering many operational events). These logs are domain-dependant and specific to the system (e.g.: user saved new report containing message “security incident”). The system’s designer can express this log level, since it is related to the features made available by the system (therefore planned by the designer) and closely related to its domain. Lastly, the strategic log level is related to a broader context of the user’s plans. It adds even more signification to log data, relating to longer interactive paths and presenting knowledge about who the users are, what they want or need to do, how, and why [3] (e.g.: manager John reported security incident). The strategic level is closely related to the user’s goal and can involve multiple systems, or even previous knowledge from the user. Therefore, it exists only in the user’s mental model.

Only with the knowledge of the strategical level we can help users with insights that are contextualized and meaningful. The challenge of CoRgI is to infer the strategic level from the tactical and operational available data. To obtain the tactical log level, we rely on designers instrumenting their systems and providing the correct contextual information. Although the systems’ instrumentation is not ideal, this was a conscious choice for the time being so we can focus on the other challenges, such as conceptualize the user experience of a cognitive advisor (as we discuss in the next section) and develop the intelligence behind the advisor.

5 CoRgI’s Advisor Features

As previously mentioned, the advisor is the main interface for CoRgI. In the next sections we will go through some of the planned features of the advisor.

Fig. 3.
figure 3

CoRgI advisor main features.

5.1 Dialog

Besides passive observation of interaction events, we plan to provide a dialog interface (shown in Fig. 3a) so the user can interact with the cogs using natural language. This feature may act as a shortcut and easy to access interface for the knowledge base. It will allow a rapid touchpoint between the user and the cognitive system, reinforcing interaction aspects. As similar assistantsFootnote 8 we plan to offer besides text input, speech-to-text and text-to-speech functionalities.

5.2 Annotations

Annotations (Fig. 3b) allow the user to create notes related to the current context. The user may add text, images, links, documents, and tags to an annotation. All the content of the annotation is then understood by the framework (using text/visual extraction/recognition techniques), associating concepts from the underlying knowledge base to the annotation. The annotation is then added to the knowledge base and may be considered in future insights.

Besides allowing the creation of new annotations, the advisor also allows the visualization of previously stored annotations associated with the current context. The advisor also offers a link to a separate environment to explore all the annotations, providing new ways to explore the annotations knowledge base. For example, the user could navigate through the knowledge base concept graph in order to search for annotations.

5.3 Events History

The events’ history (shown in Fig. 3c) allows the user to check what CoRgI is learning from him/her and its contributions. The user can revisit the interaction events being observed by CoRgI and the annotations (s)he made. The user can also verify the insights and notifications triggered by CoRgI.

Inside the advisor, only a recent history will be shown along with a link to an external environment to explore the complete history. Inspired by previous research [19, 20], in this visualization, we plan to highlight which interaction events were related to the insights and notifications.

5.4 Insights

Insights are the advisor’s main feature and, as such, occupies a prominent space in the proposed UI (colored rectangles in the top of Figs. 3a and c). They appear according to the user’s interaction, contextualized to the actions being performed. They are, therefore, dynamic, changing constantly to accompany the user’s actions.

Each insight will have a feedback system, so the user can inform if the insight was useful or not. The user’s feedback will be used to improve recommendation, personalizing the insights for the user, and improving the underlying knowledge base.

Each insight may also provide additional interaction rather than only the insight’s conclusion. It may contain a navigational link, so the user can explore related information, even from other systems (e.g.: suggesting to read an academic paper from an external website). It may also contain an actionable invitation, inviting users to start a new interaction path related to the previous one (e.g.: asking if the user wants to notify co-authors when s/he finishes writing a paper).

Last, but not least, the insight may explicit its reasoning, explaining to users which interaction events and information were used to get to that insight. Exposing the rationale behind the insight will open the path to a new degree of feedback, since the user may provide input for each computational step, not only the final conclusion. This would also allow the user to “tweak” and personalize the cogs.

5.5 Notifications

Notifications share the same UI space as insights (as in Fig. 3b) and may share interaction patterns (feedback system, navigational links, actionable invitations). A major difference, although, set them apart: their temporal nature. While insights are dynamic, changing constantly due to the user interacting with systems, notifications are static – once they appear, only the user may dismiss them.

They should be used to alert users, usually due to some long-term processing. For example, consider a web application in which the user may run scientific algorithms that take some time to complete. The application may use the CoRgI framework to notify the user when the algorithm is complete, instead of relying on page-based or browser-based notifications, that depends on the web page being open on the browser.

6 Initial Validation

We are currently conducting a series of studies to investigate the usability and effectiveness of the designed solution. The same project that provided the fieldwork that guided CoRgI’s development was an opportunity to validate our initial ideas. To help turn the idea more concrete and better communicate with users and stakeholders as well, we sketched several user interfaces (see examples on Fig. 3). Based on these drawings, we created a storyboard, illustrating the use of the envisioned system based on a real use-case presented by users, using the advisor to empower a knowledge-intense process. The storyboard was the foundation of a sketch based video that connected different features of the cognitive system through a consistent storyline. The video was shared with potential users with the intention of collecting feedback and validating the main concept of the designed system. Discussions with the clients and also the development team were rich and insightful for the UX team. The video worked very well as a material to reflect on the knowledge-intense tasks related to the client’s domain [18] and also to experiment and discuss UX ideas of design interfaces and interaction flows. The positive feedback encouraged us to continue the design and development of the tool and to plan further testing activities (e.g. usability and communicability studies).

7 Discussion and Future Work

The fieldwork described in this paper allowed us to propose and discuss CoRgI. Evaluating it in a real context is essential for constructing a user-centered and context-aware cognitive solution, iterating the development considering the users’ feedback. We still have the opportunity to execute other fieldwork studies with the same users, which will allow us to evolve our investigation around CoRgI and the related UX design.

As future usability studies, we plan to apply paper prototyping technique for rapidly testing design versions of the interface and simulating user interactions. Representative users will be asked to perform specific tasks using hand-drawn screens. One of us will facilitate the activity, manipulating the pages and playing the role of the computer, while the others observe and take notes. This low-fidelity prototype will be very useful for refining interaction issues and improving CoRgI’s user experience [21].

We also envision to apply the Wizard of Oz methodology [2] in order to test and explore a more detailed user journey. In this technique, a human simulates the advisor’s intelligence, reproducing an experience very similar to the one we are developing. It will help us evaluate and analyze interaction issues and also gather possible inputs for the dialog interface, facilitating the implementation of a more focused and context adapted natural language processing algorithm.