Keywords

1 Introduction

bRIGHT [1, 37]—a Human-Computer Interaction (HCI) research framework and workstation developed at SRI over the last 7 years—is the basis for the HCI innovations and concepts we discuss in this paper. The objective of this technology is to develop and experiment with visualization and interaction modalities that will yield revolutionary insight into demands placed on the end user’s cognitive load and contextual processing when solving complex problems in domains such as cyber security, intelligence analysis, disaster recovery, or Battle Management Command and Control (BM2C). Our research in the last few years has allowed us to gain insights on techniques for accurately modeling user context, predicting and tracking user interest, as well as identifying workflows and learning procedures by demonstration. The bRIGHT research framework was also developed to support distributed real-time collaboration among geographically disparate teams. The system architecture and its back-end server were designed based on Massively Multi-player Online Role Player games (MMORPG) such as World of Warcraft and League of Legends to support problem solving in collaborative contexts. Because our framework will scale to large user bases, one can imagine a new scale of collaborative complex problem solving in the future.

Current research in HCI and Artificial Intelligence (AI) seldom achieves the prediction accuracy needed to pro-actively support the end user because researchers are not able to access the user’s context. Improved interface modalities, such as voice, gaze, or gesture, provide some enhancement to user input processes. However, they cannot address fundamental challenges in big data manipulation, such as mitigating cognitive overload caused by the speed and volume of new information or supporting human-human and human-machine collaboration. Achieving higher performance and productivity will require novel approaches. Three fundamental challenges face decision makers and operators handling large data sets; they must:

  1. (a)

    Enhance the cognitive performance of users by reducing the overwhelming influx of data. This involves accurately filtering information based on the users’ contexts, goals, and needs, and providing only highly relevant information at the right time and in the right amount and format;

  2. (b)

    Enable users to rapidly execute decision processes and actions through task automation and adaptive interfaces. This involves anticipating user needs and putting the necessary, context-specific execution controls at their fingertips (e.g., using dynamic interaction models such as proximity detection for context-specific placement of user interface [UI] controls); and

  3. (c)

    Encourage effective collaboration by sharing users’ contexts so that individual actions are informed by, and contribute to, collective knowledge across an entire team that may be geographically disparate (Fig. 1).

    Fig. 1.
    figure 1

    A 3rd-generation bRIGHT device being used at SRI International’s Menlo Park Campus in defense against a cyber attack.

Based on our findings in the last 6 years of research, we believe future workstations used by knowledge workers should be supported with a wide array of design considerations and features (both hardware and software). This paper is organized into five main sections. In the Sect. 2, we review research and development work and describe important hardware- and software-related advances. Section 3 covers our objectives in developing bRIGHT, and the design considerations and features for an ideal end-to-end future workstation framework are discussed in Sect. 4. In Sect. 5, we present the lessons we have learned, new research avenues we propose, and the future goals of the bRIGHT project.

2 Related Work

We have been keenly aware of the significance of Cognitive Task Analysis (CTA) in terms of gaining insight [44] into the issues related to cognitive load, workflows, skills, skill acquisition, and task automation for some time. Historically, CTA has been used with great success to design applications and systems for human interaction [34,35,36]. In a sense, bRIGHT is an extension to the typical CTA processes because it enables us to capture and analyze with improved accuracy a broader range of factors that contribute to HCI bottlenecks and cognitive load. bRIGHT utilizes semantic interaction and visualization models to capture the end user’s context in a very rich fashion, see [1, 37] for a detailed discussion on how this is achieved.

CTA is also quite useful in helping us develop predictive models of human performance and understanding application models [38,39,40]. In our research, we identified the need for such predictive modeling early on (due to client requirements). To successfully implement contextual filtering, predictive task automation, contextual auto-fill, and UI control pre-positioning, we needed to leverage work done in developing such models [34, 41, 42]. Typically, such work has historically been carried out by two communities: The Human Factors Engineering Community and The Cognitive Psychology Research Community. We investigated the approaches taken by both communities to deepen our understanding of pertinent techniques and help us develop the theoretically unified architecture that is the basis of the HCI testing framework for bRIGHT.

The cognitive demand of most jobs has increased rapidly during the information age [44]. bRIGHT’s ability to play a key role in identifying skills and skills development and its use in evaluating effectiveness of training depends on building an HCI framework that can leverage applied cognitive analysis in such situations. We also studied various cognitive engineering approaches [43] to help us understand the implications of human factors methods on the design and engineering of the system.

To gain insight into the user’s engagement with the system, we added gaze-tracking support to the bRIGHT systems [1, 3]. Gaze tracking (GT) is the process of detecting gaze coordinates on a display to indicate what a person is looking at. Recently, GT has proven useful in conventional HCI research [3,4,5] as well as in studies of human cognition [6,7,8,9,10,11]. Many explorations of nonintrusive [12,13,14,15,16,17,18,19,20,21] and intrusive GT techniques [21,22,23,24] have been made to aid development of accurate, efficient, and user-friendly systems. Nonintrusive methods are preferred because they have the potential to increase a user’s GT comfort level.

bRIGHT is an extensible multimodal input system. We developed a workstation design in which new input modalities can be added as needed as determined by evaluating end-user context. The systems contextual modeling and other features can dynamically adapt to such novel input modalities. Indeed, the user benefits of multi-modal input include flexibility and reliability the ability to meet the needs of a variety of operators [25,26,27]. Such systems improve efficiency, allow alternate interaction techniques, and accommodate individual differences, such as those associated with permanent or temporary handicaps [28,29,30,31,32,33].

3 Objectives

bRIGHT was initially conceived as a platform or framework for HCI research. Our main objective at the time was to develop new interaction models based on emerging sensor technology so that we could increase the input bandwidth from end users to a computer system. Over last 6 years, the R&D has evolved and we are now building an experimental platform that enables us to test hypotheses regarding task automation and extrapolate them across large-scale collaborative teams.

We soon learned that capturing the user’s context with a high degree of accuracy was invaluable as we designed future research. As such, we enhanced traditional application modeling with interaction models and semantic visualization models (see below and [1]) to accurately capture users’ contexts. Even so, we came to understand a great deal of basic science still needed to be unraveled. For example, we needed to identify and track the evolution of the user’s interest over time by examining the things that populated her contextual model. We addressed this challenge by developing a rule-based contextual model that highlights user interest and dynamically changes over time. This contextual user model is now the central component of bRIGHT and is the basis for developing solutions to meet our other technical objectives. We propose to:

  • Develop a mechanism to identify and extract user workflows from the user’s contextual model and understand its evolution over time.

  • Establish metrics based on the user context to identify opportunities to apply contextual filtering.

  • Increase the accuracy and robustness of the contextual auto-fill mechanism that aides manual input.

  • Experiment with large teams of end users to improve the contextual modeling approach with regard to collaborative user groups and develop a ‘hive-mind’ model to reflect the group’s overall context and its dynamics.

  • Investigate technical approaches that could conceivably be used to build a contextual desktop that accurately reflects the user’s current mental model.

4 Methodology

The bRIGHT approach is based on the development of solutions for semantic interaction modeling and semantic markup tools (see below for details on the approach). We semantically enhance selected software that is used by our clients and develop an HCI experimentation platform by incorporating the semantically enhanced software within the bRIGHT framework. bRIGHT includes data and reporting capabilities that allow us to identify key areas in end users’ context models and understand how to improve their workflows. We then design and conduct HCI experiments to develop contextual and cognitive models of decision makers or system end users. These models form the basis for developing algorithms for task automation, contextual filtering, and context-based recommendation, and for learning by observation and user interface (UI) control pre-positioning using proximity detection, measures of effectiveness, and error reduction.

4.1 Semantic User Interaction Modeling

Semantic User Interaction Modeling involves two parts: (a) instrumenting software applications with semantically meaningful data to report how users interact with the application (i.e., what actions users perform) and (b) instrumenting the end users’ environments with touch, gaze, and proximity sensors to report what information users looked at. Correlating the information from the application instrumentation with that from the user’s environment allows us to create a semantically meaningful record of what the user did and perceived, and that forms the basis for contextual and cognitive modeling. While the user’s action model is highly accurate, the semantic interaction model accounts for uncertainty due to inaccuracies in gaze tracking fidelity: it is not possible to model gaze interactions to a degree such that we can say with absolute certainty that the user has read an onscreen construct.

4.2 Semantic Markup Tools

Instrumenting applications can be time-consuming depending on the complexity of the application. We have designed a methodology and protocols to instrument software when the source code is available and can be extended and recompiled. Since this is not often the case, we now propose to develop tools that will allow a person proficient with the application and the end user’s context to add semantic markup to applications through onscreen markup techniques that highlight essential elements of information or context relevant to the modeling. This will eliminate the need for recompilation of source code.

4.3 Highly Accurate Contextual Models

Accurate user context needs to be modeled to reflect the user’s reasoning and decision-making value system. Given the central role of decision making in any HCI-related research, it is critical to have a framework that allows for experimentation and iterative improvement of cognitive and contextual modeling so that the results of such experiments can be incorporated into future versions of the system. To accomplish this, we design experiments to estimate and predict user interest and intent; determine the effect of contextual filtering on cognitive load; detect skill acquisition and measures of proficiency and effectiveness; and identify collaboration during task performance and decision making using contextual indexing [1].

4.4 Active Authentication

Using multi-factor authentication (typically face recognition), our bRIGHT system is designed to continuously authenticate the user and digitally stamp every atomic action he performs. Presently, this can be done using face-recognition or iris recognition. There are some very critical advantages of active authentication:

  • It significantly increases security of the system

  • Continuously auditing user action creates an electronic trail of each performed action. Such audit trails can be used to better understand causes for errors and improve end-user training. They can also help us design systems that prevent user fatigue (or other issues) that can cause users to claim they did not see an instruction.

  • If invoking certain execution logic requires active authentication, then network-based attacks may become less effective against such a system in the future.

4.5 Platform Handler

Most users utilize a multitude of devices with varying form factors throughout their work day. These can range from large-screen workstations with support for multi-touch to smartphones. Depending on the type of hardware employed by the user, bRIGHT adapts the way information is rendered on screen and the types of interaction offered to the end user.

We will develop a platform handler to recognize the capabilities of the hardware available to the user and, accordingly, will design how the contextual desktop is rendered, and which interaction models are provided. The benefit is that end users will experience a seamless management of their context by the system, irrespective of the hardware they use. The platform handler is enabled by programmable interaction models that will be specifically tailored to different hardware platforms. For example, the conventional mouse and keyboard will constitute one such interaction model, though with lots of limits in terms of the dynamicity (e.g., the position, appearance, or size of keys cannot be dynamically adjusted to the user context as would be possible for digital versions of such interaction models). For workstations such as bRIGHT or smart-phones/tablets, we can introduce programmable interaction models. This will enable us to offer several interaction models to users so they can select a suitable hardware platform. To effectively implement the platform handler, the hardware platforms will need to support declarative implementations of their capabilities.

4.6 MMORPG-Based Backend Design

MMORPGs such as World of Warcraft have solved some very challenging engineering issues to manage extremely complex world state information that is simultaneously being changed by tens of thousands of players. We believe future workstations should be modeled after this engineering approach.

The main reason for the success of these games is the paradigm shift in the system design that allows multiple users. Instead of building a system to be used by a single person, we must build a system that can model the ‘world state’ of a collective of users and then manage how they all change that world’s state together. This will enable design of an engineering foundation for truly collaborative systems.

Introducing the notion of a world state to the proposed system means the entire set of system users will effect change and also be affected by those changes. Each end user renders his world state based on interests, goals, role, etc. This is similar to how MMORPG systems parse the game’s user interface to different classes of players depending on the play character class. Obviously more than just the user’s role can be used to customize the subset of the world state being rendered. Such a system can also address major engineering challenges such as load balancing and synchronizing extremely high rates of updates to the world state.

4.7 UI Paradigm Shift

To truly benefit from the improvements to the backend and middleware, the top-layer interaction technology needs to evolve as well. We are proposing a paradigm shift in fundamental interaction modalities. For example, using proximity detection we can position the controls user’s needs depending on current context directly under his hands on the primary interaction surface (‘controls to your hands’). This can be further improved by ‘gaze-based locking’ in which the control construct under the hand is filtered by the location of the user’s gaze on screen. For example, if the user is looking at a link on a web browser, then wherever he touches the primary interaction surface we can interpret that as a ‘click’ on the link and pass that interaction to the browser to load the relevant content. It does not matter what kind of screen construct is under the user’s hand, because the interpretation of the user’s action (click on screen) is contextually interpreted (i.e., determined by what is the user looking at).

The UI paradigm shift is inspired by graphics accelerators that have been used for several decades to achieve transformative output acceleration in rendering (and, more recently, for physics modeling and some AI reasoning). We flip this paradigm and imagine an input accelerator in which the entire focus is to achieve transformative gains in user input to the system. This could be done using contextual filtering that predicts with extreme accuracy the user’s next input needs (thus, minimizing the task of user input so the user can simply approve the top-predicted next input). Transformative gains could also be achieved through task automation that allows multiple parallel workflows to be executed proactively so that the user is presented with automation results and can pick a completed (or possibly partially completed) workflow.

5 Future Work

In this section, we describe key ideas based on the research discussed above and detail the work we plan to explore in the future.

5.1 Self-describing Systems

In the future, all of the system components—hardware and software—should describe their own functionality, requirements, and limitations. This will enable the system to develop an operational picture about its own abilities and reason about them. The description should include things such as: how the component should be rendered on screen, where it should be rendered in the operational picture, which controls are required to manipulate the construct, which interactions are associated with those controls, what mandatory inputs are needed and their type information and possible origins of such information. Such capability is critical when it comes to determining what hardware or software components are needed to automate certain tasks. This information is also critical in developing a composite contextual desktop that reflects the user’s current mental state. Assuming self-describing systems, each system component can then be intelligently arranged and managed by the system to fit into the current contextual ‘picture’ the user is manipulating to achieve the current goal.

5.2 Self-aware Systems

Our approach to self-aware systems does not mean we require sentient AI. What we propose is a significantly lower threshold of awareness than sentient AI. The system should be able to understand its role in relationship to the end user. That is, it should be able to identify the goals that are typically significant to the end user and ascertain the factors that relate to these goals. The system needs to identify behavior patterns of the user and how those relate to goals, and it needs to identify user’s typical workflows and which triggers can be associated with these workflows. Furthermore, the system should be actively reasoning about its own role in relationship to the user as the user engages with the system. This is akin to a pro-active human assistant who develops a deep understanding of the user over the course of a long observation period. The system should be able to observe and learn by observation the goals, workflows, and triggers of each user. In addition, the system should be able to reason effectively in real time given the nature of its role in terms of achieving typical user wants amid the current context.

5.3 Built-in OODA Loop

In complex problem-solving scenarios such as defending against a cyber-attack or conducting a large-scale battle management training exercise, the typical warfighter interaction loop is characterized as an Observe-Orient-Decide-Act (OODA) loop. The OODA loops should be built into the future workstation in two different ways: (1) The system should itself have a built-in OODA cycle in which it observes the user, orients itself based on the reasoning conclusions in the observation stage, decides which actions to take to support the user, executes those actions, then observes the user’s reaction to those actions, and starts the loop again. This will allow us to design and model the system as another class of user, maybe as an assistant that reflects the user’s goals. (2) The second version of the OODA loop should be based on the observation and learning abilities of the system. This OODA loop is purely a reflection of the user and her engagements with the system, allowing us to characterize such actions into the different states in the OODA loop and study the state transitions. This is useful in developing assistive technology closely tied to various state transitions in the OODA loop.

5.4 Hierarchy of Contextual/Cognitive Models

Complex problems are not typically solved by a single individual working in isolation. In most real-life scenarios, groups of people from various levels of an organizational hierarchy (with differing roles) collaborate in solving critically important and complex problems. Capturing the nature of such collaboration and reflecting it as a hierarchical contextual model cluster will assist in achieving transformative progress in collaborative groups. Such a system can effectively detect and track ‘emergent collaboration’, that is the moment when end users naturally gravitate to work together to solve a complex problem. Identifying such patterns allows us to reason beyond a single user’s goals. This will enable the system to reason about the organizational goals at a strategic level. The system can then associate its knowledge of each person’s roles, skills, and goals to match that higher-level knowledge. This in turn will enable the development of technologies that can accelerate group-level interactions and collaborative task performance. The hive-mind hierarchy will also allow the system to track goals and directives being passed from top-to-bottom as well as results and feedback that flow in the reverse. These will allow the system to understand operational or decision-making bottle-necks so that tasking and role management can be improved.

5.5 Contextual Desktop

Our understanding of the nature and complexity of the contextual models has evolved over the last few years. We hypothesize it may be possible to transform such a contextual model into a ‘contextual desktop’ that is a visualization and a direct reflection of the user’s current mental model of the problem space she is addressing. To successfully achieve such a goal, it may be necessary to revamp the application modeling approaches used in today’s software engineering. Instead of writing individual pieces of software that separately (and in a monolithic fashion) solve independent problems, we suggest building software to harmoniously co-exist and yield a common operational picture.

The main idea is to create a single operational picture that accurately reflects the user’s mental model and provide her with the tools to effectively and efficiently manipulate it to achieve her goals. Parts of the operational picture will be generated by different pieces of software and the tools to manipulate this may be provided by the same (or other) software. All such software will be combined into a single cohesive contextual desktop in which the user can execute tasks that chain multiple different types of software without the need to switch from one monolithic application to another. This may also eliminate the need to copy and paste data from one application to another since the system should be able to create wiring to get the data to seamlessly flow from one application to another. This will eliminate the minutiae of data manipulation that nowadays consumes a great deal of end-user time and cognitive capital.

The main reason to develop a singular operational picture as the contextual desktop is to permanently eliminate the sustained cognitive load of end users known to conventional systems. In such systems, end users always struggle to semantically map the state and context of the system they are using to their own mental models of the problems they are trying to solve. We believe our approach will radically reduce: the sustained cognitive load of operators; the manual input errors (by several orders of magnitude); and the tactical as well as strategic level reasoning errors made by users due to less cognitive strain. It will also simplify workflows by eliminating manual data manipulation to a significant degree and allow better collaboration because it will be easier to understand others’ contexts and workflows.

We face significant challenges in successfully achieving a singular operational picture as the contextual desktop. The application modeling paradigms will need to be revamped to support application integration into a singular contextual desktop. This means most software will depend on features provided by other parts of the system or afforded by other software. Although this is very different from today’s approach to application software, it is not an unprecedented method. Indeed, very complex software systems often feature a plug-in framework that enables independent development of new capabilities that can then be seamlessly added to the system. We propose to essentially elevate that practice to the bRIGHT framework’s layer so any software that runs on the system would be a plug-in to the bRIGHT framework that develops the singular, cohesive operational picture for the user.

Combining various independently developed software components and their rendered user interfaces into a meaningful and manageable singular construct will be particularly challenging. At present, each piece of software we execute generates its own monolithic user interface and control set. We must abandon that approach and create a declarative user interface and interaction model that can be intelligently merged into the contextual desktop. The UI and interaction model need to be declarative because the operating system (or some component that manages the contextual desktop) has to be able to quickly reason how the software fits into the current context of the user and then position it accordingly.

Adopting large and complex software that has been developed to solve a problem—often with a uniform look and feel and a cohesive workflow—will be challenging. Such software exists as windowed user interfaces with a monolithic execution model, which will not integrate well with a common operational picture-based system. If the system needs to be retrofitted into the contextual desktop, that may be difficult. On the other hand, if a system is developed from scratch, a collection of plug-ins can likely be used to cohesively integrate it with the contextual desktop.

6 Discussion

Our research in client labs has demonstrated the impact our work can have in contexts such as Battle Management and Command and Control. Given our ability to generate highly accurate contextual models that can be used to identify user’s interests and skills, skill acquisition during training, and how it evolves under operational circumstances, we are starting to see exciting possibilities using this technology. The approach seems to be broadly applicable for use in various domains in which the end user is routinely handling a large influx of data and making decisions at a rapid rate, and when collaboration is necessary to achieve a solution. While the context model development is the basis of the key features that increase a user’s throughput in interacting with the system, we also believe it can reveal important insights into inter-relationships between cognitive load and error rates, decision-making, the volume of data consumed by a user, and the evolution of skills based on operational situations. We are currently using our technology in client’s labs to evaluate the cognitive load of operators solving complex problems in the BM2C domain, and we expect to gain more insight about the factors and parameters that affect end-user performance and skill acquisition in such scenarios. We are also improving the robustness of our systems so they can be used effectively with others, and are developing tools that reduce the risk and engineering effort required to extend third-party applications to engage with a bRIGHT system. This will make the base technology accessible to a wider audience, hopefully beyond the military domain.