Keywords

1 Introduction

1.1 Teleoperation and Virtual Reality

Imagine you are performing a choreography on a crowded dance floor. After grabbing a full drink from the bar, you need to get back to your initial position. You side step, slide, make a double turn, all the while avoiding both dynamic and static objects. You finish at the empty bar table you had seen earlier, where you place your drink after looking around in preparation for your next move. These steps are deliberated as to resolve the problem of “dancing from point a to point b without spilling your drink”. Standard motor-programs for ‘grabbing’ and ‘moving around’, flow seamlessly from one into the other in congruence with cognitive monitoring of the action plan (Leisman, Mostafa and Shafir 2016). Now imagine performing a similar task without this motor-automaticity, whilst you yourself aren’t even physically performing the task. You are not in the same room but are replaced by a robot you are controlling. The dance floor and drink have been replaced by a hazardous environment like a nuclear plant and radioactive material. This is the perspective of a human operator teleoperating a robot (DeJong, Colgate and Peshkin 2004).

Teleoperation represents any technical implementation that via a communication medium extends the human operator’s capacity to manipulate objects to a manipulator positioned within a remote environment (Hokayem and Spong 2006). Adequate interpretation and accuracy in manipulation of the remote environment is essential, as the human operator typically referred to as ‘master’ holds the executive position over the manipulator or ‘slave’.

Increased telepresence through properly implemented Virtual Reality (VR) technology is believed to improve the interpretability of and control over remote environments (Kot and Novak 2018; Freund and Rossmann 1999). During teleoperation, the human operator is believed to build an affordance based mental model (i.e. a mental model of functional opportunities of a given context) of the remote environment (DeJong, Colgate and Peshkin 2004). VR can diminish the amount of translations on visual input necessary to adequately build such mental models as it presents spatial information of the remote environment in higher dimensionality compared to two-dimensional displays. On the command end, current VR technology offers the possibility to send tracker-based commands (“About The Vive” 2018), increasing the ease of end-effector based control. At the same time, the application of VR technology for teleoperation poses perceptual and cognitive questions and limitations (Rubio-Tamayo, Barrio and Garcia Garcia 2017). Signal instability, noise and limited bandwidth can be countered by selecting and transforming environmental data before transmission (Turkoglu et al. 2018). Such technologically beneficial measures may deplete the immersive nature of VR implementation (Bowman 2007). However, selecting the proper information as a processing step might prove to be beneficial in countering perceptual problems such as information overload and misinterpretation of signal and noise in VR based environment representations.

Immersion and presence express the quality of the perceptual experience of the human operator in VR (Bowman and McMahan 2007; Slater et al. 1999). Coarsely defined, presence is the sensory identification with the Virtual Environment (VE) or being “there” (being part of the virtual or mediated environment) whilst physically being present in another environment (Nowak and Biocca 2003; Witmer and Singer 1998; Sheridan (1992a). Presence is often equated to the ecological validity of VR devices or the nature of their implementation (Mestre et al. 2006). Whilst immersion can be summarized as the technological sophistication of a particular VR system, presence can be seen as its perceptual counterpart. More immersive technologies typically elicit a greater sense of presence (Mestre et al. 2006). Implementation manipulations which make a VE more ‘natural’ can also increase presence dramatically (such as realistic lighting, photo realism, shadowing) (Slater et al. 2009; Yu, Mortensen, Khanna, Spanlang and Slater 2012). However, this increase in presence does not always equate to improved task performance (Lok et al. 2003; Slater et al. 1999). It has been suggested that effects on task performance relate to both the nature of the task and sense of agency within a given Virtual Environment (VE) (Kilteni, Groten and Slater 2012).

For the current research, human-robot interaction (HRI) is defined as follows: the process of a human and a robot working together to perform a specific task. Goodrich and Schultz (2008) make a distinction between ‘Remote’ and ‘Proximate’ HRI, where remote HRI may be more restricted due to indirect communication through technical interfaces and time lag. Remote interaction applied in the real world suffers from multiple factors which interfere with the Human Robot Team (HRT) operation. Whereas teleoperation can formally be considered remote interaction, VR as an interface can provide the interaction during teleoperation akin to proximate teleoperation such as an increased experience of proximity.

Within the current study, the aim is to better understand how information presentation influences HRT task performance and oSA during VR mediated teleoperation. This aim calls for two objectives: (i) Construct a technical framework within which a robot with limited autonomy can be controlled through VR mediated teleoperation by nonprofessional robot operators. (ii) Create and perform an experiment the technical framework is applied, and HRT task performance and oSA can be assessed.

1.2 Theoretical Framework

To investigate the influence of information presentation on HRI a controlled experiment has been performed, mimicking real world teleoperation. Two informational contexts were provided within which the HRT performed a representative task, the informational contexts being: full information or preprocessed. The full information context shows all contextual and task-related information of the robot’s environment to the operator. The preprocessed context depicts a minimized version of the environment, showing merely task-related information (see also the Methods section). Reducing informational resolution, either by compressing the 3D-video stream or by computationally preprocessing a scene, improves technical efficiency of information transmission.

Within the framework as studied, the human operator holds the executive position, therefore it is important that operator Situation Awareness (oSA) is guarded. The experience of oSA is often affected by a trade-off between attention and informational load. Informational clutter can restrict the information attended to, whilst too little information can limit sufficient situational understanding (Salmon et al. 2009; Taylor 1990). Concerning attention, overly extensive automation and information processing has been known to limit oSA, but reducing situational information can extend the scope of information attended to (Salmon et al. 2009; Endsley 1995). One of the latent factors within the Situation-Awareness-Rating-Test (SART) as developed by Taylor (1990) is the attentional demand of a given situation. In the current framework we equate attentional demand to the amount of information which draws attention to it as discussed by Endsley (1995).

Immersion, presence and task performance are a mixed bag. Whilst some studies show that immersion and presence hardly affect task performance (Slater et al. 1999), others have found strong effects (Bowman and McMahan 2007; Slater 1999). A distinction can be made based on the origin of the immersive level of a set-up being highly informative (i.e. depth cues during a spatial task) or less informative (i.e. depth cues during a math’s challenge). Other research suggests that for spatial tasks, adding context can cause presence-related performance increases (Chamizo 2002).

In the current research, we hypothesize that HRT task performance is better within a full information context than within a preprocessed context, since both depth cues and contextual information are informative. We furthermore hypothesize that oSA is higher within a full information context than within a preprocessed context, because of higher levels of situational understanding and attentional supply. Last, we hypothesize that attentional demand is better (i.e., lower) during a preprocessed context than within a full information context.

2 Methods

2.1 Technical Implementation

Interface. To emulate true teleoperation, a VR interface and a robotic arm were booted on two computers which were connected through custom-made software RDA (Fig. 1). The experimental setup consisted of a simulated robotic arm (KUKA IIWA LBR 7, Kuka, Augsburg, Germany) and a generic table, which were dynamically simulated using the Gazebo platform (Gazebo, OSRF, San Jose, USA), booted on an Ubuntu pc. The arm was controlled using the ROS platform (Robotic Operating System, OSRF, San Jose, USA). The operator viewed and interacted in VR with an HTC ViveTM HMD (HTC, New Taipei, Taiwan), programmed using UnityTM (Unity Technologies, San Francisco, USA), booted on a Microsoft Windows 8.1 laptop pc with a Nvidia GTX980 M graphics card and a 2.50 GHz x64 processor.

Fig. 1.
figure 1

The interface between the simulated robotic arm and the VR environment.

Robot.

The simulated robot was a KUKA LBR IIWA 7, henceforth referred to as ‘the robot’. This robot is 7 DOF robotic arm with 7 joints, 7 movable links and 1 base. It was controlled using ROS, version Kinetic 1.12.12. The motor planning of the robot was based on a Jacobian solver for Euclidian position control. The motor planner received the desired position and orientation from the human operator through RDA and continuously published to ROS in order to move the robot correspondingly. The positions of the individual robot links were continuously published to RDA to visualize the live state of the robot in VR. The robot was rated between level 2 or 3 of Sheridan’s scale of autonomy (Sheridan and Verplank 1978).

VR Hardware.

The VR setup was developed using UnityTM (2017.3.1f1). The Microsoft Windows 8.1 laptop pc had an Nvidia GTX980 M graphics card and a 2.50 GHz x64 processor. STEAM-VR, a unity package, enabled the usage of HTC ViveTM VR gear 1. The HTC ViveTM VR gear 1 consisted of a head mounted display (HMD), two controllers of which one was used, and two lightboxes which were mounted on tripods. The HMD was used as the immersive device displaying the experiment scenes as run in Unity. The controller was used by the operator to control the robot arm based on a combination of button presses and movement in three-dimensional Euclidian space.

VR Scene.

The VR scene consisted of a blue void, a table, a model of the robot, and the Vive controller. The robot model was composed of several links which mimicked the states of the corresponding links in Gazebo as published in RDA.

In the ‘practice’-block (Sect. 2.4), the participant stood in front of a white table with the robot positioned on the other side. In the ‘preprocessed’-block, the table was replaced by a rectangular white sheet containing a black path. A translucent magenta box represented the start of the path and a translucent magenta sphere represented the end of the path. In the ‘full information’-block, the sheet was projected on a table. This block also included a three-dimensional mesh rendering of our laboratory. This mesh was created with RTAB-map Tango version 0.11.14 booted on a Lenovo Phab 2 Pro (Fig. 2).

Fig. 2.
figure 2

Two screenshots from participant perspective during both experimental contexts.

2.2 Path Following Task

Participants teleoperated a robot in a path following task. This task was designed as part of the i-Botics innovation project at TNO (Catoire et al. 2018). Participants were exposed to a practice block and two experimental blocks. Following each block, participants had a 30 second-break. During the practice block, participants had the opportunity to develop an intuition for the control dynamics of the set-up. Next, participants had to command the robot through the table and guide it up and out. The practice block was followed by either of two experimental blocks depicting a Full information version of the VR scene or a Preprocessed version of the VR scene. The second experimental block was performed in the remaining VR scene. Each experimental block consisted of 10 trials. Part of the ISO-9283 path was used in differing orientations during the experiment. The robot always started at the translucent magenta box (start box, Fig. 3) and finished at the translucent magenta sphere (finish sphere, Fig. 3). An audio signal indicated the start and successful completion of a trial.

Fig. 3.
figure 3

A depiction of the experimental blocks and a representation of a single trial.

2.3 Operator Situation Awareness Questionnaire

Following each experimental block, participants answered a questionnaire concerning oSA, the Situational Awareness Rating Technique (SART; Taylor 1990). (For further information concerning the SART – see Taylor 1990).

2.4 Procedure

After signing an informed consent form, participants received a leaflet with information and instructions concerning robot control and the tasks they were about to perform. Next, the experimental supervisor explained the process during the experiment. Thereafter, the supervisor showed the HMD and controller; and measured and calibrated the participants’ inter-pupillary distance (IPD). Next the participants were positioned on a white cross and both the HMD and controller were fitted. In case a participant was lefthanded they were positioned with their left foot just to the right of this white mark on the floor. The participant was positioned between the lightboxes, to have enough room to move around (Fig. 4).

Fig. 4.
figure 4

The picture to the left shows the room as seen when entered by the participant. The picture to the right shows the participant as positioned when the experiment commences.

After checking if the participant was ready the supervisor verbally prepared participants for the following visuals. The experiment followed the sequences as depicted in Fig. 3, booting each VR scene accordingly. After a minute of practice, participants were asked if they had enough experience. If anything was unclear, the supervisor provided clarification limited to the content of the information leaflet. After each break the supervisor again fitted the HMD and controller. Before each trial the participant had to position the head of the robot arm within the start box (Fig. 3). For each trial the supervisor verified that the participant was ready, after which the starting sound triggered the participant to perform the task. Between trials the supervisor switched the trial path as necessary. Following each experimental block, the supervisor helped to remove the HMD and controller after which the participant had to answer the oSA questionnaire followed.

2.5 Participants

A total of 20 participants between the ages of 23 and 42 were recruited based on screening criteria. To ensure group homogeneity, participants had to be healthy and be between 20 and 42 years of age, read and write the English language and have some video gaming experience. Exclusion criteria were deemed as follows: general contraindications for VR usage (such as epilepsy) or extensive experience with teleoperation.

The experimental group (N = 20) consisted of 17 males and 3 females. Two participants were excluded due to interruptions during the experiment. Statistical analyses concerning time and oSA were performed based on the results of 18 participants Two participants could not be included due to faulty measurement. Statistical analyses on accuracy were therefore executed based on the results of the 16 remaining participants.

2.6 Data Analysis

The primary independent parameter was the experiment block type, either Full Information or Preprocessed. The primary dependent parameters were oSA and HRT task performance during the pattern separation task. An oSA score and Attentional Demand score were calculated based on answers to the SART questionnaire (Taylor 1990) following each experimental block. HRT Task performance was assessed based on time and accuracy. Time reflected the number of seconds to finish a trial. Accuracy reflected by the mean distance between the performed trajectory and the ideal path within Euclidian space and in 2D. The reason to include both Euclidian space and 2D accuracy was to ensure that accuracy both in general space and in the horizontal plane, which was the orientation of the track, could be regarded separately All time and accuracy values were subject to paired t-test analyses. Both oSA and attentional demand were subject to paired t-test analyses.

3 Results

3.1 HRT Task Performance

The paired t-test on time between Full information (M = 14.58, SD = 1.43) and Preprocessed (M = 15.16, SD = 1.49) showed a significant difference (t(17) = −2.19, p = .043) (Fig. 5).

Fig. 5.
figure 5

Effect of experiment type on average trial performance time in seconds. The figure to the left shows the average time for full information (F) and preprocessed (P) including standard-error bars. The figure to the right shows a boxplot of the paired difference F-P for time.

The paired t-test on two-dimensional accuracy between Full information (M = 0.033, SD = 0.01) and Preprocessed (M = 0.034, SD = 0.01) showed no significant difference (t(15) = 0.7223, p = 0.72). The paired t-test on Euclidian accuracy between Full information (M = 0.0324, SD = 0.012) and Preprocessed (M = 0.0336, SD = 0.009) showed no significant difference (t(15) = 0.445, p = 0.66).

3.2 oSA

The paired t-test on oSA between Full information (M = 18.78, SD = 6.85) and Preprocessed (M = 19.89, SD = 5.92) showed no significant difference (t(17) = −1.035, p = .32). The paired t-test on attentional demand between Full information (M = 9.28, SD = 4.00) and Preprocessed (M = 7.56, SD = 2.83) showed a significant difference (t(17) = 2.5139, p = .02) (Fig. 6).

Fig. 6.
figure 6

Effect of experiment type on subjective attentional demand. The figure to the left shows attentional demand for Full information (F) and Preprocessed (P) including standard-error bars. The figure to the right shows a boxplot of the paired difference F-P for attentional demand.

4 Discussion

In the current study we aimed to investigate the influence of information presentation on operator Situational Awareness (oSA) and Human-Robot Team (HRT) task performance during VR mediated teleoperation. To solve this problem two tasks were conducted: First, a technical framework within which oSA and HRT task performance could be tested within a teleoperation context was designed and created. Second, within this context was examined how oSA and HRT task performance were affected. We have demonstrated a novel framework which extensively re-enacts a VR mediated teleoperation setting.

The integral way in which VR mediated teleoperation has been implemented and examined, particularly for the effects of information presentation on task performance and oSA, can be deemed both novel and effective. Previous research on VR implementation for teleoperation has often been performed within the context of extremely high predictability. For instance, factory contexts where the robot system performance was highly stable, freedom of operation was limited and the task environment provided no dynamics (Burdea 1999). Although the task environment in the current framework was static as well, participants had extensive freedom in operating the system and were confronted with some robot control and robot environment dynamics. Research and development with a higher focus on situation and system dynamics have done this primarily for systems with the highest level of human control (Kot and Novak 2018). Such research has often disregarded the effects of different sensory transferal modes from the robotic end to the operator end - a central notion for the current study. Typical research on high control teleoperation also demands extremely high operator skill levels, for which training time is extensive (DeJong, Colgate and Peshkin 2004). The framework in our study has drastically diminished operator training time to the reading of one page on robot control and one minute of practice. Though the current study was not a comparative one, the previous sentence is indicative of the power of combining VR and spatial-based control. This power includes the Jacobian-solver on the robot end, and the “collaborative positioning”, which means the operator faces the robot.

Another strength of the experiment was the extent to which the framework adequately re-enacts the dynamics of actual teleoperation. Applying communication between the VR computer and the robot computer justifies the teleoperation claim in this experiment. The implementation incorporated typical teleoperation dynamics such as communication time, static, delay etcetera (Munir and Book 2003). Future experiments could incorporate different levels of delay and static and investigate the effects as has been done in the past for other teleoperation paradigms (Rosenburg 1993a, b; Kaber 2000; Sheridan 1992b). The template path used fits typical industry standards as it is based ISO-norms (ISO 9283:1998). The path includes straight parts, sharp and wide angles, and curved sections substantiating representative nature of the path. The two experimental contexts, full information and preprocessed, were indicative of two ways that environmental information can be communicated to an operator. Full information representing an interface which incorporates more of the typical surroundings in which tasks would be performed, and preprocessed representing a context where merely information directly related to the task was provided. Importantly, the surroundings within the full information context were constructed from 3D footage of an actual robot laboratory.

Lacking in the experiment are extensive dynamics (e.g. falling objects, mission changes, obstructions, limited signal), reducing ecological validity. This caveat increases the difficulty of relating performance and perceptual experience in this experiment, and other experiments, to real world teleoperation settings (Paljic 2017; Deniaud and Mestre 2015; Walsh, Sherlock, Ling and Carnahan 2012).

The acquired results indicate better HRT task performance during a full information context. While HRT task performance accuracy is equal between the full information and preprocessed contexts, task performance time is significantly lower for the full information context. Concerning oSA, the results suggest that overall oSA does not differ between informational contexts, however, attentional demand was significantly higher during the full information context. The HRT performed tasks significantly faster during the full information context than during the preprocessed context. All accuracy measures showed no significant difference between experimental contexts. If the speed-accuracy trade-off (SAT) is considered, performance can be viewed as a function of the speed and accuracy with which a task is performed (Heitz 2014; Bogacz et al. 2010; van Maanen 2016). This perspective indicates that performance was better during the full information context. The expectation of superior performance during full information contexts was ascribed to the heightened levels of presence and immersion due to increased contextual information (Slater, Linakis, Usoh and Kooper 1996; Barfield et al. 1995) which also fits implicit importance of context for task performance propagated within radical embodied cognition (Kiverstein and Miller 2015) and ecological psychology (Heft 2001).

The results could not support the hypothesized higher oSA for the full information context. On a whole, the plethora of situation awareness measures is both extensively varied and broadly scrutinized (Wickens 2008; Salmon et al. 2006). It is important to note that the performance results portray ‘objective’ measurements during task execution. In contradiction the SART, a self-rating score, asks participants to reflect on their experience after task execution (Taylor 1990). The SART has been further scrutinized for its limited sensitivity (Salmon et al. 2006). It therefore might be hard to find strong differences. This can specifically be the case for experimental contexts which are highly similar concerning the nature and amount of situation dynamics as the SART was largely developed to assess situation dynamics’ cognitive derivatives (Taylor 1990). A possible alternative would be to apply performance-related measures inspired by the SAGAT in future experiments (Endsley et al. 1998).

Attentional demand was significantly higher during the full information context. This corroborates with the expectation that increased amounts of task non-specific information demands more attention as there is more information to attend to (Taylor 1990; Endsley 1995). The attentional demand results shed more light on the general SART scores and their equivalence across experimental contexts. As the SART score is calculated by adding situation understanding and attentional supply and subtracting attentional demand, the significantly differing demand results suggest that for the full information context, higher situation understanding and supply scores might balance the general SART scores. Particularly situation understanding which resembles the flow of situation understanding to levels 2 and 3 - comprehension and projection respectively - in the Endsley SA model (for a full reading - Endsley, 1995) or general epistemic actions in active inference theory (see Friston et al. 2015). This may indicate relevance to the nature of oSA – i.e. the interaction or combination of understanding, supply, and demand – rather than general oSA alone.

The simultaneity of significantly better task performance and higher attentional demand results for the full information context may seem confusing at first. Attention demanding contexts, such as those with increased environmental complexity, have shown to diminish performance (Horberry 2006; Graydon 1989). The lack of significant oSA differences between experimental contexts may help to further explain this phenomenon. As discussed before, while the level of subjective oSA may be similar, the nature of oSA may differ between information contexts. Combining this insight with the increased performance results during a full information context suggest that not only is the calculated level of oSA indicative of performance, the nature of oSA may be so, too. Though limited research touches upon this specific explanation, the explanation fits with both the SA model by Endsley (1995) and active inference theory (see Friston 2015). Both theories are founded upon a balance of cognitive resource division and complexity, environmental understanding or future state prediction, and error or surprise minimization (Engström 2018). The full information context provides more task non-specific information. However, in doing so it also provides a context within which a participant may expect to perform robot operation. This may already increase both the specificity and accuracy of perceptual expectations (Engström 2018).

With respect to VR presence and teleoperation performance, future research is advised to disentangle depth cues from context cues. In the current study, non-specific information may be deemed ‘constructive’ information, for instance the additional walls and familiar objects within the room may increase depth perception (Hanna et al. 2002; Rosenburg 1993a, b). To disentangle the effects of expected context from improved depth information on performance, future research may investigate a stripped version of the full information context, providing the same depth cues without providing contextual information such as recognizable objects and extensive color features. In doing so, depth perception effects can be regarded separate from general presence effects.

4.1 Recommendations for Teleoperation Design

Based on the current study and previous research, suggestions concerning VR mediated robot teleoperation design can be made, particularly related to information context and relative operator positioning during end-effector based robot control. With respect to VR presence and teleoperation performance, future research is advised to disentangle depth cues from context cues. In the current study, non-specific information may be deemed ‘constructive’ information, for instance the additional walls and familiar objects within the room may increase depth perception (Hanna et al. 2002; Rosenburg 1993a, b). To disentangle the effects of expected context from improved depth information on performance, future research may investigate a stripped version of the full information context, providing the same depth cues without providing contextual information such as recognizable objects and extensive color features. In doing so, depth perception effects can be regarded separate from general presence effects.

Artificial realistic contextual information (such as a floor, recognizable objects walls etcetera.) may be provided to add presence and depth cues so long as indispensable information from the robot scene is not replaced. Particularly for situations with limited bandwidth or dodgy signal, such an artificial context layer may serve as both a contextual and spatial anchor (Rosenberg 1993a, b). With respect to relative positioning of an operator for end-effector based control, the collaborative position seems to bear fruit. All participants were non-professionals concerning teleoperation and were able to perform smoothly with limited training. Two mechanisms could explain the success of the framework: one being the diminished amount of mental rotations an operator needs to make (DeJong, Colgate and Peshkin 2004); the other may lay in the possibility of the collaborative perspective, facing the robot, extending the scope of perception of the task environment. The robotic arm simply is not obstructing the field of view (FoV) of the operator. Additionally, operators may be inclined to move around more during collaborative positioning, increasing the possibility of improved depth perception and observation of task performance compared to traditional third person robot control.

5 Conclusion

The current study has provided a novel and promising technical framework for VR-mediated teleoperation research and development. This framework was applied to assess the influences of the informational context of the operator on Human-Robot team (HRT) task performance and operator Situation Awareness (oSA) during a path following task. Performance was better during a full information context, for which task performance time was significantly faster than during a preprocessed context and accuracy was equivalent in both informational contexts. Significant differences in general oSA levels were not found, however, attentional demand scores showed to be significantly higher for the full information context. Based on these results we provided recommendations for design, most importantly the incorporation of either natural or artificial context characteristics in VR presentation to the operator.