Keywords

1 Introduction

Social interactions and relationships involve far more than facial recognition and conversation; rather, exchanges between humans draw upon many aspects of communication including language form and content [1], interrelation synchrony in gestures, postures, and tones [2,3,4], and social perspectives of trust [5]. Humans receive both conscious and unconscious cues during social interaction. We are able to automatically align on many different levels and adapt to various external factors during these interactions [6]. Taken together, auditory and visual perceptions greatly impact human emotional responses when placed in a social setting. However, little is understood about the biological underpinnings that modulate the behaviors exhibited during an interaction. Increasingly, technology is playing a significant role in this largely unknown space of mechanisms governing intrapersonal connections. Technology is being integrated into nearly every aspect of daily life; with capabilities of watching, assessing, and even learning from our actions. Furthermore, advancements in technology are increasing at a rapid pace. According to the Nielsen Q1 Total Audience Report for 2016, computers, cell phones, tablets, and touch screen devices consume upwards of 11 h of the average person’s day, which is an entire hour more than reported in 2015 [7]. While some science fiction writers offer a darker view of a future where malevolent machines dominate, predictions have been made that one day humans and technology will seamlessly live in synchrony [8]. By examining cognitive and emotional effects of human-technology interactions, researchers are implementing ways to modify these advancements for more cohesive integrations between humans and technology, specifically examining human relationships with computers and robots [9]. In bridging this gap, it is important to consider the impact of physical appearances, movements, and social interaction perceptions which can vary dramatically across cultures, generations, and genders [6]. As autonomous machinery is nearing integration in many fields, such as medicine, the application of cortical responses is of utmost importance. Robots, as well as other technological agents, are currently limited in their capacity for autonomy. It is proposed that an influx of dependence on autonomous machine agents, not only for health care, but for companionship and assistance with daily functioning, will occur within the next century so it is important to anticipate and plan for adaptations to the new social environment.

Human Robot Interaction (HRI) and the goal of seamlessly integrating robots to live in harmony with humans is under exploration, with robots designed to reflect human appearance, mannerisms, and motions [6]. The resulting technological challenges include improving our understanding of human-human social interactions as well as human-machine interactions. An examination of our own social interaction can be used as a calibration technique for understanding perception and objective responses to technologies like autonomous robots [10]. Advanced Brain Monitoring, Inc. (ABM), located in Carlsbad, California is a neurophysiologic research company, which, in collaboration with Lowe’s Innovation Lab (LIL), Neurons Inc., and Fellow Robots has completed three phases of a Human-Robot Interaction study. Our initial intent for the study design was to explore how humans may respond to different personalities in robot assistants to understand at what level humans experience an eerie–or uncomfortable feeling when interacting with robots, known as the “Uncanny Valley” [11]. We concluded that it would be most beneficial to use somewhat extreme examples from the human-likeness spectrum; one personality that was not human like, and one that exhibited human friendliness, humor, and empathy. A collaborative project between Fellow Robots, a robotics company based out of Silicon Valley, CA, and LIL, had previously created an assistive retail robot named OSHbot. Although OSHbot has no physical resemblance to a human, it was used in the current study to interact with human participants because of its programmability and ease of use. However, noting this, many studies have shown that there are cortical response trends to non-visual components of interaction, which is why this method was employed for this study [12]. OSHbot is also equipped with 2 large touch screen interfaces that a human can use to interact with OSHbot; providing one more level of normality in the sense of communication.

In conjunction with using ABM’s B-Alert® X10 wireless sensor headset for EEG and ECG recording, our study incorporated focal attention assessment using the Tobii mobile eye-tracker during several tasks performed by participants with human and robot assistance in an Orchard Supply Hardware store in San Jose, CA. Neurons Inc., an applied neuroscience company based in Holbaek, Denmark, that focuses on neurological responses from consumer-based studies for marketing research. EEG and eye-tracking were integrated to characterize participants focal attention and assess the neural signatures associated with key events and interactions [13]. Data were analyzed by events to explore the neural responses to each of the multiple instances of recorded human-human and human-robot interactions.

In exploring neurophysiological correlates of social interactions, previous work has shown that slow theta (3–5 Hz) suppression has been linked to the “uncanny valley” response of humans towards androids and robots [14]. Furthermore, mu suppression (calculated from log ratios of power spectral densities (PSDs) across central sites C3, Cz, and C4 from 8 to 13 Hz bins of the participant’s experimental and baseline tasks) has been shown to be linked to the activation of the brain’s mirror neurons and empathy responses in human-to-human interactions [15,16,17]. Additionally, midline theta (calculated by summarizing PSDs from 3 to 7 Hz bins across sites Fz, Cz, Pz, and POz) activity has been shown to be associated with encoding into, and retrieval from, long term memory, visual stimuli matching, long term episodic memories, sustained or concentrated attention, visual working memory, positive emotions, and decreased levels of anxiety [17,18,19]. These neurophysiological and behavioral indices associated with human-robot interactions can uncover aspects of social experiences to help shape the design and function of future robots.

2 Methods

2.1 Participants

A total of N = 34 participants comprised of 47% male with an age range of 34–55 were tested. Participants were recruited by LIL through a partnered external firm database, whereby regular Orchard Supply Hardware (OSH) shoppers, upon meeting pre-screening criteria (available upon request), were asked to participate in a study to assess and better understand how they reacted to, and interacted with, products offered in the store. The robot interaction was not disclosed to the participants prior to the study to allow for unbiased demeanor towards the tasks and the robot (OSHbot; see Fig. 1A). Participants were compensated for their participation with a $100 gift certificate to OSH.

Fig. 1.
figure 1

Study equipment: (A) OSHbot; (B) EEG recording sites; (C) B-Alert® X10 headset

2.2 Equipment

Psychophysiology.

EEG and ECG were acquired simultaneously and in synchrony throughout the study tasks, using the B-Alert® X10 wireless sensor headset (Advanced Brain Monitoring, Inc., Carlsbad, CA). This system has 9 referential EEG channels located according to the International 10–20 system at Fz, F3, F4, Cz, C3, C4, POz, P3, and P4 and an auxiliary channel for ECG (Fig. 1B/C). Linked reference electrodes were located behind each ear on the mastoid bone. Impedances were recorded below 40 kΩ for all sites before recording began. ECG electrodes were placed on the right and left clavicles. Data were sampled at 256 Hz with a high pass at 0.1 Hz and a fifth order, low pass filter at 100 Hz, obtained digitally with A/D converters. Data were transmitted wirelessly via Bluetooth to the computer, where acquisition software then stored the psychophysiological data. The ABM’s proprietary acquisition software also included artifact decontamination algorithms for eye blink, muscle movement, and environmental/electrical interference such as spikes and saturations.

Tobii Pro 2 Eye Tracking Glasses.

Tobii’s Pro Glasses 2 is equipped with two cameras for each eye that use a proprietary 3D eye model, full-HD scene camera for wide field view for accuracy and precision with minimized gaze data loss. The embedded accelerometer and gyroscope sensors allowed for differentiation between head and eye movements which eliminated the impact of head movements on eye tracking data. Eyes were tracked using corneal reflection of dark pupils with a 50 Hz sampling rate. After the eye tracking glasses were situated on the participant, they were calibrated using the eye tracking software and further calibration was automated throughout duration of use.

OSHbot.

Fellow Robots, in previous collaboration with LIL, created an assistive robot enabled with the capacity of helping consumers in real-life OSH stores. It uses 2 LiDARs, a device that uses a combination of “light” and “radar” which also stands for Light Detection and Ranging (1x 3D LiDAR and 1x 2D LiDAR), and 2 IR-based 3D depth sensors to maneuver around the store, and simultaneously localize its position (SLAM) so that it safely avoids obstacles/people. OSHbot is programmed with the specific store information so that navigation and product information are accurate when it assists shoppers. Participants were able to interact with OSHbot via 2 touch screen interfaces, a microphone and speakers (see Fig. 1A).

2.3 Procedure

Benchmark Testing.

ABM’s B-Alert® X-10 EEG headset (See Fig. 1C) was applied and participants completed the ABM benchmark neurocognitive tasks: 3-Choice Vigilance Task (3CVT), Verbal Psycho-Vigilance Task (VPVT), and Auditory Psycho-Vigilance Task (APVT), to individualize the model to support classification and quantification of engagement and workload. The three-choice active vigilance task (3CVT) is a 5-min long task that requires participants to discriminate one target (70% occurrence) from two non-target (30% occurrence) geometric shapes. Each stimulus was presented for a duration of 200 ms. Participants were instructed to respond as quickly as possible to each stimulus by selecting the left arrow for target stimuli and the right arrow for non-target stimuli. A training period was provided prior to the beginning of the task in order to minimize practice effects. The VPVT and APVT tasks are passive vigilance tasks that lasted 5 min each. The VPVT repeatedly presented a 10-cm circular target image for a duration of 200 ms. The target image was presented every 2 s in the center of the computer monitor, requiring the participant to respond to image onset by pressing the spacebar. The APVT consisted of an auditory tone that was played every 2 s, requiring the participant to respond to auditory onset by pressing the spacebar. The Tobii mobile eye-tracker was then applied, and calibrated to use in tandem with the EEG headset to assess focal responses to the robot interaction during the in-store tasks.

In-Store Tasks.

Participants were randomly selected into two groups: Group 1 interacted with OSHbot which responded with the participants in a neutral, factual, and robotic tone; Group 2 interacted with OSHbot which represented some human characteristics- humorous, social, and empathetic (a script of these programmed speeches is available upon request). The participants were instructed to follow the printed instruction cards that were given to them once they reached the designated starting area. They were asked not to read the consequential task until the technician instructed them to do so. Each participant was given the same scenario and tasks for in-store shopping and, depending upon the task, they were to ask either a robot or human for assistance. The scenario: participants were told that they were working on a kitchen remodel, specifically painting, and they would be shopping for items that would help them complete their remodel. The shopping list is as follows: (1) paint; (2) sponge for textural paint application; (3) screwdriver; and (4) vent covers. A game of bean bag toss was used as a distractor task between task 3 and 4 in order to obtain repeated measures of human-robot interaction in a short time frame, without asking the participant to return a second time [20]. The participant was lead to a different area of the store to play the game. Throughout the tasks, OSHbot was programmed to speak from a script specific to the group participants were assigned. OSHbot was pre-programmed before each in-store data acquisition to follow the same script outline, but depending on the group and participant information, presentation differed.

Unbeknownst to all participants, OSHbot was programmed with details about the person that we intended to elicit an unexpected response. We intended for the participants to experience something outside their perception of what a robot could do/know, specifically, knowing somewhat personal things about the participant (e.g. how frequently they shop at OSH, if they own/rent a house/apartment, their latest renovation project, future projects, and the number of people in their households). These participant-specific details were recorded prior to the in-store tasks during the pre-test questionnaire, or upon arrival.

Analysis and Statistics.

All analyses were conducted using the B-Alert® LabX software. Data quality check and decontamination algorithms were used to identify and remove epochs contaminated by electromyography (EMG), signal excursions, or other environmental noise. Once the signals were decontaminated, the EEG data was converted from the time domain to the frequency domain. The absolute and relative power spectral densities (PSDs) were calculated for each 1 s epoch (1–60 Hz bins) for standard EEG bandwidths (delta, theta, alpha, sigma, beta, gamma, and high gamma). PSDs were summarized over the frontal, central, parietal, anterior, temporal, and left/right (asymmetric) brain region. Participant data was excluded due to poor data quality because of the intrinsic EMG signals caused by walking within each task (N = 6), unlogged events within trials (N = 4), and inability to process data (N = 2). The resulting analyses encompass the remaining N = 22 participants’ data.

3 Results

3.1 Interaction Type

To investigate how interaction type effects psychophysiological indices, several one-way ANOVAs were conducted with Tukey adjustments made for multiple comparisons with the goal of establishing how a human interacting with another human differed at a biological level from a human interacting with a robot agent. We revealed that there was greater suppression of slow wave (3–5 Hz) theta, as calculated from subtracting the antilog epoch-by-epoch PSD Bandwidth value of the OSHbot task from the averaged antilog PSD Bandwidth value of the APVT task, in the frontal, midline, and parietal regions during human-human interactions: F(1, 21) = 9.80, p = 0.003; F(1, 21) = 9.49, p = 0.004; and F(1, 21) = 7.89, p = 0.008 respectively. Specifically, slow wave theta activity was lower when a human interacted with another human, as opposed to interaction with a robot. This agrees with recent work that has shown humans experiencing heightened frontal theta activity upon observation of a robot [14].

This may indicate that theta activity is indicative of bridging the gap(s) of common semantics and visual recall between conversing with a robot than with a human. Another ANOVA indicated that the mu bandwidth (8 to 13 Hz EEG power recorded over the sensorimotor region) was higher for human-human relations (M = 3.40, SD = 0.52) than in human to robot (M = 3.09, SD = 0.31); F(1, 21) = 5.76, p = 0.02. These data are reported in Table 1 and graphically represented in Fig. 2. As mu suppression has not been shown to change much based on agent type upon observation [14], our results showing increased mu activity while meeting with another human could suggest a sensory-motor mechanism reflecting preparation for engagement with a like-being. Prior work has linked this EEG correlate to empathetic state change, active concentration, as well as motor imagery and visual activation [17, 21]. These findings prompted an exploration of how such metrics may vary as a function of age and gender.

Table 1. One way ANOVA – interaction type
Fig. 2.
figure 2

ANOVA results of Interaction type (∗p < 0.05; ∗∗p < 0.01); Error bars represent the standard error from mean (SEM).

3.2 Gender Differences

Additional analyses were conducted to assess how neurophysiological metrics varied across gender identity (Females - N = 12; Males - N = 10) based on interaction type. Several unbalanced 2-way ANOVAs were conducted with Tukey adjustments made for multiple comparisons. No PSD bandwidth, nor ECG, differences amongst gender groups were found, however, upon further exploration of the metrics, significant variances in EEG wavelets were revealed. EEG PSD bandwidths are composed of frequency subbands which use statistical coefficients called wavelets [22, 23]. With this, we unveiled statistically significant wavelets comprised of the theta and gamma frequencies within the Parietal region: F(1, 21) = 9.04, p = 0.019 and F(1, 21) = 4.70, p = 0.037, respectively. This could mean that throughout the tasks, an increase in mental task load occurred, especially when participants anticipated decision making from the other interacting agent, as found in other studies [24, 25]. Previous studies reported increases in theta activity may indicate an activation of the superior temporal sulcus (STS), which has been linked to observation of biological kinematics and social cognitive tasks [26, 27]. Our findings are consistent with other reports in that comparing brain activity between organic and inorganic motion or interaction results in stronger STS activation [26]. These data are shown in Table 2 and represented in Fig. 3.

Table 2. Two way ANOVA – indentity type X interaction
Fig. 3.
figure 3

ANOVA results of Interaction type by Gender (∗p < 0.05; ∗∗p < 0.01; ∗∗∗p < 0.001); Error bars represent the standard error from mean (SEM)

3.3 Age Differences

Like the gender analyses, further statistics were conducted to examine age-related differences in human-human and human-robot interaction using two brackets: below age 45 (n = 12) and over age 45 (n = 10). This analysis yielded significant differences in greater Cz and midline slow theta suppression within the older cohort between human-human interaction and human-robot interaction: F(1, 21) = 4.57, p = 0.039 and F(1, 21) = 4.78, p = 0.035, respectively. This suggests that theta cortical responses may be age-mediated and that perhaps younger individuals may find it easier discerning visual and social context between varying interactions. This seems plausible in the sense that those who are younger have likely been exposed to this high level of technological integration for longer, and at critical developmental periods, than those from older generations–making the interaction with a robot less of a cognitive demanding task. Furthermore, through this analysis, we also exposed differences in Cz, mu, and hemispheric (F3–F4) frontal alpha. Namely, we observed higher Cz alpha for the younger cohort; F(1, 21) = 4.97, p = 0.032, and higher mu and hemispheric frontal alpha for the older cohort; F(1, 21) = 6.30, p = 0.017 and F(1, 21) = 4.60, p = 0.039 respectively. These data are presented in Table 2 and represented in Fig. 4. General findings suggest that theta activity may indicate differentiation and recognition of movement and appearance between biological and non-biological agents as well as semantic and memory related aspects thereof. Previous studies found greater theta activity of robot observation when compared to android and human observation [14]. We suspect that both cohorts experienced an increase in processing load during their interactions with OSHbot when compared to human-human interactions because interacting with the robot was intrinsically more difficult than interacting with a more biological agent. Yet, when comparing the younger to older cohort, the latter had more theta suppression, suggesting they may have found the interaction more challenging than those that most likely had more exposure to similar technologies. Recent HCI research has found that ease of usability may be causing an increase in reluctance from older adults towards using and adapting to new technologies [28]. A natural decline in physical, perceptual, and cognitive ability could also affect an older cohort’s performance when interacting with technology and interpretation of device use [28]. We suspect that the observed differences across these cohorts may be reflective of the reluctance to interact with OSHbot due to a lack of familiarity with robots (Fig. 4).

Fig. 4.
figure 4

ANOVA results of Interaction Type by Age (∗p < 0.05; ∗∗∗p < 0.001); Error bars represent the standard error from mean (SEM)

4 Conclusions

These neurophysiological indices associated with human-human interactions and human-robot interactions across gender and age could be tapping into aspects of social experiences which may potentially help shape the physical design and automation of future robots. We decided to focus our analysis efforts on results showing higher levels of frontal slow theta, and overall mu activity between different age cohorts and genders because these significant findings were similar to the results of previous studies with human-robot and human social interactions and could help substantiate current research used to better autonomous technologies. We conclude from this dataset that social interactions between humans and robots do indeed result in different temporal changes in neural responses, which can be attributed to the gender and age of the individual. Furthermore, we suggest these observed changes highlight important aspects of human emotion and cognition during social interactions with humans and robots during a real-world shopping experience.

Future analyses of this data are planned in hopes of revealing further, unique aspects of robots and human social encounters. As we see that gender and age greatly impact neurophysiology during these interactions, we also propose that an expansion of this study be conducted whereby the appearance and vocal features of the robot are varied to more closely align with different gender and age groups. This future work has the propensity to establish how individuals perceive a robot when the autonomous machine is programmed to elicit a greater sense of familiarity and comfort within an interaction. In incorporating an analysis of EEG data in conjunction with eye-tracking data, we also hope to view specific metrics in real time providing an opportunity for adaptation of the robot behaviors based on the neural responses of the human during an encounter. Thus, assistive agents will be more capable of characterizing features of social meetings that can create a more seamless relationship between humans and machine.

5 Funding

The work presented herein was supported by DARPA Contract No. W31P4Q-12-C-0200 issued by U.S. Army Contracting Command. The views expressed are those of the author and do not reflect the official policy or position of the DoD or the U.S. Government.