Keywords

1 Introduction

Originally, robots were applied in industrial areas so that human operators can be released from dirty, dull and dangerous tasks. In more recent years, the main aim of a new generation of robots is to act as partners, assistants or companions of humans, and share life with them [1, 2]. Currently, service robots are starting to become part of work life in many sectors including shopping, education, and companion. One of the main barriers of introducing robot in public and private place is the unnatural and indifferent services that robots provided [3]. Research in HRI has identified that a number of aspects of robots, e.g. appearances, facial expression, voice, gesture [4,5,6,7] can be manipulated to enhance robot’s service quality and improve social acceptance. While considerable amount of research work in this area usually explored one single variable of robots, such as gesture [8], voice [9] or appearance [5], the way in which the verbal and nonverbal cues might be perceived simultaneously require supplemental research. In this paper, we explored how robot’s voice styles and gestures affect people’s perceived a humanoid robot personality. And it is expected to find out what kind of robot voice and gesture types that users may prefer, as such preferences may affect social acceptance towards robots.

2 Background

2.1 Anthropomorphizing Social Robot

Some years ago, robots were used in automate tasks for industrial areas, such as assembly, packing and manufacturing, which have been considered too “dull” or “dangerous” [10]. In recent years, increasingly robots are believed to engage in social-human interaction, which lead to the rising for the development of social robots. Social robots are autonomous or semi-autonomous robots that interact with humans following human social norms [11,12,13]. Researchers have found that the roles of social robots are increasingly diverse [14]. For example, Hegel et al. [15] have explored several applications for social robot, such as public assistant and personal assistant. There is a common goal in human-robot interaction research, which is creating natural and intuitive social manners. Social robots should be able to interact with people in their daily life through multimodal and sometimes redundant communication channels (face, language, speech tone, gesture, sound, etc. [16]. Fong et al. [17] provides a detailed interpretation about the attributes of social robots, which highlights the robot-related factors, such as voice, gesture, etc., that can affect interaction experiences. Therefore, important efforts in the design of social robots are dedicated to improve the social cues of robot presented.

However, blindly applying human characteristics to social robots are unlikely to gain the expected user responses. One of the most famous example is Mori’s Uncanny Valley [18]. To avoid this unpleasant interactive experience, most currently commercially available robots, such as Pepper and Nao tend to have a humanoid appearance. Besides, various studies demonstrated that humanoid robots are more likely to be accepted by people comparing with other robot types [1921].

2.2 Verbal Communication and Robot Voice

In the future, social robots are believed to be able to communicate with their users in a human-like way. Human speech plays important role in building up a good interpersonal relationship, and the attractiveness of an unseen speaker’ speech can be judged by people [22]. Vocal cues not only convey intended information but also influence people perceived speaker’s age, gender and personalities [23]. Researchers found that the pitch, pitch range, volume, and speech rate are the four fundamental characteristics of the voice that indicate personality [24]. In the study conducted by Niculescu et al. [9], voice pitch was found to be useful to model the personality of a robot.

A related study was conducted by Walter et al. [25] focusing on evaluating human comfortable approach distance towards a mechanical-looking robots with different voice styles. The experiment used a robot with 4 voice styles (male voice, female voice, and synthesized voice, no voice) to talk with participants. The result showed that the robot with synthesized voice was generally closer than other robot voice styles. The reason of this result was that the robot with synthesized voice was as more consistent with the robot mechanical appearance. People seem to have judged the robots initially very quickly during the interaction, so the vocal cues presented by robots should be designed very carefully.

2.3 Nonverbal Communication and Robot Gesture

Nonverbal communication includes all communications (e.g. facial expression, gesture, eye gaze, etc.) except speech. Nonverbal communication can be powerful in conveying speakers’ emotions as well as attitudes and completing verbal communication [26, 27]. Previous studies found that adding body language and facial expressions in verbal communication could significantly improve communication efficiency [28]. During human-human communication, using gestures will improve engagement between the two speakers and reduce the boredom of simple verbal communication.

In human-robot interaction, interest has rising for the interpretation of robots’ nonverbal behaviors. Riek et al. [29] found the speed of robot gestures can trigger different emotions and attitudes. Specifically, people cooperate with abrupt gestures more quickly than smooth ones. A study conducted by Kim et al. [11] found that controlling size, velocity, and frequency of gesture can exhibit different personalities. In addition, many researchers also attempt to classify different gestures accompany with speech. One well-recognized classification was proposed by Krauss [30] who divided the gestures into four types, symbolic gestures, deictic gestures, motor gestures and lexical gestures. Based on the classification propose by Krauss, Nehaniv et al. [31] classified five gesture types in order to infer the intent of gesture. This five classes are irrelevant gestures, side effect of expressive behavior, symbolic gestures, interactional gestures, and referential gestures. In this study, Nehaniv’s classifications were employed, because irrelevant gestures have occurred in the previous observation.

2.4 Robot Perceived Personalities

Among various social traits, the personality has been considered to be important to interpersonal relationship and human-robot interaction [5, 13]. According to CASA paradigm ‘Computers Are Social Actors’ that people treat computers and consequently robot as personalized characters [32]. In other words, people are more likely to apply human social norms to computers and robots. Various studies have confirmed that CASA can be helpful in understanding the way people interact with robots [33, 34] or even auto-machines, e.g. Robot Vacuum Cleaner [35]. Clearly, during the Human-robot interaction, people would pick up the personality of robot from its design characteristics [36], so it would be easier for them to predict robots’ functions. A study conducted by Tay et al. [13] demonstrated that robots with high pitch voice would be perceived more extroverted and more feminine than the robot with low pitch voice. Hence, the manipulation of voice can obviously trigger certain perceived robot personalities.

3 Methodology

3.1 Context and Voice Stimuli

Context and Conversation Script.

Previous research found that the application of social robot will be very promising in some occupational fields, such as robot receptionists in shopping mall, instructors in schools, and companion robots in home. In this study, a shopping reception occupational role which worked in public environment was selected for social robot.

In order to match with actual context, two shopping receptionists from electronic stores (both with more than 6 years of related work experience) were invited to conduct expert interviews. During the interview, the experts were asked to answer the following three questions based on their working experience.

  • How to start a conversation with customers?

  • What kind of information should be provided when customers want to purchase a household appliance?

  • How to attract customers’ attention during a conversation?

Based on the results of interview, the conversation script for experiment was created. The detailed scripts are shown below.

  • C: Hi.

  • R: Hi, what can I do for you?

  • C: I want to buy a refrigerator.

  • R: I recommend this new refrigerator. CX-D276

  • C: Can you tell me more about it?

  • R: Sure. This refrigerator is suitable for families of three to five people. It has……

Voice Stimuli.

The TTS (text-to-speech) system which is an open platform developed by IFLYTEK was used to generate the three voices. This TTS system was chosen because it was one of the most mature and powerful software that can make Chinese speech, and it has also been used in academic research previously [37]. In this study, the male voice was produced by the virtual character XiaoFeng, while the female voice was generated by the virtual character XiaoYan. The children voice was generated by the character FangFang before it was adjusted by Adobe audition.

Robot Gesture Types.

The robot gesture was designed based on observational method. In order to match the real context, this study invited 3 shopping receptionists from electronic store and 10 university students as subjects to conduct the observation. These people were asked to read the conversion script and act as shopping receptionist. And they were encouraged to freely express the content of the script through body languages. A camera was used to record the performance of each subjects, and based on these videos; an analysis was conducted on their gestures.

The results showed that when the word “three, four” appeared, 70% subjects tend to use symbolic gestures by holding three or four fingers. The symbolic gestures also occur when the words “a bigger vegetable room” appeared. This is consistent with Krauss’s [30] study that symbol gesture is a conventionalized signal in a communicative interaction. Therefore, in this study this kind of gesture is defined as gestures that contain specific information, and they were named as gesture type2 (G2).

Two subjects exhibit irrelevant gestures which are neither communicative nor socially interactive. Specifically, these two subjects were repeatedly and regularly moving their hands. This kind of gesture is a complementary to G2. Hence, gesture 3 (G3) was defined as G2+ irrelevant gestures. The gesture types are summarized in Table 1.

Table 1. Summarized gesture types in this study

Robot in the Experiment.

Ahumanoid robot, Pepper, which developed by Softbank Robotics was employed in this study. Pepper is a commercially available robot that also frequently used in social robot studies [38, 39]. However, Pepper is not able to produce mandarin speech. Therefore, a three-dimensional (3D) model of Pepper was created on a computer. And with the model, 9 video clips were made with different voices and gesture types for the experiment. In addition, VHRI is a mainstream experimental paradigm of social robot research [40] in which participants watch a recorded animated robot on a computer [20, 41]. The Wizard of Oz experimental method was also employed in this study, meaning that the robots in the experiment were not fully automatic and relied partially on manual control by the research staff.

3.2 Questionnaire

In this study, Kansei evaluation method was employed because of its flexibility and suitability with the purpose of studying the affective and emotional feedbacks about the robot personality from the customers. Kansei engineering which originated from Japan, is a costumers-oriented product development method [42]. Based on Kansei engineering method, the semantic differential method was used to make the questionnaires. The items were selected based on a literature review that conducted on the robot perceived personalities. For example, Hwang et al. [5] adopted 13 pairs from to explore the effects of robot overall shape on user perceived personalities. Hendriks et al. [35] used 30 personality characteristics to conduct a research of personality toward robot vacuum cleaner. Although these studies were conducted in different context, the questionnaire items might still be useful in this study, because most of the adjective pairs were adopted from big five theory which is a representative model in human personality studies. In total, 67 adjective pairs were collected from previous studies. 27 participants were instructed to evaluate the relevance of the adjective pairs and the personality of a good shopping receptionist with 5-point Likert scale. And the results are shown in Table 2. The adjective pairs that scored beyond 4.0 were used in this study.

Table 2. Results of the relevance of the adjective pairs and the personality of a shopping receptionist

3.3 Experimental Design

A mixed design (3 × 3) was employed with the within-subject variables which are robot voice types (3 levels: male, female, children) and behavior types (3 level: no gesture = G1, gestures with specific information = G2, irrelevant gestures + gestures with specific information = G3). The dependent variable was users’ evaluations of the perceived robot personalities as determined using 5-point Likert scales.

Participants and Procedures.

The experiment was conduct in a laboratory in Tatung University. A total of 15 university students between 18–25 years old (9 females and 6 males) anticipated this study. Participants were mostly students from Design science department of a Tatung University. They were recruited through email advertisements on the internet.

Before the experiment, each participant was randomly assigned to a group of three, and they were given an appointment to visit the laboratory. Upon arrival, all participants were given an information sheet detailing general information about the study and study’s ethics approval. After that, participants complete a questionnaire which collecting the basic demographic information such as their age, gender, and experience with robots.

During the experiment, participants were assigned to watch the 9 video clips in random order. An experimenter controlled the robot speech in a “Wizard-of-Oz” setting, which is a completely autonomous system used in HRI experiment. In each video, the robot was talking with the experimenter. According to the conversation script, the experimenter was firstly say “hi” to the robot. The robot asked “what can I do for you?” The experimenter said he would like to buy a refrigerator. Then the robot recommended the new refrigerator and provided detailed information with different voice and gesture. The robot ended the conversation at the end of each communicating task, which lasted around 3 min. After the experiment, the participants were given enough time to complete the questionnaires to evaluate the perceived personality of robot. 9 sheets are collected from each participant in total.

4 Results

4.1 Factor Analysis of Robot Vocally Perceived Personalities

Principal components factors analysis on 15 adjective pairs with VARIMAX rotation revealed three underlying factors, namely social factor, competence factor and interpersonal status factor. The first factor includes Passionate-Apathetic; ExtrovertedIntroverted; FlexibleStubborn, etc. The second factor was found to be related to rational judgement, and the adjective pairs were DiligentLazy, DecisiveIndecisive. The third factor includes two adjective pair, Modest-Arrogant; Safe-Dangerous. The first factor is more related to social ability evaluation, while the second factor emphasizes rational judgement of competence. The third factor is clearly described an interpersonal status. The results are shown in Table 3.

Table 3. Factor analysis of robot vocally perceived personalities

4.2 Effects of the Different Occupational Fields and Robot Voice Types

A Multivariate Statistical Analysis was conducted with robot gesture and voice types as the independent variables while people perceived robot personalities (three factors) were the dependent variables. Table 4 shows MANOVA results, which the main effects of robot voice and gesture types can be found. The robot voice types and gesture types have significant effects on Factor 1. Specifically, children voice is more strongly perceived extroverted than female and male voice, F (2, 45) = 12.47, P < 0.00, and there is no significant difference between male and female voices. (See ‘Main effect (V)’ column in Table 4). Regarding to gesture types, the results showed that when the robot didn’t use any gesture, it was perceived more introverted than other two gesture types, F (2, 45) = 10.30, P < 0.00. When the robot used the gestures that contain specific information, it would be perceived significantly more extroverted. And using irrelevant gesture as a complement in conversation seems to be unnecessary, because there is no significant difference between Gesture type 2 (using gestures that contains specific information) and Gesture3 (using gestures that contains specific information plus using irrelevant gestures as complementary).

Table 4. MANOVA results for perceived personalities.

The interaction effects between robot voice and gesture types were also investigated. However, the last two columns in Table 4 showed that there is no significant interaction effect between robot voice and gesture types.

5 Discussion and Conclusion

In this study, the effects of robot voice and gesture types on perceived personalities in shopping scenario have been experimentally investigated. The implications of the findings and the guidelines of the outcomes will be discussed below.

First, three main evaluation factors were found in this application fields, which are social factor, competence factor and interpersonal status factor. Interestingly, this three factors were not classified entirely based on big five theory. For example, the adjective pairs in the factor 1 belong to three kinds of personalities, which are extroverted, openness and agreeable. The adjective pairs in factor 2 are more related to the anti-neurotic and conscientious personalities. Additionally, factor 3 describes the dominance in interpersonal relationships. These results illustrate that people expected to feel dominance toward a social robot during the interaction in a shopping service scenario, and people also prefer the robot to be extroverted, passionate and decisive. In addition, the robot with male or female voice while exhibit no gestures during the conversation was perceived to have an introverted, passive and Stubborn personality.

According to the MANOVA results, the robot with children voice was more likely to be perceived having extroverted, passionate, relaxed personalities. The male and adult voices were considered significantly more introverted and apathetic. Additionally, the robot with children voice was significantly more accepted by the users. One possible explanation for the aforementioned findings is Morton’s animal behavior theory [43], namely motivation-structural rules. Morton found that pure tones, similar to high-frequency voices, indicate that a speaker is obedient and calm, and that these types of voices are considered agreeable. A child’s voice is pure and has a higher frequency, which makes it sound obedient and harmless. Therefore, children voice was considered more relaxed and extroverted than the two other voices (i.e., the male and female voices). By contrast, speakers with relatively low-frequency and harsh voices are considered dominant and aggressive. Therefore, using a child’s voice on a robot makes people feel that the robot is safe and harmless, which means that the robot is more easily accepted.

Regarding to gesture types, the robots using gestures that contain specific information would be perceived to be more extroverted and more easily to be accepted by users. Although Kim et al. [11] found that the speed, velocity, and frequency of a gesture could be used to control the robot perceived personalities, there is no significant difference between G2 and G3. This result indicated that G2 (gestures that contain specific information) is essential for the robot during a conversation. One possible explanation is that the gestures that contain specific information could convey the information that neglected in the robot voice. It would be unacceptable for users if that kind of gestures are missed during the service. Nevertheless, adding more gestures to the robot didn’t contribute significantly to any perceived personalities.

In general, these findings can be utilized to establish guidelines for affective design of social robot. For example, a shopping receptionist robot with children voice while using gestures that contain specific information is preferred by users. This study is meaningful to make a first step, and give some directions in designing a sociable humanoid robot.