Keywords

1 Introduction

Digital games are an extremely varied set of applications with a rich range of experiences offered to players. This diversity makes it difficult to devise a unique approach to their conceptualization and measurement. Terms such as fun, flow and gameplay are widely used to explain the user experience in game design [4]. However, there is an open discussion to include other relevant factors. Fluency on the gamepad and the game controls [27] as well as emotion [14] are often cited as key elements of the user experience. Emotions in digital games act as a motivator for the cognitive decisions players make during gameplay and they drive user experience in digital games [19]. The success of the gaming experience is determined by the positive aspects of the gameplay experience and by the quality of the input method used to control it. The game controls are not just related to the hardware used to play: it includes learning how to manipulate the game, move the avatar and memorize the mappings of in-game actions to the gamepad. While there is plenty of works about evaluating the experience provided by the game, we have a limited literature about how the game controller can interfere with the experience and performance of a user [15, 20, 28, 30].

We propose a method to evaluate the game in an exploratory user experience and usability study that investigated a commercial joystick as input device compared to a novel adaptive touch-based controller (the Smart controller [22, 26, 27]). Our goal is to advance the theoretical understanding of how game controllers can affect the user experience, focusing on the measurement of user experience, usability, and physiology. One limitation of current psychophysiological studies is that they cannot precisely classify UX in games since many aspects of the experience lack standardized quantitative measurements [16, 18]. Hence, another purpose of this study is to determine if there is a correlation between subjective player experience (with AttrakDiff questionnaire) and objective physiological data (collected with electroencephalography (EEG) sensors), in an attempt to determine which measures are more adequate to evaluate game controllers. While not the focus of this research, we will report any interesting finding relative to evaluating the game itself, to stimulate further research about the topic.

2 How to Evaluate UX in Games

The current ISO definition on user experience focuses on a person’s perception and the responses resulting from the use or anticipated use of a product, system, or service. User experience includes all the users’ emotions, beliefs, preferences, perceptions, physical and psychological responses, behaviors and accomplishments that occur before, during and after use [9]. In the literature, there are many works that perform user experience evaluations in games [4], however, very few focus on the evaluation of the players’ experiences with game controllers.

2.1 User Experience Questionnaires

Some questionnaires use a broader approach, looking to evaluate all aspects of gaming experience [7, 12] while others try to determine more specific aspects, such as immersion [13] or motivation [24]. Brown et al. [6] explored the experience, functionality, and usability through different controllers (gamepads, keyboard, racing wheel). Subjective Mental Effort Questionnaire (SMEQ) [3] and Consumer Product Questionnaire (CPQ) [17] were used to respectively measure effectiveness and satisfaction.

Attractiveness has been used to measure UX in games [8, 14], being described as a set of four dimensions: (1) Pragmatic Quality (PQ), (2) Hedonic Quality - Stimulation (HQS), (3) Hedonic Quality - Identity (HQI) and (4) Attractiveness (ATT) [11, 25]. The first one (PQ) focuses on task-related design aspects and indicates if the users reached their goals on an interaction. The Hedonic Quality dimensions (HQS and HQI) describe quality aspects, like originality and beauty. The Attractiveness (ATT) represents a global value of the product, based on quality perception. AttrakDiff is a usability questionnaire that analyzes all these dimensions. It has 28 questions (7 per dimension) with a semantic differential scale [25] (integer values from −3 to 3).

Lankees et al. [14] applied the AttrakDiff questionnaire to understand how emotional stimuli (facial expressions by Embodied Conversational Agents and emotion-eliciting situations) in interactive systems affect the user experience. Christou [8] applied the AttrakDiff questionnaire to explore the connection between the players’ perceptions of usability and appeal of massively multiplayer role-playing games using World of Warcraft. Findings pointed that the relationship between usability and the general quality of the experience is reaffirmed in the realm of video and computer games. Different cases were successfully evaluated with AttrakDiff: iTV [4], user interface for business management system [25] and games [8, 14].

2.2 Psychophysiological Measures and Its Correlation with Subjective User Data

The use of objective data, like physiological measures (e.g. galvanic skin response, muscles contraction, respiratory and cardiovascular signals), are widely employed in the literature to evaluate UX and user engagement in digital games [16, 18, 19]. Basically, these methods seek to obtain objective data to measure factors that are normally subjective, like the emotional state.

Some works have focused on studying the correlation between subjective user response and objective physiological data while measuring the player experience. Mandryk et al. [16] collect galvanic skin response (GSR), electrocardiography (EKG), electromyography of the jaw (EMG) and respiration. In their first experiment, they found many inconsistent correlations across participants. They observed that participants were responding more to the experimental situation than the experimental manipulations. In a second experiment, they decided to maximize the user experience by adding a competition factor, with users testing two situations: playing against other player and against the computer. This time, they could correlate the data showing that the increases in subjective ratings corresponded increases in the physiological data.

3 Method

3.1 Research Goals and Hypotheses

Recently, Torok et al. [26, 27] and Pelegrino [22] proposed a novel adaptive controller for digital games called Smart controllerFootnote 1. Smart controller is a mobile app available for Android and iOS that allows the player to use a mobile phone as a touch-based gamepad for PC games. This is not a regular gamepad: the game can configure which buttons will be shown, their position, size, and icons, being also able to change it anytime during the gameplay section. New UI elements, not present on traditional controllers, can also be used, creating a totally new experience. As a touch-based interface, it lacks any kind of tactile perception. To partially counter this issue, it dynamically changes the size and position of its buttons according to the user behavior, optimizing its ergonomics on-the-fly to improve the experience. Preliminary evaluation of that controller [26] demonstrated that players had better performance results with the intelligent adaptations turned on. Subsequent tests [22] showed that it provided a significantly different user experience to gamers than what a regular gamepad could offer.

The goal of our research is to propose a method to evaluate the user experience and usability aspects of game controllers. We decided to use two radically different interfaces, the Smart controller, and a traditional gamepad, to test our method, seeking to propose a way to compare any game controllers in different aspects, like user experience (e.g. attractiveness and emotions), usability factors, physiological and performance data. In more detail, we intend to address the following research question:

RQ1: How the user experience and usability (measured with both objective and subjective data) are perceived with the different controllers in digital games?

We are also looking for potentials correlations between AttrakDiff data and EEG data. For instance, here we are interested in answering questions such as:

RQ2: Which dimensions of user experience (measured with AttrakDiff questionnaire) correlate with user emotions (measured with physiological EEG data) for the different controllers?

3.2 Evaluation Criteria

Our approach is based on the capture and analysis of subjective and objective user data. We applied two methods: the AttrakDiff questionnaire [1, 11] and collection of EEG data [2]. User satisfaction is captured with the System Usability Scale (SUS) [5] questionnaire and player performance is analyzed with a game log (text file). The SUS questionnaire was designed to evaluate user satisfaction during a software system interaction. Its score is calculated from all answers and a mean score below 68 indicates usability problems. The in-game performance when using both controllers was measured with 2 metrics: time to complete a stage in seconds and number of deaths in each stage.

Raw EEG signals were recorded with the Emotiv EPOC+ device [2], using the Xavier software platform. The device has 16 electrodes (14 for data capture, 2 for reference and positioning). The Xavier software process the EEG signal and exports metrics for different emotional states (Engagement, Excitement, Interest, Relaxation, Stress and Focus) [23]. During a test section, Xavier will collect all emotions as real values between 0 and 1. After the test, the software generates an average value, that will be used in our analysis.

Fig. 1.
figure 1

Input devices used in the user’s sessions: Smart controller (a) and a Microsoft Xbox 360 controller (b).

4 Experiment

4.1 Participants and Procedure

Data were recorded from 10 volunteers, students invited on the university campus, with varying levels of experience with video games. No financial compensation was given. Their age ranged from 18 to 30 years. Six participants were male and none had previous experience with the testing game. 70% of the volunteers played frequently on their smartphones. 40% played games on a console at least twice a week, 20% up to six days a week and 40% do not play on game consoles. Furthermore, we also collected their game genre preference resulting in (70%) Adventure, (50%) Strategy, (40%) Casuals, (40%) Simulation, (30%) Fight. Other genres were cited with percentages lower than 20%. We performed three pilot tests in order to improve the experiment.

After signing a standard agreement term, each user filled the profile mapping questionnaire, used to obtain the previously mentioned statistics. The user would then, with our help, put the Emotiv Epoc+ helmet (as shown in Fig. 1a). Before the experiment, the participant was requested to close his eyes and concentrate his attention on a relaxing music during two minutes, so we could measure a neutral mental state to calibrate the EEG device. After that, the participant received brief instructions about the game and the controllers. The user would play for 10 min with each controller. At the end, the participants filled the SUS and AttrakDiff questionnaires for each controller experience. To avoid any bias due to learning the game mechanics, half the users started with the touch controller while half started with the traditional gamepad.

Fig. 2.
figure 2

Game screens for both stages. Stage 1 has two gameplay modes: platforming as robot (a) and dual-stick shooter with the spaceship (b). After finishing this level, the player enters a minigame (c) that precedes the second stage (d)

Fig. 3.
figure 3

The layout displayed by the Smart controller during the different stages of the game.

4.2 The Game

Pelegrino [22] developed a game, called Guardians of Eternity, to test the Smart controller. Figure 2 shows the different levels of the game. We decided to use the same game in our evaluation since it explored well the new functions of the novel controller while also being compatible with a Xbox 360 gamepad. Adding support to the Smart controller involves altering the source code of a game and adding the correct API calls to configure the controller and gather input, while the game must be developed in Unity. Performing significant changes to games (that would have to be open source and made in Unity) is out of our scope, so we decided to use Guardian of Eternity. Keeping the game choice consistent also allow future works to compare our findings with the previous study by Pelegrino. As the source code is available, we could also slightly alter it to add routines to log the performance data. The game has two simple stages. The first one is mainly a platforming game, with the player controlling a robot and traversing different platforms while defeating or avoiding enemies. The main character can also transform in a spaceship and the gameplay becomes more similar to a dual-stick shooter. With the adaptive controller, the interface changes as the player alternates forms (see Fig. 3. With the 360 controller, only the key mappings are altered. In the end of the stage, the robot is severely damaged and the player must disable several subsystems: the Smart controller presents a screen similar to a control panel, while the regular gamepad maps these actions to its buttons. As the robot lost its transforming power as well as its weapons, level 2 is a maze to escape the planet. The player must avoid colliding with walls or asteroids. The layout in the touch controller is unusual: one hand controls the vertical level while the other controls the horizontal movements (as for the 360 gamepad, each analog stick controls one level).

5 Results

For numeric results, we extensively used the Wilcoxon signed-rank test [29], a non-parametric statistical test, to compare the data for both controllers and determine if the difference was statistically significant. The significance level was 0.05 and we performed a paired test since the samples are dependent (same users). If the resulting p-value for any test is lower than 0.05, the difference is significant. The Wilcoxon test was used because it does not depend on the distribution of the data (while a t-test demands a normal distribution, for instance) and has a high precision [10]. For correlation results, we applied the Pearson correlation [21]. The Pearson correlation returns a p-value (that also must be lower than 0.05) and a correlation score between −1 and 1. Values closer to 0 indicate that the data is not correlated or that the correlation is weak. Values closer to −1 or 1 show a strong correlation. While positive values are a direct correlation (i.e. if A increases, B also increases), negative values are the opposite (if A increases, B decreases).

5.1 Attractiveness and Satisfaction

Table 1 shows the results for the AttrakDiff questionnaire. While the means for all qualities were slightly higher with the physical gamepad, none of these differences were statistically significant. As the two interfaces are radically different, this could mean that the AttrakDiff evaluation was more affected by the game itself than by the controllers.

The SUS results seemed to indicate a lower satisfaction with the Smart controller (64.8, below the desirable mean of 68) than with the traditional controller (74.0). However, the p-value for all participants was 0.853 (not-significant). The SUS scores were apparently too subjective, with hugely different answers that did not provide any definitive conclusion.

Table 1. Comparison of the scores for the different qualities of AttrakDiff. SD = Standard Deviation.

5.2 Performance

The stage duration for level 1 was slightly lower with the traditional controller (M = 145.4) when compared to the adaptive one (M = 158.7). In Stage 2, the opposite happens (traditional controller averages 268.3 s while the adaptive averages 243.9 s). The p-value for both stages was, respectively, 0.375 and 0.444, both non-statistically significant. The number of player deaths in stage 1 is higher with the adaptive controller (M = 3.7) than with the traditional (M = 1.8). Stage 2 resulted in an apparent draw (M = 1.6 with the 360 gamepad and M = 1.5 with the Smart controller). With p-values of 0.188 and 0.930, respectively, we confirm that no significant difference in game performance was caused by either controller.

5.3 Emotion Analysis

Our findings show that all emotions reach higher levels with the Smart controller (see Fig. 4). Excitement (p-value = 0.0141) and Focus (p-value = 0.0245) emotions resulted in significant differences. This presents an interesting evidence that the Smart controller increased the excitement and allowed players to focus more intensively on the gameplay experience.

Fig. 4.
figure 4

Emotions in Smart controller and traditional controller - mean and standard deviation.

Table 2. Emotions versus deaths in stage 1. Only statistically significant (p < 0.05) results are reported.
Table 3. Emotions versus stage duration in seconds. Only statistically significant (p < 0.05) results are reported.

5.4 Insights About Interrelations Between Emotions, Performance and Usability

We also evaluated how emotions changed accordingly to in-game performance. While stage 2 did not have any correlation between emotions and deaths, during stage 1 we had a strong positive correlation between excitement with in-game deaths (when 0.7 \(\leqslant \) r < 0.9, as seen in Table 3), for both controllers. All results are significant. The interest emotion was moderately correlated with deaths only with the traditional controller (when 0.5 \(\leqslant \) r < 0.7), with the adaptive one resulting in a p-value higher than 0.05 (not-significant). This difference could indicate that the new experience of the Smart controller had an impact on the interest of users, but more in-depth evaluations would be necessary to determine if it was a significant factor and if it was positive or negative.

This seems to indicate that users who considered the game more difficult were more interested in the experience and were also more excited. As the test game tended to be easy and short (according to user feedback after our tests and consistent with user interviews in Pelegrino [22]) when compared to most commercial games. This may indicate that the most skilled players probably considered the game boring because it was not a reasonable challenge to their abilities. Future evaluation could try to determine if a game that was excessively hard could result in an opposite correlation, frustrating users and resulting in a lack of interest. This result also shows that EEG data could be useful for developers to balance games and maximize interest.

When we compare the stage duration with the emotions, stage 1 did not result in any correlation. Stage 2 in the other hand, had moderate or strong correlations for excitement and interest with stage duration for both controllers (see Table 2). This level differs from the first one since it lacks enemies, so it is harder to die. The challenge is on beating the maze and users that took longer to complete the task had more difficulties, so it seems to indicate that, once more, a higher challenge was considered more exciting and interesting.

Our first research question (RQ1), intended to discover the difference in usability and user experience with both controllers. There was not a significant difference in game performance. Attractiveness was also similar. The Smart controller had a lower than ideal SUS score, which indicates usability problems. However, the difference between both interfaces in satisfaction was not significant. The user experience evaluations shows a different scenario, with the Smart controller increasing significantly both excitement and focus, improving the user experience and showing that innovative interfaces, even with a slight loss in performance, can provide a better experience to users.

In order to answer RQ2, we looked for correlations between emotions and attractiveness, comparing the results from AttrakDiff dimensions and the EEG data. The correlation between the Pragmatic Quality dimension and the Interest emotion was strong (0.7894 with p-value = 0.0066), showing that one of the dimensions was indeed directly linked to physiological data, answering the aforementioned question. This seems to indicate that the users’ interest is higher when they consider that a game experience has a good usability and learnability, showing a direct correlation between subjective and objective data. In the other hand, the SUS data was correlated to neither emotion results or AttrakDiff dimensions.

6 Conclusion and Future Works

The evaluation of game controllers is a relatively unexplored area. With this work, we intended to verify which approaches to measure the user experience of gamepads were effective by comparing two different game controllers. The AttrakDiff and SUS results were similar for both interfaces, creating doubt about their efficiency to compare game controllers. The users seemed to have difficulties to separate which usability aspects were correspondent to the game itself and which were related to the controllers, a finding consistent with the work of Pelegrino [22]. It is not surprising that objective data, that does not rely on the user’s capacity to separate different interaction facets, was a better fit to evaluate game controllers. The EEG data showed that the new interface, the Smart controller, resulted in a significantly more exciting experience and increased the focus in the game. When evaluating game controllers, it seems that objective data, like our EEG results, is much more reliable than subjective data provided by user feedback.

While not directly related to game controllers, we also observed that users that died more on stage 1 or demanded more time to complete stage 2 were more interested in the game and were generally more excited. As the test game tended to be excessively easy, the absence of challenge may have bored the most skilled players. As a side result, we believe that EEG data can be a promising method to determine if a game is correctly balanced and provides an adequate challenge to its players. An interesting finding was the correlation between the Pragmatic Quality dimension and the “Interest” emotion. This may indicate that the interest measured by the Epoc raises when the quality and usability of a game improves, showing that developers could benefit from monitoring this emotion with EEG during usability tests.

Clearly, these results are the first step in order to fully understand user experience with game controllers. As future developments, it would be relevant to perform in-depth tests to determine if the non-significant correlation between interest and in-game deaths for the Smart controller was a result of the impact that the interface itself had on that emotion, especially because the traditional controller did not seem to interfere at all with the level of interest. Another possibility is to repeat the tests with a game that is significantly harder since we found a correlation between difficulty and interest/excitement. This could confirm if an increased difficulty, even to the point of frustration, could revert this correlation. In this case, maybe EEG is a viable option for balancing the difficulty of games, seeking to maximize positive emotions like interest and excitement.