Keywords

1 Introduction

User experience is a user’s perceptions and responses resulting from the use of an interactive system, including emotions, beliefs, preferences, physiological responses and much more [16]. To measure these responses, most UX research focuses on explicit methods such as questionnaires and interviews. For example, emotions, or user’s feelings regarding a system, have previously been measured using a self-report scale developed by Hassenzahl et al. [22]. However, it is difficult for users to precisely report on their own experience. Prior research shows that there is an important difference between what users felt during the experience and how they recalled it afterwards [11, 18]. Recent findings suggest the influence of multiple biases, such as the peak effect, where the user tends to remember the most intense moment better, and the peak-end rule, where the user’s impressions toward the experience tends to be influenced by the final moment [10]. Furthermore, it has been shown that the intensity of emotions felt plays an important part in the recalling process [3] and that negative memories tend to be better remembered than good ones [5].

Considering the lack of proper methods to accurately identify implicit pain points in the UX literature, we propose a systemic method that uses physiological data to identify pain points in a user online journey. Pain points can either be explicit, implicit, or both. An explicit pain point, usually derived from qualitative data, is defined as the negative emotion consciously felt by the participant during a particular moment in the task and mentioned by the participant during or after the task. It is commonly used in marketing research [47]. An implicit pain point, however, is defined here as a moment, in reaction to an event during the interaction, during which the user experiences an automatic physiological activation characterized by a high level of emotional arousal and a negative emotional valence. Building upon previous research on peak loads, that identifies the exact moments users approach or pass their cognitive capacities [36], we use psychophysiological measures of emotional valence and arousal to build a metric that identifies pain points in the online user journey. We then illustrate the results using a journey map representation that allows a better understanding of the reasons behind those pain points as well as an easier comparison, either between different tasks or systems.

Gaining a deeper understanding of the reasons behind pain points contributes to HCI literature and practice by providing insights on peak emotional moments in users’ experiences. It also allows UX designers to significantly improve their design by knowing precisely and accurately where the pain points are located, without interrupting users’ authentic interactions with the website.

2 Literature Review

2.1 Current Methods to Assess Customer Experience

Customer experience contributes to the success of e-commerce websites and thus to a company’s viability. Indeed, understanding customers and meeting their needs have been shown to be keys to success [23]. There is therefore a vast amount of literature focusing on analyzing the customer experience, using a variety of methods, such as personas, experience maps, blueprints, and walk-through audits [27]. However, these methods usually focus on a portion of customer experience, failing to give an overall picture. It has therefore been suggested that combining complementary methods offer a deeper understanding of user experience, while adding implicit measurement, such as physiological tools, allows for a more precise measure of the emotional journey of the participant [26].

A first method, Customer Experience Modeling, has been developed in the service sector to better synthesize the whole customer journey and the sequence of the different touchpoints by using customer-centric soft goals [46]. Soft goals are part of a goal-oriented analysis that allows problem detections in interactions by taking into account the subjective nature of the experience in the customer’s evaluation of their different levels of satisfaction [39]. It allows to discover pain points that emerge from interactions. Methods such as Customer Experience Modeling derive pain points from qualitative data. For example, the analysis of common words and sentences while completing a task [47].

Another method, the Customer Job Mapping, also known as the customer centered innovation map, consists of breaking down, step by step, every task customers face, in order to find new ways to innovate. Certain tasks or parts of tasks can bring difficulties for customers and are thus classified as pain points. The main difference between this method and Customer Experience Modelling is that this method focuses on what customers are trying to achieve at every step, instead of looking at what they are doing [6].

The Customer Journey Map, a more recent method, is a diagram, illustrating every touchpoint a consumer has with the company, every step of the way and through every channel used across the company [44]. An example of a Customer Journey Map for grocery shopping can be found in Fig. 1. It is used both in the design service field, to help design the experience, and in the user experience field, to better understand the customer experience [37]. Customer Journey Maps allow companies to focus on the entire customer experience rather than individual interactions [42]. However, recent research suggests that this method is still far from flawless, as it assumes all touchpoints are equally important to every customer, which is not the case [45]. In order to identify the most important touchpoints, Customer Journey Maps should be linked with consumer research by using explicit measures such as self-administered questionnaires and interviews [45]. Another problem with Customer Journey Maps is that although they are now used in various industries, no clear process to design journey maps has been established, which makes it extremely difficult to compare across websites or interfaces, leading to inconsistent and non-generalizable results [37].

Fig. 1.
figure 1

Example of a customer journey map

As seen from the above-mentioned methods, customer experience measurement has mainly been observed from a qualitative angle, using focus groups or observations, with the exception of surveys, which can include both qualitative and quantitative data [37, 40, 42]. To understand the consumer’s complete experience, data driven, quantitative analysis must be combined with qualitative, judgement driven evaluations [42]. There is currently a lack of methods that combine both these approaches [42], as well as a lack of quantitative methods that would make experiences comparable across websites [35]. To this day, there is no agreement on a method that would allow to evaluate all aspects of customer experience while reflecting reality, particularly when the user is completing complex tasks [32]. A recent study highlights the importance of using implicit measures to validate the data obtained from the participant’s perceived emotions, to make sure all users emotions and reactions are considered [2]. Another recent study used electroencephalography (EEG) and eye tracking to explore customer experience in order to develop new visualization methods [2]. The authors quantified user experience with data such as attention levels, eye blinks, and pupil size. This study’s difference compared to the previous ones is that the data collected comes from implicit, quantitative measures and can therefore be less bias indicators of customer experience, when compared to explicit or qualitative methods [2]. This leads to the next section, explaining why consumers responses are sometimes biased.

2.2 Biases in Consumer Responses

A user’s perception towards a system is commonly measured through self-reported measures. However, it is difficult for users to precisely report on their own experience as they may be influenced by multiple biases, often unwillingly. Prior research shows that there is an important discrepancy between what users feel during the experience and how they recall it afterwards [11, 18]. Research suggests that retrospective evaluations are often biased and that human memory is influenced by peak moments [28, 29, 43]. According to Fredrickson and Kahneman [28] (p. 46), “[…] most moments of an episode are assigned zero weight in the evaluation and a few selectsnapshotsreceive larger weights”. This means that those snapshots are usually the only things remembered from a previous moment. Two examples of the snapshots are described by the peak effect and the peak-end rule. The peak effect is that the user tends to better remember the most intense moment of the experience, while the peak-end rule is that the user’s impression about the experience tends to be influenced by its final moment [10]. Therefore, when asked about remembering a precise moment, users can lack confidence because of both the process of remembering and the act of evaluation [29].

While remembering can be difficult because of the loss over time of the ability to recall certain details of the context, the time spent between the experience and the moment of recalling can also impact the biases related to the operation of remembering [18, 28]. Furthermore, it has been shown that the intensity of the emotions felt plays an important part in the recalling process and that negative memories tend to be better remembered than positive ones [3, 5]. As for the act of evaluation, is has been shown that hedonic and utilitarian moments are remembered differently, but that both are influenced by effects that could bias their retrospective evaluation [33]. Therefore, current methods used to measure and map user experience may be subject to multiple biases, as they are based on human retrospective evaluation. Using implicit physiological measures is a potential way to get around those biases.

2.3 Advantages and Disadvantages of Using Psychophysiological Measures

Over the years, many physiological and psychophysiological measures have been developed to evaluate users’ responses such as electrocardiography (ECG), respiration rate, skin-based measures (EDA), blood pressure, ocular measures and brain measures (EEG) [8]. With the increase popularity of e-commerce, it has become necessary to take into account users’ emotions when interacting with an interface, as users’ decisions are often based on hedonic motivations rather than utilitarian ones [7]. However, research shows that interrupting users during a task negatively affects their affective states, therefore biasing results [4]. Hence, to improve human-computer interaction in e-commerce without interfering with the interaction, using physiological measures can be extremely useful [17]. In domains such as entertainment technologies, physiological measures are far most robust in finding differences between participants and tasks than current subjective methods [34]. Another advantage is that data is collected in real-time, which allows to precisely identify peaks without relying on user’s memory. For example, a study on mental workload on air traffic controller operations showed that using real-time eye movement data allowed for deeper insights that subjective ratings might not have discovered, therefore allowing designers to detect problems earlier in the design process [1]. Moreover, capturing data can often interfere with the validity of the results, as users can be obstructed or distracted by the settings or methods used. Using unobtrusive tools to capture psychophysiological data allows users to use a given technology in a realistic way, giving more reliable insights while reducing biases of explicit measures as well, as it can be used in a complimentary way to give more validity to the results [15].

Furthermore, using implicit psychophysiological measures allows to test multiple new factors that can not be accurately reported by the users at any given moment in time. Many of those measures are constructs related to user experience, such as valence, arousal, and cognitive load [14]. For example, a study on equipment installation found that success was negatively impacted by the level of the user’s arousal [26]. This result could not have been found with the same accuracy without the help of psychophysiological implicit measures. Another recent study, where users were asked to retrospectively review their previous interaction with a website at every moment in time, show that the user’s accuracy of the evaluation of their previous emotions was extremely low and often completely incorrect [24], therefore showing the utility of more accurate measures.

Although physiological measures open new ways for researchers to understand user behavior, it also comes with some disadvantages. Since it is a relatively new area of application, definitions and ways of measuring physiological constructs such as workload or arousal often vary between studies. This makes it difficult to compare results across studies, and to replicate and validate findings [8]. Also, physiological measurement tools can be sensitive to extraneous noise, further complicating comparison between studies. For example, EDA, a skin-based measure of change in electrodermal activity varies with temperature, level of humidity, time of the day and season, which are all difficult to control [30]. Furthermore, in some cases, participants do not react the same way in a laboratory setting as in a real-life setting. For example, a study measuring mental workload for plane pilots showed that measures taken during a real flight tasks were completely different from measures taken during the same task done in a laboratory setting [48]. Another study using cardiovascular responses also showed a weak correlation between laboratory and real-world contexts [27]. Moreover, recent research has shown that a single measure is not sufficient to satisfy validity requirements and therefore, triangulation is necessary in order to obtain valid results [8]. Triangulation is also necessary because a same physiological reaction can be elicited for different reasons, depending on the context and the user’s previous experiences [8]. Basically, triangulation allows for better data interpretation and therefore more useful insights.

3 Method

We collected data in order to identify psychophysiological pain points during an e-commerce interaction. The goal was to develop a method that would accurately identify pain points and combine them with eye-tracking data in order to gain key insights into the user journey. A second goal was also to compare the pain points identified using psychophysiological data with the ones identified qualitatively by the users retrospectively. To increase the generalization of our results, we used three different websites in order to obtain different sources of pain points. This allowed us to compare pain points both across websites and between participants using the same websites.

3.1 Context

We used online grocery shopping as the study context. This context has numerous advantages. First, it involves high complexity arithmetic tasks for multiple items, as users need to figure how much they need of each product [13]. This need for multiple items forces the customer to accomplish multiple tasks in a single visit and choose between a vast product assortment, which makes a session longer than a traditional e-commerce session, even if more convenient than a trip to the grocery store [38]. Second, online grocery shopping also generates risk as users need to trust the website regarding both the freshness and the quality of products as well as confidential data such as credit card and phone information [9]. This lack of trust can cause pain points, as users are already potentially opposed to buying fresh products online or filling out their personal information, making them more sensitive towards potential problems. Third, online grocery shopping is an uncommon or unusual transaction for users, as in 2016, only 21% of consumers globally have already bought fresh online groceries [41]. Finally, in this specific context, consumers were more involved in the task as they were buying their own groceries rather than having a simulated goal, compared to other studies where the nature of the task is artificial [25].

3.2 Design, Sample and Procedure

Twenty-one students and young professionals (mean age: 23) were recruited via our institution’s panel and were divided between three equal groups of seven participants, each group shopping on a different online grocery website. Using three different websites allowed to illustrate possible comparisons between websites as well as determine if the results were generalizable. Participants had one task: they were asked to do their grocery shopping online, buying items they really needed and paying using their own credit card to maximize ecological validity. The task was the same for all three groups. It lasted between forty-five minutes and an hour, excluding the baseline measures. Participants had to spend at least 50$ and were asked to select the store where they would go pick up their order in the following days. They had to buy at least one fruit, one vegetable and one piece of meat to make sure they would navigate sufficiently on the website. Participants had to fill a questionnaire before the task, right after the task, and after picking up their order from the store. An interview was also conducted right after the task by an experienced moderator, in order to know qualitatively how the user felt about the task. In that interview, the user was specifically asked about the positive and negative aspects of his online grocery shopping experience. Every participant received a $60 cash compensation to reimburse their groceries. Each participant completed a consent form beforehand and this project has been approved by the Institutional Review Board (IRB) of our institution.

3.3 Measures

During the interaction with the assigned website, non-intrusive tools were used to capture the users’ reactions in real time. A Tobii X-60 eye-tracker (Stockholm, Sweden) sampled at 60 Hz, as suggested by Laeng et al. [31], was used to capture eye-tracking data and Tobii Studio was used to record the experience. The use of eye tracking data allowed to identify precisely where the participant was looking at every second and the recording allowed to go back afterwards, without interfering with the interaction. Arousal was measured using electrodermal activity (EDA) with the Acqknowledge software (BIOPAC, Goleta, USA). EDA is a precise indication of physiological arousal and its variation throughout time [21]. Emotional valence was measured using facial emotion recognition with the FaceReader™ software (Noldus, Wageningen, Netherlands). FaceReader™ was used to observe facial movements to calculate emotional valence, from negative to positive [12]. The Observer XT (Noldus, Wageningen Netherlands) software was also used to synchronize apparatus and event markers.

At the end of the experiment, a qualitative interview was conducted with each participant, where users we asked explicitly about the positive and negative aspects of the task, in order to verify what pain points were noticed by the participants. Qualitative data was analysed using Reframer from Optimal Workshop to find trends between participants. This was done in order to compare the added value of the implicit and explicit measures in the construction of the journey map.

Calculations of pain points using a specific threshold was done using statistical software SAS 9.4 and results were then illustrated as a journey map using Tableau®. In this particular context, to be qualified as a pain point, the data point needed to be both in the ninetieth percentile of EDA (i.e., high arousal) and in the tenth percentile of valence (i.e., large negative valence). Each pain point was validated manually using the time code of the recording in Tobii Studio. It was also used to put markers at the beginning and ending of each subtask, in order to color code them in Tableau®.

These tools allowed to identify and label the emotional peaks. In sum, the visualization method allowed us to accurately and precisely identify the psychophysiological pain points using non-intrusive tools and ensure that our insights were representative of what the users really felt by comparing the results of the quantitative data (implicit pain points) with the qualitative data (explicit pain points).

4 Results

Our results show that the temporal occurrence of psychophysical pain points can be accurately identified. Using a journey map representation, the evolution of valence (y axis) and arousal (size of dot) over time (x axis), was sampled for every single second (see Fig. 2). We color-coded each subtask, i.e., shopping, account creation, payment, time selection, and store selection to better visualize the order as well as the number of times the participant came back to that subtask. As an optimized journey is expected to be linear (i.e., no coming back to a previous subtask), this allows us to see where potential problems could be as well. For example, in Fig. 2, we can see that the participant started with shopping, then switched to time and store selection, before returning to the shopping task. He then returned to time selection, before proceeding with account creation and payment. As the journey is relatively linear, there are not many pain points along the way. Pain points were identified using a different shape and colour as the other dots, in other to distinguish them. Pain points are illustrated by red squares (Fig. 2) and calculated using a specific threshold. In Fig. 2, most pain points are toward the end of the interaction, in the payment and account creation subtask, except for one located in the shopping subtask. It can also be noticed that some pain points are successive, as they come one second after the other. We called those pain periods, as they usually have the same source. For example, there is a pain period labeled “Enters his last name”. This means that this specific task was painful for successive seconds, therefore showing the importance of improving this specific task compared to other single pain points. Finally, the visualization method allows to add labels to the online user journey, to identify the reason behind the pain points visually, so that with one glance, one can understand what is wrong for a specific participant. For the participant below (Fig. 2), we can see that the experience was relatively painless until the end, where s/he experienced many pain points during the payment and account creation subtasks, mainly when entering personal information, such as first name, last name, postal code, and credit card information.

Fig. 2.
figure 2

Visualization of the online user journey for one participant

Furthermore, the visual representation of the user journey allows for an easier comparison between participants. This allows to compare the duration of consumer journeys, as well as the order and duration of the different subtasks and the location of the pain points. In the example below (Fig. 3), one can see that the 6th participant took more than twice the time of the 1st one to complete the same task. All participants started with the shopping subtask, probably because it is the most intuitive way to start. The 2nd participant made his store selection early in the process and that did not cause any pain points, compared to three other participants, that did the same subtask later and experienced pain points doing it. A possible reason explaining those results could be that choosing your store at the beginning shows you only the food items available at the store chosen. If you chose later, some of the products in your cart could become unavailable, causing pain points to the participants because they either had to find a substitute or delete the item from their cart. This method can also be used to compare journeys between different companies. For example, comparing how many pain points were related to shopping or payment for different competitors is a good way to benchmark how well the company is performing in different areas.

Fig. 3.
figure 3

Comparison of different participants. Legend: Numbers indicate the pain points number.

Labelling those pain points also allowed us to compare the experience truly felt by the participants with that users mentioned afterwards. This gave additional insights by identifying pain points that were not mentioned qualitatively by the participants afterwards but most importantly, showed us specific moments where the participant clearly mentioned that a specific subtask went well, while the pain points identified clearly showed otherwise. For example, Fig. 4 shows that the participant reported that he had no problems filling out his credit card information. However, its body reactions showed otherwise.

Fig. 4.
figure 4

Comparison of qualitative and quantitative data for one participant: pain points not identified

Our results showed that less that 25% of pain points were identified qualitatively by the participant afterwards. Out of the 65 pain points or pain periods identified for the 21 participants, only 16 were mentioned as a negative point afterwards (24,6%). Most surprisingly, 5 out of those 65 pain points were clearly mentioned as specifically positive by the participants, while the physiological data clearly showed otherwise, as you can see in the Fig. 3 below. Results between grocers are surprisingly similar and are shown in Table 1. Details of Pain Points per Grocer 1.

Table 1. Details of pain points per grocer

5 Discussion and Concluding Comments

Our results show that the temporal occurrence of implicit psychophysical pain points can be accurately identified and that the visual representation of the user journey allows for an easier comparison between participants. Moreover, results showed that less that 25% of pain points were identified qualitatively by the participant afterwards and that some pain points were clearly mentioned as specifically positive by the participants, while the physiological data clearly showed otherwise.

This study contributes to the existing user experience literature by proposing a reliable method to visualise peak emotional reactions experienced by users while performing a task. Thus, providing more precision and reliability in identifying pain points when compared to pain points mentioned by users after the task (Fang et al. [19]). It also introduces the notion of implicit psychophysiological pain points, which, compared to explicit pain points previously used in the literature, allows to identify more pain paints and gives more reliable insights by potentially reducing biases of explicit measures [15, 25].

The results also have managerial implications. First, prior work by Georges et al. [20] explained the importance of several factors when developing new UX evaluation tools using physiological measures, such as the ability to locate issues, the ease of use, and the reduction of the analysis time. This new method allows both practitioners and researchers to identify psychophysiological pain points easily and the visualization allows to interpret and analyze more efficiently the results. This study contributes to user experience’s evaluations tools by using physiological data to assess how users truly felt during an online task, providing more precision and accuracy in identifying pain points when compared to pain points mentioned by users after the task. Therefore, if practioners are interested in identifying pain points in order to improve interfaces, implicit pain points provide a more comprehensive list. However, if practitioners are interested in what users remember or think of their interface (e.g. attitude), explicit pain points should be used. Second, this study clearly shows that without the implicit emotional measures of users, it would have been extremely difficult to identify pain points, showing the relevance of this current study. Moreover, in an online grocery shopping context, pain points need to be identified in a much more precise way. The new visualization method presented in this study acknowledges this need, so companies can not only identify the “painful” steps, but the exact moment the pain point happened. Moreover, this new method is useful to benchmark user experience across interfaces, which can be used in prototype comparisons or competing interfaces.

Furthermore, some limitations need to be acknowledged. First, this visualization has so far only been applied to an online grocery shopping context and has not been tested in a hedonic context or a context that has a lot of arousal variations. Secondly, the experiment was about forty-five minutes to an hour long, excluding the baseline measures. This can be a limitation, as participants could have gotten tired and the pain points found in the final parts could be related to participant’s fatigue rather than actual problems with the interaction. Finally, as they were only 7 participants per grocery website, this was not a large-scale study, mostly due to the high cost of obtaining the data. Hence, additional studies in different contexts as well as of different durations of time and with a greater number of participants could help with the generalization of these results.

In conclusion, using this new visualization method allows to identify implicit psychophysiological pain points in the user’s experience, by targeting moments where the user had both a high level of arousal and a negative valence, compared to his baseline state, which meant that s/he felt an intense negative emotion. Identifying those pain points and combining them with eye-tracking data gives key insights into the online user journey and helps identify common negative moments between users. It also allows to gain a deeper understanding of the pain points that participants failed to identify during the post-task interview as well as compare the experience felt by the participants, either between tasks or between companies.