Keywords

1 Introduction

Researchers have investigated the value of solving problems using non-traditional approaches to problem solving. Worked examples [1,2,3] and erroneous examples [4,5,6] have been of particular interest. Worked examples demonstrate a procedure to arrive at a correct solution and may prompt students to provide explanations to correct steps of a solution while erroneous examples require them to identify and fix errors in incorrect solutions. The reason these approaches improve learning has been attributed to their role in freeing up cognitive resources that can then be used to learn new knowledge [7]. Factors not specific to a particular approach may also interact with learning. Of these, affect and behavior have garnered the most attention [8,9,10,11]. In particular, states of confusion, concentration and boredom have been shown to persist across computer-based learning environments (dialog tutors, problem-solving games, problem-solving intelligent tutors) [12].

In a recent study, we found that students who were assigned erroneous examples implemented in an intelligent tutor [13] experienced higher levels of confrustion [14], a mix of confusion and frustration, than those who were asked to answer typical problem-solving questions. However, we found that confrustion was negatively correlated with both immediate and delayed learning, albeit less so for students who worked with erroneous examples.

This study, which is a replication of our recent findings but in a game versus ITS context, was motivated by two observations. First, in order to determine whether this relationship is robust, it is important to explore whether our recent findings persist in other digital learning environments. This is because levels of affective states such as frustration and behaviors such as gaming the system have been shown to vary across learning environments and user interfaces [12, 15].

Second, research has shown that students who engage in gaming the system also experience frustration [10], though frustration does not always precede gaming [12]. Therefore, it is interesting to explore if this association persists when erroneous examples are implemented in a digital learning game context.

Participants were divided into four groups where two groups worked with either Erroneous Examples (ErrEx) or Problem Solving (PS) questions only and the other two worked with a mix of either ErrEx then PS or PS then ErrEx questions. We expected that students in all four groups would perform better from pretest to posttest. We then tested the following hypotheses:

  • H1: Confrustion and gaming will be negatively related to performance, even when controlling for prior knowledge.

  • H2: Students in any of the conditions that include erroneous examples will experience higher levels of confrustion and gaming the system.

  • H3: Students in any of the conditions that include erroneous examples will perform better than their PS-only counterparts in the posttest.

2 Methods

The data used in this study was collected in the spring of 2015. Participants were recruited from four teachers’ classes at two middle schools, and participated over four to five class sessions. Both schools are located in the metropolitan area of a city in the United States. The analysis for this study included the data of 191 students, divided into four conditions within the game context.

Materials consisted of the digital learning game, Decimal Point [16], and three isomorphic versions of a test administered as a pretest and posttest. The Decimal Point game is laid out on an amusement park map, with 24 mini-games in which students play two rounds of each. All tests and the game used the Cognitive Tutor Authoring Tool (CTAT) [17] as a tutoring backend. The game was designed with focus on common misconceptions middle school students have about decimals [18].

We used gameplay data to generate machine learning models to detect confrustion and gaming the system. In this study, we applied text replay coding [19, 20] to student logs to label 1,560 clips (irr κ = .74). To predict confrustion and gaming, the detectors used 23 features of the students’ interaction with the decimal tutor, involving the number of attempts, amount of time spent and restart behavior.

After evaluating the performance of several classification algorithms in terms of Area Under the Receiver Operating Characteristic Curve (AUC ROC) and Cohen’s Kappa (κ), we built the confrustion detector using the Extreme Gradient Boosting (XGBoost) ensemble tree-based classifier [21] (AUC ROC = .97, κ = .81) and the gaming detector using the J-Rip classifier [22] (AUC ROC = .85, κ = .62).

3 Results

Confrustion was significantly, negatively correlated with performance on the pretest (r = −.62, p < .001) and posttest (r = −.68, p < .001). A multiple regression model tested using confrustion to predict posttest performance while controlling for pretest was also significant, F(2, 188) = 181.14, p < .001. Within the model, both pretest, (β = .57, p < .001) and confrustion (β = −.32, p < .001) were significant; confrustion was a significant, negative predictor of posttest performance even after controlling for pretest.

Gaming was significantly, negatively correlated with performance on the pretest (r = −.58, p < .001) and posttest (r = −.66, p < .001). A multiple regression model tested using gaming to predict posttest performance while controlling for pretest was also significant, F(2, 188) = 181.14, p < .001. Within the model, both pretest, (β = .59, p < .001) and gaming (β = −.31, p < .001) were significant, indicating that gaming was also a significant, negative predictor of posttest performance even after controlling for pretest.

Mean levels of confrustion and gaming for each condition are reported in Table 1. A one-way analysis of variance (ANOVA) comparing gaming and confrustion levels across conditions indicated a significant effect of condition on confrustion, F(3, 187) = 14.01, p < .001, and gaming, F(3, 187) = 10.07, p < .001. Posthoc (Tukey) tests indicated that students in the PS-only condition experienced significantly lower levels of confrustion (ps < .001), while there were no differences among the other conditions (ps > .97). Similarly, posthoc (Tukey) tests indicated that students in the PS-only condition experienced significantly lower levels of gaming (ps < .001), while there were no differences among the other conditions (ps > .91).

Table 1. Gaming, confrustion, and test performance by condition.

Finally, a repeated-measure analysis of variance (ANOVA) indicated that students across all conditions improved significantly from pretest to posttest, F(3, 187) = 167.04, p < .001. See Table 1 for means and standard deviations across conditions. A series of ANOVAs indicated no significant differences across conditions on pretest, F(3, 187) = 1.63, p = .18, or posttest, F(3. 187) = 1.65, p = .18.

4 Discussion

In this study, we implemented erroneous examples in a digital learning game context and found that students who played the erroneous examples versions of the game experienced higher levels of confrustion. There was also a significant correlation between gaming the system and confrustion. Future research might further explore the relationship between frustration and gaming, as previous research using affect detectors has found that frustration did not tend to precede gaming the system [12].

A previous study using a web-based intelligent tutor showed that students working with erroneous examples performed better than their problem-solving counterparts [6]. This study, however, did not replicate that finding.

While it is not possible to make a direct comparison between confrustion levels in the game and intelligent tutor versions of the ErrEx condition, it is worth noting that students who played the game experienced higher levels of confrustion (M = 0.46, SD = 0.26) than those who used the intelligent tutor (M = 0.34, SD = 0.16) [13]. Since confrustion has been shown to be significantly, negatively correlated with learning, these higher levels of confrustion may explain why we did not see better learning effects of erroneous examples in the game context.

Alternatively, integrating the game interface with a feature where students watch a game character play the game for them may have negatively impacted both the game experience and the intended benefit of erroneous examples.

In an upcoming study, we will explore mechanisms intended to reduce the negative impact of confrustion and gaming on learning with erroneous examples in a digital learning game.