Keywords

1 Introduction

Mastery learning is an instructional technique popularized by Benjamin Bloom [7], but which at least dates back to progressive educational movements in the early twentieth century [17, 24]. The idea is to give students just the right amount of instruction or practice that they need in order to mastery a particular topic before moving them on to the next topic. Today, mastery learning underlies many adaptive learning technologies, including Khan Academy, Duolingo, ASSISTments, ALEKS, and cognitive tutors like MATHia. Each of these platforms are being used by thousands to millions of students yearly, and as such, the way they assess mastery can have real consequences for students. A lot of work has been invested in developing statistical techniques to infer models of student learning that make predictions about whether a student has learned a skill, which could in turn be used in mastery learning [5, 9, 12, 20, 25]. However, in practice, many state-of-the-art adaptive learning systems assess mastery in simple ways, which are seemingly not very “intelligent” [4]. For example, some platforms use simple heuristics to assess whether students have reached mastery, such as having students receive practice on a skill until they answer questions correctly three times in a row. Even platforms that use models of student learning often have the model parameters manually set by system designers or domain experts [10, 22], rather than leveraging data-driven techniques.

One advantage of using simple heuristics is that they are interpretable and easy-to-convey to teachers, students, and other stakeholders. Moreover, they may seem intuitively reasonable. However, it is not clear whether these intuitions in principle align with (a) our understanding of how students learn or (b) inferences we can make about student mastery from the data.

In this paper, we present a means for better understanding mastery learning heuristics, by re-interpreting them as model-based algorithms. In particular, we show that the N-Consecutive Correct in a Row (N-CCR) heuristic used by ASSISTments and a simplified version of the mastery learning heuristic used by ALEKS are both optimal mastery learning policies for variants of the Bayesian knowledge tracing (BKT) model. By placing mastery learning heuristics in the same playing field as model-based mastery learning algorithms, we hope to better understand the theoretical assumptions about learning that mastery learning heuristics are making, and as such, help guide designers of adaptive learning systems to make more intentional decisions about what heuristics to use.

2 Background

In what follows, a mastery learning policy is any instructional policy that considers topics–skills, concepts, or knowledge components (KCs)–one at a time, and decides how many practice opportunities to give for the current topic before moving on to the next. In all of the mastery learning policies we consider, the decision will be made purely based on whether previous answers were correct or incorrect on the students’ first attempt on each question for the same KC. An optimal mastery learning policy under a model is one that gives the optimal amount of practice subject to some accuracy threshold (e.g., 95% confidence that the student has mastered the skill).

A popular approach to mastery learning that underlies cognitive tutors, such as MATHia, is to use the Bayesian knowledge tracing (BKT) model. The standard BKT model for a single KC is a two-state hidden Markov model that assumes that after receiving a practice opportunity t, the student is in one of two knowledge states: the learned state, where they know the KC (\(K_t = 1\)), or the unlearned state, where they do not know the KC (\(K_t = 0\)) [12]. When the student begins using the adaptive learning system, BKT assumes they start in the unlearned state with probability \(P(L_0) = P(K_1 = 1)\). If a student is in the unlearned state, every time they attempt a practice opportunity and receive feedback, they have some fixed probability of learning the KC, \(P(T) = P(K_{t+1} = 1 | K_{t} = 0)\). When the student is in the learned state, they are assumed to stay there forever (i.e., no forgetting). Every time the student is given a practice opportunity, we can see whether they answered the question correctly (\(C_t = 1\)) or incorrectly (\(C_t = 0\)). If the student is in the unlearned state, they will answer correctly with some probability of guessing P(G) and otherwise answer incorrectly. If the student is in the learned state, they will answer correctly unless they slip with probability P(S). The BKT model for a single skill is thus fully described by four parameters: \(P(L_0)\), P(T), P(G), and P(S). When using the BKT model, one can continuously update the probability that the student has learned the KC so far (\(P(K_t | C_1, C_2, \dots , C_{t-1})\). The optimal mastery learning policy for the BKT model continues to give practice opportunities to the student until this probability exceeds some threshold, typically 0.95 [11, 12].

3 N-CCR as a Model-Based Algorithm

The N-Consecutive Correct in a Row (N-CCR) heuristic keeps giving students practice problems on a given topic or skill until the student answers the problems correctly N times correctly in a row. This heuristic is used in ASSISTments’ Skill Builders exercises [6, 15] and was previously used by Khan Academy [13]. Recently, Khan Academy has switched to a more gamified way of implementing mastery learning, where students can go through a series of Mastery Levels, but to reach the Proficient Level students still have to get a certain number of problems correct in a row [16].

We now show that N-CCR can be viewed as the optimal mastery learning policy for certain BKT models. Note that when using the N-CCR heuristic, if a student gives any number of consecutive correct answers less than N followed by an incorrect answer, then they are back in the same “state” as though they had not given any correct answers. Now suppose that the true model of learning is a BKT model. Then, this must mean once we see \(C_{t-1} = 0\), the student is identified as having been in state \(K_{t-1} = 0\), or \(P(K_{t-1} = 0 | C_{t-1} = 0) = 1\). Using Bayes’ rule, it can be shown that this implies that the probability of slipping P(S) must be zero. By setting the values of P(G), P(T), and \(P(L_0)\) appropriately, it can be easily seen that N-CCR is the optimal mastery learning policy for some BKT model where \(P(S) = 0\) and for a given accuracy (e.g., 95%). We demonstrate this precisely in the online supplementary materialFootnote 1. It is worth noting that the BKT model with \(P(S) = 1\) was actually a well-studied model in the 1960s mathematical psychology community, known as the one-element model [8, 14].

This re-formulation of N-CCR can help explain why it may seem to perform well in practice. For example, Pelánek and Řihák [19] showed in simulation that, even if students learn according to BKT models, the N-CCR heuristic (with the optimal value of N) often performs almost as good as the optimal BKT mastery learning policy. This may seem surprising, but our findings indicate that if P(S) is small, then the best N-CCR heuristic will correspond to the mastery learning policy for a BKT model that might be close to optimal.

4 TOW as a Model-Based Algorithm

The Tug-of-War (TOW) Heuristic is what we are calling a mastery learning heuristic that gives points to students for answering questions correctly, removes points for answering questions incorrectly (while keeping the minimum number of points at zero), and keeps giving practice until the student achieves a certain number of points. We will use TOW\(_{+i,-j,N}\) to designate the specific TOW heuristic where the student gets i points for a correct answer, loses j points for an incorrect answer and needs N points for mastery. ALEKS, a prominent adaptive learning system, implements mastery learning using the TOW\(_{+1,-1,5}\) heuristic (or TOW\(_{+1,-1,3}\) in some cases) with a few differencesFootnote 2 [2]. Previously, ALEKS used the TOW\(_{+2,-1,5}\) and TOW\(_{+2,-1,3}\) heuristics [23].

To see how some TOW heuristics can be interpreted as model-based algorithms, consider a variant of the BKT model where we make no assumptions as to how or when the student learns a KC. That is, P(T) need not be a fixed probability (e.g., it can increase over time), it need not be the same for all students, and it need not even be probabilistic. Since we do not make assumptions about how students are learning, to implement mastery learning here, we simply want to detect when there appears to have been a sudden increase in the probability of answering correctly (from P(G) to \(1 - P(S)\)). This is known as change-point detection, which is a well studied problem in statistics and fields like quality control [1, 3, 18]. Specifically, we can use the Bernoulli CUSUM chart algorithm [21]. We describe this method and how it can be applied to mastery learning in the online supplementary materialFootnote 3. Given that we do not make assumptions about the probability of learning, we cannot make any statement about how confident we are that the student has learned the skill. Instead, we can set the parameters such that 95% of students who have not learned the skill would have had at least a certain number of practice opportunities (which we can choose) before we would mistakenly declare mastery. It can be shown that a variety of TOW heuristics, including all of the ones mentioned above can be implemented using the CUSUM algorithm with an appropriate choice of parameters.

5 Conclusion

Given that these heuristics may be more principled than they might appear at first sight, perhaps their use in adaptive learning systems is warranted, especially given that they are much easier to communicate to students than complex model-based policies. However, other considerations need to also be taken into account. For example, the N-CCR heuristic might be demotivating for students given that a single “slip” punishes the student. On the other hand, if we believe students can slip, then perhaps the N-CCR heuristic should not be used altogether! All in all, a more comprehensive understanding of mastery learning heuristics and their hidden models can hopefully help us ensure that adaptive learning systems perform mastery learning in productive ways.