Keywords

1 Introduction

Education systems require constant evaluation in the form of written texts. In any country, the solving of those represents a challenge. For foreign students, however, the difficulty of the understanding of a series of questions written on a different language requires intellectual work, in addition to the effort required to solve the problem itself, putting them in a position of disadvantage. The language proficiency of a student may be taken in to account when there are time restrictions to answer an exam, but until now there is no sufficient evidence to support this argument. Also, the use of online tutoring systems and tests make this practice a very common resource for professors and teachers all around the globe, creating the need of systems to measure the state of the student during online sessions in order to understand the limits and reaches of these tools, regardless of the language limitations and helping to design more efficient systems for a better education where all students have the same opportunities.

Considerable progress has been made over the last decade in integrating models generated from behaviors such as keyboard clicks and interaction latencies with real-time sensors indicating users’ affective states. These models have produced significant advances in Intelligent Tutoring Systems (ITS) research, leading to more adaptive systems that should ultimately be able to intervene to optimize learning for individuals. However, when considering the typical classroom situation, feasibility of data collection becomes an important constraint to consider: Although more information about the user’s state would clearly be better in terms of creating accurate student models, there is also a limit to the instrumentation that we can apply to the user, at least outside a laboratory situation.

In this study, we analyzed the behavior of the brain signals of 16 students, 10 whom first language was English and 6 who were English learners to find a difference in their cerebral activity in order to understand the physical processing of written text, previous to the solving of a math problem while using an online tutoring system.

2 Prior Work

Researchers in the ITS community have investigated several approaches to acquire information of the state of the learner. One of them is to use behaviors such as the time between clicks and rapid activation of instructional scaffolding (e.g., repeatedly clicking on the “help” button) to estimate whether the student is actually engaged with trying to solve a problem or is avoiding effort, perhaps by “gaming the system” by deliberately entering wrong answers in order to move on quickly [3]. Beal, Mitra and Cohen [4] used a Hidden Markov Model (HMM) to infer the level of engagement of the student to predict behavior in the next problem. The results showed that students had distinct trajectories of engagement, and that the HMM estimates were strongly related to independent estimates of individual students’ mathematics motivation, based on students’ self-report and reports provided by their teachers, and mathematics proficiency (grades, test scores). Johns and Woolf [12] also reported that an HMM provided good predictions about students’ motivation while solving a series of problems in a math tutoring system, predicting a student correct response 72% of the time (versus 62% of the baseline) with a Dynamic Mixture Model based on Item Response Theory. Arroyo, Mehranian and Woolf [2] tested 600 students on an ITS for mathematics, estimating in real time the effort the student invested on solving each problem and using the results to choose the next problem with the same level or greater level of difficulty if the student is engaged, and an easier one otherwise. Students using the experimental effort-based selector scored significantly better than those that got problems served by a random problem selector (57% vs. 42% accuracy in post-tests).

Technology for capturing electroencephalography (EEG) signals has progressed considerably, to the point where the user can wear a lightweight recording unit that transmits data for analysis. The recording unit is sufficiently non-intrusive and it is being used in a variety of tasks that require sustained attention and cognitive effort, including long-haul truck driving, missile tracking and submarine systems control. Berka et al. [6] found that officers tracking missiles in a simulated environment Aegis Combat System had a high or extreme cognitive workload 25 to 30% of the time and achieved a detection efficiency of almost 100%. Education researchers have begun to use this type of device to track students’ cognitive activity during problem solving.

In an effort to understand and improve the reading capabilities of English learners, Pegory and Boyle [15] suggest that those students must have different motivations for reading, since the effort they make is substantially higher than their classmates. Other studies focus on the differences between the experiences of English learners versus their schoolmates’, describing the problem from a social perspective, mentioning the limitations in access to different resources, inequality issues [11], or emphasizing their needs [10].

In previous work [7], we found that English Language Learners using an Intelligent Tutoring System were less likely to answer correctly, had more incorrect answer attempts, took longer on each problem, and were more likely to use multimedia help features.

In this work, we try to explore and detect the patterns of the brain signals of both English learners and English primary students in order to recognize their differences. We compared the performance of two supervised machine learning algorithms (AdaBoost and Classification Tree C4.5) for classification when recognizing if a brain signal corresponded to an English Primary or an English Learner student.

3 Methodology

In this study, we invited a number of students (n = 16) to solve a series of easy and difficult math problems in an Intelligent Tutoring System (ITS). We asked them their primary language and we obtained a total of 10 English Primary (EP) speakers and 6 English Language Learners (ELL).

The verbal description of each problem selected for the experiment had different levels of complexity, since the composition of the text for each question varies in extension, and some of them may be more elaborated than the rest. It was necessary to differentiate these levels in order to test if the wording of the problem affected the ELL students’ performance. We obtained the readability level of each question used in this study using the Collins-Thompson and Callan [9] method, using algorithms with automatic modeling and predicting of the reading difficulty of texts (See Table 1).

Table 1. List of problems selected for the study, with their difficulty and reading difficulty level according to Collins et al.

The EEG data was recorded as students solved a series of easy and difficult math problems. There were 16 participants in the study (8 males, 8 females). Participants were college students who were at least 18 years old and gave active written consent for participation. Each person participated in a 90-minute session, which included informed consent procedures, fitting the EEG headset, completing a 15-minute baseline calibration task, and solving eight multiple-choice math problems presented at the computer while wearing the EEG headset. Math problems were taken from a set of released SAT items; there were four easy problems and four hard problems, with difficulty level determined by information from the College Board. Each problem had four answer options. The items were presented to students within an online tutoring system that recorded the time on the problem (initial presentation on the screen to first answer selection) as well as the outcome (correct, incorrect answer chosen). Problems were presented in one of two sequences (easy, easy, hard, hard, easy, easy, hard, hard or hard, hard, easy, easy, hard, hard, easy, easy) across subjects.

3.1 EEG Data Acquisition

The electroencephalogram (EEG) data were recorded from nine sensors integrated into a mesh cap covering the upper half of the head, along with two reference signals attached to the mastoid bones (behind the ears) and two sensors attached to the right clavicle and to the lowest left rib to record the heart rate (although the heart rate data were not used in the study). The location of each sensor was determined by the International 10–20 System [14] to ensure standardized reproduction of tests. This cap was equipped with a small wireless transmission unit. A small USB dongle received the wireless transmissions to a PC computer with Windows (XP/Vista/7) 32-bit operating system.

Each second, 256 EEG signals were transmitted and converted to Theta, Alpha, Beta and Sigma wave signals (ranging from 3 Hz to 40 Hz). These signals were processed by Advanced Brain Monitoring proprietary software from B-Alert to produce classifications of mental states, meaning the probability that the participant was in a particular state in epochs of one second. States included Engagement, Distraction, Drowsiness and Cognitive Workload [5]. Engagement includes estimates of cognitive activities such as information gathering, visual scanning and sustained attention, and Workload is a measure of effortful cognitive activity [16]. The Engagement and Workload data were selected for our analyses because levels of Drowsiness were almost non-existent in the present study, and Distraction is essentially the inverse of Engagement.

3.2 Data Processing and Classification

Of a total of 16 participants, 15 completed all eight problems and one completed seven of the eight items. The data set thus consisted of 127 completed math problems with its corresponding Engagement & Workload data. We refer to these as Engagement signal and Workload signal.

The signals were processed by converting each signal into one of three equal-sized bins, with limits set at 0.333 and 0.666 of the Cumulative Distribution Function. The count of signals below 0.333 were considered as the Low State, values between 0.333 and 0.666 as the Medium state, and those between 0.666 and 1.0 as the High state. By doing this, we assure a normalization between all the participants.

As a result, Table 2 shows that when measuring the Workload data, the participants solved problems in a High state around 65% of the time for both type of problems, easy and hard, in the Medium state around 24% of the time and in the Low state 11%, while processing the Engagement data shows that the predominant states where High and Low (both around 40% of the time) for the easy and hard problems.

Table 2. Percentage of time in three levels (low, medium and high) for Engagement and Workload states while solving math problems in the ITS. Student Workload inclines toward the High state in both Easy and Hard problems, while Engagement has Low and High states.

Using the language proficiency of the students, a table of average states was generated, comparing English Primary (EP) students versus English Language Learner (ELL) students. In this table it shows that ELL students spent more time in a high state of Engagement (visual acquisition) than the English primary students (Easy: 51% vs 44%, Hard: 46% vs 40%) and the EP students state averages for Workload show that they were in a High state (Easy: ELL 48% vs EP 74%, Hard: ELL 50% vs. EP 77%), as it can be seen in Table 3.

Table 3. Percentage of time in three levels (low, medium and high) for Engagement and Workload states while solving math problems in the ITS, for English Language Learners (ELL) and English Primary (EP) speakers. The state of the Workload signal is higher for English Primary (EP) speakers in both Easy and Hard problems.

In previous work [8] we used a transition probabilities table of the low, medium and high states of both signals for each problem solved, producing vectors of 9 features and a class tag (outcome of the problems). The results showed that it was possible to predict if the student was going to choose either a right or wrong answer with 83% of accuracy on easy problems. On this study, we use a similar approach, but instead of predicting the outcome, we will try to predict if the student was an English Learner or not. We used language proficiency (English Language Learner ELL or English Primary) to construct 8 files (one for each problem) to serve as training sets for to two different machine learning algorithms (AdaBoost and Classification Tree C4.5).

As the signals of Engagement and Workload are of different length for each user and problem, it is necessary to process them to represent them in a more manageable way.

We reduced the Workload signals into vectors of equal size with Piecewise Aggregate Approximation (PAA) [13], to use these vectors as features to a classifier. The signals are divided in N bins and the mean of each segment is calculated and stored in its bin, thus, describing the signal with a reduced representation. A file is generated for each problem, one user per register (16 users total) with language proficiency (ELL or EP) as the class, and the means as the features. A plot for the Workload signal for the problem ‘Triangle’ can be seen in Fig. 1.

Fig. 1.
figure 1

Workload signal for problem ‘Triangle’ reduced to 8 bins using PAA. English Primary speakers (*) are mostly on the top of the chart.

The processed signals for each problem (8 for each 16 participants) are used as input to an AdaBoost classifier and a Classification Tree (C4.5) with random sampling, setting aside 90% of the records for training and 10% for testing. The average of 20 train/tests sessions is reported.

4 Results

The readability level of the word math problems appears to have no impact in the correct classification of the type of student, as well the average response time (See Table 4). The only factor that seems to affect the classification (using a Classifying Tree) is the difficulty type. Problems marked as ‘Hard’ had a better rating than the ‘Easy’ ones. ‘Bus’ had its students accurately classified 94% of the time, and ‘Towns’ and ‘Triangle’ show a 72% classification accuracy. Using AdaBoost with the SAMME algorithm and 50 estimators does not produce better results. Only ‘Bus’ and ‘Triangle’ show a classification accuracy better than 75%. ‘Summation’ and ‘Village’, being ‘Easy’ problems, classify no better than chance.

Table 4. The classification accuracy for problems of type ‘hard’ is better than chance in 3 of the 4 cases with the Classification Tree algorithm.

Results show that neither the response time nor the type of the problems are the key to distinguish the language proficiency for this set of users. Even when there are some problems where the accuracy is as high as 94%, in others, the prediction is as good as chance. However, when the complexity of the problem was high and the response time was larger, the accuracy was mostly acceptable. There could be different aspects to analyze in more depth, as the time the volunteers have been in the United States and at what age they arrived, as their use of the language on a daily basis. This is a first approach that has given some new directions to our research.

5 Conclusions

The average state of Engagement and Workload of English Primary speakers and English Language Learners shows that a difference between them exist and can be found under certain circumstances, where the complexity of the problem demands a higher workload for a longer period of time.

In this study, English learners were born in different countries, and all of them had the opportunity of study at a university in the United States. The time they spent in the country before the experiment was conducted is unknown, and probably that missing piece of information was relevant to the interpretation of the results. In a future experiment, we will test only foreign students who, even when they understand the language, are not used to speak English on a daily basis. Also, we will contemplate the incorporation of more participants as well including more challenging math problems.