Why do we, under some circumstances, rely on costly, effortful cognitive processing, while other times turn to relatively effortless, cognitively ‘inexpensive’ forms of processing? A spate of recent research has endeavored to examine the situational and individual factors which govern the deployment of cognitive effort in service of task goals (Kool & Botvinick, 2018; Shenhav et al., 2017; Westbrook & Braver, 2015). Of particular interest in this burgeoning cognitive effort literature are tasks requiring cognitive control—broadly defined as the capacity to flexibly adapt one’s behavior and appropriately direct cognitive processing in accordance with internally maintained goals. Cognitive control is readily measurable in the lab using, for example, interference tasks such as the Stroop or flanker (Botvinick, Braver, Barch, Carter, & Cohen, 2001). In these tasks, successful performance is thought to reflect not only an individual’s cognitive capacity (i.e., executive function ability) but also the individual’s decision to invest cognitive effort at that particular moment.

According to one influential account, this decision to engage in (or withhold) cognitively effortful processing is governed by the inherent trade-off between the costs of exerting effort and the benefits (i.e., rewards) potentially conferred by effort exertion (Shenhav, Botvinick, & Cohen, 2013). On this view, previous work has found monetary incentives to improve task performance by offsetting the costs of cognitive resource allocation, reflecting the mobilization of effort (Capa, Bouquet, Dreher, & Dufour, 2013; Chiew & Braver, 2014; Hübner & Schlösser, 2010; Otto & Daw, 2019; Padmala & Pessoa, 2011; Sandra & Otto, 2018). Furthermore, it has been shown that people consistently avoid expending cognitive effort when rewards are held constant (Inzlicht, Schmeichel, & Macrae, 2014; Westbrook & Braver, 2015), and this avoidance appears more prevalent in individuals with limited cognitive capacity which, in turn are presumed to have higher effort costs (Kool, McGuire, Rosen, & Botvinick, 2010).

A key challenge in developing an account of the regulation of effortful behavior (or ‘metacontrol’) is the specification of a trial-by-trial measurement of an individual’s momentary cognitive effort outlay—that is, quantifying the amount of effort an individual exerts—in accordance with costs and benefits. One potentially promising online effort measure is pupil diameter. Indeed, a considerable body of psychophysiological work suggests that task-evoked pupillary responses (TEPRs) might serve as a viable index of cognitive effort exertion (Beatty, 1982), finding that across a diverse range of task domains, increasing the effort required to produce a correct response evokes larger TEPRs (van der Wel & van Steenbergen, 2018). Specifically, TEPRs appear to track increases in working memory load (Heitz, Schrock, Payne, & Engle, 2008; Hopstaken, Van Der Linden, Bakker, & Kompier, 2015; Kahneman & Beatty, 1966), response inhibition requirements (Laeng, Ørbo, Holmlund, & Miozzo, 2011; Rondeel, Van Steenbergen, Holland, & van Knippenberg, 2015; van Steenbergen & Band, 2013), changes in task sets (Rondeel et al., 2015), syntactic complexity of written sentences (Just & Carpenter, 1993), and the difficulty of arithmetic (Ahern & Beatty, 1979; Steinhauer, Siegle, Condray, & Pless, 2004) and geometric analogy problems (Van Der Meer et al., 2010).

However, it is unclear if pupil diameter actually indexes effort exertion or merely reflects task demands as both constructs are, by nature, tightly intertwined in many cognitive tasks (van der Wel & van Steenbergen, 2018). Put another way, when the level of task demand increases, successful performance often requires more effort on the part of participants to meet this increased demand. To demonstrate pupil diameter might serve as a viable measure of cognitive effort outlay, separate from task demand, the present study seeks to examine whether changes in TEPRs indeed reflect levels of effort investment—both varying intrinsically as a function of individual differences, and extrinsically, evoked by changes in reward incentives—while holding task demands constant.

Indeed, disambiguating the effort and demand accounts of TEPRs is important because this body of extant pupillometry work, taken as a whole, finds inconsistent relationships between individual differences in cognitive task performance and TEPRs (van der Wel & van Steenbergen, 2018). For example, lending support to an effort account of TEPRs, heightened TEPRs were found to be associated with improved N-back performance (Rondeel et al., 2015), and fewer errors on mental arithmetic problems (Ahern & Beatty, 1979). Other work has found that within-individual increases in TEPRs track improvements in performance on flanker-type tasks (Diede & Bugg, 2017). Interpreting these results within a cost-benefit framework, individuals with larger effort costs presumably invest less effort than individuals with smaller effort costs (Kool & Botvinick, 2018), and taking task performance as a proxy for effort investment, differences in effort investment would explain the finding that better performance in these tasks is associated with larger TEPRs. In support of this effort account, previous work has also demonstrated that individuals high in fluid intelligence (i.e., with low effort costs) exhibit better performance (i.e., more effort investment) and higher TEPRs on difficult geometric analogy problems (Van Der Meer et al., 2010).

At the same time, consistent with a demand view, larger TEPR differences between trial types in a Stroop task (i.e., congruent vs. incongruent trials), were found to correlate with larger Stroop RT interference costs (i.e., worse performance; Laeng et al., 2011; Rondeel et al., 2015). This particular relationship between task performance and TEPRs might suggest that pupillary responses reflect the current level of task demand (i.e., the costs of cognitive control) rather than the actual effort exerted, as those with the worst performance also had the largest dilations. Further buttressing this view, a recent study observed a dissociation between physiological and performance measures, such that TEPRs reflect task conflict levels in a Stroop task (congruent vs. neutral trials) in the absence of task conflict effects on performance (Hershman & Henik, 2019). That is, the observation that increases in task conflict level can drive increased TEPRs without a change in performance lends support to the demand hypothesis, as this account predicts that TEPRs should only differentiate to demand levels, but not to invested effort. However, taking a cost-benefit view of effort investment, interindividual differences in task performance could reflect variation in abilities (i.e., effort costs) and/or motivation (i.e., reward incentives). This might explain the variability in the reported relationships between task performance and physiology across these studies.

Furthermore, while there is suggestive evidence that pupillary responses might index individual differences in effort outlay, it also remains unclear if TEPRs also track within-individual reward-induced task performance improvements as a result of a decision to expend increased effort to obtain rewards. Indeed, examination of intraindividual differences are thought to be key in developing an understanding of TEPRs, as they can potentially circumvent issues associated with interindividual comparisons (see van der Wel & van Steenbergen, 2018, for extended discussion). Taking a cost-benefit view of effort expenditure, here we seek to disentangle the effort and demand accounts of pupil diameter by (1) modulating available rewards and (2) leveraging the inherent variability in individuals’ in both cognitive capacity and intrinsic motivation to expend effort.

In line with the cost-benefit view of effort, a large body of work demonstrates how reward incentives mobilize cognitive effort (Botvinick & Braver, 2015). As a consequence, task performance increases when large monetary rewards hinge on the successful deployment of cognitive control, compared with smaller reward incentives (Aarts et al., 2014; Bijleveld, Custers, & Aarts, 2009) or the absence of reward incentives altogether (Hübner & Schlösser, 2010; Locke & Braver, 2008; Padmala & Pessoa, 2011). For example, in task-switching paradigms—where task switch costs are thought to reflect, in part, reconfiguration costs necessary for shifting between task sets (Monsell, 2003)—larger performance-contingent monetary rewards engender task switch costs reductions, which are interpreted as a marker of increased effort investment (Capa et al., 2013; Fröber & Dreisbach, 2016; Kleinsorge & Rinkenauer, 2012; Otto & Vassena, 2020). If pupil diameter is thought to reflect effort investment, we would also expect that reward-induced changes in task performance should also be reflected in TEPRs. Indeed, previous pupillometry work finds that reward incentive levels increase TEPRs on difficult trials in a working memory task (Bijleveld et al., 2009). Similarly, other work has also found reward induced increases in both transient (i.e., trial-by-trial) and sustained pupil diameter, suggesting a distinct role for using pupil diameter to track changes in motivational state (Chiew & Braver, 2013, 2014). However, while these studies find that reward manipulations effectively modulate TEPRs, they did not examine whether these TEPR modulations are related to reward-induced task performance, which would lend strong support to an effort view of TEPRs (van der Wel & van Steenbergen, 2018). Thus, manipulating reward incentives offers an opportunity to study the intraindividual modulations of effort exertion (i.e., reward induced changes in task performance) and its subsequent effects on pupil diameter, while holding task demands (i.e., difficulty) constant.

Finally, individual differences in cognitive capacity (i.e., effort costs) might bear upon the relationship between TEPRs and behavioral markers of effort exertion, as effort avoidance is observed to be more prevalent in individuals with limited cognitive ability (Kool et al., 2010), and more recent work finds that individuals with lower executive function (EF) capacity benefit the most from monetary incentives during task-switching (Sandra & Otto, 2018). Thus, we also assessed how differences in more general EF capacity, as measured by Stroop interference costs—which are thought to tap into EF abilities (Kane & Engle, 2003)—moderate the relationship between rewards and effort allocation. While the Stroop task and task-switching rely, in part, on shared EF capacities (Miyake et al., 2000), they also impose unique requirements, respectively, on inhibition and set-shifting processes. Our use of qualitatively different EF-dependent tasks to separately assess individual differences underscores the generalizability of the relationship between effort costs and effort expenditure, as evidenced behaviorally and in TEPRs, while at the same time mitigating circularity issues potentially arising from the use of a task-switch-based measure to understand the relationship between task switch costs and TEPRs. Beyond cognitive capacity, other work has highlighted the variability in people’s aversion to exerting effort, suggesting that some individuals value effortful thought more than others (Inzlicht et al., 2018), over and above differences cognitive ability. Indeed, differences in intrinsic effort valuation predict the amount of money a person is willing to accept to exert effort (Westbrook, Kester, & Braver, 2013) and the extent of reward-induced improvements on task performance (Sandra & Otto, 2018). Therefore, we further assess how interindividual differences in effort avoidance, operationalized by the Need for Cognition scale (NFC; Cacioppo, Petty, & Feng Kao, 1984), predict reward-induced effort recruitment, both behaviorally and physiologically.

Finally, we examine how tonic (i.e., nonstimulus-evoked) changes in pupil diameter relate to task engagement and arousal. Previously, tonic pupil dilations have been shown to reflect control state changes (i.e., task engagement; Gilzenrat, Nieuwenhuis, Jepma, & Cohen, 2010), reward-induced changes in arousal (Chiew & Braver, 2013, 2014), and individual differences in cognitive ability (Heitz et al., 2008; Van Der Meer et al., 2010). Accordingly, we also examine the extent to which reward incentives increase tonic pupil diameter, and whether those high in EF capacity (as indexed by Stroop interference costs) also have larger tonic pupil diameter, as was previously reported for those high in fluid intelligence (Van Der Meer et al., 2010) and working memory (Heitz et al., 2008). Finally, we examine whether individual differences in motivation to deploy effort (indexed by NFC) relate to tonic pupil diameter, and whether the effect of reward on tonic pupil diameter depends on these individual differences.

Method

Overall experimental procedure

We first assessed individual differences in motivation to exert effort (NFC) and EF abilities (Stroop interference costs). Following the individual difference assessments, participants were asked to complete a ‘baseline’ taskswitching paradigm in the absence reward incentives, before completing the same paradigm under two different levels of reward incentives, termed low-reward and high-reward blocks. We recorded pupil diameter during all task-switching blocks.

Participants

Eighty English-speaking participants (55 females; mean age = 22.08 years, SD = 3.03 years) were recruited from the McGill University community for a base remuneration of $20 CAN and a performance-contingent cash bonus of up to $13.20. All participants had corrected- to-normal vision, and had no reported color blindness or diagnosis of psychiatric or neurological conditions. Prior to the experiment, participants provided informed consent in accordance with the McGill University Research Ethics Board.

We excluded the data of five participants missing more than 20 trials, one participant who failed to perform the task with at least 80% accuracy overall, and four participants for which no reliable pupil dilation data could be collected due to technical issues with the eye tracker. For those analyses involving the NFC, we excluded two additional participants who were missing NFC questionnaire responses. Finally, for those analysis involving the Stroop task, we excluded an additional six participants for having less than 80% overall Stroop task accuracy.

Materials and procedure

Prior to completing computerized tasks, participants first completed the Need for Cognition (NFC) questionnaire to assess individual differences in their tendency to engage in effortful thinking (Cacioppo et al., 1984). The questionnaire involves rating 18 statements—such as “I find satisfaction in deliberating for hard and long for hours” and “I only think as hard as I have to”—rated on a scale of how characteristic they are of the participant (1 = extremely uncharacteristic to 5 = extremely characteristic). Participants also completed the behavioral approach/inhibition scales (BIS/BAS; Carver & White, 1994), which was not examined in the present analysis.

In the computerized task portion of the experiment, participants were seated comfortably in front of a 24-inch monitor set to a resolution of 1,280 × 1,024 pixels in a dimly lit room. Participants were instructed to keep their heads still and rested on a mount positioned 60 centimeters away from the screen. During both the Stroop and switching paradigm tasks, participants’ right pupil diameter was measured using an EyeLink 1000 eye tracker (SR Research, Osgoode, ON) set to a sampling rate of 250 Hz. Stimuli were presented using PsychoPy (Version 1.85.3), synchronized with the eye tracker. Prior to each task block, participants underwent a standard 9-point calibration procedure.

Stroop interference task

Participants completed a computerized version of the Stroop interference paradigm (Kerns et al., 2004; Otto, Skatova, Madlon-Kay, & Daw, 2015) to measure individual differences in executive function. Participants were instructed to identify, as quickly and accurately as possible, the color (i.e., green, red or blue) of the font a word was presented in (i.e., ‘GREEN’, ‘RED’, or ‘BLUE’), corresponding to three keys (‘j’, ‘k’, or ‘l’), with stimulus–response mapping counterbalanced between participants. Participants completed a total of 120 trials, on 90 of which the word and font color matched (congruent trials), and on the remaining 30 trials they did not match (incongruent trials), presented in a pseudorandomized order. Before starting the task, participants were given 10 practice trials to get accustomed to the timing and response procedure and received trial-by-trial feedback (600 ms) as to whether their response was correct or incorrect. During the task, participants were shown a fixation cross in yellow for 1.5 seconds before being shown the target and given 1.5 seconds to respond without feedback.

Task-switching paradigm

After completing the Stroop task, participants completed a number magnitude-parity task-switching paradigm (Kool et al., 2010). In this task, participants were presented a single digit (9, 8, 7, 6, 4, 3, 2, or 1) and were asked to judge either the magnitude (larger or smaller than 5) or the parity (even or odd) of the number, depending on the position of a bar above or below the digit, with position-task mappings counterbalanced between participants. The task (i.e., magnitude or parity) cue was chosen to reduce luminance-driven changes in pupillometric responses. Importantly, for approximately half of the trials, participants repeated the same task from the previous trial and on the other half of trials switched to the other task. The order of presentation of repeat and switch trials was pseudorandomized. Additionally, participants were presented with 10 practice trials with accuracy feedback to adjust to the timing and response procedure.

On each trial, a fixation cross was presented in yellow for 2 seconds before being presented with the target digit, and participants were given 2.5 seconds to respond, followed by the same accuracy feedback immediately after participants’ responses (Chiew & Braver, 2013; Heitz, et al., 2008; Hershman & Henik, 2019; Rondeel, et al. 2015). The task was broken up into 6 blocks, each consisting of 60 trials. For the first two blocks, participants did not receive any reward incentives for correct responses. For the subsequent four reward blocks (see Fig. 1), participants were informed that they would receive either 10 cents (i.e., high reward) or 1 cent (i.e., low reward) per correct response. The reward manipulation was further made apparent by a change to task feedback from baseline signaling the amount of money earned on the trial (i.e., “+10 cents” or “+1 cent”; “+0 cents” for correct and incorrect responses, respectively). The order of reward block presentation was also counterbalanced between participants, such that Blocks 3–4 were associated with high reward, and Blocks 5–6 were associated with low reward, or vice versa. Experimental blocks were separated by a participant-paced break to minimize fatigue (see Fig. 1).

Fig. 1
figure 1

a Schematic of the phases of the experiment participants experienced. b Illustration of the timeline of the task-switching paradigm where subtasks (i.e., magnitude or parity judgement) was cued by a bar presented either above or below the digit (with task–cue pairings counterbalanced between participants)

Behavioral data analysis

We analyzed participants’ responses on both the Stroop and task-switching tasks using linear mixed-effects regressions, using the lme4 package (Version 1.1.14; Bates & Maechler, 2019) for the R programming language. For both tasks, we removed the first 10 trials of the experiment to mitigate the influence of task novelty and/or early learning trials upon TEPRs, as well as trials in which participants failed to respond within the response window (2% of total trials for the Stroop, and <1% for task-switching). Both Stroop interference effects and task switch costs were calculated using RTs for correct trials only, which were log-transformed to remove skew (Ratcliff, 1993). We also removed unexpectedly fast or slow trials which were greater than or less than three standard deviations from the participant mean (Jiang, Beck, Heller, & Egner, 2015; Laeng et al., 2011; Padmala & Pessoa, 2011; Qiao, Zhang, Chen, & Egner, 2017), resulting in the removal of less than 1% of trials (for each trial type). Each individual participant’s Stroop interference effect was calculated as the estimated per-subject regression coefficient representing the effect of trial incongruence. For all RT regressions, we included a linear predictor of trial block to account for practice effects, and categorical nuisance variables accounting for the previous trial type (i.e., incongruent/congruent or switch/repeat), previous errors, and key repetitions with respect to the previous trial. Finally, in the task-switching regressions, we included a response congruence predictor, specifying whether the correct response for a given stimulus mapped to the same or different keys for both tasks.

Pupillary data analysis

Pupillary data were preprocessed in MATLAB (Version 2017b) before calculating a trial-by-trial task-evoked pupillary response (TEPR). First, eye blinks were detected and corrected using linear interpolation, and pupil diameter measurements were passed through a high-pass Butterworth filter to remove slow drift in below 0.012 Hz, following Knapen et al., 2016). After this preprocessing step, pupil diameter was first z-scored within block to make pupil units comparable between blocks (de Gee, Knapen, & Donner, 2014; Nassar et al., 2012; Urai, Braun, & Donner, 2017), and then baseline-corrected on a trial-by-trial basis by subtracting the mean diameter of a 200-ms baseline period prior to stimulus presentation, following previous work (Eckstein, Starr, & Bunge, 2019; Hershman & Henik, 2019). TEPRs were subsequently calculated as the maximum pupil diameter (Gilzenrat et al., 2010) observed between 1,000 ms and 3,000 ms after stimulus onset—a time windowpreviously shown to contain the pupillary response of interest in similar tasks (Laeng et al., 2011; Rondeel et al., 2015; see Fig. 3). Critically, the calculation of TEPRs did not depend on participants’ response latency, as switch and repeat trials typically engender different RTs. Trial-by-trial tonic pupil diameter was calculated as the average unfiltered pupil diameter during the 200-ms baseline period before stimulus onset, following Chiew and Braver (2013). We also used mixed-effects regressions to examine task-switching effects upon TEPRs, predicting trial-by-trial TEPRs, as computed above, on the basis of trial type (i.e., repeat or switch) and subtask (i.e., magnitude vs. parity).

To examine how TEPRs relate to individual differences in task performance in baseline blocks, we calculated switch costs (switch-trial RTs minus repeat-trial RTs) for each 30-trial ‘mini block’ (yielding two switch costs per experimental block). We then estimated a mixed-effects regression predicting these RT switch costs estimates as a function of mean mini-block TEPRs on switch trials, mini-block number (to account for learning effects), as well as each participant’s Stroop interference cost and NFC scores (z-scored across participants), with random effects taken over intercepts and mini-block. To test whether individual differences in reward-induced TEPR changes tracked switch cost reductions, we computed reward-induced RT switch cost changes by subtracting baseline switch costs from the mean RT switch cost across both reward blocks and the analogous per-participant change in switch trial TEPRs (reward − baseline). We then estimated a linear regression predicting change in RT switch costs on the basis of reward-induced TEPRs changes, Stroop costs, NFC scores (z scored across participants), and reward presentation order.

Finally, we analyzed the effect of both reward and interindividual differences on tonic pupil diameter with linear mixed-effects regression conducted upon the average raw (i.e., unfiltered and unstandardized within block) pupil diameter during the 200-ms baseline period. Specifically, this model predicted tonic pupil diameter as a function of reward (coded linearly: 0 = baseline, 1 = low reward, 2 = high reward) and its interaction with TEPRs, NFC scores, and Stroop costs (both z-scored across participants). As with the previous regressions, we included a linear regressor of trial number (from 1 to 360) to account for fatigue-induced task disengagement.

Results

Task performance

As typically observed in the Stroop task, participants were slower (β = 0.2578, SE = 0.0115, p < .0001; see Fig. 2b and Supplemental Table S1) and less accurate (β = −1.7928, SE = 0.1803, p < .0001; see Supplemental Table S2) to respond to incongruent trials compared with congruent trials. From these RTs, we calculated Stroop interference costs as the estimate of the per-participant incongruent effect, yielding our individual difference measure of executive function (EF). Analyzing performance on the baseline task-switching paradigm (without rewards), we observe the typical task switch costs (Monsell, 2003): participants were both slower (β = 0.1431, SE = 0.0087, p < .001; see Table 2) and less accurate (β = −0.4508, SE = 0.0937, p < .001; see Supplemental Table S5) on task switches compared with task repetitions (see Table 1).

Fig. 2
figure 2

a Average switch costs—calculated as the difference between median RT on switch and repeat trial—for the baseline, low-reward, and high-reward blocks. b Mean RTs for correct trials in the Stroop interference task, for congruent and incongruent trials. Error bars represent bootstrapped 95% confidence intervals. Individual dots represent participant-level data

Table 1 Average median RTs, TEPRs, and accuracy for congruent versus incongruent trials in the Stroop task and in repeat and switch trials across the three blocks of the task-switching paradigm
Table 2 Mixed-effects regression coefficients indicating the influence of trial type (task switch versus task repeat), reward level (reward vs. baseline), and the interaction between reward and trial type on RTs in the task-switching paradigm

We also examined task-switching performance across high-reward and low-reward conditions (see Fig. 2), but, mirroring past findings (Sandra & Otto, 2018), we did not observe a significant main effect of reward upon switch costs (Task Switch × Reward interaction; β = −0.0087, SE = 0.0083, p = .29; see Supplemental Table S3), suggesting either a weak effect of reward, large heterogeneity in individual response to reward, or both. Similarly, we did not observe a significant reward effect on switch costs expressed in terms of accuracy (Switch × Reward interaction; β = −0.1277, SE = 0.1503, p = .39; see Supplemental Table S4), but found a main effect of reward on accuracy such that participants were more accurate overall on high-reward versus low-reward trials (β = 0.4613, SE = 0.1191, p < .001, see Supplemental Table S4). Collapsing across reward levels, we found that reward reduced both individuals’ RTs (β = −0.0252, SE = 0.0107, p = .01; see Table 2) and switch costs expressed in RT when compared with baseline blocks (β = −0.0237, SE = 0.0077, p = .002; see Table 2), as well as a main effect of reward on accuracy, such that rewarded responses were more accurate (β = 0.2863, SE = 0.0937, p = .03; see Supplemental Table S5), but failed to find this effect on switch costs expressed in terms of accuracy (i.e., Switch × Reward interaction; β = 0.0558, SE = 0.1194, p = .64; see Supplemental Table S5).

Task-evoked pupillary responses (TEPRs)

Examining task-evoked pupillary responses (TEPRs) on correct trials, we observed a significant difference between switch trials in the baseline block in comparison with repeat trials (see Fig. 3a), whereby task switches engendered larger TEPRs than repetitions (β = 0.0881, SE = 0.0163, p < .001; see Table 3), thus supporting the demand account. As seen in Fig. 2a, TEPRs peaked in a window ranging from 1 to 2 seconds post stimulus onset, where these switch-versus-repeat differences were observed. Comparing reward conditions with baseline, we found no main effect of reward on TEPRs (β = −0.0389, SE = 0.0333, p = .24; see Table 3), nor an interaction between reward condition and trial type (Reward × Switch; β = 0.0173, SE = 0.0187, p = .35; see Table 3 and Fig. 3b), suggesting that reward did not increase TEPRs on average.

Fig. 3
figure 3

a Time series depicting the stimulus-onset (dashed line) locked average pupil diameter over the course of trials. Median response times for each trial type are depicted as solid vertical lines. The shaded area shows the time period used to calculate TEPRs. b Bar graph depicting the average TEPR by trial type (switch vs. repeat) and reward condition. Error bars represent bootstrapped 95% confidence intervals. Individual dots represent participant-level data

Table 3 Mixed-effects regression coefficients indicating the influence of trial type (task switch versus task repeat), reward level (reward vs. baseline), and the interaction between reward and trial type on TEPRs in the task-switching paradigm

Relationship between TEPRs and task switch costs at baseline

To arbitrate between candidate effort and demand accounts of pupillary responses, we first sought to test whether larger TEPRs during task-switching would predict greater effort exertion in task-switching, as measured by task switch costs—a result uniquely predicted by the effort account. As seen in Fig. 4, we found a significant effect of switch trial TEPRs upon RT switch costs during the baseline blocks, indicating that larger pupil dilations on switch trials predicted smaller switch costs at baseline (β = −25.6165, SE = 11.8921, p = .03; see Table 4).

Fig. 4
figure 4

Scatter plot depicting the relationship between TEPRs on switch trials (horizontal axis) and switch costs during the baseline block (vertical axis)

Table 4 Mixed-effects regression coefficients indicating the influence of TEPRs, NFC scores, Stroop costs, and their interactions on mini block switch costs in the baseline block of the task-switching paradigm

We further probed whether individual differences in Stroop RT costs and NFC might bear upon the observed relationship between TEPRs and task switch costs at baseline. As seen in Fig. 5, both Stroop RT costs and NFC scores appeared to modulate the strength of the relationship between switch costs and TEPRs. Statistically, we found a significant interaction between TEPR and Stroop RT costs (β = −28.6091, SE = 11.3658, p = .012) while the interaction between TEPRs and NFC only reached marginal significance (β = 26.37, SE = 13.8212, p = .056; see Table 4). In other words, while TEPRs significantly predicted individual task switch costs at baseline across the entire sample, this relationship was stronger for individuals lower in EF ability as operationalized by Stroop RT costs, and marginally stronger for low-NFC individuals.

Fig. 5
figure 5

Scatter plots depicting the relationship between TEPRs on switch trials (horizontal axes) and switch costs during the baseline block (vertical axes) as a function of individual differences. The first row (a and b) is a median split of participants based on Need for Cognition (NFC) scores, and the second row (c and d) groups participants based on a median split performed upon Stroop RT costs

Importantly, EF ability and NFC were not able to predict task switch costs at baseline, as we found neither a significant main effect of Stroop RT costs (β = 8.6686, SE = 11.7237, p =.46; see Table 4) nor NFC scores (β = 9.3651, SE = 12.0031, p = .43; see Table 4). The absence of a relationship suggests that the moderating effect of individual difference measures on the relationship between TEPR and task performance is not driven by overall differences in performance. Furthermore, EF ability and NFC were not significantly correlated (r = −.02, p = .88), suggesting that these two measures tap into dissociable constructs. Finally, to control for the possibility that these individual differences in TEPR–switch-cost relationships were attributable to age differences (MacLachlan & Howland, 2002), we entered participant age as a covariate into the regression and found nearly identical results, suggesting that our results were not driven by differences in participant age (see Supplemental Table S6). Of note, covarying out the effect of age revealed a significant interaction between NFC scores and TEPRs on baseline switch costs: TEPRs are a better predictor of baseline switch costs for those low in NFC (β = 29.1472, SE = 14.0607, p = .03; see Supplemental Table S6).

Reward-induced changes in pupil diameter and task performance

To further probe the effort account, we sought to test whether individual differences in reward-induced switch cost reductions—interpreted as increased effort investment in accordance with incentives—could be predicted by pupil diameter changes. Since we did not observe significant changes in RT switch costs between the low-reward and high-reward conditions, we elected to compare TEPRs between rewarded blocks (collapsed across low- and high-reward blocks) and the baseline block. For each participant, we calculated (1) the difference in switch costs between rewarded and nonrewarded blocks and (2) the difference in switch-trial TEPRs between rewarded and nonrewarded blocks (or “delta TEPRs”). Plotting these scores against each other in Fig. 6, we see that majority of the switch cost difference scores are negative—indicating reward-induced switch cost reductions—and these differences are related to changes in switch-trial TEPRs. Statistically, we observed a significant predictive relationship between reward-induced changes in switch trial TEPRs and reward-induced changes in switch costs, as indicated by a main effect of delta TEPRs on delta switch costs (β = −31.0843, SE = 13.1172, p = .02; see Table 5). This result provides further support for the effort account as it suggests that intraindividual, reward-induced modulations of effort are tracked by TEPR changes, while, critically, task demand remained constant.

Fig. 6
figure 6

Scatter plot depicting the relationship between reward-induced changes in TEPRs on task switch trials, computed as the difference between rewarded and baseline blocks (horizontal axis) and reward-induced change in RT switch costs, computed as the difference between rewarded and baseline blocks (vertical axis)

Table 5 Linear regression coefficients indicating the influence of change in TEPRs, NFC scores, Stroop costs on reward induced changes in switch costs in the task-switching paradigm

Tonic pupil diameter

Finally, we sought to test whether reward-induced changes in arousal would manifest in tonic pupil diameter, operationalized here as the average raw pupil diameter during the baseline period of each trial. Indeed, as depicted in Fig. 7, we found that tonic pupil diameter increased linearly with reward incentive level (β = 251.1978, SE = 9.3582, p < .001; see Table 6), corroborating previous observations examining reward-induced tonic pupil diameter changes (Chiew & Braver, 2013; Heitz et al., 2008). As above, we also sought to test whether tonic pupil diameter, measured during baseline and reward blocks, could be predicted on the basis of individual differences in Stroop RT costs or NFC, as previous work has shown tonic pupil diameter bears some relationship with both working memory ability (Heitz et al., 2008) and fluid intelligence (Van Der Meer et al., 2010). We failed to find a significant relationship between Stroop RT costs and tonic pupil diameter on baseline blocks (β = 3.3240, SE = 74.9657, p = .95), while higher NFC had a marginally significant predictive effect upon Baseline tonic pupil diameter (β = −143.5890, SE = 74.9709, p = .05).

Fig. 7
figure 7

Average tonic pupil diameter during the three task switching blocks. Error bars represent bootstrapped 95% confidence intervals. Individual dots represent participant-level data

Table 6 Mixed-effects regression coefficients indicating the influence of reward (0 = baseline, 1 = low reward, 2 = high reward), TEPRs, NFC scores, Stroop costs, and their interactions on tonic pupil diameter

Examining the reward blocks separately (see Fig. 8), we found reward-induced increases in tonic pupil diameter to be strongest in low-NFC individuals (reward NFC interaction, β = −39.7326, SE = 8.3446, p < .001), but did not depend on executive functioning ability (reward Stroop RT cost interaction; β = −2.8445, SE = 6.6682, p = .670). Finally, we tested whether phasic pupillary responses (i.e., TEPRs) related to tonic pupil diameter at the level of individual participants, observing a significant negative relationship (β = −248.6662, SE = 8.8030, p < .001). In other words, phasic changes in pupil diameter appeared largest for individuals whose tonic pupil diameter size was smallest, mirroring previous findings examining this tonic–phasic relationship (Gilzenrat et al., 2010). Further, this tonic–phasic relationship was moderated by reward incentives, such that higher available reward led to a stronger relationship between phasic and tonic pupillary responses (TEPR × Reward interaction, β = −39.7326, SE = 6.6740, p < .001). Again, to ensure these observed relationships were not driven by differences in age (MacLachlan & Howland, 2002), we added participant age as a covariate to this regression and found similar results, suggesting that the observed interaction between individual differences in intrinsic motivation and reward incentives was not attributable to differences in age (see Supplemental Table S7).

Fig. 8
figure 8

a Average tonic pupil diameter by block type and grouping by median split Stroop costs. b Average tonic pupil diameter by block and grouping by median split Need for Cognition (NFC) scores. Error bars represent bootstrapped 95% confidence intervals. Individual dots represent participant-level data

Discussion

While a considerable body of results has pointed toward using task-evoked pupillary responses (TEPRs) as a potential index of cognitive effort (Laeng et al., 2011; Rondeel et al., 2015; Van Der Meer et al., 2010), other work suggests that pupil diameter reflects task demand level (Beatty, 1982; Hershman & Henik, 2019; Kahneman & Beatty, 1966). Here, we sought to arbitrate between the effort and demand accounts of pupil dilations, by measuring TEPRs while holding task demand constant and examining how individual differences in task switch costs—a behavioral maker of effort investment—relate to task-evoked pupillary responses both at baseline and in response to changes in reward incentives.

First, upon examining the interrelationship between individual differences in task performance and pupillary responses at baseline—in the absence of reward incentives—we found that larger TEPRs on switch trials predicted smaller task switch costs. In other words, holding task demand constant, larger pupillary responses predicted better task-switching performance across individuals. This result provides compelling support for the effort account and complements previous work that has similarly found improved task performance to be associated with larger phasic pupil diameter (Rondeel et al., 2015; Van Der Meer et al., 2010). At the same time, we found evidence in support of the demand account, as TEPRs were larger on more demanding task switch trials, mirroring previous findings that highlight the positive relationship between TEPRs and task demand (Katidioti, Borst, & Taatgen, 2014; Rondeel et al., 2015). Taken together, this patterns of results observed suggests that TEPRs can potentially provide unique information about an individual’s effort outlay, over and above task demand level.

Second, we observed that the relationship between TEPRs and task switch costs at baseline was strongest for those low in EF capacity (as measured by Stroop interference effects). In other words, individual effort costs—stemming from either cognitive processing limitations, intrinsic motivation to expend effort, or both (Inzlicht et al., 2018)—appeared to moderate the observed relationship between this putative physiological measure of effort (TEPR) and the behavioral consequences of effort (task switch costs), highlighting the usefulness of examining individual differences. Again, these results are difficult to explain with a pure task demand interpretation of TEPRs, as we did not find that either these trait measures could predict task switch costs at baseline (see Table 4). Instead, this pattern of results could suggest that the observed variability in task performance reflects heterogenous levels of effort investment across the entire sample—for example, for those with the lowest EF capacity, variability in task performance may arise from increased processing of task-relevant information, while for those high in ability, variability in task performance may be harder to account for. This observation dovetails with past work finding that individuals low in working memory capacity also had larger phasic pupillary responses while completing a demanding working memory task (Heitz et al., 2008). Similarly, with respect to intrinsic motivation to exert effort—as measured by the NFC scale—we found suggestive, but statistically marginal, evidence that individual differences in TEPRs for lower NFC individuals more strongly predicted task-switching performance.

Third, we tested whether these observed individual differences in effort exertion, in response to increasing performance-contingent rewards (i.e., reward vs. baseline) were related to reward-induced changes in TEPRs. In accordance with the notion of a cost-benefit trade-off guiding effort investment (Shenhav et al., 2017), we found that reward-induced decreases in task switch costs—interpreted as increased effort investment in accordance with incentives—were also predicted by individual differences in reward-driven TEPR modulations. This observation provides particularly compelling evidence for the effort account, as comparing TEPRs within participant addresses any concerns of potential confounds that may arise when comparing between individuals or groups (e.g., ambient lighting, age; van der Wel & van Steenbergen, 2018).

It is worth noting that while reward incentives have previously been shown to increase pupil diameter on demanding working memory and cognitive control tasks (Bijleveld et al., 2009; Chiew & Braver, 2013, 2014), the current study builds on these findings and demonstrates that reward-induced changes in pupil diameter relate to behavioral changes, further providing evidence that pupil diameter reflects increased effort investment. These findings extend our previous work, revealing how EF capacity and NFC differentially predict reward-induced cognitive effort modulations, measured behaviorally with task switch costs (Sandra & Otto, 2018). Here, we found that individual differences in presumed effort costs (i.e., EF capacity) also bear upon the strength of the relationship between behavioral and pupillary measures of effort exertion, and in doing so, compellingly suggest that TEPRs might, in principle, provide a window into cost-benefit effort computations that may not be observable with behavioral measures alone.

Our results are difficult to reconcile with a demand account of task-evoked pupillary responses, as they suggest that intraindividual modulations in performance can be tracked by pupillary responses. It is possible that the observed lack of an effect between high-reward and low-reward conditions can be attributable to the small difference in reward values (i.e., 1 cent vs. 10 cents per correct response) used here, or the use of a blocked design rather than employing trial-by-trial variation in rewards (cf. Fröber & Dreisbach, 2016; Kleinsorge & Rinkenauer, 2012; Shen & Chun, 2011). This is consistent with past work also finding equivocal evidence for the ability of reward incentives alone to reduce switch costs (e.g., Aarts et al., 2014). Here, as in our previous work, the increase in potential rewards in high-reward versus low-reward trials may not be sufficient to increase effort outlay alone, but it was large enough to elicit differences between individuals in reward-induced effort expenditure (Sandra & Otto, 2018). Relatedly, in the specific reward incentive manipulation used here, task-switching performance at baseline (i.e., without incentives) was measured prior to performance in rewarded blocks, following designs employed in previous investigations of motivated cognitive control (Chiew & Braver, 2013, 2014; Fröber & Dreisbach, 2016). While the fact that all participants performed the baseline block first could potentially contribute to a practice effect—after controlling for linear effects of trial number and block order in our regression models—we should note that we observed no significant effects of mini-block order upon switch costs (see Table 4), suggesting that performance did not merely improve as a result of practice over successive trial blocks, perhaps owing to the practice participants underwent prior to the baseline blocks. Similarly, we find that while participants’ RTs generally decreased over the course of the experiment, TEPRs remained stable (see Table 3) suggesting the observed reward-induced TEPR changes were not driven by practice effects. In terms of accuracy, while participants showed slight improvements over the course of the entire experiment (see Supplemental Table S5), these improvements were not found to be significant when comparing the reward blocks (see Supplemental Table S4). Future work investigating rewarded-guided effort allocation should employ designs that carefully control order effects to firmly rule out the possibility that apparent reward-induced changes in behavior and physiological responses arise from practice.

It is also worth noting that task switch costs are thought to reflect two constituent processes: a task set reconfiguration cost accompanying task switches, which can be reduced by increasing preparatory or proactive control, and a residual switch cost, thought to arise from reactive control processes stemming from task set interference (Kiesel et al., 2010). While the task-switching paradigm employed in the present experiment was not designed to disentangle the specific form of effortful control—proactive versus reactive—presumably reflected by TEPRs, we speculate that effort-linked TEPRs observed here might uniquely reflect a proactive component, on the basis of a body of previous work linking TEPRs to proactive control adjustments in continuous performance tasks (Chiew & Braver, 2013, 2014). Of course, future work leveraging more specialized task-switching paradigms that can adjudicate between reconfiguration and residual switch costs is necessary to resolve which specific form(s) of effortful control—proactive and/or reactive—TEPRs index. Relatedly, while the present study did not employ a task precue, providing task cues in advance of the stimulus permits individuals to engage in advance preparation for task switches, which as the result of reducing task switch costs (Kiesel et al., 2010; Monsell & Monzin, 2006). Accordingly, while the present experimental design is unable to conclusively link TEPRs to (effortful) preparatory processes that occur in advance of stimuli but rather speak to effort investment at the time of stimulus presentation, future research should probe (1) the relationships between switch costs and TEPRs under baseline and reward conditions in a task-switching paradigm employing precues, and (2) how parametrically manipulating the cue-stimulus interval might alter these observed relationships between switch costs and TEPRs.

Finally, we also sought to test whether changes in arousal state or task engagement would manifest in tonic pupil diameter (Unsworth & Robison, 2018). We hypothesized that increasing reward would result in upregulation of arousal, resulting in larger tonic pupil diameter, following previous findings (Chiew & Braver, 2013, 2014; Hopstaken et al., 2015). Confirming our hypotheses, we found that reward incentives increased tonic pupil diameter, suggesting that this measure correlates to one’s overall state of arousal and is perhaps indicative of the use of more proactive (i.e., sustained) rather than reactive (i.e., transient) control processes (Braver, 2012; Chiew & Braver, 2013) in task-switching.

Given these results indicating that tonic pupil diameter could index one’s attentional state, we also sought to test whether individual differences in executive functioning and intrinsic motivation for exerting effort were reflected in tonic pupil diameter. While we did not observe robust relationships between tonic pupil diameter and EF capacity, individual differences in intrinsic motivation (measured with the NFC scale) were found to modulate the effect of reward on tonic pupil size. Specifically, reward-induced changes in tonic pupil diameter were strongest for those low on intrinsic motivation to exert effort, suggesting that reward incentives offset their aversion to exert cognitive effort and led them to increase general task engagement vis-à-vis arousal. However, we did not observe a significant relationship between EF capacity and reward-induced increases in tonic pupil diameter.

Of note, while our phasic pupil diameter analyses found that individual differences in EF capacity and NFC were found to moderate the relationship between performance and TEPRs, we failed to find predictive effects of tonic pupil diameter upon performance. This pattern of results suggests that phasic and tonic measures might index separable psychological constructs, (i.e., momentary effort investment vs. a more sustained arousal state) as was previously suggested (Chiew & Braver, 2013, 2014; Unsworth & Robison, 2018). More generally, while our results speak to the importance of measuring both individual differences in EF and intrinsic motivation, it should be noted that our Stroop-based measure of EF is not domain general but rather is specific to the inhibition component of EF (see Miyake et al., 2000). Thus, future work should examine the extent to which the observed relationships between EF, task performance and reward responsiveness generalize to other components of EF (e.g., updating and set-shifting) or if they are specific to the facet of EF indexed by Stroop interference (i.e., inhibition).

Recently, it has been theorized that the relationship between limited working memory capacity and performance on executive control tasks is mediated by a dysregulation in the locus coeruleus-norepinephrine system, which in turn is thought to lead to greater default-mode network activity and lapses in attention (Unsworth & Robison, 2017). At the same time, pupil diameter has been previously linked to locus-coeruleus norepinephrine functioning (Joshi, Li, Kalwani, & Gold, 2016), which, in turn, is thought to be modulated in response to increasing arousal (e.g., via increasing task demands; Aston-Jones & Cohen, 2005). Thus, pupil diameter is thought to index momentary shifts in neuronal gain driven by modulations in norepinephrine functioning (Aston-Jones & Cohen, 2005; Nieuwenhuis, De Geus, & Aston-Jones, 2011) and has also been previously shown to decrease with off-task thoughts (i.e., mind-wandering, distraction, inattention; Unsworth & Robison, 2016). These phasic pupil-linked changes in norepinephrine-mediated attentional state also lend support to the effort account of pupil diameter, as it has been found that the trials in which participants report greatest task engagement are also trials with the largest TEPRs (Unsworth & Robison, 2016). Finally, we observed a negative relationship between tonic pupil diameter and phasic pupillary responses, which was further modulated by reward. These observations buttress the putative norepinephrine-dependent trade-off between control states (i.e., task engagement vs. disengagement; Gilzenrat et al., 2010), and suggest that perhaps monetary incentives alter task performance through locus coeruleus functioning.

Overall, our results weigh in favor of an effort account of TEPRs, suggesting that pupil diameter, under controlled circumstances, can serve as viable index of cognitive effort investment in cognitive control tasks, and, in turn, that pupil measurements can inform models of the regulation of effortful cognitive processing. Given the theorized neural basis for nonluminance mediated pupil diameter changes, our results further suggest potential neural correlates of metacontrol. As previously discussed, it is thought that changes in pupil diameter reflects locus-coeruleus norepinephrine mediated changes in arousal state. These changes in norepinephrine are thought to be driven by the anterior cingulate cortex (ACC; Aston-Jones & Cohen, 2005), which has been previously implicated in signaling the need for increased cognitive control (Botvinick, Cohen, & Carter, 2004; Braver, Barch, Gray, Molfese, & Snyder, 2001; Carter & Van Veen, 2007). Interestingly, it has also been shown that, to some degree, pupil dilations in nonhuman primates correlate with spontaneous ACC firing, and in some cases precedes pupil-linked modulations of locus coeruleus neuronal activity (Joshi et al., 2016). More recent theories of ACC function posit that the ACC allocates cognitive control by weighing the relative costs of exerting control and the benefits (i.e., rewards) potentially conferred by successfully completing one’s goal (Shenhav et al., 2013, 2016). Mirroring this view, our results indicate that offsetting the costs of control, by increasing reward incentives, not only improved task performance but was also tracked by increases in pupil diameter. Together, these results suggest that task performance reflects the momentary decisions to exert cognitive control based on the relative costs and benefits, which are reflected in modulations of phasic pupil diameter. Future work should directly investigate the interrelationship between TEPRs, ACC activity, and both interindividual and intraindividual variation in EF capacity, intrinsic motivation, and performance on cognitive control tasks.