Recent research has shown that lack of sleep impairs moral awareness (i.e., the awareness that there is a moral component to a given decision or action), a prerequisite for engaging in moral judgment (Barnes et al., 2015). The proposed mechanism heavily relies on the strength model of self-regulation, according to which the capacity for self-control is limited and rest is necessary to restore self-control (Baumeister et al., 1998; Muraven & Baumeister, 2000). Thus, a lack of sleep increases people’s difficulty in directing and maintaining their cognitive attention (also referred to as ego depletion; see Baumeister et al., 1998; Gailliot et al., 2007; Harrison & Horne, 2000), and in holding effective executive functioning (Nilsson et al., 2005). As a result, sleepy individuals may struggle more with cognitively costly moral, ethical decision-making (Barnes et al., 2011). Literature on the topic has confirmed this possibility. For instance, it was observed that experimentally partially depriving navy and army officers from sleeping impairs moral reasoning of those who initially displayed high levels of mature moral reasoning (Olsen et al., 2010). In parallel, cross-sectional explorations of natural sleep have shown that low levels of sleep and low perceived quality of sleep are associated with unethical behaviors at work and on an online trivia test. As theoretically predicted, a putative mediator is cognitive fatigue (Barnes et al., 2011). Consistent with it, lack of sleep is associated with low moral awareness the next day (Barnes et al., 2015).

To date, this area of research has almost overlooked the study of sleep quality/quantity and its relationship with moral utilitarianism. This may seem surprising, since the study of moral utilitarianism has gained tremendous popularity these last decades and its study has been instrumental in the establishment of models of moral reasoning (e.g., Greene et al., 2001). Moral utilitarianism is commonly explored via the flagship sacrificial dilemmas paradigm (for a review, see Christensen & Gomila, 2012), in which people are commonly asked to choose (or decide what is morally acceptable) between killing one person to save a greater number or doing nothing (e.g., the infamous Trolley problem; Foot, 1978, Thomson, 1985). These dilemmas are designed so that agreeing with killing the single person is consistent with utilitarian principles, according to which an action is morally acceptable as soon as it maximizes aggregate well-being. Refusing to kill the single person, by contrast, is consistent with deontological principles, according to which it is never morally acceptable to kill another person, regardless of the number of people that would be saved otherwise.

Of importance, although it is disputed, is that some literature suggests that reaching utilitarian-like responses requires people to engage in more deliberative thinking, and that people more able/willing to deliberate give more utilitarian-like responses. Individual differences in reasoning are associated with different moral judgments, with participants with greater cognitive reflection endorsing more the utilitarian options in sacrificial dilemmas (e.g., Byrd & Conway, 2019; Patil et al., 2020; Conway & Gawronski, 2013; Paxton et al., 2012; Wiech et al., 2013). Experimental manipulations have also shown that having participants under time constraints (e.g., Cummins & Cummins, 2012; Suter & Hertwig, 2011), or cognitive load (e.g., Białek & De Neys, 2017; Conway & Gawronski, 2013; Gawronski et al., 2017; Trémolière et al., 2012; see also Timmons & Byrne, 2019, for the effect of cognitive fatigue on moral utilitarianism), decreases the frequency of utilitarian judgments (or increases utilitarian responses latencies; e.g., Greene et al., 2008). The fact that deliberative reasoning is needed to reach utilitarian conclusions is disputed, however, and there is evidence for failed attempts to replicate some of the effects reported above (see Baron & Gürçay, 2017; Gürçay & Baron, 2017; Rosas & Aguilar-Pardo, 2019; Tinghög et al., 2016; Trémolière & Bonnefon, 2014, for inconsistent conclusions, or else Bago & De Neys, 2019; Białek & De Neys, 2016, 2017; De Neys & Białek, 2017, for evidence of utilitarian intuitions).

For our current purpose, we allow and explore the possibility that utilitarian judgments require “something more” than deontological ones cognitively speaking. Should the positive relationship between cognitive deliberation and moral utilitarianism be true, people prevented from deliberating too much or with diminished self-control, as it is the case with sleepy persons (e.g., Meldrum et al., 2015; Salfi et al., 2020), may experience more difficulty to give utilitarian-like responses.

Among the rare studies investigating the sleep/moral utilitarianism relationship, it was found that preventing participants from getting enough sleep increases response times when addressing personal dilemmas (i.e., emotionally evocative scenarios requiring directly harming someone to save a greater number of people) but not impersonal dilemmas (i.e., less emotionally evocative scenarios requiring deflecting an existing threat to save a greater number of people; Killgore et al., 2007). Yet, a very different result was obtained by Tempesta et al. (2011), who observed that preventing participants from getting enough sleep increases response times when addressing impersonal dilemmas but not personal dilemmas (none of the two studies showed a difference in the propensity for moral utilitarianism). Taken together, these experimental results are mixed, partly due to the heterogeneity in the protocols (e.g., the Killgore et al., 2007, study involved active-duty military personnel while the Tempesta et al. one involved students), and hardly conclusive as a result. Beyond this result, it is surprising that no observational cross-sectional study, to our knowledge, has explored how natural sleep fluctuation is associated with moral utilitarianism. We believe that observational evidence is an important first step before setting up well-designed—but more costly—experimental studies to investigate whether sleep is a causal factor to moral utilitarianism.

In the present research, we propose the first and thorough investigation of the association between natural sleep and moral utilitarianism. To this end, we make vary both the sleep measures (using existing as well as new measures of sleep quality and quantity) and utilitarianism measures (using classic moral dilemmas, ecological utilitarian rules, and scenarios featuring the hot topic about the morality of autonomous vehicles). We also make vary the population studied by including either USA, UK, or French participants.

General methods

Research strategy

We report on six cross-sectional online studies designed on Qualtrics software. Each study was preregistered on the OSF platform prior to collecting the data (https://osf.io/9tzyd/registrations; Details on preregistration, full material, data, scripts, and additional analyses are made publicly available from these links and the following GitHub repository: https://github.com/CorentinJGosling/PBR_supplementary_2021; https://corentinjgosling.github.io/PBR_supplementary_2021/).

Participants

Because our study is intrinsically exploratory—no previous study has targeted the association of natural fluctuations of sleep with our measures of moral utilitarianism—we were unable to rely on established, robust effect sizes to determine the appropriate sample size for our studies. We chose our studies to have more than 80% power to detect a Pearson’s r of about .30 at an alpha of .0125. This effect size was chosen because it is in the range of previously reported effect sizes from the association between natural fluctuations of sleep quantity and quality and unethical behaviors (e.g., cheating; Barnes et al., 2011) and it allows to detect medium effect sizes (i.e., explaining about 10% of the variance). An alpha of .0125 was selected to adjust for multiple testing due to the presence of four independent variables (sleep quality and quantity, measured at both the chronic and acute levels). In all but one study (Study 3), the planned sample size was roughly achieved.

To take part in our studies, participants were expected to be between 18 to 50 years old. Regarding exclusion criteria, we kept in the final analyses only participants whose chronic sleep quantity per night was less than 16 hours and strictly equal or greater than 2 hours. For acute sleep quantity, the only exclusion criterion was not to report a negative sleep quantity score. Considering the sleep efficiency ratio (chronic and acute), we included in the analyses only the participants whose proportion was included between 0 and 100. Moreover, attentional checks were included in Studies 1, 2, 6 (in which one item, presented as an attentional check, asked participants to choose a specific response option) and in Study 4 (in which a situation required participants to decide between having an autonomous car keeping on its original trajectory and killing three men as a result, and deviating the car from its original trajectory and killing no one). Participants who did not choose the required option (Studies 1, 2, 6), or who decided to let the car keep going on its original trajectory and kill the three men (Study 4) were excluded from main analyses (Table 1).

Table 1 Details about Studies 1–6 and demographic details for participants

These exclusion criteria led to the exclusion of 35 participants (i.e., 5.7%) out of the initial total of 617 participants (five participants were excluded due to their sleep efficiency ratio, three were excluded due to their atypical sleep quantity, and 27 because of a failure to the attention check). Note that a sensitivity analysis, including all participants regardless of these exclusion criteria, was performed. This analysis revealed very similar results compared with the primary analysis (see Supplemental Results S8.6).

Measures

The following questionnaires were included in the studies (for further details and full presentation of the material, see Supplementary Materials):

Sleep measures

  • Sleep Quality Scale (SQS): The SQS is a one-item questionnaire designed by Snyder et al. (2018), which assesses sleep quality (the full verbatim is available in Snyder et al., 2018).

  • Pittsburgh Sleep Quality Index (PSQI): The PSQI is a 19-item scale designed by Buysse et al. (1989), which captures seven dimensions (sleep duration; sleep disturbance; sleep latency; daytime dysfunction due to sleepiness; sleep efficiency; overall sleep quality; sleep medication use). The scores were recoded, and a composite sum score of sleep was computed, with higher scores indicating better chronic sleep quality (the full scale is available in Buysse et al., 1989).

  • Modified Pittsburgh Sleep Quality Index (mPSQI): We designed a modified version of the PSQI (see Buysse et al., 1989, for the original version). These modifications were performed to include some dimensions recommended by the National Sleep Foundation to assess sleep quality (e.g., number of awakenings superior to 5 minutes; wake after sleep onset; naps; Ohayon et al., 2017; see Supplementary Material for a display of the modifications). These modifications made it possible to compute and analyze seven dimensions at both the acute and chronic levels: sleep quantity, sleep latency, number of awakenings per night, wake after sleep onset, sleep efficiency, naps, and subjective sleep quality. Sum of the scores obtained on these seven dimensions enabled computing an overall sleep quality index. The subscale regarding the quantity of sleep was used as an indicator of sleep quantity.

Morality measures

  • Moral dilemmas: A total of 11 moral dilemmas were used, drawn from Bartels (2008; see also Bartels & Pizarro, 2011). Each dilemma features a fictitious situation in which the participant can decide to actively kill someone to save a greater number of people. A composite score of utilitarian responses was computed, with higher scores indicative of higher utilitarian propensity.

  • Utilitarian scale: We used the utilitarian morality scale designed by Baron et al. (2017). The measure consists of 13 sentences capturing ecological utilitarian rules (e.g., “When a moral rule leads to avoidable harms, we should break the rule”). People answer a 4-point Likert scale ranging from 1 (Never) to 4 (Always).

  • Morality of autonomous vehicles: We designed 15 specific scenarios on the moral machine data collection platform (see Awad et al., 2018). Each of the dilemmas features a fictitious situation in which the participants must decide how a running empty autonomous car should behave in a context where some individuals are bound to die (e.g., if a car keeps going straight on its lane, then it will kill three pedestrians; if it changes lane, then it will kill one pedestrian). A composite score of utilitarian responses was computed, with higher scores indicative of higher utilitarian propensity.

Additional variables

In addition to our main investigation, we also considered several possible confounding factors:

  • Cognitive Reflection Test (7-item CRT; Frederick, 2005; Thomson & Oppenheimer, 2016), which captures people’s ability to override an appealing but incorrect intuitive response.

  • Numeracy (3-item; Schwartz et al., 1997), which captures people’s ability to manipulate numerical information and which is used as a proxy of working memory capacity.

  • Actively Open-Minded Thinking scale (8-item AOT; Baron et al., 2015), which captures people’s willingness to accept new information that conflicts with their initial thoughts, or to change their initial opinion.

  • Short Dark Triad (SD3; Jones & Paulhus, 2014), which captures three aversive personality traits, that is narcissism, Machiavellianism, and psychopathy. Literature has consistently shown that people high in trait psychopathy (Pletti et al., 2017) or the three Dark Triad traits (Djeriouat & Trémolière, 2014) are more likely to sacrifice someone to save a greater number, as compared with people low in these traits.

Because we preregistered that the effect of these moderators would be analyzed if, and only if, the main effect of sleep on moral utilitarianism was significant in individual studies, which was never the case, we did not consider these confounding variables any further (a further detailed description of each of these additional measures is presented as Supplementary Material).

Statistical analyses

Each of our six studies included four predictors assessing sleep (sleep quantity and quality measured at the acute and chronic levels) and one to three outcomes assessing utilitarian inclinations (measured on standard sacrificial dilemmas, a scale assessing utilitarian reasoning, and/or dilemmas involving autonomous cars). In each study, the association between sleep and moral reasoning was systematically assessed using correlations. When repeated measures tasks were used (i.e., standard sacrificial dilemmas and autonomous cars dilemmas), the aggregation approach was performed to obtain one observation by participant (e.g., Koricheva et al., 2013).

Because our studies included a various number of outcomes, they produced different numbers of effect sizes to be reported. Studies 1, 2, and 3 each produced four effect sizes (one for each sleep indicator). Studies 5 and 6 each produced eight effect sizes (two for each sleep indicator). Study 3 produced 12 effect sizes (three for each sleep indicator). A two-stage random-effects multivariate meta-analysis was performed (using the ‘metafor’ package in R; Viechtbauer, 2010) to quantify the overall relationship between each of the four sleep indicators and utilitarian reasoning while taking the dependency between effect sizes produced by the same study into account. A random effects model was used because our materials differed depending on the study. A block-diagonal variance-covariance matrix, required to fit this model, was obtained for each study using the Steiger’s (1980) equations (via the ‘rmat’ function in R). A restricted maximum likelihood estimator was used. The use of an ‘unstructured’ variance structure was initially planned, but this specification prevented the model from converging. Therefore, a heteroscedastic compound symmetric structure was preferred (this structure assumed different variance components for each combination of sleep indicator and outcome but, unlike the ‘unstructured’ variance structure, it assumed a unique correlation coefficient between these values). The p values of the four pooled effect sizes were systematically reported with and without Bonferroni correction for multiple testing.

To assess the robustness of the findings obtained in this primary analysis, we conducted multiple sensitivity analyses, reassessing our hypothesis using different approaches. Three of them retested the hypothesis by using alternative statistical analyses, such as (i) refitting our primary model with a ‘compound symmetric’ variance structure (i.e., equivalent to a ‘three-level’ meta-analysis), (ii) fitting four separate meta-analyses (one for each sleep indicator) with an ‘unstructured’ variance structure, and (iii) fitting a one-stage meta-analysis using mixed models. Then, four sensitivity analyses retested our hypotheses using the primary model, but (iv) with the exclusion of participants identified as influential observations using Cook’s distance, (v) the exclusion of participants with an atypical sleep indicator (high leverage) identified using hat values, (vi) the exclusion of participants with a very short task duration, and (vii) the inclusion of all participants regardless of their answer to the exclusion criteria.

Results

Descriptive and inferential analyses of each individual study are available in Supplementary Materials. Briefly, our variables had sufficient variability to allow meaningful analyses. Of importance, inferential analyses of these individual studies (following straightforwardly the preregistered plan) revealed no significant association between each sleep indicator and moral reasoning. Individual studies were powered to detect medium effect sizes. Indeed, the targeted size of our samples—roughly achieved in five out of the six studies—was determined to detect correlations of about Pearson’s r = .30. As a result, one cannot confidently ensure that we did not miss low effect size associations in these individual studies.

A meta-analysis of the individual studies was thus performed to obtain greater statistical power (Jackson & Turner, 2017). Our meta-analysis, based on a two-stage random-effects multivariate model, included 40 effect sizes (10 per predictor), resulting from the observation of 582 participants nested in six independent studies. This analysis revealed that sleep quantity (at both chronic and acute levels) as well as chronic sleep quality were unrelated to moral reasoning (see Table 2). By contrast, the association between acute sleep quality and moral reasoning was statistically significant (but became only marginally significant when adjusting for multiple testing). Further, it is important to note that, in all but one sensitivity analyses, the p value of this significant association became insignificant when correcting for multiple comparisons.

Table 2 Results from the primary meta-analyses (N = 582)

As shown in Fig. 1 and more thoroughly detailed in Supplementary results, the global heterogeneity is low (Q-statistics = 24.200, p > .90). When decomposing heterogeneity for each sleep indicator and outcome, the highest heterogeneity is found for the association between chronic sleep quality and the scores to moral dilemmas (tau-squared = 0.015, I-squared = 76%). However, the size of the Pearson’s correlations found for this association in the individual studies remains systematically low, r = (−.19; .02).

Fig. 1
figure 1

Forest plot of the effect sizes (Pearson’s r) along with their 95% CI between each sleep indicator and moral utilitarianism (note that 95% confidence intervals are adjusted with a Bonferroni correction). Dot shapes represent the outcome used (circle = standard sacrificial dilemmas, triangle = moral scale and lozenge = autonomous vehicles dilemmas)

Critically, equivalence tests allowed us to conclude that, regardless of the statistical significance, the strength of the association between each sleep indicator and moral reasoning was systematically low (i.e., inferior to Pearson’s r = .20; Lakens, 2017). This weak association between sleep and moral reasoning was also observed in all the subsequent sensitivity analyses, confirming again the robustness of this finding.

Post hoc exploratory analyses (not preregistered)

A difficulty at the time of discussing null effects is to ensure that the data are not too noisy to observe any phenomenon. Although it was not preregistered, we conducted additional bivariate correlation analyses between our variables for each study in order to ensure that the null effects were not due to the collection of noisy data (see Supplementary Results S.10). Consistent patterns, predicted by the literature, were observed within (and between) the studies. Although we refrain from going into unplanned levels of details, we observed positive relationships between our different measures of sleep, a result also observed between our cognitive measures, or else our personality measures. These results, together with the low number of participants who did not pass the attentional checks (<6% across all studies), support the good quality of the data collected.

Discussion

Our study consisted in a thorough examination of the relationship between sleep natural fluctuations and moral utilitarianism. To this end, we conducted a series of six studies which varied in the measures used (both for sleep and moral utilitarianism) and the population studied. Measures of sleep quality and quantity, at both the acute (past night) and chronic levels (past month) were used. Overall, our results suggest small to no association between sleep and moral utilitarianism. Only the association of acute sleep quality with moral utilitarianism barely reached statistical significance. Critically, the strength of this association was very small (r = 0.09).

An important result regards the heterogeneity of the effect sizes. Overall, we found no evidence for heterogeneity across sleep indicators and outcomes. To illustrate this low heterogeneity, our 40 effect sizes regarding the association between each sleep indicator and moral outcome in the six studies ranged between [−0.186; 0.140]. In other words, they were limited to the low range. Therefore, we are confident that our small pooled effect sizes did not result from the combination of effect sizes in opposite directions.

Previous literature on this topic shows somewhat different results. Our results are consistent with some failed experimental attempts to observe an association between sleep deprivation and moral utilitarianism in sacrificial dilemmas (Killgore et al., 2007; Tempesta et al., 2011). Our preregistered, well-powered studies are the first observational ones to support this result. In comparisons with the results from Barnes and colleagues which emphasized an association between natural sleep fluctuations and unethical behaviors, our data revealed considerably smaller effect sizes. For example, the equivalence tests allowed us to conclude that we may reject correlations superior to 0.14 for the influence of chronic sleep quality while Barnes et al. (2011) reported correlations as high as 0.32. Similarly, we may reject correlations superior to 0.12 for the influence of acute sleep quantity while Barnes et al. (2011) reported correlations as high as 0.30.

A possibility, although speculative, is that this difference arises from the different moral judgment tasks assessed: we focused on moral utilitarianism while Barnes et al. (2011) focused on unethical behaviors. Unethical behaviors, as they are assessed in the studies reported, clearly have a “morally” correct response (not doing the action; e.g., not cheating; not abusing others) and a morally incorrect one (doing the action; e.g., cheating; abusing others). The study of moral utilitarianism is somewhat different since none of the responses (either the type of scale) are positively or negatively valenced per se, nor they echo a shared view of what is morally correct or incorrect (since a normally immoral action can be thought of as moral). That is, one can only speculate that not doing an action in the study of unethical behaviors is supported by the same psychological mechanisms (by means of self-control) as giving utilitarian judgments (for which the classic dual-process model points to the necessity to recruit cognitive resources). To allow a broader picture of the role of sleep on moral judgement, future studies could thus benefit from assessing several dimensions of moral judgements.

Besides, an important limitation of our findings lies in the observational design used. However, our thorough investigation, we believe, allows to limit some of the flaws of this design. We diminished selection bias, by using diversified populations: some of our participants were financially compensated when others were not, some were pooled from the general population when others were students, and our studies involved participants from different countries. Information bias was also handled, by making vary our measures: we explored different measures of moral utilitarianism (such as classical ones—i.e., sacrificial dilemmas—more ecological ones—i.e., utilitarian rules—and some exploratory—i.e., scenarios involving autonomous vehicles) and different measures of sleep. For instance, not only we used the standard version of the PSQI to assess sleep quality but we also designed a novel version of the scale, including some dimensions recommended by the National Sleep Foundation (among others, the novel version includes questions about nap, which is not assessed in the classic version of the scale). In the same way, not only sleep quantity was assessed by directly asking participants their sleeping time, but also by asking participants the hour falling asleep, the hour waking up, the time spent awake during the night and the nap time. Finally, we also planned to limit confounding bias by including additional factors assumed to be associated with moral utilitarianism (although the absence of main effects prevented us from exploring these factors deeper, in accordance with our preregistered plan).

Another limitation in our set of studies is that they were all conducted online, giving us less control over the participants’ activity at the time of addressing the surveys. On a positive note, however, is the low error rate at attentional checks (across studies <6%) and the high robustness of our findings throughout the multiple sensibility analyses we performed. Considering the online nature of the studies conducted, we also refrained from measuring response times, which require a greater control over the participants’ activity and are more suited for lab studies.

Finally, our data do not emphasize a robust association between sleep and moral utilitarianism. As previously discussed, this result is in line with some previous studies, but not with others. An observation is that the field of moral judgment—as other research domains in psychology—is filled with studies which sometimes report conflicting results. The examples are numerous, from the study of framing effects in moral judgment (see Gosling & Trémolière, 2021) to that of the role of reasoning in our moral judgment activities (see Baron et al., 2015). The important number of diverging results has led to the rise of replication concerns regarding the role of reasoning variables on moral judgment activities and has recently led some researchers to call for a revision of the classic dual-process theory of moral judgmentFootnote 1 (e.g., see Bago & De Neys, 2019; Białek & De Neys, 2016, 2017; or else Gürçay & Baron, 2017). Undoubtedly, research efforts are needed in the coming years to keep exploring the potential moderators explaining the disparity of results obtained to date. Regarding our investigation, the two main studies exploring the effect of sleep deprivation on moral utilitarianism have found opposite results (Killgore et al., 2007) observed that sleep deprivation increases response time only for personal dilemmas while Tempesta et al. (2011) found that sleep deprivation increases response time only for impersonal dilemmas). In this perspective, we participate in this attempt, trying to facilitate replication in the field, by offering a set of multiple studies varying in the measures and the population used, and by preregistering our methods and sharing our materials, data and scripts.

Finally, we see the present line of research important, and this importance is—at least—twofold. First, it provides further insights into our understanding of morality and its potential moderators. Second, and more importantly, moral judgments are regularly made in high-stake situations (e.g., during a trial). Not only these situations may be critical, but moral judgments are sometimes to be made in contexts which do not favor moral judgment activities, such as when sleep may be lacking (e.g., as it may be the case during military operations). Although our results are only cross-sectional, they suggest, on a positive side, that natural sleep is not dramatically related to moral utilitarianism, and is unlikely to greatly shape people’s judgments in the context of moral utilitarianism.

In conclusion, consistent with the small but precise effects obtained in the present set of studies, we can only recommend researchers to conduct new observational studies assessing the role of sleep on moral judgement before moving on to the use of much more time-consuming, costly randomized controlled trials. Indeed, these trials may come at a price for the participants, as it is undoubtedly the case with any experimental study involving sleep deprivation.