Human societies are heavily based on social interactions, which allow individuals to exchange information about both themselves and the environment around them. These communicative exchanges are performed through many different channels, such as spoken language and bodily signals. As concerns bodily signals, humans seem particularly sensitive to eye-gaze direction, likely because it generally provides a rapidly extracted and reliable index of others’ focus of attention over space (see Capozzi & Ristic, 2018; Emery, 2000; Frischen, Bayliss, & Tipper, 2007). A great bulk of evidence corroborates the crucial relevance of eye-gaze direction for humans. First, humans are the only primate species with a white sclera. The high chromatic contrast between this white area and the darker area of the iris would have evolved as it facilitates a fast evaluation of others’ eye-gaze direction (e.g., Kobayashi & Kohshima, 2001; but see Perea-García, Kret, Monteiro, & Hobaiter, 2019). Second, several studies confirmed that the most attended area during face-scanning tasks is the eye region (Yarbus, 1967, see also, e.g., Birmingham, Bischof, & Kingstone, 2008; Tatler, Wade, Kwan, Findlay, & Velichkovsky, 2010). Inferring the focus of attention of our conspecifics from their eye-gaze direction is an essential ability not only to navigate within social and natural contexts around us (e.g., Capozzi & Ristic, 2018) but also for neurocognitive development (e.g., Nummenmaa & Calder, 2009). Moreover, evidence is rapidly accumulating showing that eye-gaze stimuli can have a deep impact on different mechanisms of human cognition (e.g., Burra, Mares, & Senju, 2019; Conty, George, & Hietanen, 2016; Hamilton, 2016; Senju & Johnson, 2009). As for visual attention, a large body of experimental evidence indicated that eye-gaze stimuli lead to remarkable effects that can be classified into three distinct phenomena—namely, (a) attention holding, (b) attention capture, and (c) attention shifting (see Fig. 1a).

Fig. 1
figure 1

Eye-gaze stimuli and their impact on attentional mechanisms. a Examples of the tasks that can be employed to uncover attention holding, attention capture, and attention shifting (i.e., gaze-cueing of attention), in response to eye-gaze stimuli. Typically, after a fixation point, one or more faces (depending on task) are presented. Then, a target (here, a red “T”) appears, and participants are asked to provide a response. In the top panel, a central direct-gaze face is depicted; the target is placed on the right. In the middle panel, a peripheral direct-gaze face (upper stimulus) is depicted among other three peripheral averted-gaze faces; the target is placed on the direct-gaze face. In the lower panel, a central face with averted gaze is depicted; the target is placed on the right—that is, where the face is looking at (i.e., a spatially congruent trial). b Main results that are typically reported in the gaze-cueing task: Manual response latencies are significantly lower on spatially congruent trials, in which the target appears in the spatial location gazed at by the face, than on spatially incongruent trials, in which target location and eye-gaze direction do not match. All the presented facial stimuli are taken from the NimStim database (Tottenham et al., 2009)

Both attention holding and attention capture are typically reported in the presence of direct-gaze stimuli, that are powerful social signals generally associated with approaching behaviours (see Emery, 2000). In more detail, attention holding refers to the greater “difficulty” to disengage attention from direct-gaze faces as compared with both averted-gaze or closed-eye faces. This phenomenon has been reported for the first time by Senju and Hasegawa (2005; see also Hietanen, Myllyneva, Helminen, & Lyyra, 2016; Syrjämäki & Hietanen, 2018), who asked participants to manually respond to peripheral targets while they were fixating a central task-irrelevant face with or without direct gaze. Evidence for such attention holding effect has been then reported also with oculomotor measures (Dalmaso, Castelli, & Galfano, 2017a; Ueda, Takahashi, & Watanabe, 2014). Similarly, attention capture refers to the tendency shown by direct-gaze—as compared with averted-gaze—stimuli in grabbing the attentional focus when presented in the periphery—that is, while the participant is looking elsewhere (typically, at fixation). Several studies provided support for this phenomenon, employing both manual tasks (e.g., Böckler, van der Wel, & Welsh, 2014; von Grünau & Anston, 1995) and oculomotor tasks (e.g., Dalmaso, Castelli, Scatturin, & Galfano, 2017b; Mares, Smith, Johnson, & Senju, 2016). For instance, in Böckler et al. (2014), participants responded faster to a peripheral target appearing in the same location occupied by a face with direct rather than averted gaze, thus suggesting that direct gaze does indeed grab attention.

Finally, attention shifting refers to the tendency to shift attention towards the spatial location indicated by a task-irrelevant face with averted gaze presented at fixation, a phenomenon known as the “gaze-cueing effect” (GCE), which has been mainly investigated through manual response tasks relying on covert orienting. So far, the attempt to understand the role of social factors in modulating the effects of eye-gaze stimuli on human visual attention has been almost exclusively carried out in the domain of attention shifting. The main goal of the present review is to provide a summary of the modulations exerted by social factors on the GCE in a critical perspective. Indeed, the current literature on this topic is conspicuous and fairly scattered, and this points to the need of a systematization effort and the proposal of a general theoretical framework. In the next paragraph, we discuss the GCE in further detail by examining how the potential impact of social variables in the beginning was taken as a means to address the specific question concerning the extent to which the GCE could be considered as a strongly automatic effect.

The gaze cueing-effect

The GCE has been reported at the end of the 20th century by independent research groups (Driver et al., 1999; Friesen & Kingstone, 1998; Hietanen, 1999; Langton & Bruce, 1999) who employed different variants of the classic spatial cueing task (e.g., Posner, 1980). In a typical gaze-cueing task, after a central fixation point, the participant is generally presented with a direct-gaze face. Then, the same face is presented with gaze averted either rightwards or leftwards. After a variable temporal interval (stimulus-onset asynchrony [SOA]), usually less than 1 second, a target appears either rightwards or leftwards, and the participant is required to provide a manual response, such as a key press. On spatially congruent trials, the target appears in the spatial location indicated by the gaze cue. On spatially incongruent trials, the target appears in the opposite spatial location. The difference in performance between congruent and incongruent trials allows to estimate the magnitude of the GCE. Typically, smaller reaction times (RTs) are observed on congruent trials than on incongruent trials (see Fig. 1b), even when participants are informed that congruent and incongruent trials occur with the same frequency (i.e., the gaze cue is not informative about the target location). This result, along with the observation that the GCE can typically be observed also with short SOAs (e.g., less than 300 ms), has been taken as proof that gaze cues elicit reflexive attention shifts. Importantly, in the classic version of the gaze-cueing task, participants are generally asked to keep the eyes at fixation for the entire trial duration, an approach that allows the study of covert orienting. However, the impact of eye gaze has also been established with reference to overt orienting. Ricciardelli, Bricolo, Aglioti, and Chelazzi (2002) proposed an instructed saccade paradigm in which a task-irrelevant face with gaze averted either rightwards or leftwards is presented at fixation, and participants are asked to perform a saccade either rightwards or leftwards in response to a central direction cue. The SOA in this paradigm is computed as the time in between the onset of the task-irrelevant face with averted gaze and the onset of the central direction cue. Typically, smaller saccadic latencies and greater accuracy emerge when the gaze and the direction cue indicate the same spatial location as compared with the condition in which the two cues indicate different spatial locations (see also Kuhn & Benson, 2007). This suggests a spontaneous, overt gaze-following behavior, even when eye-gaze stimuli are task-irrelevant. Although overt and covert orienting are likely to call into play oculomotor control to a different extent, they largely rely on similar brain networks (e.g., Corbetta et al., 1998). Therefore, studies addressing both covert and overt orienting will be examined in the present review, under the assumption that they can tap at least partially overlapping attentional mechanisms.

After the introduction of the gaze-cueing task, a number of studies were conducted with the aim to define the peculiarities of this form of social attention by comparing gaze cues with arrow cues presented at fixation—namely, nonsocial stimuli that are known to elicit a reliable orienting of attention (e.g., Tipples, 2002). The comparison between gaze and arrow cues has been extensively explored to address whether the two types of cues can be considered qualitatively different (e.g., Bonmassar, Pavani, & van Zoest, 2019; Ciardo, Ricciardelli, & Iani, 2018; Friesen, Ristic, & Kingstone, 2004; Galfano et al., 2012; Guzzon, Brignani, Miniussi, & Marzi, 2010; Hayward & Ristic, 2015; Hermens & Walker, 2010; Kuhn & Kingstone, 2009; Marotta, Lupiañez, & Casagrande, 2012a; Marotta, Lupiañez, Martella, & Casagrande, 2012b; Marotta, Román-Caballero, & Lupiáñez, 2018b; Nummenmaa & Hietanen, 2006; Ristic, Friesen, & Kingstone, 2002; Ristic, Wright, & Kingstone, 2007; Zeligman & Zivotofsky, 2018; Zhao, Uono, Yoshimura, & Toichi, 2014). This line of research moved from the assumption that gaze and arrows might be associated with different forms of automaticity. In particular, whereas “automatic” processing of arrows might be the consequence of overlearning of the meaning of symbolic cues (e.g., Hommel, Pratt, Colzato, & Godijn, 2001), “automatic” processing of gaze stimuli would be more strongly hardwired (e.g., Farroni, Massaccesi, Pividori, & Johnson, 2004). In this regard, when gaze cues are embedded in schematic face stimuli, the behavioural effects of gaze cues on orienting are almost indistinguishable from those produced by arrows—at least in healthy individuals. This has also been confirmed by assessing the effects of different manipulations aimed at testing the so called intentionality criterion of automaticity in attention shifting (see Jonides, 1981), such as the use of counterpredictive cues (Tipples, 2008), or even informing participants in advance about the future target location (Galfano et al., 2012). More recently, a different avenue has been pursued to test the unconditional automaticity of the GCE, using more ecological stimuli. On the one hand, the idea that eye gaze is a special stimulus, and hence it is processed in a strongly automatic manner, stems from its social, unique relevance. On the other hand, this very same argument could also lead one to predict that some kind of selectivity in the processing of gaze cues emerges as a function of the actual relevance of considering such social cues. If this holds true, then gaze processing as indexed by the GCE would not meet the criterion for a strong, unconditional automaticity in that its occurrence would critically depend on a host of social factors.

In the next sections, we will specifically address the impact of social variables on the magnitude of the attention-shifting response mediated by eye gaze. The data we will summarize provide a very different picture with reference to the issue of the reflexivity of both covert and overt gaze-mediated orienting. Indeed, the studies addressing social variables have shown that we do not shift our attention in response to every averted gaze we encounter to the same extent.

The social side of the gaze-cueing effect

The nature of eye-gaze stimuli is twofold. On one hand, at a perceptual level, eye-gaze stimuli—just like arrows—can indicate a certain spatial location in the environment around us. Hence, it is not surprising that when schematic faces that are deprived of any social features are employed in a cueing task, the behavioural effects can be very similar to those observed in response to arrow stimuli (e.g., Galfano et al., 2012; Kuhn & Benson, 2007; Tipples, 2008). On the other hand, at a social level, eye-gaze stimuli—unlike arrows—are spatial cues provided by a conspecific who can be characterized by several social features and intentions. Moreover, in everyday life, we are constantly exposed to a variety of eye stimuli provided by many different individuals around us. Therefore, it seems reasonable to hypothesize the presence of modulatory processes that would allow for the regulation of this gaze-mediated orienting in accordance with the social variables characterizing both the individual depicted in the cueing face and—in a complementary fashion—the observer. In line with this view, in recent years several studies have reported that the GCE in healthy humans can indeed be shaped by many different social variables. Thus, whereas early studies manipulating social variables were mostly concerned with the aim of addressing the issue of automaticity of the GCE, more recent research had a more specific focus on the modulatory effects of different social variables per se. In the next paragraphs, we will critically review the current evidence concerning the modulations exerted by social factors on gaze-mediated orienting of attention. Moving from a social perspective implies the consideration of different potential sources of “social meaning.” First, the characteristics of the observer need to be considered, in that different individuals might value and prioritize different social features. Second, the characteristics of the cueing faces need to be taken into account, in that they may change the overall social informativeness of the gaze cue. Third, a social analysis requires that the relationship between the two former factors is integrated, in that the social informativeness of a specific gaze cue is likely to vary as a function of the goals and characteristics of the observer. In line with this analysis, we will organize the presentation of the relevant research findings in healthy adults according to three major sections focusing on (a) characteristics of the observer, (b) characteristics of cueing faces, and (c) their relationship. This organization is based on the need to simplify a complex literature and to identify the likely basic components underlying social modulations and has a heuristic value that will allow us not only to more critically discuss the current state of the art but also to propose a framework with the aim to orient further developments in this research field. The main results of each study included in the present review are summarized in Table 1.

Table 1 Major social modulations on the gaze-cueing effect (GCE), typically assessed with manual responses and covert orienting paradigms, and gaze-following (GF) behaviour, typically assessed with oculomotor responses and overt orienting paradigms, in healthy adults

The inclusion of a specific study within one of the three aforementioned categories has been often done for pragmatic reasons, without neglecting that the very same study could also provide valuable information concerning the other two categories.Footnote 1 The major interest in the present review concerns covert orienting, which is mainly investigated in the GCE literature. However, when available, we will also add a discussion about oculomotor evidence. This further focus is important because it allows us to gain insights about the generalizability of the social modulations in different task settings. Throughout the text, we have also examined the issue of the temporal parameters (e.g., SOA) considered in the different studies with the goal of addressing the extent to which the reported social modulations can reflect the involvement of early rising, reflexive processes.

Characteristics of the observer

In this section, we focused on the studies addressing the potential role of the characteristics of the observer (i.e., gender, age, personality, and internal states). Because the rationale behind these studies is that individual differences may have an impact on the orienting of attention, the vast majority of them shared the logic of comparing social and nonsocial spatial cues with the aim of uncovering whether potential differences in attention shifting occurred irrespective of cue type or instead were observed only in response to gaze stimuli. The illustration of the studies will therefore follow this twofold approach.

Gender

It is well established that males and females can show a variety of differences in many cognitive domains (e.g., Halpern, 2013). As for social cognition, females have been shown to be more sensitive to social stimuli as compared with males (e.g., Geary, 2010). In this regard, Baron-Cohen (2002) has proposed the “extreme male brain” theory of autism, according to which male individuals in the normal population would tend to display more autistic-like traits as compared with female individuals. That means that social abilities in males would be reduced as compared with females, and this could be reflected also in the GCE. This idea has been tested for the first time by Bayliss, di Pellegrino, and Tipper (2005), who presented male and female participants with male and female cueing faces, while manipulating SOA (i.e., 100, 300, and 700 ms). The gender of the face did not lead to a significant result, whereas the GCE was overall stronger in female participants, irrespective of SOA, although the effect seemed to be slightly larger at the longest SOA. Interestingly, Bayliss et al. (2005) also reported a negative relationship between the magnitude of the GCE and the Autism-Spectrum Quotient (AQ; Baron-Cohen, Wheelwright, Skinner, Martin, & Clubley, 2001; see also Bayliss & Tipper, 2005; Hayward & Ristic, 2017). In subsequent experiments, Bayliss et al. (2005) observed a comparable spatial cueing effect between genders in response to the presentation of peripheral abrupt onset cues, while an increased spatial cueing effect emerged again in female participants when central arrow cues were employed (see also Merritt et al., 2007; Mitsuda, Otani, & Sugimoto, 2019). This latter result would suggest that males might be less sensitive to central cues in general. The main results for gaze cues observed by Bayliss et al. (2005) have been subsequently replicated by other independent research groups (Alwall, Johansson, & Hansen, 2010; Cooney, Brady, & Ryan, 2017; Feng et al., 2011; Hayward & Ristic, 2017; McCrackin & Itier, 2019), who showed an increased GCE in females as compared with males. Overall, these studies highlight the importance of considering gender in tasks aimed at investigating attention-orienting abilities. Whether or not social stimuli (e.g., eye gaze) lead to accentuated gender differences as compared with nonsocial cues (e.g., arrows) is still, to a large extent, an open question.

Age

Many aspects concerning social perception tend to decline with age, such as the ability to infer intentions and beliefs (e.g., Sullivan & Ruffman, 2004) and the ability to elaborate signals coming from others’ faces, like emotional expressions (e.g., Ruffman, Henry, Livingstone, & Phillips, 2008). Hence, it is not surprising that this decline has also been documented for the GCE. The first study on this topic has been conducted by Slessor, Phillips, and Bull (2008). In that study, in which a single SOA was used (i.e., 180 ms), younger (mean age about 20 years) and older (mean age about 70 years) individuals were presented with young adult faces with averted gaze and displaying different emotions (i.e., joy, sadness, fear, anger, and neutral) as well as arrow cues. As for facial cues, while orienting was not modulated by emotional expression, overall, the GCE was greater in younger individuals. Slessor et al. (2008) reported that arrow-mediated orienting was also stronger in younger participants, suggesting that older adults might have a generalized hyposensitivity to central cues, although this effect was not later replicated by Slessor et al. (2016), who found no age-related differences in a nonsocial arrow cueing task. In a further study, Slessor, Laird, Phillips, Bull, and Filippou (2010) presented younger (mean age about 20 years) and older (mean age about 70 years) participants with younger (age range: 18–25 years) and older adult (age range: 60–88 years) cueing faces. A single 500-ms SOA was used. Consistent with Slessor et al. (2008), the GCE was overall greater in younger participants, and this was particularly evident in response to younger facial cues. In contrast, in older participants a similar GCE emerged irrespective of the age of the facial cue.

Turning to studies addressing overt orienting, Kuhn, Pagano, Maani, and Bunce (2015) asked younger (mean age about 20 years) and older (mean age about 70 years) individuals to search for a peripheral target in the presence of a central distractor avatar face with averted gaze. Older individuals were overall less influenced by the distractor gaze, corroborating the notion that gaze following abilities seem to decline with age. More relevant to the understanding of the unique role of social variables, Ciardo, Marino, Actis-Grosso, Rossetti, and Ricciardelli (2014) monitored overt orienting responses in an instructed saccade paradigm and reported an own-age bias in younger participants akin to the effect reported by Slessor et al. (2010) with a covert orienting paradigm.

Overall, the available studies suggest that gaze-mediated orienting abilities decline with age, although it is not yet clear whether the impact of other cues (i.e., arrows) is also reduced among older individuals. The presence of an own-age bias in younger participants reported by Slessor et al. (2010) and Ciardo et al. (2014), however, appears to be more consistent with the uniqueness of gaze as a social modulator of attention shifting.

Personality and internal states

There is evidence that even subtle dimensions related to our own personality can shape our social attention abilities. For instance, it has been shown that personality traits are associated with oculomotor behaviour during the scanning of pictures displaying social interactions (Wu, Bischof, Anderson, Jakobsen, & Kingstone, 2014). As for the GCE, Wilkowski, Robinson, and Friesen (2009) reported, in a first experiment, an increased effect in individuals with low levels of trait self-esteem (assessed through the self-esteem scale of Rosenberg, 1965), likely reflecting their need to be reconnected with others, which is a core need of human beings (e.g., Baumeister & Leary, 1995). This modulation was not further qualified by SOA (50 vs. 600 ms). In another experiment, Wilkowski et al. (2009) also reported a similar pattern in a sample of individuals undergoing a manipulation aimed at activating rejection-related thoughts. Specifically, prior to the GCE task, the participants were asked to write for 5 minutes about times in their life when they had felt to be either socially accepted or rejected. This latter experiment is important in that it suggests that not only stable individual differences but also temporarily induced internal states can modulate the GCE. This conclusion is also supported by Cui, Zhang, and Geng (2014), who primed participants with high or low power (a social dimension strictly related to dominance and social control; see Keltner, Gruenfeld, & Anderson, 2003). In more detail, participants firstly completed a priming task, in which they were asked to recall or imagine experiences in which they controlled others (high-power priming) or in which they were controlled by others (low-power priming). After that, a gaze-cueing task was administered. The main results showed that the GCE was greater in participants who received a low-power priming, and this was particularly evident in female participants. In a more recent study, Capellini, Riva, Ricciardelli, and Sacchi (2019) further explored the impact of temporarily induced state of belongingness on the GCE, using a gaze-cueing task with a fixed 200-ms SOA. Unlike Wilkowski et al. (2009), Capellini et al. (2019) manipulated social exclusion through the administration of the cyberball task (Williams & Blair, 2006). Their results suggest that the manipulation had an effect on the GCE, although its direction was opposite with respect to that reported by Wilkowski et al. (2009). Indeed, participants who had faced a rejection experience displayed a reduced GCE. Importantly, the cyberball task manipulation had no effect when gaze cues were replaced by arrows (Capellini et al., 2019; Experiment 2), thus confirming the social nature of the observed modulation of the GCE. In sum, the findings reported by both Wilkowski et al. (2009) and Capellini et al. (2019) seem to support the idea that affiliative states can affect the GCE, although they are inconsistent with respect to the direction of the effect. There are several methodological differences between the two studies, and therefore further research is needed to identify the potential additional factors that might lead to one kind of effect or its opposite. One possibility is that the effect of social variables is not always linear in nature, but it might crucially depend on the strength of the induced internal states so that relatively weak versus strong experiences of rejection lead to divergent findings.

Another proxy for investigating the impact of personality differences on attentional responses is provided by political temperament. Indeed, differences between liberals and conservatives go beyond the different vision of the world possessed by these two groups. Evidence shows that liberals and conservatives can also differ in many aspects of cognition (e.g., Jost, Nam, Amodio, & Van Bavel, 2014). Intriguingly, there is also evidence that in self-defined conservatives, the GCE is reduced (Dodd, Hibbing, & Smith, 2011). In more detail, Dodd et al. (2011) used schematic faces and three different SOAs (i.e., 100, 500, and 800 ms) and found that the GCE was completely abolished in conservatives whereas it emerged among liberals, irrespective of SOA. According to Dodd et al. (2011), this might be because conservatives tend to be individualistic and therefore less inclined to be influenced by others as compared with liberals. Similar results have been observed also by Carraro, Dalmaso, Castelli, and Galfano (2015), who used two SOAs (200 and 700 ms) and tested the attentional response elicited by both gaze and arrow cues. Carraro et al. (2015) reported that arrow-mediated cueing of attention was comparable between conservatives and liberals, regardless of SOA. In contrast, the reduced GCE for conservatives was particularly evident at the longest SOA used in the study. In sum, the findings reported by Dodd et al. (2011) and Carraro et al. (2015) consistently suggest that political temperament has an impact on attention shifting when the cue has a social value (i.e., eye gaze). Critically, this modulatory role did not emerge when nonsocial stimuli were used.

Characteristics of the cueing faces

In this section, we included studies focusing on social influences elicited by the analysis of the perceptual features conveyed by the faces providing the gaze (i.e., physical dominance, physiognomic traits suggesting trustworthiness, emotional expressions). However, this section also focused on studies in which social factors were extracted as a result of higher-level processing related, for instance, to the retrieval of previous knowledge about the characteristics and behaviours of the individual providing the gaze cue (e.g., knowledge about the social status, trustworthiness, and the previous gazing behaviours in multiagent contexts). Both inferences from perceptual features and the retrieval of episodic knowledge can occur very quickly (e.g., Castelli, Zogmaister, Smith, & Arcuri, 2004; Todorov, Pakrashi, & Oosterhof, 2009; Willis, & Todorov, 2006), thus opening the possibility that social modulations on gaze-mediated orienting of attention could also emerge early in processing.

Physical dominance

The concept of dominance is strongly associated with the concept of hierarchy, and it can be defined as the use of force and intimidation to influence the behaviour of other individuals (e.g., Henrich & Gil-White, 2001). In everyday life, individuals tend to associate dominance with masculinity. Indeed, it has been shown that masculine faces are generally judged as being more dominant as compared with feminine faces (e.g., Perrett et al., 1998). Inspired by this evidence, Jones et al. (2010) presented participants with faces of real individuals that were either masculinized (i.e., made more dominant) or feminized (i.e., made less dominant) through a morphing technique. Their paradigm included three SOAs (200, 400, and 800 ms). A greater GCE emerged in response to masculinized than to feminized faces at the shortest SOA. The observation of a significant modulation only at the 200-ms SOA was interpreted as evidence of reflexive processing. More recently, Ohlsen, van Zoest, and van Vugt (2013) have presented participants with both a male face judged as dominant looking and a female face judged as nondominant looking in a gaze-cueing task including two different SOAs (200 vs. 800 ms). Critically, the gaze-cueing task was preceded by the presentation of threatening (e.g., an accident) versus nonthreatening pictures (e.g., a smiling baby) in order to induce a sense of an unsafe versus safe context. In line with Jones et al. (2010; also see Jones, Main, Little, & DeBruine, 2011), the dominant-looking male face elicited an overall stronger GCE as compared with the nondominant female face. In addition, the nondominant female face induced a reliable GCE only when participants were primed with a nonthreatening picture. According to Ohlsen et al. (2013), the lack of a reliable GCE for female faces in threatening contexts would be associated with the idea that physically weaker individuals are less likely to offer safety and protection in dangerous situations.

Social status

Like dominance, social status also contributes to creating hierarchies in human groups. However, whereas dominance mainly arises from physical strength, social status most often arises from intellectual abilities and can be defined as the amount of respect and admiration accorded to an individual from others (e.g., Gould, 2002). From an empirical perspective, the effects of social status on the GCE are comparable with those reported for dominance. In a first study, Dalmaso, Pavan, Castelli, and Galfano (2012) asked participants to read fictive curricula vitae describing individuals displaying either high or low social status. Then, the faces that had been associated with the fictive curricula were employed in a gaze-cueing task with a single 200-ms SOA. The results showed a greater GCE in response to faces that had been associated with higher social status. The same pattern of results also emerged in a subsequent study by Dalmaso, Galfano, Coricelli, and Castelli (2014), who also showed that the modulatory effects of social status on the GCE tend to decay over time. Indeed, with a 200-ms SOA, a significant GCE only emerged for high-status faces, whereas, with a 1,000-ms SOA, the magnitude of the GCE was not affected by the social status associated with the face providing the gaze cue. Moreover, Dalmaso et al. (2014) provided additional evidence for the reflexive nature of this social modulation by manipulating the duration of the direct gaze face frame preceding the presentation of the averted gaze face. Indeed, while SOA is a critical parameter in order to sample the location of covert attention that can be used to extract information about the time course of the GCE, the processing of social features conveyed by the face can start before the onset of the gaze cue. More specifically, extraction of social features can start to operate time-locked to the onset of the direct gaze face frame. Hence, in order to assess the reflexive nature of the social modulation, Dalmaso et al. (2014) performed a further experiment with a fixed 200-ms SOA in which the direct gaze face frame could last either 50 or 900 ms. Intriguingly, the magnified GCE for high-status faces was visible even when there was a very short time between the onset of the direct gaze face frame and the appearance of the target stimulus, suggesting that this social modulation is early rising and, thus, reflexive.

Overall, the available evidence discussed here and in the previous section (i.e., dominance) indicates that individuals higher in dominance and social status can elicit a greater orienting response, likely reflecting the greater social relevance that is generally associated with people holding higher positions within social hierarchies (see also Koski, Xie, & Olson, 2015).

Trustworthiness

Trustworthiness is a crucial social dimension that strongly guides our tendency to establish approach or avoidance behaviours towards another individual. Interestingly, a greater GCE in response to a cueing face described as belonging to a trustworthy individual has been recently reported by Süßenbach and Schönbrodt (2014), whereas other studies reported no modulations of the GCE as a function of trustworthiness (see Bayliss & Tipper, 2006; King, Rowe, & Leonards, 2011; Strachan, Kirkham, Manssuer, Over, & Tipper 2017). Because all of the aforementioned studies differ in many methodological aspects, it is hard to identify the source of this inconsistency, and future studies are therefore strongly recommended to shed light on the interplay between trustworthiness and the GCE. One possibility is that additional factors related to the characteristics of the participant can influence this modulation (see Petrican et al., 2013). Moreover, the fixed SOAs used by both Süßenbach and Schönbrodt (2014; SOA = 450 ms) and the other three studies (Bayliss & Tipper, 2006; King et al., 2011; Strachan et al., 2017; SOA = 500 ms) were not conceived to address the reflexivity of social modulations, if any. Therefore, the current evidence does not allow us to draw any firm conclusion about the potential modulatory role of trustworthiness.

Emotional expressions

Facial expressions of emotions are basic and—often—reliable signals used by individuals to communicate and infer intentions and feelings. Moreover, when emotional expressions are used in combination with gaze direction, they can also become a powerful tool to communicate the presence of relevant objects in the environment to others. For instance, a fearful face with averted gaze could indicate the presence of threats, such as a snake on a mountain trail. Hence, emotional expressions are highly adaptive stimuli with an important role in social interactions. Despite the bulk of evidence highlighting the relevance of emotional information on attention (e.g., Yiend, 2010), in healthy adults, the study of the GCE in response to facial expressions has led to mixed results, in particular when static expressions were used. On the one hand, a pioneering study composed of six experiments, and including several SOAs ranging from 14 to 600 ms, reported no modulation of the GCE as a function of emotional expressions (i.e., neutral, happy, angry, fearful; Hietanen & Leppänen, 2003), and this lack of modulation was then reported also by other research groups, at least at the behavioural level (e.g., Bayliss, Frischen, Fenske, & Tipper, 2007; Borjon, Shepherd, Todorov, & Ghazanfar, 2010; Galfano et al., 2011; Holmes, Mogg, Garcia, & Bradley, 2010; Prasad, Marmolejo-Ramos, & Mishra, 2015; Slessor et al., 2008). On the other hand, the evidence in favour of an overall larger GCE in response to emotional faces per se is scant (see Carlson, 2016), whereas the vast majority of studies highlighting modulatory effects of emotional expressions documented them only under specific circumstances. In this regard, Hori et al. (2005), who included a single 150-ms SOA, reported a greater GCE for happy faces, as compared with both neutral and angry faces, but only in response to stimuli provided by female actors. A similar result has been observed by Hudson, Nijboer, and Jellema (2012), indicating a greater GCE in low AQ individuals in response to specific faces that, in a learning phase, smiled at them. This pattern emerged with a 300-ms SOA, but not with an 800-ms SOA. Moreover, Pecchinenda and Petrucci (2016), who used a single 250-ms SOA, reported a greater GCE for angry faces as compared with both happy and neutral faces, but only when participants were engaged in a concurrent high cognitive load task (i.e., seven-step backwards counting). Interestingly, a greater GCE for both happy and fearful faces has been observed in a paradigm including a fixed 200-ms SOA, after oxytocin administration, a hormone known to enhance emotion recognition (Tollenaar, Chatzimanoli, van der Wee, & Putman, 2013). Furthermore, there is evidence that some individual differences may contribute to shape the GCE in response to emotional expressions. In this regard, Mathews, Fox, Yiend, and Calder (2003) found, irrespective of SOA (300 vs. 700 ms), a greater GCE in response to fearful than to neutral faces, but only in high-anxious individuals (for similar results, see also Fox, Mathews, Calder, & Yiend, 2007; Holmes, Richards, & Green, 2006; Putman, Hermans, & van Honk, 2006). However, even using comparable SOAs, this pattern has not always emerged consistently (see, e.g., Galfano et al., 2011; Holmes et al., 2010; McCrackin & Itier, 2019). An enhanced GCE for fearful faces regardless of SOA (300 vs. 700 ms) has also been documented by Tipples (2006), but only in high-fearful individuals. Pletti, Dalmaso, Sarlo, and Galfano (2015) reported a magnified GCE for faces expressing disgust, fear and anger in individuals with snake phobia. This pattern was not further modulated by SOA (200 vs. 500 ms). Lassalle and Itier (2015a; see also McCrackin & Itier, 2019) observed a larger GCE for fearful than for happy faces using a fixed 500-ms SOA, but only in high AQ individuals. A further study by Ponari, Trojano, Grossi, and Conson (2013) focused on introversion/extroversion traits, finding that introvert participants showed a reliable GCE in response to both neutral and happy faces, but not in response to angry faces, while the opposite pattern of results emerged in extrovert participants. No effect of the SOA (300 vs. 700 ms) was reported.

Other studies employed more ecological cueing stimuli—namely, faces that morphed dynamically from a neutral state into an affective state. Under these circumstances, a greater GCE has been observed in response to fearful or angry faces in studies using SOAs in the 160–700 ms range (Bayless, Glover, Taylor, & Itier, 2011; Graham, Friesen, Fichtenholz, & LaBar, 2010; Lassalle & Itier, 2013, 2015b; Liu, Shi, Whitaker, Tian, & Hu, 2019; McCrackin & Itier, 2018; Neath, Nilsen, Gittsovich, & Itier, 2013; see also Putman et al., 2006; Tipples, 2006; Uono, Sato, & Toichi, 2009; but see Fichtenholtz, Hopfinger, Graham, Detwiler, & LaBar, 2007, 2009) and even in response to surprised or happy faces (e.g., Bayless et al., 2011; Lassalle & Itier, 2013, 2015b; McCrackin & Itier, 2018; Neath et al., 2013). Modulations of the GCE as a function of emotional expression have also been reported in studies manipulating the salience of affective information by presenting participants with emotionally valenced targets rather than with neutral targets, like simple shapes or letters. In this regard, Pecchinenda, Pes, Ferlazzo, and Zoccolotti (2008), who used a single 250-ms SOA, found a greater GCE for fearful and disgusted faces, but only when participants were asked to discriminate the affective valence (positive vs. negative) of target words. By contrast, a similar GCE regardless of emotional expression emerged when the task was to discriminate the case (upper vs. lower) of such words. Following similar approaches, Bayliss, Schuch, and Tipper (2010) reported a greater GCE for happy than for disgusted faces with a fixed 500-ms SOA, but only when participants were asked to localize pleasant targets as compared with neutral targets. Moreover, Friesen, Halvorson, and Graham (2011) observed a greater GCE for fearful than for happy faces, but only when participants responded to emotionally valenced targets as compared with neutral targets, provided that a medium-to-high SOA duration (i.e., 525 ms) was employed. Finally, there is evidence that also manipulating the frequency with which participants are exposed to emotional expressions can shape the GCE (Kuhn, Pickering, & Cole, 2016a). In more detail, Kuhn et al., (2016a), who included a single 150-ms SOA, presented participants with blocks in which fearful faces were rare stimuli among happy faces—or vice versa—an approach aimed to overcome the potential confound of habituation. The results showed a greater GCE in response to fearful faces, but only when these were rare occurrences.

As concerns studies using overt attention paradigms, the available research is scarce and mixed. Bonifacci, Ricciardelli, Lugli, and Pellicano (2008) used an instructed saccade paradigm and found no modulations as a function of emotion. In another study using a different paradigm, Matsunaka and Hiraki (2019) provided evidence of a possible modulation of overt gaze-following behaviour due to the emotion displayed by the face stimulus. On the whole, no clear conclusions can be drawn, although subtle factors may also play a key role in overt orienting. In this regard, it is worth noting that the important role of the salience of affective information for gaze-mediated orienting is also supported by an eye-tracking study in which a greater influence of eye gaze emerged in response to a dynamic fearful face, but only when participants were asked to search for a threatening target as compared with a pleasant target (Kuhn & Tipples, 2011).

To sum up, it seems that both individual differences (e.g., gender, levels of anxiety) and methodological aspects (e.g., dynamic morphing, salience of affective information, stimulus frequency) contribute to shape gaze-mediated orienting in response to emotional expressions in healthy adults, confirming that this form of social orienting is a complex and multifaceted phenomenon. In addition, the role of temporal parameters is also far from being straightforward. According to Graham et al. (2010), significant modulatory effects of emotional expression on the GCE would emerge only at relatively long SOAs (i.e., higher than 300 ms), which would be required for a full gaze and emotion integration. Such a view, however, is not in line with the results reported by different studies indicating modulations of the GCE as a function of emotion at rather brief SOAs (e.g., McCrackin & Itier, 2018; Putman et al., 2006; Tipples, 2006). This, in turn, would be more consistent with the view that these modulations are the consequence of more reflexive processing.

Nevertheless, the lack of a robust and consistent pattern of results invites caution and strongly points to the need of further studies systematically addressing the role of both individual and contextual factors, as well as the role of eye movements in covert attention paradigms, which may, at least partially, reduce the likelihood to detect significant modulatory effects of emotional expressions on the GCE (McCrackin, Soomal, Patel, & Itier, 2019).

Multiagent contexts

In the previous sections, the characteristics of the cueing faces were related to the specific features of the single individuals providing the cue. In contrast, in this section, the characteristics of the cueing stimuli are intended as the attentional behaviours of the presented individuals (i.e., where more faces are gazing). Indeed, the GCE has mostly been investigated by presenting participants with just one cueing face per time. However, during our everyday activities, it is highly likely that we meet more than one person at the same time and that each of these individuals can look at different spatial locations. The investigation of the GCE in multiagent contexts is therefore of great interest, as it may reveal important insights concerning the functioning of human social attention in real contexts. In this regard, two cueing faces have been employed by Böckler, Knoblich, and Sebanz (2011). In their study, participants firstly observed the two faces either looking at each other (i.e., establishing mutual gaze) or not, and then the two faces looked jointly towards the same spatial location, thus determining a joint attention episode. The target, presented after 500, 600, or 700 ms, was equally likely to appear at the gazed-at or nongazed-at location. Strikingly, a reliable GCE emerged only when the two faces had established mutual gaze, a result that would indicate a link between joint attention episodes and social orienting (see also Dalmaso, Edwards, & Bayliss, 2016b). In a similar vein, this link has also been explored by Edwards, Stephenson, Dalmaso, and Bayliss (2015). In more detail, participants made an eye movement towards an object flanked by two faces, one looking at the object—thus establishing joint attention with the participant—and the other one looking at the opposite location. After either 100 or 400 ms, a target appeared on one of the two faces. Overall, latencies were smaller when the target appeared on the “joint-attention face” rather than on the other one, suggesting the presence of an attentional mechanism that would promote the processing of faces who have previously established joint attention bids with us. In a further study addressing the interplay among two potentially cueing faces and emotions, Becker (2010) presented participants with two facial stimuli (one above and one below fixation) displaying different facial expressions and gazing at different locations. The findings showed that participants preferentially attended to fearful faces, as testified by a greater GCE in response to a fearful cueing face, when this was presented together with a neutral cueing face. This modulation was observed regardless of both SOA (250 vs. 500 ms) and the specific set of face stimuli (avatar vs. real faces) used (see also Carlson & Aday, 2018). Overall, data reported by Becker (2010) and Carlson and Aday (2018) can be interpreted as suggesting that, in multiagent contexts, faces bearing an emotional expression can capture attention and, in turn, become more likely to elicit a GCE.

Other studies explored socioattentional dynamics in response to small groups and crowds. In this regard, Capozzi, Becchio, Willemse, and Bayliss (2016) exposed participants to an initial learning phase in which three faces were presented together. In one condition, one of these faces could turn either rightwards or leftwards, and it was later imitated by the other two faces, which moved in the same direction (i.e., “leader” condition). In the other condition, two faces turned either rightwards or leftwards simultaneously, and they were later imitated by the remaining face (i.e., “follower” condition). In a second phase, faces of both “leaders” and “followers” were employed in a gaze-cueing task with either a 200-ms or 1,000-ms SOA. The results showed a greater GCE in response to the faces associated with a leading behaviour, regardless of SOA, thus confirming the important role of leadership in modulating the GCE (see also Dalmaso et al., 2012, 2014; Jones et al., 2010). Moreover, Sun, Yu, Zhou, and Shen (2017) presented participants with a relatively large group of 10 avatars that simultaneously turned their heads while manipulating whether they all moved in the same direction or not (SOA = 300 ms). The magnitude of the GCE was maximum when all the avatars looked at the same spatial location, and linearly decreased as the inconsistency in the gaze direction of the avatars increased, thus supporting the notion that when individuals are exposed to multiple cueing faces, they tend to follow a kind of “majority rule.” Interestingly, in relatively small groups (i.e., three to five individuals), Capozzi, Ristic, and Bayliss (2018) have also observed a “quorum-like rule,” according to which the minimal proportion of consistent facial cues needed to elicit an attentional response increased with group dimension (for an additional study employing only two cueing faces, see also Wang, Xu, Zhang, Luo, & Geng, 2019).

As for overt attention paradigms, some experimental evidence suggests that even the cultural background of the respondents might further shape the modulatory role of multiagent contexts (Cohen, Sasaki, German, & Kim, 2017), indicating stronger effects for multiple gaze cues in individuals belonging to interdependent cultures (East Asia) as compared with individuals belonging to more individualistic cultures (United States). Intriguingly, overt gaze following in response to multiple cueing stimuli also has been investigated outside the laboratory, in human crowds that spontaneously take place in everyday life. In a pioneering study, Milgram, Bickman, and Berkowitz (1969) asked a variable number of confederates (i.e., from one to 15) to stop walking on busy city streets and look up at a window. At the same time, the number of naïve individuals on the same street who imitated the confederates’ looking behaviour was counted. Similarly to Sun et al. (2017), the results showed that the larger the confederate group, the higher the number of naïve participants who looked at the window, a result also replicated in a recent study (Gallup, Hale, et al., 2012b; see also Gallup, Chong, & Couzin, 2012a). Finally, Gallup, Chong, Kacelnik, Krebs, and Couzin (2014) filmed naïve pedestrians during an interaction with a confederate that was instructed to show an averted gaze along with four different emotions (neutral, happy, fear, and “suspicion”). In this case, a stronger gaze following behaviour emerged in response to both fearful and “suspicious” faces, but only when the participant was embedded in a small group of individuals (composed by two to six members) rather than alone.

Overall, all these studies confirm that social aggregation is also a key factor for the mechanisms underlying both the GCE and overt orienting of attention. They also highlight the relevance of studying social attention in groups and, more generally, during real social interactions (also see Kuhn, Teszka, Tenaw, & Kingstone, 2016b; Laidlaw, Rothwell, & Kingstone, 2016).

Relationship between the observer and the cueing face

The studies included in this section cover several different processes through which social information can be extracted. Similar to what was discussed in relation to the characteristics of the cueing faces, both perceptual features (e.g., shared skin colour) and exemplar-based representations in memory (e.g., familiarity) can be involved. However, other processes can be at work while extracting the social information that can then trigger modulations of gaze-mediated orienting of attention. One set of factors has to do with motivational states influencing our appraisal of the individuals depicted in the cueing face as a function of their relevance in relation to our personal goals. Another possible set of factors is related to high-level processing involved in the active construction of an episodic representation of the scene we are exposed to, such as when we have to determine the mental states of a target person.

Familiarity

Familiarity is an important social dimension that profoundly shapes our social-cognitive processes and—by definition—it exists in the eyes of the beholder. As for the GCE, Deaner, Shepherd, and Platt (2007) observed that familiar faces (i.e., faces of people who worked in the same department as the participants) were associated with a greater GCE as compared with unfamiliar faces (i.e., faces of people who worked in a different department). However, this difference emerged only in female participants, likely reflecting their greater sensitivity to eye-gaze cues (e.g., Bayliss et al., 2005), and it was more evident at the shortest (i.e., 200 ms) SOA. In a similar vein, Hungr and Hunt (2012) morphed participants’ faces with those of unknown individuals, obtaining facial stimuli that could be 0%, 30%, 50%, or 100% similar with the participant’s face. The results, obtained using a fixed 100-ms SOA, showed that the GCE increased with self-similarity with the cueing face. The same pattern emerged in overt gaze following behaviour when an oculomotor task was used (Hungr & Hunt, 2012; also see Porciello et al., 2014, for related evidence with an instructed saccade task).

Familiar and unfamiliar faces may also affect other facets of attention such as attention holding (see Chauhan, di Oleggio Castello, Soltani, & Gobbini, 2017). Intriguingly, the effects of familiarity also appear to be long-lasting. For instance, Frischen and Tipper (2006) demonstrated that the face of a famous individual (e.g., a film actor), initially presented with an averted gaze, is able to bias attention shifting in a consistent direction even 3 minutes later, when the target is presented along with the same face displaying a straight gaze. This suggests that the familiar face with a straight gaze acts as a retrieval cue to an event (the same face with an averted gaze that has previously elicited a shift of attention), indicating that the knowledge that is rapidly retrieved about the known exemplar may further shape social attention.

Overall, the available evidence supports the idea that familiarity is indeed a relevant dimension modulating social attention, as observed in both the GCE and in overt orienting tasks.

Racial group membership

The impact of racial group membership on cognition is one of the central topics in social cognition research, and increasing evidence shows that racial group membership can also affect attentional mechanisms (e.g., Trawalter, Todd, Braid, & Richeson, 2008). As for the GCE, Pavan, Dalmaso, Galfano, and Castelli (2011) presented White and Black individuals living in a Western country (i.e., Italy) with White and Black cueing faces and used a fixed 200-ms SOA. Two main results emerged: On the one hand, Black individuals showed a similar GCE in response to both facial stimuli; on the other hand, White individuals showed a reliable GCE only in response to White faces. The reduced influence of gaze stimuli provided by Black faces in White individuals has been subsequently replicated in a different cultural context (i.e., United States) by Weisbuch, Pauker, Adams, Lamer, and Ambady (2017), who used a gaze-cueing paradigm including a 100-ms and a 300-ms SOA. Importantly, Weisbuch et al. (2017) also extended Pavan et al.’s (2011) results showing that when White participants were primed with high-power and low-power conditions, a GCE for Black faces emerged, but only in those who received a low-power prime. Hence, according to both Pavan et al. (2011) and Weisbuch et al. (2017), this race-based modulation of the GCE could reflect the different social status/power (typically higher for White than for Black individuals), although the previous history of intergroup relationships can also play a role (Chen & Zhao, 2015; Chen, Zhao, Song, Guan, & Wu, 2017). In a further study, Strachan et al. (2017) addressed possible modulations of the GCE as a function of racial-group membership by testing British White participants with White and Asian faces, using a single 500-ms SOA. Their results suggested no differences in the GCE when the gaze cue was provided by White and Asian faces. The discrepancy between these latter findings and those reported by Pavan et al. (2011) and Weisbuch et al. (2017) might be accounted for by methodological differences, the most relevant being that Strachan et al. (2017) had participants complete a task at the beginning of the experiment aimed to familiarize with the face stimuli later used in the gaze-cueing task. Indeed, because the authors were mainly interested in investigating the effects of trustworthiness, they reasoned that greater familiarity with faces could increase trust. However, as discussed in the previous paragraph, changes in familiarity can also influence the GCE and thus possibly mask a potential role of racial group membership. In addition, it should be always stressed that social modulations are intrinsically related to the specific cultural context in which the studies are performed, and the key driving factor is represented by the relationship between the involved groups rather than by group membership per se.

The impact of racial group membership has been also investigated with oculomotor measures. Using an instructed saccade task with both a 0-ms and a 900-ms SOA, Dalmaso, Galfano, and Castelli (2015b) found a reduced gaze-following behaviour in response to Black faces in White individuals, but only at the shortest SOA. Moreover, such modulation was still detected even when the time interval between the onset of the direct gaze face frame and the simultaneous appearance of both the instruction cue and the averted gaze frame was very short (i.e., 50 ms), suggesting that this social modulation is both short-lasting and early rising, consistent with reflexive processing. In sum, evidence from studies addressing both the GCE and overt orienting confirm that racial group membership is a significant factor affecting social attention, and that its effects appear to be mainly driven by the different social status associated to the various social groups.

Shared political affiliation

Political leaders can also shape the GCE in their voters. This is what emerged in a set of studies on overt orienting that employed an instructed saccade task with a fixed 75-ms SOA including faces belonging to real politicians. In more detail, the results indicated that conservatives—but not liberals—are more influenced by eye-gaze stimuli provided by in-group than by out-group political leaders (Liuzza et al., 2011; see also Cazzato, Liuzza, Macaluso, Caprara, & Aglioti, 2015) and that this influence tends to decrease if the in-group leader’s popularity decreases (Porciello, Liuzza, Minio-Paluello, Caprara, & Aglioti, 2016). Intriguingly, there is evidence that the magnitude of this overt gaze following behaviour can also be associated with future voting intentions (Liuzza et al., 2013), suggesting that gaze-mediated orienting response might be a useful index to unveil complex social dynamics.

Personal goals and values

A few studies showed that even more subtle and complex relational facets, such as competitiveness and perceived morality, can impact on the GCE. In more detail, Ciardo, Ricciardelli, Lugli, Rubichi, and Iani (2015) firstly exposed participants to a task in which some actors’ faces displayed either a cooperative or a competitive behaviour. The same faces were then employed in a standard gaze-cueing task with a fixed 200-ms SOA. The results showed that participants characterized by higher levels of competitiveness displayed a reliable GCE in response to both types of cueing faces, whereas participants with lower levels of competitiveness showed a greater GCE only in response to faces associated with a competitive behaviour. According to the authors, competitive contexts would result in a generalized tendency to monitor potentially relevant cues provided by social actors, whereas cooperative contexts would elicit a more selective attentional focus on competitive individuals, who might interfere with the achievement of one’s goals. As concerns morality, Carraro et al. (2017) presented participants with two face sets, one described as composed of flatmates who were characterized by positive and socially accepted behaviours, and the other one described as composed of flatmates regularly breaking relevant social norms. The same faces were later included as stimuli in a gaze-cueing task with two different SOAs (200 vs. 700 ms). Overall, regardless of SOA, a greater GCE emerged for faces associated with antisocial/immoral rather than prosocial/moral behaviours, and, importantly, this difference was much more evident in participants who evaluated antisocial behaviours more negatively. In sum, these two studies indicate that the way we respond to social cues can vary according to present goals and value orientation.

Mental state attribution

The attribution of mental states to facial cues has been considered as a further factor potentially shaping the attentional response to social cues. This has been investigated both with the GCE and with orienting of attention mediated by head turns. In the first study on this topic (Nuku & Bekkering, 2008), participants were presented with cueing faces with either open or closed eyes, or with the eye region covered by either an occluder or sunglasses. Remarkably, a reliable orienting of attention emerged only when the cueing face depicted an individual that was in the condition to see the peripheral target (i.e., open eyes or sunglasses conditions). Conceptually similar results indicating larger orienting of attention in response to faces representing individuals in conditions that allow them to actually see the targets have subsequently been reported by employing different tasks and stimuli (e.g., Kawai, 2011; Morgan, Freeth, & Smith, 2018; Schulz, Velichkovsky, & Helmert, 2014; Teufel, Alexis, Clayton, & Davis, 2010; Wiese, Wykowska, Zwickel, & Müller, 2012). Nonetheless, another stream of studies failed to report an influence of attributed mental states on social orienting. The first evidence supporting this notion has been reported by Quadflieg, Mason, and Macrae (2004), who observed a significant GCE even when eye gaze was embedded in inanimate objects, such as apples and gloves. In addition, Cole, Smith, and Atkinson (2015) reported a robust attentional response even when the cueing face was not in the condition to see the target because of the presence of an occluding barrier. Remarkably, this emerged even when the cue was provided by a real person and the target appeared behind a real barrier. Interestingly, despite many of the studies discussed above included a wide range of SOAs, this factor did not reliably modulate the impact of mental state attribution. More recently, Kingstone, Kachkovski, Vasilyev, Kuk, and Welsh (2019) presented participants with the picture of an actor wearing two identical masks, one covering the face of the actor and one covering the back part of the head. When the actor turned his head leftwards or rightwards, the two masks provided either a left or a right spatial cue for the upcoming peripheral target. In this manner, participants were presented with two identical cues, but only the mask covering the actor’s face could be associated with the mental attribution that “a person is looking at the target.” Strikingly, whereas in a control condition a single mask elicited a significant orienting of attention, when the two masks were presented together, there was no evidence that the mask covering the actor’s face led to a greater cueing effect as compared with the other mask, thus suggesting no role of attributed mental states in social orienting of attention.

Finally, the possible link between mental states and attentional responses to social cues has also been explored through oculomotor measures (Kuhn, Vacaityte, D’Souza, Millett, & Cole, 2018). Similar to Cole et al. (2015), participants were presented with everyday life scenes in which an actor and an object (e.g., a drink can) could be separated by a physical barrier (e.g., a pizza box). When participants were allowed 5 seconds to freely explore these scenes, object-directed saccades were faster when the actor was in the condition to see the object. However, when participants were asked to rapidly discriminate a target inside the scene, no such influence of actor’s mental states occurred, which in turn suggests that temporal parameters play a major role in this phenomenon. To conclude, the potential impact of mental states on social orienting is a lively debated, challenging, and still unsolved topic underlying human social attention. Likely, mental state attribution is a much more complex process with respect to those underlying the factors discussed above, for which inferential processes are often extremely rapid. Further research is therefore needed before strong conclusions can be drawn.

General discussion and future directions

The major goal of the present review has been to summarize the documented modulations exerted by social factors on social attention in a critical perspective. The ability to infer others’ focus of attention and to orient towards the same spatial location has a pivotal role in allowing individuals to build up meaningful and pervasive relationships within our social environment. Even if, in humans, these socioattentional shifts can be elicited by several social cues such as head and body turns (e.g., Langton & Bruce, 2000) or pointing/reaching gestures (e.g., Atkinson, Simpson, & Cole, 2018; Dalmaso et al., 2016a), most of the research has focused on eye-gaze direction (e.g., Capozzi & Ristic, 2018; Emery, 2000; Frischen et al., 2007). The number of studies investigating gaze-mediated orienting has increased exponentially after the introduction of the gaze-cueing task (e.g., Friesen & Kingstone, 1998). One of the major reasons for this popularity can be traced to its flexibility and potential for providing insightful answers in many different fields within psychology. In this sense, the gaze-cueing task and its related phenomenon (i.e., GCE) are one of the most striking examples of how different disciplines can mutually contribute and interact to foster the understanding of fundamental underlying constructs. Indeed, in addition to what we have reviewed so far in relation to healthy adults, much is also now known in relation to clinical populations (e.g., Akiyama et al., 2008; Dalmaso et al., 2015a; Dalmaso, Galfano, Tarqui, Forti, & Castelli, 2013; Heimler et al., 2015; Kuhn et al., 2010; Langdon, Corner, McLaren, Coltheart, & Ward, 2006; Marotta et al., 2014; Marotta et al., 2018a), nonhuman species (see Shepherd, 2010, for a review), and human–robot interaction (see Chevalier, Kompatsiari, Ciardo, & Wykowska, 2020, for a review). This research effort has also been put forward from a developmental perspective (e.g., Farroni et al., 2004; Pickron, Fava, & Scott, 2017) and with the aim of uncovering the neural underpinnings (e.g., Tipples, Johnston, & Mayes, 2013).

In recent years, increasing evidence has shown that the GCE can be shaped by different social variables. In the beginning, social variables were mainly manipulated with the goal of testing whether the GCE could be considered as strongly automatic, under the assumption that modulations of the GCE as a function of social factors would reflect top-down processing. However, more recent research focused on the impact of different social variables per se. The underlying idea is that because we live in complex social environments populated by other individuals, each characterized by a variety of different social variables, and because our attentional resources are limited, it is likely that our social attention system evolved with the ability to respond more promptly to some cueing faces rather than others.

In the present paper, we have reviewed the available studies by organizing them in three main sections: characteristics of the observer, characteristics of the cueing faces, and the relationship between the two former factors. As concerns the characteristics of the observer, the current evidence speaks in favour of a magnified GCE in females over males (e.g., Bayliss et al., 2005) as well as in young adults over older adults (e.g., Slessor et al., 2008). The specific reason behind these results is yet to be clearly determined, given that evidence exists showing that similar patterns can also emerge when nonsocial stimuli are used as attentional cues. However, at least in the case of age, the presence of own-age attention biases in young adults (e.g., Slessor et al., 2010) suggests that the effects are, at least partially, rooted into social processes. The investigation of the impact of internal states (e.g., experiences of ostracism) has led to more inconsistent findings, whereas the few available studies on political temperament have shown that the GCE is significantly larger among liberals as compared with conservatives (e.g., Dodd et al., 2011). As concerns the characteristics of the cueing faces, there appears to be reliable evidence about the role of physical dominance (e.g., Jones et al., 2010) and social status (e.g., Dalmaso et al., 2012), whereas findings related to trustworthiness are mixed. Overall, emotional expressions do not seem to play a straightforward role in themselves, but they can more likely have an impact on both covert and overt social attention in combination with other factors, such as, for instance, the salience of affective information (e.g., Pecchinenda et al., 2008), or their relatively rare presence in the experimental setting (Kuhn et al., 2016a). The investigation of multiagent contexts to probe the role of social aggregation has shown converging evidence that both covert and overt orienting seem to be sensitive to multiple individuals providing cueing faces and the relationship among them (e.g., Böckler et al., 2011; Gallup, Hale, et al., 2012b). As concerns the relationship between the observer and the cueing face, the available evidence is in line with the observation that more familiar faces trigger a stronger GCE (e.g., Deaner et al., 2007). Moreover, consistent evidence has shown that shared features between the observer and the cueing face can have a major role in modulating both covert and overt orienting of attention, as for instance in the case of shared racial membership (e.g., Pavan et al., 2011; Weisbuch et al., 2017), shared political affiliation (e.g., Liuzza et al., 2011), and personal goals and values (e.g., Carraro et al., 2017). Finally, the impact of mental state attribution is more debated, and the mixed available evidence possibly reflects not only the complexity of the issue but also the adoption of very different methods to manipulate the type of attributions the perceiver is likely to carry out (e.g., Kingstone et al., 2019; Kuhn et al., 2018).

An interesting point concerning the effects of social variables is related to their time course. In this regard, although there is relevant variability across studies, the available evidence suggests different temporal trajectories depending on the specific social features under investigation. On the one hand, there is evidence indicating that extraction of social information based on perceptual features (e.g., the colour of the skin) and retrieval of exemplar representations from memory (e.g., familiarity) tend to take place very rapidly, and can influence orienting of social attention even when the participants have limited time to process the face providing the gaze cue and very short SOAs are used. This pattern is consistent with the view that such modulations occur in a reflexive manner (e.g., Müller & Rabbitt, 1989). On the other hand, in the case of mental-state attribution, the effects, when present, appear to be detected relatively late (i.e., with SOAs longer than 300 ms), in line with the assumption that inferences about the perspective of the individual providing the cueing face are created on the spot rather than being based on representations stored in long-term memory. This is a more complex process that is more likely to rely on top-down control, and this may partially account for the mixed evidence characterizing the impact of mental state attribution.

An alternative, intriguing strategy to systematize all the relevant studies in the literature is to place them along a continuum based on the level of ecological validity. In this regard, a still open question concerns a detailed comprehension of how social orienting of attention takes place in real social interactions. This is a thorny issue, in that ecological validity is, almost invariably, inversely correlated to internal validity. As discussed earlier, some efforts to increase the ecological validity of the gaze-cueing task have already been made. For instance, the employment of images of real rather than schematic faces (e.g., Driver et al., 1999), and the use of dynamic facial stimuli, such as in some studies addressing the role of emotional expressions (e.g., Putman et al., 2006), can be considered important steps towards that goal (see also Fig. 2). A more recent set of studies has even employed a live confederate to investigate gaze-mediated orienting of attention in a face-to-face context (e.g., Hayward, Voorhies, Morris, Capozzi, & Ristic, 2017; Kuhn, Teszka, et al., 2016b; Lachat, Conty, Hugueville, & George, 2012). Nevertheless, outstanding advancements towards the comprehension of social attention in natural settings can be achieved through eye-tracking methodologies, which can provide a more direct measure of attention allocation (Pfeiffer, Vogeley, & Schilbach, 2013). In recent years, this approach has been successfully employed to enrich the knowledge on social attention mechanisms (e.g., Edwards et al., 2015; Nummenmaa, Hyönä, & Hietanen, 2009), but in all these cases only one naïve participant (i.e., the observer) per time was presented with pictorial cueing faces. In other studies, the cueing face belonged to a real confederate (e.g., Laidlaw, Foulsham, Kuhn, & Kingstone, 2011; Macdonald & Tatler, 2013, 2015). However, in the future, it will be interesting to employ two eye trackers and two participants simultaneously (for a similar approach, see Cole, Skarratt, & Kuhn, 2016; Gobel, Tufft, & Richardson, 2018; Ho, Foulsham, & Kingstone, 2015; Macdonald & Tatler, 2018; Rogers, Speelman, Guidetti, & Longmuir, 2018), in order to allow both individuals to become, alternatively, either the observer or the cueing “face,” thus moving from a classical “one-way” gaze-cueing task to a “two-way” gaze-cueing task in which participants’ eye gaze can both perceive and communicate intentions—namely, the “dual function” of eye gaze (see Gobel, Kim, & Richardson, 2015; Risko, Richardson, & Kingstone, 2016). This approach would allow the increase of ecological validity and would represent one of the most promising avenues to manipulate social variables and explore their impact on social attention processes, both inside and outside the lab (e.g., Gallup et al., 2014).

Fig. 2
figure 2

The circular arrows show the main variables that can shape covert (i.e., GCE) and overt attention shifting mediated by eye gaze, and that can be associated with (a) the observer, (b) the cueing face, and (c) their relationship. b Different types of cueing stimuli. On the one hand, schematic faces allow for a better control on variables characterizing the stimulus, but they are associated to a low ecological validity. On the other hand, studying social attention in everyday interactions allows for a higher ecological validity, but entails a lower control on relevant variables. Other stimuli (i.e., avatars, photographs, and real individuals) are characterized by more nuanced values on both internal and ecological validity

Addressing the issue of ecological validity implies using paradigms with real persons, but it also points to the need to consider the broader context in which the persons are embedded. The different factors we have examined in the present review can affect social attention processes as a function of their context-dependent salience. For example, the effects of perceived power can be expected to be magnified if the participants are placed in a context in which they have to depend on others in order to achieve their goals. From an empirical perspective, this approach has been taken by Cui et al. (2014), who manipulated power by asking participants to recall or imagine situations in which they controlled (high-power priming) or were controlled by others (low-power priming). This simple manipulation was able to affect the GCE, since low-power females showed an enhanced GCE as compared with both males and high-power females, a complementary result to those reported in the literature (e.g., Dalmaso et al., 2012; Jones et al., 2011). This approach has been little explored in the literature, and we believe that substantial work in this direction should be carried out with the aim to increase ecological validity.

How social variables shape attention shifting mediated by eye gaze: Introducing the eyeTUNE theoretical framework

Social stimuli are extremely rich and complex, and this richness and complexity is also reflected in the studies discussed in this review. Indeed, the magnitude of gaze-mediated orienting of attention can be shaped by several social variables characterizing the cueing face, the observer, and their relationship, indicating a great malleability of this phenomenon. As a first attempt to explain how social variables can shape social attention, a possible conceptual framework is proposed, hereafter called “eyeTUNE” (see Fig. 3). According to this framework, eye-gaze cues would be firstly detected and processed by visual mechanisms likely evolved to quickly segregate and prioritize the spatially relevant information conveyed by the dark region of the iris within the sclera (e.g., Kobayashi & Kohshima, 2001; Ricciardelli, Baylis, & Driver, 2000). Then, this visual information would be further processed depending on three main dimensions corresponding to the key clusters that would represent the main sources for tuning the magnitude of GCE and gaze-mediated attentional response in general. It is worth noting that these three dimensions do not have a straightforward overlap with respect to the three sections (i.e., characteristics of the observer and the cueing face, and their relationship) used to summarize the available literature in the present review. Indeed, the model is more generally aimed at characterizing the conditions in which the very same social variable can or cannot lead to a modulatory effect. The first dimension (“situational gain”) would be involved in the evaluation of the cueing face under a functional perspective—namely, as a function of whether orienting in response to the perceived gaze cue could lead to any personal benefit. For instance, an averted-gaze face of a similar individual (e.g., sharing the same young age with the observer; see Slessor et al., 2010) could indicate the presence of a stimulus within the environment of common interest, and therefore an enhanced attention-shifting response could help localize this item. This reasoning applies to situations in which cueing faces with different characteristics are presented and, therefore, the strongest pattern of attentional orienting is more likely to be displayed in response to the face that maximizes the situational gain. However, when no such comparative setting is present, the gaze-mediated orienting might represent the default option, and this would be consistent with the observation that the GCE is elicited also by schematic faces. The second dimension (“individual constraints”) would be more linked to the observer’s characteristics. In other words, this dimension would constrain the GCE mainly based on both biological and psychological individual differences. For instance, the GCE tends to be smaller among males than females (e.g., Bayliss et al., 2005). The third dimension (“contextual factors”) would include less stable environmental variables, such as the presence of affectively valenced targets, priming conditions, or the frequency of a given cueing face. For instance, fearful cueing faces can elicit a stronger GCE, particularly when they are rare events (Kuhn et al., 2016a). The interaction among these three dimensions would then contribute to determine the final magnitude of GCE and, more generally, of gaze-mediated attentional orienting.

Fig. 3
figure 3

The eyeTUNE conceptual framework for social attention. From left to right, the gaze cue is detected, and information is further processed according to three different dimensions. The interaction among the three dimensions would then determine the final magnitude of the GCE. The same reasoning can be also applied to overt gaze-mediated attentional responses

Overall, the pattern of data emerging from the reviewed literature seems to suggest that although eye gaze is a special stimulus, it does not bias attention shifting in a ballistic fashion (i.e., we do not inevitably shift our covert or overt attention in response to any averted gaze we encounter). In contrast, we tend to be selective, depending on the actual relevance of considering such social spatial cue. In the same way, the available evidence indicates that social modulations, although early rising (consistent with reflexive processing), are not ubiquitous. This, in turn, suggests that they can be more properly interpreted as reflecting conditionally automatic processes (e.g., Bargh, 1992). In more detail, social modulations of the GCE are more likely to be found when specific preconditions are met. For instance, Pavan et al. (2011, Experiment 3) have shown that the GCE can be modulated by group membership, but only when such group membership was made salient in the experimental setting by presenting faces belonging to different social groups in an intermixed manner. In other words, social modulations emerged when the experimental setting induced participants to activate categorization processes through social comparison (intermixed condition), but not when these processes were less likely to be elicited due to exposure to stimuli belonging to a single category (blocked condition). In sum, the modulatory effects illustrated in this review could be mainly considered as stemming from conditionally automatic processes (for oculomotor evidence related to different manipulations, see Dalmaso, Alessi, Castelli, & Galfano, 2020).

Although the analysis of what specific social features may affect the modulation of gaze-mediated orienting is theoretically important in itself, social beings do not process information in a vacuum, and the situational relevance of each specific feature is subject to huge variability. For instance, the social status of an individual can be highly relevant if she or he is assessing us in a job interview, and far less relevant while we play soccer with the same individual. This implies that the observer’s current goals, the communicative contexts, and how the current relationship between the observer and the person providing the gaze cue is framed, can all automatically contribute to make some specific social features (e.g., status, gender, trustworthiness) contextually salient because they maximize the potential gain for the observer in that context (see Smith & Semin, 2007). An avenue for future research is thus to more thoroughly integrate a sociocognitive perspective with the idea that we may mentally represent the very same person in different ways depending on the situational demands. Adopting this perspective may allow us to explore social attention by considering the term “social” in relation to the fact that not only are social variables implicated but also people live in socially construed environments that largely shape which social dimensions are more likely activated in the perceivers’ mind, in a context-dependent way, while appraising the persons around them. Following this rationale, the eyeTUNE framework could help in both generating and testing novel directional hypotheses concerning the impact of social variables on the GCE and—more broadly—on social orienting abilities.

Conclusions

In recent years, evidence has accumulated showing that many different social variables associated with both the cueing face and the observer can shape, by themselves and in combination, gaze-mediated orienting of attention. In this work, we have reviewed the existing empirical evidence and attempted to provide a framework aimed at integrating the main findings emerging from this broad literature. Future studies are necessary to further explore the social side of this fascinating form of social orienting, that is an essential ability for successfully navigating within social contexts and the environment. Laboratory-based experiments will have to be increasingly integrated with studies based on real social interactions considering contextual influences. This, in turn, will enable more faithful reproductions of what actually happens in everyday-life activities and therefore broaden the horizons of our knowledge about human social orienting abilities.