As social animals, humans are experts in observing and understanding conspecifics’ behaviors and signals in a crowd (Hare, Call, Agnetta, & Tomasello, 2000; Frith & Frith, 2012; Sumpter, 2006). Not only can this help with interaction but it also can provide information about the surrounding world, such as immediate predation risks or food sources (Galef & Giraldeau, 2001; Griffin, 2004). An effective means by which individuals receive information from others in a group is observing the attentional focus of these others through their gaze (or head) orientation. This orientation implies conspecifics’ interests, which may provide important information about the surrounding physical and social environment (Shepherd, 2010).

This tendency to analyze others’ gaze exists virtually from birth in humans (e.g., Beier & Spelke, 2012; Vecera & Johnson, 1995). Remarkably, humans exhibit rapid shifts in gaze and attention toward the locations on which others’ gazes are fixated. Indeed, there is a great deal of research focusing on the properties of this “gaze following” response (e.g., Friesen & Kingstone, 1998; Langton & Bruce, 1999; Zhang, Tang, Zhang, & Zhang, 2015) and its underlying neural mechanisms (Hietanen, Nummenmaa, Nyman, Parkkola, & Hämäläinen, 2006; Joseph, Fricker, & Keehn, 2015), with a particular emphasis on its key role in human interaction. Furthermore, gaze following exists in a diverse range of animal species (Carpenter & Call, 2013; Emery, 2000; Schloegl, Kotrschal, & Bugnyar, 2007), suggesting that it has a broad adaptive significance.

Intriguingly, people exhibit a correspondingly rapid attentional shift when observing a group of people with a consistent gaze orientation. Compared to when using individual gaze cues, the proportion of individuals who engage in gaze following further increases when gaze information is available in crowds (e.g., Gallup et al., 2012; Milgram, Bickman, & Berkowitz, 1969). The earliest study of gaze following in crowds dates back to 1969. On a crowded street in New York, Milgram et al. (1969) had a stimulus group of individuals stop and stare up together into a building window. The probability of passers-by following this behavior was then measured, and the results showed that the probability of a person following the stimulus group’s action increased with the size of the stimulus group. In other words, the increasing number of consistently directed gazes induced a correspondingly greater cueing strength, which in turn led to an amplified gaze following effect (i.e., a higher probability of stopping and following). Recently, psychological researchers revived this field in several studies investigating the extent, influence, a contextual dependence of social visual attention transmission (Bayliss et al., 2013; Gallup, Chong, Kacelnik, Krebs, & Couzin, 2014; Gallup et al., 2012), gaze perception (Florey, Clifford, Dakin, & Mareschal, 2016; Sweeny & Whitney, 2014), and even virtual experience (Narang, Best, Randhavane, Shapiro, & Manocha, 2016).

In many situations, however, people’s gaze orientations or preferences in crowds diverge (Conradt, 1998; Couzin, Krause, Franks, & Levin, 2005; Ruckstuhl & Neuhaus 2000, 2002). Specifically, it is common to see multiple gazes of varying orientations at the same moment, especially during emergencies or frantic situations such as panic or rioting (Helbing, Farkas, & Vicsek, 2000; Shiwakoti & Sarvi, 2013). In such situations, how is the conflicting gaze information processed? In this case, the gaze following response must be reconsidered. To our knowledge, few studies have systematically examined the gaze following effect in conflicting gaze cue situations. Researchers are far from clear on how gaze cues guide individuals’ attention when they are faced with a crowd with a variety of diverging gaze orientations. Therefore, this study aimed to clarify this point.

When comparing scenes involving a single gaze or consistently directed gazes, a more sophisticated mechanism in conflicting multigaze scenes is required. Take, for example, two subsets of individuals with totally opposing gaze orientations (left and right), which can be regarded as a relatively simple and common conflicting situation involving divergent gazing orientations. Based on previous studies (e.g., Gallup et al., 2012; Milgram et al., 1969), the group with more consistent gaze orientation (i.e., the majority) might have greater cueing strength than the other group (i.e., the minority), so we assumed that the majority would be prioritized and followed during processing. In an evolutionary sense, information from the majority has been considered rather reliable (Galef & Giraldeau, 2001; Griffin, 2004), thus following the majority could help a group to effectively resolve conflicts and make collective decisions, and finally maintain an adaptive advantage (Conradt & Roper, 2009; Sumpter & Pratt, 2009).

Moreover, in most situations involving conflicting gaze orientations, there is a size difference between subgroups (i.e., a majority and a minority). How is the distribution of attention affected by the extent of this difference? This is our second concern. It is possible that attention follows an “all or none” principle, wherein all attentional resources would be distributed to the majority (and hence their gaze orientation). If this were the case, attentional shifts would be equally quick regardless of the size difference between the majority and minority. In contrast, the distribution of attention on the cued side might increase with the gap between the majority and the minority—although this allocation of attention may not follow a linear rule, it would almost certainly be monotonic. In other words, each subgroup would achieve a portion of an individual’s attentional resources.

Considering the above background, we investigated the influence of social attention information derived from conflicting gazes in human crowds on individuals’ attention allocation. We modified the gaze cueing paradigm (e.g., Friesen & Kingstone, 1998) so that a human crowd was employed. Specifically, a group of human avatars was presented at the center of a screen, and participants were asked to do a target identification task in which a target appeared randomly on the left or right side of this group. As the avatars’ eyes would be too small to be recognized at the distance at which they were viewed, we used head orientation as the index of social attention. According to previous studies, head orientation, which is easier to recognize than gaze orientation (Perrett, Hietanen, Oram, Benson, & Rolls, 1992), has a similar capacity to trigger attentional shifts as eye gaze does (e.g., Hietanen, 1999; Nuku & Bekkering, 2008; Sato, Okada, & Toichi, 2007); hence, we adopted head-gaze orientation (wherein the head and gaze always share the same orientation) instead of merely gaze orientation. To investigate our second concern—namely, the relation between the degree of difference in subgroup size and the distribution of attention—we manipulated the sizes of the two subgroups of gaze orientation among the experimental conditions.

Experiment 1

Method

Participants

Sixteen graduate and undergraduate students (seven females; mean age 21.2 ± 2.1 years) were paid to participate in this experiment. None had a history of neurological problems, and all had normal or corrected-to-normal vision. The participants provided their written informed consent before the experiment, and all experimental procedures were approved by the Research Ethics Board of Zhejiang University.

The sample size of the study was determined via a power analysis using G*Power 3 (Faul, Erdfelder, Buchner, & Lang, 2009). Given a large effect size (f = .40; partial η2 = .14), a power of .80, and an alpha level of .05, the power analysis ultimately yielded an estimated sample size of 14. Furthermore, because some participants might be necessitate exclusion, we decided to stop collecting data at N = 16.

Stimuli

Ten human avatars (six males and four females) were created using the 3D animation software Poser 6® (E Frontier, Scotts Valley, California, USA). Three head orientations for each avatar were produced: one facing straight ahead (and therefore looking directly at participants when presented in the middle of a computer screen), one with the head turned 30° leftward, and one with the head turned 30° rightward. All faces showed a neutral expression. The gaze orientation was always the same as the head orientation.

All 10 avatars were positioned in three rows (with the whole picture measuring 12° high and 17° wide), with each row comprising three or four avatars. Each avatar in the back, middle, and front rows measured 3° × 6° with a 1° × 1.5° face, 3.5° × 7° with a 1.25° × 1.75° face, and 4° × 8° with a 1.5° × 2° face, respectively. The upper body and face of each avatar was clearly displayed, not being covered by any other avatar.

A capitalized letter T, subtending 0.4° of visual angle and centered 13° to the left or right of a central fixation cross, was set as the target. All stimuli were presented on a gray background (RGB, 80, 80, 80).

Procedure and design

Participants were seated in an electrically shielded and sound-attenuated recording chamber at a distance of 70 cm from a 17-inch CRT monitor (with a 100-Hz refresh rate). We used the Presentation® software to control stimulus presentation and response acquisition. Participants were asked to keep their eyes centrally fixated and were given clear instructions on how to perform the experimental trials.

The experimental procedure is illustrated in Fig. 1. Each trial began with a fixation cross presented at the center of the screen for a random duration (500–1,000 ms), which was then replaced with an array of all 10 avatars looking straight forward. After remaining for 600 ms, a 300-ms cue array was displayed, wherein each avatar was equally likely to change his or her head orientation to the left or right. Then, the target appeared on the left or right of the display. The participant was required to indicate the position of the target by pressing F or J on the keyboard. If no response was made after 1,500 ms, the trial was coded as a missing response, and the next trial was presented. The interval between trials was randomly determined from 1,000 to 1,500 ms. Both the accuracy and reaction time (RT) were recorded.

Fig. 1
figure 1

An example of a trial in a 6vs4 condition

The experimental design involved two within-subjects factors: consistency (with levels of all consistent, 9vs1, 8vs2, 7vs3, 6vs4, and 5vs5) and validity (with levels of majority-valid and majority invalid).

The consistency factor involved the manipulation of the avatars’ head orientation in the cue array, and included six conditions: one wherein all 10 avatars had their heads turned leftward or rightward (i.e., the all-consistent condition), and five with varying numbers of heads (nine, eight, seven, six, or five) with the same orientation and the other heads (one, two, three, four, or five, respectively) with the opposite orientation (i.e., the 9vs1, 8vs2, 7vs3, 6vs4, and 5vs5 conditions, respectively). To exclude the processes of perceptual subgrouping and also to avoid displaying a same picture under each condition throughout the experiment as much as possible, the position of avatars shifting in one direction or the other was randomized across conditions. To be specific, we produced eight versions of pictures for each consistency condition in which the positions of the minority avatars were randomly selected.

The majority-valid and majority-invalid trials existed only in the all-consistent, 9vs1, 8vs2, 7vs3, and 6vs4 conditions. On majority-valid trials, the letter was presented on the side where the majority of heads were oriented, whereas in majority-invalid trials, the letter appeared on the opposite side. In the 5vs5 condition, wherein the numbers of heads orienting to the left and right were equal, the letter was presented on the left side in 50% of the trials and the right side in the remaining trials.

For each of the six consistency conditions, participant completed 48 trials, which were further divided evenly into the two validity conditions (i.e., 24 trials in both majority-valid and majority-invalid conditions), resulting in a total of 288 randomly presented trials. To ensure adequate rest, the whole experiment was divided into six sessions with a 2-minute break between sessions. Before the formal experiment, there were at least 20 practice trials to ensure that the participants understood the instructions.

After all trials were finished, participants were asked to complete a survey, including rating the task difficulty (from 1 = extremely easy to 7 = extremely difficult), reporting whether they indeed followed the instructions (e.g., keep eyes centrally fixed) and any strategy they adopted during the experiment as well as making a guess on the experimental objective.

Results

Because no trial could be defined as majority valid or majority invalid in the 5vs5 condition, we divided those trials into two halves based on the position of the target, namely the left visual field and right visual field, and conducted paired t tests to determine whether participants showed a prior attention bias to a certain visual field. For the remaining trials, a 5 (consistency) × 2 (validity) two-way analysis of variance (ANOVA) was conducted for RT. Significant main effects were always followed by Bonferroni post-hoc contrasts.

All participants’ mean accuracies exceed 98%, which were fairly high and reached the level of ceiling effects, suggesting that the task was so simple that all participants completed it very well.

Concerning the RTs, all results with RTs either lower than 100 ms or exceeding three standard deviations from the mean were excluded from further analyses (Friesen & Kingstone, 1998; Hayward & Ristic, 2015), with 97.50% of trials remaining. We observed no significant difference between the left visual field and right visual field in the 5vs5 condition, M left-visual-field = 354 ms, M right-visual-field = 357 ms, t(15) = 0.92, p = .37, Cohen’s d = 0.08, suggesting that participants showed evenly distributed attention due to the equal number of avatars in each orientation subgroup.

For the other conditions, the two-way ANOVA for RT revealed a significant main effect of validity, M majority-valid = 345 ms, M majority-invalid = 361 ms, F(1, 15) = 76.32, p < .001, ηp 2 = 0.84. The Consistency × Validity interaction was also significant, F(4, 60) = 8.38, p < .001, ηp 2 = 0.36. Bonferroni post hoc tests revealed that the difference of RT between the majority-valid and majority-invalid trials was significant for either the all-consistent, 9vs1, 8vs2, or 7vs3 condition, ps < .006, while that for the 6vs4 condition was nonsignificant, p = .96 (see Fig. 2a). No significant main effect of consistency was found, F(4, 60) = 2.10, p = .09, ηp 2 = 0.12.

Fig. 2
figure 2

Results of Experiment 1. a Reaction times (RTs) for the five consistency conditions. The asterisks represent a significant difference (p < .05) between the majority-valid and majority-invalid conditions. The numbers present RT values of the nearest data point. The error bars present one SEM. b RT differences for the five consistency conditions. The asterisks represent a significant difference (p < .05) between two corresponding conditions

Taking the difference of RT between the majority-valid and majority-invalid trials in each consistency condition as the dependent variable, we then performed t tests for RT differences between two of each consistency condition (see Fig. 2b), for investigating the interaction effect in more detail. The results indicated that the RT difference in the all-consistent condition was larger than those in the 9vs1 condition, t(15) = 2.37, p = .031, Cohen’s d = 0.64; 8vs2, t(15) = 3.48, p = .003, Cohen’s d = 1.18; 7vs3, t(15) = 4.46, p < .001, Cohen’s d = 1.34; and 6vs4, t(15) = 4.02, p = .001, Cohen’s d = 1.71. Furthermore, there was a larger RT difference in the 9vs1 condition than in the 7vs3 condition, t(15) = 2.14, p = .049, Cohen’s d = 0.60, and the 6vs4 condition, t(15) = 2.45, p = .027, Cohen’s d = 1.06, as well as a larger RT difference in the 8vs2 condition than in the 6vs4 condition, t(15) = 3.22, p = .006, Cohen’s d = 0.97. No significant difference was found between the remaining conditions, ps > .11.

Discussion

The results implied that when faced with two subgroups of diverging gaze cues, participants follow the majority’s gaze orientation, and the superiority of the cueing strength of the majority group would monotonically increase as the size difference between two subgroups enlarges.

Despite that, there’s one important reason to be skeptical about this interpretation. Namely, the head orientation of the avatars might prime a corresponding left versus right response rather than causing a shift of attention to the cued location, since the response property of the target (left vs. right) was identical to the cue. In other words, a corroborating measure might be helpful to clarify at what level of processing the observed biases are occurring.

Second concern for this interpretation is that, as participants were instructed to maintain central fixation, they might only select and process the central avatar in each trial, so that the different probability levels of this central face as a member of the majority or minority should be reflected in the different RT differences between majority valid and majority invalid. We improved the experimental design in Experiment 2 to solve this issue.

Experiment 2

Instead of the localization task in Experiment 1, we used a target identification task in Experiment 2 to confirm this attentional effect, because head orientation would not prime the response.

To investigate whether the central faces dominated the processing of crowd gaze following in Experiment 1, we focused on the pair of conditions whose RT differences significantly differed, such as all-consistent and 9vs1 conditions. In Experiment 2, the central face in the 9vs1 condition was always arranged as a member of the majority, which is the same as the situation of central face in the all-consistent condition. If participants only encode the central face, the patterns of RT difference in the all-consistent and 9vs1 conditions should be the same. In addition, the condition where RT was faster in majority valid than in majority invalid, such as 8vs2, might provide further evidence after rearranging their central faces. Thus, we set the central faces in the 8vs2 condition to be equally possible as a member of the majority and minority. In this case, mean RT for majority-valid trials should be the same as that for majority-invalid trials if the central face is solely processed; otherwise, RT difference in the 8vs2 condition should remain different, as in Experiment 1. To improve the reliability of this further analysis, more trials were adopted in Experiment 2.

Method

Participants

We followed the same data collection rule as in Experiment 1. Two suspicious participants were excluded from analyses (they reported to be aware of our experimental objective after the experiment), leaving 14 participants (seven females; mean age 21.0 ± 1.67 years). None had a history of neurological problems, and all had normal or corrected-to-normal vision. The participants provided their written informed consent before the experiment, and all experimental procedures were approved by the Research Ethics Board of Zhejiang University.

Stimuli

Avatars were the same as those in Experiment 1. The arrangements of 10 avatars in the 9vs1, 8vs2, and 6vs4 conditions had minor changes according to corresponding experiment designs (see Procedure and Design section).

A capitalized letter L or T, subtending 0.4° of visual angle and centered 13° to the left or right of a central fixation cross, was set as the target.

Procedure and design

Most of the procedure was the same as in Experiment 1, except for the target. Participants were asked to indicate whether the target was L or T by pressing F or J on the keyboard.

Consistency (with levels of all-consistent, 9vs1, 8vs2, 7vs3, 6vs4, and 5vs5) and validity (with levels of majority valid and majority invalid) were adopted as two within-subjects factors in Experiment 2. For each of the six consistency conditions, participant completed 72 trials, which were further divided evenly into the two validity conditions (i.e., 36 trials in both majority-valid and majority-invalid conditions), resulting in a total of 432 randomly presented trials. In the 9vs1 condition, the central face (the middle face in the second row) was always a member of the majority, and in the 8vs2 condition, the central face had equal probability as a member of the majority and minority. In the other conditions, the probabilities of the central face being a member of the majority and minority were pseudorandom without prior manipulation. After all trials were finished, participants were asked to complete a survey, which was the same as that in Experiment 1.

Results

The mean accuracy for all participants was 96%, implying a ceiling effect in accuracy, as in Experiment 1.

Concerning the RTs, all results with RTs either lower than 100 ms or exceeding three standard deviations from the mean were excluded from further RT analyses, with 95.19% of trials remaining. A same result pattern as in Experiment 1 was found. First, there was no significant difference between the left and right visual field in the 5vs5 condition, M left-visual-field = 688 ms, M right-visual-field = 690 ms, t(13) = 0.23, p = .82, Cohen’s d = 0.02, suggesting that participants showed evenly distributed attention due to the equal number of avatars in each orientation subgroup. Second, the two-way ANOVA for RT in other conditions revealed a significant main effect of validity, M majority-valid = 662 ms, M majority-invalid = 713 ms, F(1, 13) = 49.60, p < .001, ηp 2 = 0.79, and a significant Condition × Validity interaction, F(4, 52) = 16.50, p < .001, ηp 2 = 0.56. Bonferroni post hoc tests revealed that the difference of RT between the majority-valid and majority-invalid trials was significant for either the all-consistent, 9vs1, 8vs2, or 7vs3 conditions, ps < .04, while that for the 6vs4 condition was nonsignificant, p = .40 (see Fig. 3a). No significant main effect of consistency was found, F(4, 52) = 0.36, p = .84, ηp 2 = 0.03.

Fig. 3
figure 3

Results of Experiment 2. a Reaction times (RTs) for the five consistency conditions. The asterisks represent a significant difference (p < .05) between the majority-valid and majority-invalid conditions. The numbers represent RT values of the nearest data point. The error bars represent one SEM. b RT differences for the five consistency conditions. The asterisks represent a significant difference (p < .05) between two corresponding conditions

Again, taken the difference of RT between the majority-valid and majority-invalid trials in each consistency condition as the dependent variable, we then performed t tests for RT differences between each of the consistency conditions (see Fig. 3b). The RT difference in the all-consistent condition was larger than those in the 9vs1 condition, t (13) = 5.02, p < .001, Cohen’s d = 1.27; 8vs2, t(13) = 6.22, p < .001, Cohen’s d = 1.54; 7vs3, t(13) = 5.10, p < .001, Cohen’s d = 1.73; and 6vs4, t(13) = 7.45, p < .001, Cohen’s d = 2.75. Furthermore, RT difference in the 6vs4 condition was significantly smaller than in the 9vs1 condition, t (13) = 4.47, p = .001, Cohen’s d = 1.61; 8vs2, t(13) = 3.27, p = .006, Cohen’s d = 1.24; and 7vs3, t(13) = 2.27, p = .04, Cohen’s d = 0.88. No significant difference was found between the remaining conditions, ps > .16.

As predicted, the overall RT in the 8vs2 condition for majority-valid trials was faster than that for majority-invalid trials. Since the central gazes were equally possible as a member of the majority and minority in the 8vs2 condition, we further separately analyzed the trials where the central gaze was consistent or inconsistent to the majority. Two central-majority consistency subconditions were generated accordingly: central-as-majority condition (the central face orientation is consistent with the majority) and central-as-minority condition (the central face orientation is inconsistent with the majority). We conducted 2 (central-majority consistency) × 2 (validity) ANOVA for RT in the 8vs2 condition and found a main effect for validity, F(1, 13) = 13.28, p = .003, ηp 2 = 0.51, just as that we have observed in Fig. 3a, but either the main effect for central-majority consistency, F(1, 13) = 0.14, p = .72, ηp 2 = 0.01, or their interaction, F(1, 13) = 1.71, p = .21, ηp 2 = 0.12, was nonsignificant (see Fig. 4).

Fig. 4
figure 4

Results in the 8vs2 condition of Experiment 2. The asterisks represent a significant difference (p < .05) between the majority-valid and majority-invalid conditions. The error bars represent one SEM

Discussion

Using an identification task, we replicated the results in Experiment 1, confirming that by shifts of attention, RTs were facilitated to probes appearing at the location to which the majority of faces were looking.

More importantly, RT difference between the majority-valid and majority-invalid trials was larger in the all-consistent than in the 9vs1 conditions, even though central face was always as a member of the majority in both conditions. RT difference in the 8vs2 condition, where central faces had equal probability as a member of the majority and minority, was also found apparent. Moreover, further detailed analyses to split trials in the 8vs2 condition demonstrated that participants would follow the gaze orientation of the majority, even the central gaze oriented to the opposite direction, implying that central gaze was not relied upon exclusively. These comparison results also indicated that the effects for the majority-valid condition in 8vs2 were not entirely pulled by the 50% of trials where the central gaze was consistent with the majority. Hence, it’s clear that participants did not process only the central avatar.

General discussion

In this study, we explored the gaze following effect in scenes with conflicting gazes and obtained a rich set of results. First, when all gazes had the same orientation, individuals tended to follow these gazes (and thereby shift their attention to the corresponding location), which in turn led them to respond faster to a test stimulus presented on the cued side than on the other side. Second, when a marked difference existed between the numbers of gazes with divergent orientations, individuals would automatically follow the majority’s gaze orientation; in contrast, when the number of gazes with divergent orientations was the same or only slightly different, the gaze following effect vanished. Third, the strongest gaze cue effect occurred when all gazes shared the same orientation, and the response superiority of the majority’s oriented location monotonically diminished with the number of gazes with divergent orientations.

The majority rule of gaze following in conflict gaze cues

Our results showed that people follow the majority’s gaze orientation when faced with a crowd of diverging gaze cues. This phenomenon corresponds to the majority rule that applies to group decision making (Kerr & Tindale, 2004; Hastie & Kameda, 2005). The majority rule—wherein groups make decisions based on the agreement of the majority of members—has been widely used across the full spectrum of human groups, from the Stone Age (Boehm, 1996; Boyd & Richerson, 1985) to the modern era (Mueller, 1989; Sorkin, West, & Robinson, 1998). A similar rule has been demonstrated in laboratory studies with highly controlled environments and parameters. For instance, when participants were asked to make yes-or-no decisions in a visual detection task, groups with a simple majority performed better in terms of detection sensitivity (Sorkin, Hays, & West, 2001). Arguably, the greatest benefit of the majority rule is that it allows for good decision-making performance for comparatively little cognitive effort (Kameda & Hastie, 1999).

In the current study, the majority’s gaze orientation did not predict the location of the target during the experiments, and the participants were instructed to ignore the avatars. Despite this, the gaze following effect still occurred, which implies that it was an automatic process. This suggests that the majority rule of gaze following may be an evolutionary adaptation, and beneficial from two perspectives. First, one of the instincts of social animals is striving for the unity and stability of the group. The majority rule is one such strategy by which self-interested behaviors can be constrained in complex decision-making scenarios (Henrich & Boyd, 1998; Kameda, Takezawa, Tindale, & Smith, 2002), as this helps in maintaining an adaptive advantage and achieving optimal species fitness (Conradt & Roper, 2009; Sumpter & Pratt, 2009).

Second, in general, information from the majority is rather reliable. For instance, during foraging or group migration (Galef & Giraldeau, 2001; Griffin 2004), conspecifics’ gazes can be considered an initial alert toward a target. Thus, when a conflict of gaze orientation arises within the group, the majority’s gaze orientation has a rather high likelihood of indicating the correct location of a food source or safe destination. To ensure that a consensus is reached regarding selection of a target, natural selection may have forced species to evolve strategies of following the majority’s gaze; otherwise, the group might be misdirected, thereby leading to a loss of adaptive advantage. Indeed, there is evidence for a similar majority-rule gaze following effect among nonhuman animal groups, such as birds or fish; it is believed that this following allows for resolution of conflicts and making collective decisions (Conradt & Roper, 2003; Couzin et al., 2005; Seeley & Buhrman, 1999).

Taken together, this majority rule in the visual system might be shaped by evolutionary contingencies.

The strength of gaze cues is modulated by the difference in orientation subgroup size

The results indicated that the size of the gap in the numbers of the majority and minority group members influenced the cueing strength of the majority group. One intriguing aspect concerns the monotonic nature of this relationship. In other words, we could reject the “all or none” hypothesis mentioned above. More specifically, as the gap in the number of members between the two subgroups widened, there was a concomitant increase in the difference in RTs, which corresponded to a greater superiority of the cueing strength of the majority group.

One may argue that participants might do not have time to process all 10 gazes, and instead, they use a sampling strategy in which a single or several avatars were selected in each trial. Because of the probabilistic distribution of those 10 gaze orientations, the same pattern of results, which roughly matches the gaze distribution of the avatars, would be produced after all trials were finished.

This alternative possibility could be largely ruled out based on two reasons. First, human’s visual system has sufficient capability to receive and process all gazes in a group. Summary statistical perception studies found that a human is equally rapid and precise to process a single stimulus and multiple stimuli (e.g., Ariely, 2001; Haberman & Whitney, 2009). Findings in gaze perception from Sweeny and Whitney (2014) also confirmed the simultaneous processing ability of multiple gazes. Second, participants were asked to maintain central fixation during the experiment, and all of them reported that they followed this instruction in the survey after the experiment.Footnote 1 As discussed in Experiment 1, if participants indeed adopted this sampling strategy, the gaze orientation of the central face was the most likely to be sampled and encoded, so that the different probability levels of this central face as a member of the majority or minority should be reflected in the different RT differences between majority-valid and majority-invalid trials. We further analyzed the data in Experiment 2 and found that, although the central face was always set as a member of the majority in both the all-consistent and 9vs1 conditions, a gap of RT difference was still apparent between them. Together with a significant cueing effect after rearranging the central face in the 8vs2 trials in Experiment 2, we concluded that participants did not adopt this sampling strategy toward the central face.

The current results can be supported by previous studies at the individual level. In previous gaze cue studies, cueing strength was found to be modulated by social status (Dalmaso, Galfano, Coricelli, & Castelli, 2014; Dalmaso, Pavan, Castelli, & Galfano, 2012), political temperament (Carraro, Dalmaso, Castelli, & Galfano, 2015; Liuzza et al., 2011), facial expression (Lassalle & Itier, 2013; Tipples, 2006), facial dominance (Jones et al., 2010), and even social interaction history (Capozzi, Becchio, Willemse, & Bayliss, 2016; Dalmaso, Edwards, & Bayliss, 2016). Take, for example, social status. Previous studies have shown that high-status individuals, who might be considered as more relevant or reliable sources of information, had a stronger cueing strength and further resulted in a greater gaze cueing effect than did low-status individuals (Dalmaso et al., 2012). Analogously, in our study, it might have been adaptively appropriate for participants to orient themselves relatively quickly to objects that have captured the majority of gazes in a scene, because a growing number of gazes with identical gaze orientations may suggest increasing gaze cueing strength and group reliability to predict the existence of a target, and thus further raise likelihood of others’ following behavior. This monotonic tendency is supported by the results of Milgram and his colleagues’ field study (1969) described in the Introduction.

Nevertheless, choosing between the majority and minority may not be inevitable. In the 6vs4 condition, there was no selective gaze following response, despite the fact that these two subgroups had differing numbers of members. This suggests that the majority rule of gaze following only applies if the size difference between the two subgroups reaches a certain level. In this sense, there may be a quorum-like relationship in the proportion of attention distribution and the size difference between subgroups (Franks, Dornhaus, Fitzsimmons, & Stevens, 2003; Ward, Sumpter, Couzin, Hart, & Krause, 2008). Animals typically change their own behavior only when the number of individuals performing that same behavior reaches a threshold (i.e., the quorum); otherwise, others’ behavior is disregarded (Pratt, 2005). From an adaptive perspective, a quorum response could lower the possibility of nonadaptive behavior spreading, and may therefore maintain the group’s stability (Sumpter & Pratt, 2009). Nevertheless, the current results merely provide preliminary supportive evidence for the possibility of this quorum; further study is needed to verify it.

Conclusion

By manipulating the size difference between majority and minority groups in an array, we explored how the quantity of conflicting gazes in a crowd influences the gaze-following effect. Our findings suggest that individuals automatically follow the majority’s gaze orientation, and that the response superiority of the majority’s oriented location monotonically diminishes with the size difference between the two subgroups.

Because conflicting multigaze scenes do not always contain two divergent subgroups in the real life, we must consider other possible conflicting scenes to fully understand this issue. Follow-up studies that aim to narrow the gap between picture stimuli and real human groups would also be needed and could be realized using certain technologies, such as virtual reality.