RETRACTED ARTICLE: The plateau in mnemonic resolution across large set sizes indicates discrete resource limits in visual working memory

Anderson, David E.; Awh, Edward

doi:10.3758/s13414-012-0292-1

RETRACTED ARTICLE: The plateau in mnemonic resolution across large set sizes indicates discrete resource limits in visual working memory

Published: 04 April 2012

Volume 74, pages 891–910, (2012)
Cite this article

Download PDF

Attention, Perception, & Psychophysics Aims and scope Submit manuscript

RETRACTED ARTICLE: The plateau in mnemonic resolution across large set sizes indicates discrete resource limits in visual working memory

Download PDF

David E. Anderson¹ &
Edward Awh¹

2012 Accesses
28 Citations
7 Altmetric
1 Mention
Explore all metrics

This article was retracted on 18 August 2015

Abstract

The precision of visual working memory (WM) representations declines monotonically with increasing storage load. Two distinct models of WM capacity predict different shapes for this precision-by-set-size function. Flexible-resource models, which assert a continuous allocation of resources across an unlimited number of items, predict a monotonic decline in precision across a large range of set sizes. Conversely, discrete-resource models, which assert a relatively small item limit for WM storage, predict that precision will plateau once this item limit is exceeded. Recent work has demonstrated such a plateau in mnemonic precision. Moreover, the set size at which mnemonic precision reached asymptote has been strongly predicted by estimated item limits in WM. In the present work, we extend this evidence in three ways. First, we show that this empirical pattern generalizes beyond orientation memory to color memory. Second, we rule out encoding limits as the source of discrete limits by demonstrating equivalent performance across simultaneous and sequential presentations of the memoranda. Finally, we demonstrate that the analytic approach commonly used to estimate precision yields flawed parameter estimates when the range of stimulus space is narrowed (e.g., a 180º rather than a 360º orientation space) and typical numbers of observations are collected. Such errors in parameter estimation reconcile an apparent conflict between our findings and others based on different stimuli. These findings provide further support for discrete-resource models of WM capacity.

Terms of the debate on the format and structure of visual memory

Article 04 June 2014

Strategic allocation of working memory resource

Article Open access 01 November 2018

Allocation of resources in working memory: Theoretical and empirical implications for visual search

Article Open access 17 March 2021

Working memory (WM) is a limited-capacity system that enables the active storage of representations in an online state. Two broad classes of models have been proposed to characterize the nature of capacity limits in WM. Discrete-resource models assert a fixed item limit for storage in working memory, such that once a relatively small number of items have been stored, no further information can be stored from additional items (Cowan, 2001; Zhang & Luck, 2008). By contrast, flexible-resource models suggest that WM can store an unlimited number of items, although the fidelity of each stored representation declines as the number of stored items increases (e.g., Bays & Husain, 2008; Wilken & Ma, 2004).

Multiple studies have demonstrated an inverse relationship between the number of stored items and the precision of the stored representations (Anderson, Vogel, & Awh, 2011; Barton, Ester, & Awh, 2009; Bays & Husain, 2008; Zhang & Luck, 2008). Although flexible-resource models naturally explain this empirical pattern, it can also be reconciled with discrete-resource models that allow for variations in mnemonic precision within subspan displays. For example, Barton et al. proposed a hybrid model in which a discrete number of “slots” constrain the allocation of a separate resource that determines mnemonic precision. Critically, this version of the discrete-resource model predicts that declines in mnemonic precision should cease once the number of to-be-stored items exceeds the putative item limit, because such supraspan displays do not lead to the storage of additional items. By contrast, flexible-resource models predict that monotonic declines in precision should be observable across much larger set sizes. Zhang and Luck disconfirmed the latter prediction by demonstrating equivalent mnemonic precision for set sizes that exceeded estimated item limits. Extending this observation, Anderson et al. showed that individual differences in the number of items that could be stored—estimated using both behavioral performance and storage-related neural activity—strongly predicted the set size at which WM precision reached a plateau for each observer. Thus, the resolution-by-set-size function in visual WM strengthens the case for discrete item limits in WM storage.

The goal of the present work was to replicate and expand the boundary conditions of this basic empirical pattern and to address some key alternative accounts of the Anderson et al. (2011) findings. In Experiment 1, we demonstrate that the original results with orientation are replicated with color memoranda. In Experiment 2, we rule out limitations in the encoding of simultaneous stimulus displays as the source of the apparent item limits in memory storage. To test whether encoding is a limiting factor, we examined the precision-by-set-size function in both simultaneous and sequentially presented displays. The rationale was that if the plateau in mnemonic precision observed by Anderson et al. was due to limits in the number of stimuli that could be encoded simultaneously, this plateau should shift to larger set sizes when the stimuli were presented sequentially and the simultaneous encoding demands were cut in half. As will be shown, the shapes of the precision-by-set-size functions in the simultaneous and sequential conditions were identical, thereby ruling out encoding limits as the source of the plateau in the precision-by-set-size function. Finally, in Experiment 3 we used a combination of behavioral data and simulations to show that the approach that we have used to estimate item limits and precision (Zhang & Luck, 2008) leads to systematic errors in parameter estimates under specific stimulus conditions. Specifically, when typical numbers of observations are collected, constricting the range of stimulus values (from 360º to 180º) leads to inaccurate parameter estimates and changes the shape of the precision-by-set-size function from a bilinear to a logarithmic shape. This is a crucial aspect of the results, because the bilinear shape implies a discrete item limit, while the logarithmic shape implies a continuous resource that is divided across much larger numbers of items. Nevertheless, our finding that the logarithmic function is an artifact of flawed parameter estimates provides a clear explanation of the discrepancy between our findings and those from studies using 180º stimuli (Gorgoraptis, Catalao, Bays, & Husain, 2011; Ma & Chou, 2010; Rademaker & Tong, 2010). Thus, the present work strengthens the case for discrete resource limits in visual WM.

Experiment 1

To test whether the previously reported relationship between asymptotes in precision and WM capacity generalizes across other stimulus dimensions, we replicated Anderson et al. (2011) with a color recall task (Wilken & Ma, 2004; Zhang & Luck, 2008).

Method

Subjects

A total of 22 undergraduates at the University of Oregon completed the experiment for course credit. All had normal or corrected-to-normal visual acuity and gave informed consent according to procedures approved by the University of Oregon institutional review board.

Stimulus displays

The stimuli were generated in MATLAB using the Psychophysics Toolbox extension (Brainard, 1997; Pelli, 1997) and were presented on a 17-in. flat CRT computer screen (refresh rate of 120 Hz). Viewing distances were approximately 77 cm.

Our tasks required participants to remember the color of a set of solid discs. Colors were randomly selected from a color wheel consisting of 180 color values that were evenly distributed in CIE L × a × b color space and centered at the color (L = 70, a = 20, b = 38); see Fig. 1. All objects had radii of 0.93º of visual angle.

In Experiment 1, objects were presented within a square region subtending 10.7º × 10.7º of visual angle, and subjects fixated on a central fixation point that subtended 0.37º × 0.37º. The objects were positioned randomly, with the constraint that no two objects could fall within 1.43º of one another, resulting in a between-object separation of at least two-thirds of an object. At the end of each trial, subjects were cued to recall the color of a single item. A specific object was probed by outlining its position with a thick, white ring with a radius of 0.93º and a rim thickness of 0.37º (see Fig. 1).

Procedures

Experiment 1 took approximately 1.5 h to complete and consisted of 12 blocks of 60 trials each. The events in a single trial went as follows. First, subjects saw a central fixation point, followed by the presentation of one, two, three, four, five, or six colored discs for 200 ms. A 1,000-ms delay period followed the offset of the discs. Following the delay, a probe ring appeared in the position of a randomly selected disc. This probe was presented for 500 ms, and then a color wheel was presented. This procedure was implemented to reduce any masking caused by the simultaneous presentation of the memory probe and color wheel. The color wheel, which had a radius of 9º and a thickness of 0.74º, consisted of 180 color values that were evenly distributed along the perimeter of the wheel in CIE L × a × b color space and centered at the color (L = 70, a = 20, b = 38). Subjects clicked on the perimeter of this color wheel in an unspeeded response to indicate the color of the sample item that had appeared in the same position as the probe. Each response was followed by a 750-ms blank intertrial interval.

Modeling response error distributions

Offset values were defined by the difference between a subject’s response and the angle of the correct color of the probed sample stimulus (ranging from –180º to 180º). Frequency histograms of response offsets for each set size were analyzed to determine the probability of storage and mnemonic precision at each set size.

Maximum-likelihood estimation was used to fit the distribution of the response offsets. Three parameters were estimated: μ, the mean of a von Mises distribution corresponding to trials on which the subject had selected the target location; SD, the width of the same von Mises distribution (used to operationalize mnemonic precision); and p(failure), denoted P _f. The latter parameter corresponds to the height of a uniform distribution, corresponding to trials on which subjects failed to store the probed item. P _mem refers simply to the probability that the critical item was stored (1 – P _f).

Results and discussion

The discrete-resource and flexible-resource models were fitted to the aggregate distributions of response offsets for each set size (Fig. 2, top panel). To compare the relative fits of the discrete- and flexible-resource models, we computed the adjusted R ² statistic on the basis of histograms of the data with 15 bins, each 24º wide. The adjusted R ² statistic reflects the proportion of variance explained by a model weighted by its number of parameters. Models with a greater number of parameters are penalized relative to models with fewer parameters. This statistic ensures a fair comparison between the three-component discrete-resource model and the two-component flexible-resource model. The mixture model representing the predictions of the discrete-resource model (red) was more effective than the flexible-resource model (green) in explaining variance in the response distributions for each set size [discrete-resource R ² values: .99 (set size 1 [SS1]), .99 (SS2), .99 (SS3), .99 (SS4), .99 (SS5), .97 (SS6); flexible-resource R ² values: .88 (SS1), .87 (SS2), .83 (SS3), .79 (SS4), .81 (SS5), .79 (SS6)]. Moreover, Kolmogorov–Smirnov tests revealed a significant difference between the predicted values of the flexible-resource model and the actual distributions at every set size, whereas the test revealed no difference between the predictions of the discrete-resource model and their fits to each distribution.

We also assessed whether the discrete- and flexible-resource models could account for the distribution of response errors in the individual subject data. Here again, the discrete-resource model was more effective on average than the flexible-resource model in characterizing the response distributions and in explaining distribution variance for each subject at every set size. Dependent-samples t tests revealed a significant advantage for the discrete-resource model in explaining variance (R ²) in the distributions for all set sizes (p < .001) except set size 1 (p = .4). To summarize, at both the group and individual-subject levels, the discrete-resource model was superior to the flexible-resource model in its ability to account for the distributions of response offsets. This replicates previous findings (Anderson et al., 2011; Zhang & Luck, 2008) and suggests that observers could store a subset of the items in the sample array while maintaining no information about the remaining items.

We first examined the probability of storing the probed item (P _mem) across set sizes (SS1 = .98, SS2 = .93, SS3 = .82, SS4 = .63, SS5 = .49, SS6 = .38), which revealed a significant effect of set size [F(5, 16) = 41.25, p < .001]. The product of P _mem and set size provides an estimate of the number of items stored; this estimate rose from set sizes 1–3 and reached a statistical plateau for subsequent set sizes [set size 3 (M = 2.46) vs. 4 (M = 2.52), t(21) = –0.45, p = .6; set size 4 vs. 5 (M = 2.46), t(21) = 0.39, p = .69; set size 5 vs. 6 (M = 2.30), t(21) = 1.2, p = .25]. This result notwithstanding, we note that past studies have often documented declining capacity estimates as set sizes move farther past putative item limits. For example, Cusack, Lehmann, Veldsman, and Mitchell (2009) found lower capacity estimates for set size 8 than for set size 4; moreover, they found that the size of the drop from set sizes 4 to 8 was a good predictor of fluid intelligence. Indeed, in our prior work (Anderson et al., 2011), we also observed declining capacity estimates from set sizes 3 to 8. Thus, the stable plateau in capacity estimates in the present experiment is somewhat anomalous when the extant data are considered. Although it may seem that discrete-resource models predict stable capacity limits across large set sizes, this view depends on the strong assumption that the number of items available for report depends only on the total amount of “space” in working memory. However, a growing literature suggests that limits in the number of items stored may instead result from limitations in the observers’ ability to filter out irrelevant from relevant items during encoding (Engle, 2002; Fukuda & Vogel, 2009). This perspective provides a plausible explanation of why capacity estimates may decline at larger set sizes, because it may be harder to select a manageable subset of items to be stored when putative item limits are exceeded. One possible reason for this is that supraspan displays make it harder for observers to individuate the to-be-stored items from those that exceed the storage capacity (e.g., Ester, Vogel, & Awh, 2012). Clearly, more work is needed to test these hypotheses. For now, we note that discrete-resource models do not entail the prediction that capacity estimates remain the same across large set sizes.

Recently, Bays, Catalao, and Husain (2009) suggested an alternative explanation of the mixture distributions first reported by Zhang and Luck (2008). Bays et al. (2009) questioned whether the flat component of this mixture distribution was really caused by trials in which no information had been retained about the probed item. Instead, Bays et al. (2009) argued that the flat distribution might be the result of trials on which observers mislocalized the probed object and reported the value of a different item in the display. Because the color for each item varied randomly with respect to every other item, this kind of mislocalization would yield a random (i.e., flat) distribution of response errors relative to the probed item. Indeed, Bays et al. (2009) found that a substantial proportion of responses in their study could be attributed to the erroneous report of nontarget values. To test this possibility in our own data, we calculated the response offset between each response and each of the nontarget values from the same trial. If subjects had consistently reported distractor values during the experiment, this analysis should reveal a central tendency in the response error histogram, showing that the subjects’ reported nontarget values occurred at greater-than-chance levels. Figure 2 (bottom panel) shows the aggregate distribution of distractor response values for each set size, which reveals no evidence of a central tendency in these error plots, suggesting that observers did not consistently report distractor values in this experiment. Specifically, a swapping analysis in which we determined the proportion of trials on which observers erroneously reported a distractor value (Bays et al., 2009) revealed relatively small proportions of swapping errors across set sizes (SS2 = .01, SS3 = .03, SS4 = .05, SS5 = .07, SS6 = .06). We note an apparent depression in the error histograms that depict response offsets relative to distractor values, but this drop is not statistically reliable; the frequency of reporting a distractor (i.e., with a distractor offset of 0) is not significantly different than the mean frequency across all distractor offset bins (p > .27). To conclude, this analysis suggests that the uniform component of the target distribution is not due to mislocalizations, but rather to a failure to store the probed item.

As in Anderson, Vogel, and Awh (2011), we found that mnemonic precision monotonically declined up to set size 3 [set size 1 (M = 11.6) vs. 2 (M = 13.7), t(21) = –7.58, p < .001; set size 2 vs. 3 (M = 16.2), t(21) = –2.42, p < .05] and then reached an apparent asymptote after set size 3 [set size 3 vs. 4 (M = 15.5), t(21) = 0.539, p = .60; set size 4 vs. 5 (M = 17.4), t(21) = –1.15, p = .26; set size 5 vs. 6 (M = 17.9), t(21) = –0.367, p = .72]. The aggregate precision-by-set-size function was fitted with a bilinear function to calculate the estimated point of asymptote, which was 3.32 items (Fig. 3a; R ² = .94, p < .01). This is consistent with previous results demonstrating an asymptote in precision at approximately three items (Anderson et al., 2011; Zhang & Luck, 2008).

Following the approach of Anderson et al. (2011), we fitted each individual observer’s precision function with a bilinear function to calculate individual asymptotes in precision. This analytic approach yielded good fits of the data with the bilinear function (average r = .59), and the average point of asymptote was 3.20 items. These fits can be compared with those of a logarithmic function, which is the shape of the precision-by-set-size functions predicted by the flexible-resource model (Bays & Husain, 2008). When the same precision functions were fitted with a logarithmic function, we observed a significantly worse fit than when fitting with a bilinear function [average rs = .45 for logarithmic, .59 for bilinear; t(21) = 2.66, p < .05]. Thus, the shapes of precision-by-set-size functions were better fit by the predictions of a discrete-resource model rather than a flexible-resource model. Moreover, we found a strong relationship between inflections in the bilinear precision function and an independent capacity estimate (R ² = .43, p < .001; Fig. 3b) obtained from a separate color experiment within the same session. A separate measure of P _mem was employed to avoid reporting relationships between nonindependent measures of SD and P _mem from the same data set (Brady, Fougnie, & Alvarez, 2011). Indeed, the strength of this relationship is artificially increased when examining this relationship between nonindependent measures, such as P _mem for set size 6 inclusive in the precision-by-set-size function (R ² = .72). The true correlation (R ² = .43), however, still affirmed the clear prediction of discrete-resource models by showing that putative item limits within each observer robustly predict the shape of the precision-by-set-size function. We note that this correlation is also inconsistent with a mislocalization explanation of the flat component of the mixture distribution (Bays et al., 2009), because there is no reason why the probability of mislocalizations should correlate with the inflection point of the precision-by-set-size function. Thus, Experiment 1 demonstrates that the basic empirical pattern from Anderson et al. generalizes across both orientation and color memoranda. Working memory for both orientations and colors is subject to discrete item limits.

Experiment 2

The purpose of Experiment 2 was to test the alternative hypothesis that asymptotes in precision (Anderson et al., 2011; Zhang & Luck, 2008) are a consequence of encoding limits, rather than storage limits. If observers were unable to encode all of the sample stimuli simultaneously, then apparent item limits in those experiments might be explained without recourse to limits in memory storage per se. To address this hypothesis, we measured mnemonic precision for memoranda that were presented simultaneously (in one display) or sequentially (across two displays). Previous work had used longer memory durations, rather than sequential designs, to address potential encoding limitations in a WM task (Bays, Gorgoraptis, Wee, Marshall, & Husain, 2011). Increasing exposure durations, however, is known to enhance long-term memory for memoranda (e.g., Glanzer & Cunitz, 1966), and this could yield inflated estimates of online storage capacity. In addition, large increases in exposure duration would also be conducive to verbal labeling or other recoding strategies. To avoid such methodological pitfalls, we chose to present the memoranda across two sequential displays, so that the individual display durations would remain constant while encoding time per item would double. If the reported asymptotes in mnemonic precision are a consequence of encoding limits, precision should plateau at a larger set size in the sequential condition.