The mental lexicon, defined by Jackendoff (2002) as the store of words in long-term memory from which the grammar constructs phrases and sentences, contains information such as part of speech (house is a noun), denotation (a dog is an animal), pronunciation (balloon is pronounced bə-lōōn’), affective meaning (cake is something I like), and so forth. When studying aspects of word meaning, the mental lexicon is sometimes portrayed as a semantic network, in which nodes correspond to words and connections indicate a meaningful relation between them (Collins & Loftus, 1975; Collins & Quillian, 1969).

While connections between concepts often reflect semantic relationships (e.g., synonymy, hyponomy, meronomy; Murphy, 2003), research suggests that the properties of a word itself correlate with connectivity as well. In particular, a small corpus of studies indicates that the probability that two words are connected correlates with the presence of similar lexical or psychological properties. In network terms, this tendency for connected nodes to exhibit similar covariates is called assortativity or assortative mixing Footnote 1 (Newman, 2010; Vitevitch, 2008; Vitevitch, Chan, & Goldstein, 2014).

To study assortative mixing, word association data are often used. In a word association task, the probability of producing a certain response to a cue is a measure of the associative strength between the cue and response in the lexicon (De Deyne, Navarro, & Storms, 2015; Nelson, McEvoy, & Schreiber, 2004). As such, a cue-response correspondence on some factor would be indicative of that factor displaying assortative mixing in the mental lexicon. Using this approach, word association research has identified several factors that exhibit assortativity, that is, several properties that tend to be shared between connected concepts.

First, there is evidence for assortative mixing by syntax: in a word association task, cues tend to elicit responses with the same syntactic properties (Cramer, 1968; Deese, 1962, 1966). These results are corroborated by the finding that processing an utterance with a specific syntactic form facilitates processing utterances with a similar syntax (a phenomenon named syntactic priming; Bock, 1986; Pickering & Branigan, 1998, 1999), by the finding that word selection errors frequently preserve part of speech (Hotopf, 1980), and by noun- or verb-specific deficits in patient studies (Mätzig, Druks, Masterson, & Vigliocco, 2009).

There also is evidence that valence (i.e., how positive a word is considered, cfr. Osgood, Suci, & Tannenbaum, 1957) exhibits assortativity, as research shows a positive cue-response correlation on this dimension (Cramer, 1968; Pollio, 1964; Staats & Staats, 1959) and activation of a specific evaluative attitude (e.g., good) facilitates processing of information that shares that evaluation (a concept called affective priming; see Klauer, 1997, for an overview).

Similarly, word association studies show evidence for assortative mixing by dominance (whether a word refers to a strong or dominant concept, e.g., power) and arousal (whether a word refers to an active or aroused concept, e.g., explosion), again evidenced by positive cue-response correlations on these aspects (Pollio, 1964; Staats & Staats, 1959).

Finally, research on concreteness (the extent to which words are imageable, i.e., refer to something perceptible) suggests this factor may exhibit assortativity as well, as processing a concept with a specific degree of concreteness facilitates processing of concepts with similar imageability (Bleasdale, 1987).

Research on the structure of the mental lexicon has not been limited to assessments of assortativity. A separate line of inquiry has focused on uncovering which word properties contribute to the overall number of connections a word has, that is, what aspects determine which nodes are highly connected or central in the mental lexicon, and which are not. Some of this research examined the same word properties described above—observing, for example, that words with a high valence show increased connectivity (Cramer, 1968; Johnson & Lim, 1964; Matlin & Stang, 1978; Pollio, 1964), as do highly imageable words (de Groot, 1989). Other researchers investigated the role of statistical word properties that are not related directly to meaning but are inferred from the environment in which a word is acquired. They find that concepts that are learned at a young age show higher network connectivity (Barabási & Albert, 1999; Steyvers & Tenenbaum, 2005) and that a person’s exposure to a particular word is involved as well: words with a high word frequency show higher network connectivity (Steyvers & Tenenbaum, 2005), as do words with a high contextual diversity (the number of different contexts in which a word is seen; Hills, Maouene, Riordan, & Smith, 2010). Clearly, these distributional word properties are linked to the structure of the mental lexicon, yet to our knowledge, no research has assessed whether they exhibit assortative mixing, which considers similarity of connected concepts and is distinct from a relation between these factors and overall connectivity.

Current Study

As indicated above, a number of studies have identified several word covariates that display assortativity in the mental lexicon: part of speech, valence, dominance, arousal, and concreteness. Yet, none of these studies have investigated these factors simultaneously, which makes it very hard to evaluate whether they exert an independent contribution. Potentially, these factors depend on one another; it is conceivable, for example, that after controlling for one factor, the effects of some other factor(s) disappear. In the same vein, the lack of common ground between these studies makes it hard to estimate the relative importance of each factor.

A second problem is that part of the research that looked into these factors made use of very small sample sizes, mostly due to technical limitations of their time, making generalizations towards the entire mental lexicon somewhat unfeasible. For example, the study of Staats & Staats (1959) was based on 10 words, and the study of Pollio (1964) comprised 52 words; these small stimulus sets are likely to misrepresent the variability captured by a combination of the investigated factors.

In this study, we use word association data to investigate the linguistic and subjective factors that underlie the configuration of the mental lexicon by examining the extent to which cue word and their associative responses exhibit similar properties. We investigate part of speech, valence, dominance, arousal, and concreteness—five factors that have been established previously to display assortativity. We also will examine word frequency, contextual diversity, and age-of-acquisition—three aspects that have been found to be involved with the structure of the mental lexicon, but for which assortativity has not yet been assessed.

Our main goal was to (a) establish which of these factors display assortativity in the mental lexicon, (b) investigate their relative contribution, and (c) examine whether these findings uphold for a large variability of cue stimuli.

Method

Materials

Word association corpus

To derive the associative strength for a large set of items, we made use of the Dutch Small World of Words project,Footnote 2 which comprises 3.8 million cue-response pairs (see De Deyne, Navarro, & Storms, 2013, for full details). Briefly, these associations were gathered in response to more than 12,571 cues; each cue was presented to 100 participants, who gave up to three responses to a number of cues in a continued word association task.

Lexical and psycho-affective variables

Three norming databases were used to gather lexical and psycho-affective measures of a large set of words. Word frequency, contextual diversity, and syntactic form (part of speech) for 437,000 Dutch words was obtained from Keuleers, Brysbaert, and New (2010). Word frequency was derived from the raw word count in the subtitles of 8,070 films and television show episodes, contextual diversity was based on the number of films or episodes a word occurred in, and part of speech was estimated using an integrated Dutch morphosyntactic analyzer and part of speech tagger (Tadpole: Van Den Bosch, Busser, Canisius, & Daelemans, 2007).

Age-of-acquisition estimates and concreteness ratings for 30,000 Dutch words were taken from the dataset by Brysbaert, Stevens, De Deyne, Voorspoels, and Storms (2014). Age-of-acquisition was estimated in years, while concreteness was rated on a 5-point Likert scale, where a value of 1 corresponded to “very abstract” and a value of 5 to “very concrete.”

Valence, arousal, and dominance ratings for 4,300 Dutch words were available through Moors et al. (2013). Each dimension was rated on a 7-point Likert scale, where a value of 1 corresponded to “very negative/unpleasant,” “very passive/calm,” and “very weak/submissive,” respectively, and a value of 7 to “very positive/pleasant,” “very active/aroused,” and ”very strong/dominant.” Cues in this database were selected from various sources and consisted of mostly nouns, adjectives, and verbs.

Procedure

Of the 3.8 million cue-response pairs in the Dutch Small World of Words project, 665,461 consist of a cue and response both present in all three norming databases described above. These word pairs contain 4,151 unique words (2,472 nouns, 764 verbs, 814 adjectives, and 101 other words types, based on the dominant syntactical role described by Keuleers, Brysbaert, & New, 2010).

Results

To investigate the extent to which part of speech, valence, arousal, dominance, concreteness, word frequency, contextual diversity, and age-of-acquisition display assortativity in the mental lexicon, we assessed how cues and associative responses correspond on these factors. Our main objective was to inspect correspondence within one dimension—that is, how much of the variance in associative responses’ values on some factor is explained by cue values on that factor. A secondary goal was to examine the extent to which the different factors depend on each other.

To this end, we fitted seven multiple linear regression models, each of which predicts response values on one factor using cue values on all seven measures. The relative contribution of each predictor in the regression model was assessed using the metric lmg in the R package relaimpo (Grömping, 2006), which takes into account predictor collinearity and handles the issue of predictor order by averaging across all possible orders. The resulting R 2 values are described in Table 1.

Table 1 Proportion of variance (> .001) in response values on various psychological and lexical dimensions explained by cue values on those dimensions

For affective dimensions, we find that response values are by far best predicted by cue values on that same measure, as one might expect if these aspects display assortativity. Cues and responses correspond most strongly on valence, with cue valence explaining 31 % of the variance in response valence. We found a smaller but still considerable cue-response correspondence on arousal, dominance, and concreteness, with cue properties explaining between 15 % and 20 % of variance in response values.

We find almost no cue-response correspondence on word frequency and contextual diversity, with cue properties explaining at most 1 % of variance in response values. Lastly, we find a small effect-size of age-of-acquisition, with cue age-of-acquisition explaining 4 % of variance in response age-of-acquisition.

Scatterplots of cue and response values reveal distributions that are somewhat skewed, at least for some of the examined variables (Fig. 1). As such, it is possible that the cue-response correspondence displayed in Table 1 is the result of the distributional properties of the used data, instead of being indicative of assortative mixing. To investigate this alternate explanation, we performed the above regression analysis after permuting the cue-association pairs (so responses are not matched to ”their” cue but to a random cue). This approach yields R 2 values less than .001 for all predictors in all seven models, which indicates that the R 2 values reported in Table 1 are not a result of the properties of the used dataset, but rather indicate that when presented with a cue, people tend to respond with associations of similar valence, arousal, dominance, and concreteness, and to a small extent, age-of-acquisition.Footnote 3

Fig. 1
figure 1

Regression lines and scatterplots (with semitransparent markers) of cue-response correspondence on various psychological and linguistic ratings (n = 665,461)

Finally, to investigate cue-response correspondence on part of speech, we included a part of speech contingency table (Table 2). Overall, 57.50 % of responses match the syntactical role of their corresponding cue. Combining the six smallest categories into one (adverbs, pronouns, prepositions, interjections, determiners, and numerals) allows us to perform a chi-squared test on the contingencies, which indicates that part of speech of responses is significantly related to part of speech of their corresponding cue (χ2 = 82,469, df = 9, p < .001, Cramér’s C = .205).

Table 2 Contingency table denoting part of speech of 654,484 cue-response pairs

Discussion

The present research used word association data to assess the assortativity of various linguistic and psycho-affective factors. Using an approach that allows us to compare the relative importance of each factor, we examined valence, arousal, dominance, concreteness, word frequency, contextual diversity, age-of-acquisition, and part of speech.

In investigating cue-response correspondence on these dimensions, we find a very strong assortative effect of valence. This pivotal role of evaluative attitude is in line with existing word association research; for example, Deese (1966) identified valence as the dominant factor in determining which concepts people consider related, and a study of our own found valence to account for over 83 % of the variance in a spatial representation of the mental lexicon (De Deyne et al., 2013). The vital importance of evaluative attitude is corroborated in other domains as well, such as in word recognition research (Kuperman, Estes, Brysbaert, & Warriner, 2014), categorization tasks (Niedenthal, Halberstadt, & Innes-Ker, 1999), or affective priming (Klauer, 1997).

We also find a high cue-response correspondence on dominance and arousal, again in line with existing research (Pollio, 1964; Staats & Staats, 1959). This seminal role of the affective dimensions valence, dominance, and arousal is in agreement with the traditional view on semantic meaning. In an attempt to quantify connotative meaning, Osgood and colleagues performed a factor analysis on ratings of concepts on a large number of semantic dimensions (Osgood, Suci, & Tannenbaum, 1957). They found that evaluation (valence), potency (dominance), and activity (arousal) are by far the most powerful aspects in differentiating subjective meaning. Moreover, the importance of these dimensions seems to be near universal, as follow-up studies have replicated these results across dozens of cultures (see Heise, 2010, or Osgood, 1975, for an overview).

In examining concreteness, we find that the level of abstractness of cues is highly predictive of that of its corresponding responses, indicating that this factor, too, is involved with the structure of the mental lexicon. Some research on concreteness-based priming reports similar findings (Bleasdale, 1987), although in general, this factor has received little attention in literature on the mental lexicon. Considering the strong effect we report, inclusion of this factor in future research on the structure of the lexicon might be merited.

Overall, we find that all investigated subjective dimensions show a high cue-response correspondence, indicative of assortative mixing. This is clear evidence for the idea that subjective/affective dimensions are involved with the structure of the mental lexicon and likely play an important role in shaping chain of thought overall.

We also examined the role of syntactic information. We found that cues tend to elicit associative responses with similar syntactic properties, in concordance with existing research (see Deese, 1966, for an overview). This effect was highly significant; in fact, we find that more than half of all associations share the part of speech of their corresponding cue, evidence that syntax exhibits network assortativity as well. We also assessed whether the effects of the psycho-affective and statistical word properties that we investigated were mediated by cue part of speech. In comparing results for verb cues, adjective cues, and noun cues, we find some small baseline differences, although all correspondence patterns described above remained true in all three cases.

As described above, existing research also reports evidence for assortative mixing by valence, arousal, dominance, concreteness, and part of speech. Most of these aspects were studied separately; as such, these existing studies cannot rule out the possibility that some of these factors depend on one another. By investigating all aspects simultaneously, we were able to establish that the assortativity effects reported both by us and in this previous literature cannot be explained by any codependence between the different factors; rather, each of these investigated aspects displays assortative mixing independently of any relation to the remaining factors.

A separate concern with existing research on assortativity in the mental lexicon is that these studies often made use of stimulus sets of (very) limited size, making generalizations towards the entire lexicon somewhat unfeasible. The current study employs a much larger dataset, comprising 4,151 unique words (contained in 665,461 word-pairs). With this, we were able to ascertain that the assortativity effects reported in existing research hold up for a large variability of cue stimuli.

Finally, we investigated word frequency, contextual, and age-of-acquisition, three factors that are not related directly to the meaning of concepts, but rather reflect how a word is acquired by a speaker. Existing research reports that these aspects are all involved with connectivity in the mental lexicon: concepts that are learned at a young age show higher connectivity (Barabási & Albert, 1999; Steyvers & Tenenbaum, 2005), as are words with a high word frequency (Steyvers & Tenenbaum, 2005) and words with a high contextual diversity (Hills, Maouene, Riordan, & Smith, 2010). Note that while this indicates that these aspects are involved with the structure of the mental lexicon, we do not necessarily expect them to exhibit assortativity, which considers similarity between connected concepts and is distinct from overall connectivity. Indeed, our results show only a small cue-response correspondence for age-of-acquisition and virtually no correspondence on word frequency and contextual diversity, indicating that these aspects do not display assortativity in the mental lexicon.

From the previous discussion, it should be clear that assortivity describes how the mental lexicon is structured, but in itself does not directly inform us about causality. This raises the question whether factors that display assortativity actually influence response tendencies or whether they simply co-vary with the type of responses made in an association task. In other words, do we produce a negative response to a negative cue because of their congruency in valence, or because they have similar (negative) meanings? It often is assumed that semantic similarity is the strongest determinant of response tendencies (Mollin, 2009), yet this does not necessarily rule out any influence of the psycho-affective properties of a word: these properties could correspond to semantic features, in which case the likelihood that the response depends on similarity to the cue would increase.

An alternative is to consider the word association process as reflecting learned co-occurrences derived from the linguistic environment. In this view, valence assortativity reflects negative or positive words co-occurring in language. The validity of this perspective could be addressed easily by examining assortativity in text corpora and should be part of future investigations. However, we are very cautious at presenting this as a comprehensive explanation, as it has been pointed out on several occasions that by virtue of not being propositional, word associations capture different information than what can be inferred from a linguistic environment that conveys communicative constraints, such as pragmatics (McRae, Khalkhali, & Hare, 2012; Szalay & Deese, 1978; De Deyne, Verheyen, & Storms, 2014).

Assortativity effects have implications for studies in other domains, such as in research on priming. First, assortativity as measured through word associations can be used to predict which factors will exhibit prime-target congruency effects, and which factors do not. For example, our findings are in line with the affective priming effect, where an affectively congruent prime facilitates processing more than an affectively incongruent prime (Fazio, 2001; Klauer, 1997; Spruyt, Hermans, De Houwer, Vandekerckhove, & Eelen, 2007). However, the current findings also point towards the fact that not all types of congruencies are equally strong and that other factors can enhance or diminish these effects. For example, our findings suggest a larger congruency effect for valence than for concreteness; while these factors have been investigated separately in the priming literature, to our knowledge, they have not been compared directly. Moreover, our results also suggest strong effects for part of speech, which suggests that this factor should be controlled for when investigating congruency effects of other factors, such as in affective priming. Conversely, this relation between cue-target assortativity and congruency effects in priming research also might lead to new factors being included in future investigations of assortativity; for example, because a congruency effect of modality has been established in the priming literature (Pecher, Zeelenberg, & Barsalou, 2003), one might expect cue-target pairs to correspond on this dimension, too.

Common to all these cases is the idea that affectivity, modality, and concreteness might be part of a hierarchy of semantic properties, where valence is relevant to most words in the lexicon, while modality (visual, haptic) applies only to a subset of word, and specific semantic properties (e.g., “is an animal”) to even smaller regions of the lexicon.

In summary, the present research investigated the extent to which various word covariates exhibit assortativity in the mental lexicon. We find assortative mixing by valence, dominance, arousal, concreteness, and part of speech, but not by word frequency, contextual diversity, and age-of-acquisition.