Abstract
Usenet newsgroups provide a popular means of scientific communication. We demonstrate striking order in the diversity of biology newsgroups: Submissions to newsgroups obey a form of Zipf's law, a simple power law for the frequency of posts as a function of the rank, by posting, of contributors. We show that a simple stochastic process, due to Günther et al. (1992, 1996), Levitin and Schapiro (1993), and Schapiro (1994), accounts for this pattern and reproduces many of the properties of newsgroups. This model successfully predicts the relative contribution from each poster in terms of the size, the number of posters and total posts, of the newsgroup.
Similar content being viewed by others
References
Baayen, R. H. (2001), Word Frequency Distributions. Kluwer Academic Publishers, Dordrecht, Netherlands.
Bar-ilan, J. (1997), The “mad cow” disease, Usenet newsgroups and bibliometric laws. Scientometrics, 39: 29–55.
David, H. A., Hartley, H. O., Pearson, E. S. (1954), The distribution of the ratio, in a single normal sample, of range to standard deviation. Biometrika, 41: 482–493.
Frontier, S. (1985), Diversity and structure in aquatic ecosystems. Oceanography and Marine Biology: An Annual Review, 23: 253–312.
Günther, R., Levitin, L., Schapiro, B., Wagner, P. (1996), Zipf's law and the effect of ranking on probability distributions. International Journal of Theoretical Physics, 35: 395–417.
Günther, R., Schapiro, B., Wagner, P. (1992), Physical complexity and Zipf's law. International Journal of Theoretical Physics, 31: 525–543.
Hauben, M., Hauben, R. (1997), Netizens: On the History and Impact of Usenet and the Internet. IEEE Computer Society Press, Los Alamitos, California, USA.
Hubbell, S. P. (2001), The Unified Neutral Theory of Biodiversity and Biogeography. Princeton University Press, Princeton, New Jersey, USA.
Huberman, B. A., Pirolli, P. L. T., Pitkow, J. E., Lukose, R. M. (1998), Strong regularities in World Wide Web surfing. Science, 280: 95–97.
Kanji, G. K. (1999), 100 Statistical Tests. Sage Publications, London, UK.
Levitin, L. B., Schapiro, B. (1993), Zipf's law and information complexity in an evolutionary system. Proceedings IEEE International Symposium on Information Theory, 76.
Li, W. (1992), Random texts exhibit Zipf's-law-like word frequency distribution. IEEE Transactions on Information Theory, 38: 1842–1845.
Mandelbrot, B. (1953), An information theory of the statistical structure of language. In: W. E. Jackson (Ed.), Communication Theory, Academic Press, New York, New York, USA, pp. 486–502.
Mandelbrot, B. (1961), On the theory of word frequencies and on related Markovian models of discourse. In: R. Jakobson (Ed.), Structure of Language and its Mathematical Aspects, American Mathematical Society, Providence, Rhode Island, USA, pp. 190–219.
Magurran, A. E. (1988), Ecological Diversity and Its Measurement. Princeton University Press, Princeton, New Jersey, USA.
Marsili, M., Zhang, Y.-C. (1998), Interacting individuals leading to Zipf's law. Physical Review Letters, 80: 2741–2744.
Miller, G. A., Newman, E. B., Friedman, E. A. (1957), Some effects of intermittent silence. American Journal of Psychology, 70: 311–313.
Okuyama, K., Takayasu, M., Takayasu, H. (1999), Zipf's law in income distribution of companies. Physica A, 269: 125–131.
Osborne, L. N. (1998), Topic development in USENET newsgroups. Journal of the American Society for Information Science, 49:1010–1016.
Schapiro, B. (1994), An approach to the physics of complexity. Chaos, Solitons and Fractals, 4: 115–123.
Simon, H. A. (1955), On a class of skew distribution functions. Biometrika, 42: 425–440.
Smith, M. A. (1999), Invisible crowds in cyberspace: mapping the social structure of the Usenet. In: M. A. Smith, P. Kollock (Eds), Communities in Cyberspace, Routledge, London, UK, pp. 195–219.
Tokeshi, M. (1993), Species abundance patterns and community structure. Advances in Ecological Research, 24: 111–186.
Wilson, J. B., Wells, T. C. E., Trueman, I. C., Jones, G., Atkinson, M. D., Crawley, M. J., Dodd, M. E., Silvertown, J. (1996), Are there assembly rules for plant species abundance? An investigation in relation to soil resources and successional trends. Journal of Ecology, 84: 527–538.
Yule, G. U. (1924), A mathematical theory of evolution, based on the conclusions of Dr. J. C. Willis, F. R. S. Philosophical Transactions B, 213: 21.
Zipf, G. K. (1935), The Psycho-Biology of Language. Houghton Mifflin, Boston, Massachusetts, USA.
Zipf, G. K. (1949), Human Behavior and the Principle of Least Effort. Addison-Wesley Publishing Company, Cambridge, Massachusetts, USA.
Author information
Authors and Affiliations
Rights and permissions
About this article
Cite this article
Kot, M., Silverman, E. & Berg, C.A. Zipf's law and the diversity of biology newsgroups. Scientometrics 56, 247–257 (2003). https://doi.org/10.1023/A:1021971212438
Issue Date:
DOI: https://doi.org/10.1023/A:1021971212438