Abstract
While scientific inquiry crucially relies on the extraction of patterns from data, we still have a far from perfect understanding of the metaphysics of patterns—and, in particular, of what makes a pattern real. In this paper we derive a criterion of real-patternhood from the notion of conditional Kolmogorov complexity. The resulting account belongs to the philosophical tradition, initiated by Dennett (J Philos 88(1):27–51, 1991), that links real-patternhood to data compressibility, but is simpler and formally more perspicuous than other proposals previously defended in the literature. It also successfully enforces a non-redundancy principle, suggested by Ladyman and Ross (Every thing must go: metaphysics naturalized, Oxford University Press, Oxford, 2007), that aims to exclude from real-patternhood those patterns that can be ignored without loss of information about the target dataset, and which their own account fails to enforce.
Similar content being viewed by others
Notes
The above informal characterization of non-redundancy will be sharpened in Sect. 4.
Speech compression is an instance of lossy compression, where faithfulness of compression is judged by a certain distortion measure, or loss function (Cover and Thomas 2006, ch. 10; Shannon 1959). The main notion of compression we rely on in what follows, on the other hand, is lossless compression, in which the original file and the decoded version thereof are identical. TIFF, FLAC and others such as DEFLATE, typically used in zip files, are widely popular lossless algorithms. We note that the very existence of lossless compression algorithms appears to be in some tension with McAllister's (2003a) claim that empirical datasets are incompressible—insofar, e.g., as empirical datasets can contain photographs or audio recordings. We won't pursue this topic here.
In formal presentations of Kolmogorov complexity (e.g. Li and Vitányi 2008, p. 107), the programs we have been alluding to are inputs to a reference universal Turing machine (UTM). For the purposes of this paper, we can just think of the reference UTM as implementing one of the very many popular Turing-complete programming languages—say, Python, or Javascript. Petersen (2018, p. 2) discusses whether the choice of UTM introduces a bias in the resulting account of patterns (for example, by making any arbitrary dataset, however big and random, compressible and hence patterned) and concludes, with Li and Vitányi (2008, p. 112), that a small enough UTM will make any such potential bias negligible.
The foregoing few paragraphs only scratch the surface of the algorithmic approach to model selection. This is the aim of so-called algorithmic statistics. We point the interested reader to Gács et al. (2001), Vereshchagin and Shen (2017) and references therein for in-depth discussion and alternatives to the structure–function two-part code.
To be clear: this is a problem insofar as we want to use model selection as a method for identifying patterns. Model selection is a perfectly clear goal in algorithmic statistics, and the structure–function approach has much to recommend it, when used for its intended purpose.
Keep in mind that patterns simpliciter for L&R are just what we have called “strings”. This is what Fn. 51 in Ladyman and Ross (2007) amounts to saying: “A mere pattern is a locatable address associated with no projectible or non-redundant object” (ibid., p. 231). See also ibid., p. 229: “From the ontological point of view, a non-projectible pattern exactly resembles the traditional philosophical individual.” For the meaning of “locator” and cognates in L&R's system, see ibid., p. 121ff. For the related notion of “perspective” see ibid., p. 224.
We have changed the variable names to align them with the ones we use in this paper.
A megabyte (MB) is one million bytes.
For ease of discussion, we have designed our example so that redundancies in the dataset are readily apparent. Of course, in more realistic examples, sophisticated coding might be needed in order to squeeze the redundant material out of our target string. The kind of argument we develop here applies to more realistic cases as well.
In what follows we will give our example algorithms in pseudocode—i.e. a dialect that does not correspond to any particular programming language, but can be readily translated to many of them and is tailored to maximize readability for humans. It may have occurred to some readers that the choice of coding scheme or programming language used to describe an object can condition the minimum achievable length for describing it and thus, seemingly, that different languages will introduce different Ks for the same object. This is correct, but the apparent relativity thus introduced does not affect the objectivity of K as a measure of compressibility: K is equal for every programming language up to an additive constant that is independent of the string to be compressed itself (Grünwald 2007, p. 10).
At least in our laptop. The exact number will vary slightly from platform to platform.
In fact, if, as we said, scientists have not yet learned that A2 and A3 are repeated in D, these two strings should occur twice in listing 2, once for each repetition. This detail does not interfere with our point, and we have omitted it so as not to complicate the structure of the example.
Put to us by an anonymous referee.
References
Andersen, H. K. (2017). Pattens, information, and causation. The Journal of Philosophy, 114(11), 592–622.
Barrett, T., Wilhite, S. E., Ledoux, P., Evangelista, C., Kim, I. F., Tomashevsky, M., et al. (2013). NCBI GEO: Archive for functional genomics data sets—Update. Nucleic Acids Research, 41(D1), D991–D995. https://doi.org/10.1093/nar/gks1193.
Bennett, C. H. (1988). Logical depth and physical complexity. In R. Herken (Ed.), The universal Turing machine, a half century survey, (pp. 227-257). Oxford University Press.
Bird, A. P. (1986). CpG-rich islands and the function of DNA methylation. Nature, 321(6067), 209. https://doi.org/10.1038/321209a0.
Bogen, J., & Woodward, J. (1988). Saving the phenomena. The Philosophical Review, 97(3), 303–352.
Chaitin, G. J. (1966). On the length of programs for computing finite binary sequences. Journal of the ACM (JACM), 13(4), 547–569.
Collier, J. (2001). Dealing with the unexpected. In Partial proceedings of CASYS 2000: Fourth international conference on computing anticipatory systems, international journal of computing anticipatory systems (vol. 10, pp. 21–30).
Collier, J., & Hooker, C. A. (1999). Complexly organised dynamical systems. Open Systems and Information Dynamics, 6(3), 241–302.
Cover, T., & Thomas, J. (2006). Elements of information theory (Wiley Series in Telecommunications and Signal Processing). New York, NY: Wiley-Interscience.
Dennett, D. C. (1991). Real patterns. The Journal of Philosophy, 88(1), 27–51.
Feigelson, E. D., & Babu, G. J. (2012). Big data in astronomy. Significance, 9, 22–25. https://doi.org/10.1111/j.1740-9713.2012.00587.x.
Frøkjær-Jensen, C., Jain, N., Hansen, L., Davis, M. W., Li, Y., Zhao, D., et al. (2016). An abundant class of non-coding DNA can prevent stochastic gene silencing in the C. elegans germline. Cell, 166(2), 343–357. https://doi.org/10.1016/j.cell.2016.05.072.
Gács, P., Tromp, J. T., & Vitányi, P. (2001). Algorithmic statistics. IEEE Transactions on Information Theory, 47(6), 2443–2463.
Griffith, V., Chong, E. K., James, R. G., Ellison, C. J., & Crutchfield, J. P. (2014). Intersection information based on common randomness. Entropy, 16(4), 1985–2000.
Grünwald, P. (2007). The minimum description length principle. Cambridge: The MIT Press.
Grünwald, P. D., & Vitányi, P. M. B. (2008). Algorithmic Information Theory. In Adriaans, P. & van Benthem, J. (Eds.), Philosophy of Information (Vol. 8, pp. 281–320). Handbook of the Philosophy of Science. Amsterdam: North-Holland.
Kolmogorov, A. N. (1965). Three approaches to the quantitative definition of information. Problems of Information Transmission, 1(1), 1–7.
Ladyman, J., & Ross, D. (2007). Every thing must go: Metaphysics naturalized. Oxford: Oxford University Press.
Larsen, F., Gundersen, G., Lopez, R., & Prydz, H. (1992). CpG islands as gene markers in the human genome. Genomics, 13(4), 1095–1107. https://doi.org/10.1016/0888-7543(92)90024-M.
Li, M., & Vitányi, P. (2008). An introduction to Kolmogorov complexity and its applications. Texts in computer science (Vol. 9). New York: Springer.
Martínez, M. (2015). Informationally-connected property clusters, and polymorphism. Biology and Philosophy, 30(1), 99–117.
Martínez, M. (2017). Synergic kinds. Synthese. https://doi.org/10.1007/s11229-017-1480-2.
McAllister, J. W. (2003a). Algorithmic randomness in empirical data. Studies in History and Philosophy of Science Part A, 34(3), 633–646.
McAllister, J. W. (2003b). Effective complexity as a measure of information content. Philosophy of Science, 70(2), 302–307.
McAllister, J. W. (2011). What do patterns in empirical data tell us about the structure of the world? Synthese, 182(1), 73–87.
Petersen, S. (2013). Toward an algorithmic metaphysics. In D. Dowe (Ed.), Algorithmic probability and friends: Bayesian prediction and artificial intelligence (pp. 306–317). Berlin: Springer.
Petersen, S. (2018). Composition as pattern. Philosophical Studies, 176, 1119. https://doi.org/10.1007/s11098-018-1050-6.
Rappaport, T. S. (1996). Wireless communications: Principles and practice. Upper Saddle River, NJ: Prentice Hall PTR.
Rissanen, J. (1998). Stochastic complexity in statistical inquiry (Vol. 15). Singapore: World Scientific.
Ross, D. (2000). Rainforest realism: A Dennettian theory of existence. In D. Ross, D. Thomson, & A. Brook (Eds.), Dennett’s philosophy: A comprehensive assessment (pp. 147–168). Cambridge, MA: The MIT Press.
Shannon, C. E. (1959). Coding theorems for a discrete source with a fidelity criterion. Institute of Radio Engineers, International Convention Record, Part 4, 142–163.
Solomonoff, R. J. (1964a). A formal theory of inductive inference. Part I. Information and Control, 7(1), 1–22.
Solomonoff, R. J. (1964b). A formal theory of inductive inference. Part II. Information and Control, 7(2), 224–254.
Sporns, O., Tononi, G., & Kötter, R. (2005). The human connectome: A structural description of the human brain. PLoS Computational Biology, 1(4), e42. https://doi.org/10.1371/journal.pcbi.0010042.
Vereshchagin, N., & Shen, A. (2017). Algorithmic statistics: Forty years later. In A. Day, M. Fellows, N. Greenberg, B. Khoussainov, A. Melnikov, & F. Rosamond (Eds.), Computability and complexity: Essays dedicated to Rodney G. Downey on the occasion of his 60th birthday. Lecture Notes in Computer Science (pp. 669–737). Cham: Springer. https://doi.org/10.1007/978-3-319-50062-1_41.
Vereshchagin, N., & Vitanyi, P. (2010). Rate distortion and denoising of individual data using Kolmogorov complexity. IEEE Transactions on Information Theory, 56(7), 3438–3454. https://doi.org/10.1109/TIT.2010.2048491.
Vereshchagin, N., & Vitányi, P. (2006). On algorithmic rate-distortion function. In 2006 IEEE international symposium on information theory (pp. 798–802). IEEE.
Vitányi, P. M. (2006). Meaningful Information. IEEE Transactions on Information Theory, 52(10), 4617–4626.
Williams, P. L., & Beer, R. D. (2010). Nonnegative decomposition of multivariate information. ArXiv Preprint arXiv:1004.2515.
Acknowledgements
We would like to thank James Ladyman for his very generous discussion of the topics of this paper. We would also like to thank the participants of the reading group on Real Patterns held at the University of Barcelona in 2018. Two very detailed reviews from two anonymous referees helped us to significantly improve the paper. Abel Suñé also wishes to thank J.P. Grodniewicz for valuable discussion and Pepa Toribio for her support. Manolo Martínez would like to acknowledge research funding awarded by the Spanish Ministry of Economy, Industry and Competitiveness, in the form of grants PGC2018-101425-B-I00 and RYC-2016-20642.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Suñé, A., Martínez, M. Real patterns and indispensability. Synthese 198, 4315–4330 (2021). https://doi.org/10.1007/s11229-019-02343-1
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11229-019-02343-1