Theory choice, non-epistemic values, and machine learning

Dotan, Ravit

doi:10.1007/s11229-020-02773-2

Theory choice, non-epistemic values, and machine learning

Published: 13 August 2020

Volume 198, pages 11081–11101, (2021)
Cite this article

Synthese Aims and scope Submit manuscript

Ravit Dotan ORCID: orcid.org/0000-0002-9646-8315¹

1343 Accesses
8 Citations
1 Altmetric
Explore all metrics

Abstract

I use a theorem from machine learning, called the “No Free Lunch” theorem (NFL) to support the claim that non-epistemic values are essential to theory choice. I argue that NFL entails that predictive accuracy is insufficient to favor a given theory over others, and that NFL challenges our ability to give a purely epistemic justification for using other traditional epistemic virtues in theory choice. In addition, I argue that the natural way to overcome NFL’s challenge is to use non-epistemic values. If my argument holds, non-epistemic values are entangled in theory choice regardless of human limitations and regardless of the subject matter. Thereby, my argument overcomes objections to the main lines of argument revealing the role of values in theory choice. At the end of the paper, I argue that, contrary to common conception, the epistemic challenge arising from NFL is distinct from Hume’s problem of induction and other forms of underdetermination.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

No free theory choice from machine learning

Article Open access 02 October 2022

What Makes a Good Theory, and How Do We Make a Theory Good?

Article Open access 24 January 2024

Aleatoric and epistemic uncertainty in machine learning: an introduction to concepts and methods

Article Open access 08 March 2021

Notes

For a more technical, yet accessible, introduction to machine learning algorithms see Russell and Norvig (2010).
To be more precise, Wolpert’s (1996) theorem applies to supervised learning algorithms.
This thought experiment is loosely based on the adversary argument from Culberson (1998) and the OR/XOR example from Wilson and Martinez (1997).
Since NFL allows to use any error measure that is only a function of the relevant values and prominent distance measures are also functions of the same values, we can manipulate NFL’s results to bear on popular error measures. For example, suppose we use square Euclidian distance as our error measure for NFL: $\left| {Y_{F} \left( {\text{x}} \right) - Y_{H} \left( {\text{x}} \right)} \right|^{2}$, where $Y_{H} \left( {\text{x}} \right)$ is the algorithm’s prediction for input x and $Y_{F} \left( {\text{x}} \right)$ is the true output. Then, according to NFL, all algorithms have the same average expected error: $\mathop \sum \nolimits_{{x \in {\text{X}}}} \left| {Y_{F} \left( {\text{x}} \right) - Y_{H} \left( {\text{x}} \right)} \right|^{2} /\left| X \right|$ (where X is the set of all relevant inputs). However, since |X| is just the number of items in X, the quantitiy $\mathop \sum \nolimits_{{x \in {\text{X}}}} \left| {Y_{F} \left( {\text{x}} \right) - Y_{H} \left( {\text{x}} \right)} \right|^{2}$ is also the same for all algorithms. But $\mathop \sum \nolimits_{{x \in {\text{X}}}} \left| {Y_{F} \left( {\text{x}} \right) - Y_{H} \left( {\text{x}} \right)} \right|^{2}$ is the Brier inaccuracy measure. Therefore, we get that the predictions of all algorithms are equally inaccuate relative to the Brier inaccuracy measure.
See Dotan (forthcoming) for more discussion of the implication of the No Free Lunch theorem on using accuracy in theory choice.
Based on the OR/XOR example from Wilson and Martinez (1997).

References

Arlot, S., and Celisse, A. (2010). A survey of cross-validation procedures for model selection. Statistics Surveys, 4, 40–79.
Article Google Scholar
Bird, A. (2012). The structure of scientific revolutions and its significance: An essay review of the fiftieth anniversary edition. The British Journal for the Philosophy of Science, 63(4), 859–883. https://doi.org/10.1093/bjps/axs031.
Article Google Scholar
Boghossian, P. A. (2006). Fear of knowledge: Against relativism and constructivism. Oxford: Clarendon Press. https://doi.org/10.15713/ins.mmj.3.
Book Google Scholar
Culberson, J. (1998). On the futility of blind search: An algorithmic view of “no free lunch”. Evolutionary Computation, 6(2), 109–127. https://doi.org/10.1162/evco.1998.6.2.109.
Article Google Scholar
Davidson, D. (1973). On the very idea of conceptual scheme. Proceedings and Addresses of the American Philosophical Association, 47, 5–20. https://doi.org/10.1075/pc.3.1.12bus.
Article Google Scholar
Domingos, P. (2012). A few useful things to know about machine learning. Communications of the ACM, 55(10), 78–87.
Article Google Scholar
Dotan, R. (forthcoming). What can we learn about accuracy from machine learning? Philosophy of Science
Douglas, H. (2009). Science, policy, and the value free ideal. Pittsburgh: University of Pittsburgh Press.
Book Google Scholar
Elliott, K., & Steel, D. (Eds.). (2017). Current controversies in values and science. Oxford: Taylor & Francis.
Google Scholar
Fernández-Delgado, M., Cernadas, E., Barro, S., et al. (2014). Do we need hundreds of classifiers to solve real world classification problems? Journal of Machine Learning Research, 15, 3133–3181. https://doi.org/10.1016/j.csda.2008.10.033.
Article Google Scholar
Giraud-Carrier, C., & Provost, F. (2005). Toward a justification of meta-learning: Is the no free lunch theorem a show-stopper. In Proceedings of the ICML-2005 Workshop on Meta-Learning.
Gómez, D., & Rojas, A. (2015). An empirical overview of the no free lunch theorem and its effect on real-world machine learning classification. Neural Computation, 28, 105.
Google Scholar
Henderson, L. (2020). The problem of induction. In Edward N. Z. (ed.) The Stanford encyclopedia of philosophy. Springer: Berlin. https://plato.stanford.edu/archives/spr2020/entries/induction-problem.
Igel, C., & Toussaint, M. (2005). A no-free-lunch theorem for non-uniform distributions of target functions. Journal of Mathematical Modelling and Algorithms, 3(4), 313–322.
Article Google Scholar
Korb, K. B. (2004). Introduction: Machine learning as philosophy of science. Minds and Machines, 14(4), 433–440. https://doi.org/10.1023/B:MIND.0000045986.90956.7f.
Article Google Scholar
Kuhn, T. S. (1962). The structure of scientific revolutions. Chicago: The University of Chicago Press.
Google Scholar
Lacey, H. (1999). Is science value free? Values and scientific understanding. Science Teacher (Vol. 53). London: Routledge.
Google Scholar
Lacey, H. (2017). Distinguishing between cognitive and social values. In K. Elliott & D. Steel (Eds.), Current controversies in values and science. New York: Routledge.
Google Scholar
Lattimore, T., & Hutter, M.. (2011). No free lunch versus Occam’s razor in supervised learning. [ArXiv preprint available at arXiv:1111.3846].
Lauc, D. (2018). How gruesome are the no-free-lunch theorems for machine learning? Croatian Journal of Philosophy, 18(54), 479–485.
Google Scholar
Lauden, L. (1990). Science and relativism: Some key controversies in the philosophy of science. Chicago: The University of Chicago Press.
Book Google Scholar
Levi, I. (1962). On the seriousness of mistakes. Philosophy of Science, 29(1), 47–65.
Article Google Scholar
Lipton, P. (2004). Inference to the best explanation (2nd ed.). New York: Routledge.
Google Scholar
Longino, H. (1990). Science as social knowledge: Values and objectivity in scientific inquiry. Princeton: Princeton University Press.
Book Google Scholar
Longino, H. (1996). Cognitive and non-cognitive values in science: Rethinking the dichotomy. In H. N. Lynn & N. Jack (Eds.), Feminism, science, and the philosophy of science (pp. 39–58). New York: Kluwer Academic Publishers.
Chapter Google Scholar
Longino, H. (2002). The fate of knowledge. Princeton: Princeton University Press.
Book Google Scholar
Longino, H. (2014). Values, heuristics, and politics of knowledge. In M. Carrier (Ed.), The challange of the social and the pressure of the practice: Science and values revisited. Pittsburgh: University of Pittsburgh Press.
Google Scholar
McMullin, E. (1982). Values in science. PSA Proceedings of the Biennial Meeting of the Philosophy of Science Association, 2, 3–28.
Article Google Scholar
Montañez, G.D. (2017). Why machine learning works. Carnegie Mellon.
Okruhlik, K. (1994). Gender and the biological sciences. Canadian Journal of Philosophy, 24(sup1), 21–42.
Google Scholar
Pettigrew, R. (2016). Accuracy and the laws of credence. Oxford: Oxford University Press.
Book Google Scholar
Rolin, Kristina. (2017). Can social diversity be best incorporated into science by adopting the social value management ideal? In D. Steel & Kevin C. Elliott (Eds.), Current controversies in values and science (pp. 113–129). Routledge.
Rudner, R. (1953). The scientist qua scientist makes value judgments. Philosophy of Science, 20(1), 1–6.
Article Google Scholar
Russell, S., & Norvig, P. (2010). Artificial intelligence: A modern approach (3rd ed.). New Jersey: Pearson Education Inc.
Google Scholar
Schaffer, C. (1993a). Overfitting avoidance as bias. Machine Learning, 10(2), 153–178.
Google Scholar
Schaffer, C. (1993b). Selecting a classification method by cross validation. Machine Learning, 13(1), 135–143.
Google Scholar
Schaffer, C. (1994). A conservation law for generalization performance. In Machine learning: Proceedings of the eleventh international conference.
Steel, D. (2013). Acceptance, values, and inductive risk. Philosophy of Science, 80(5), 818–828. https://doi.org/10.1086/673936.
Article Google Scholar
Strawson, P. F. (1952). Introduction to logical theory. London: Methuen.
Google Scholar
Swinburne, R. (1997). Simplicity as Evidence for Truth. Milwaukee: Marquette University Press.
Google Scholar
The Biology and Gender Study Group. (1988). The importance of feminist critique for contemporary cell biology. Hypatia, 3(1), 61–76.
Article Google Scholar
Toulmin, S. (1970). Does the distinction between normal and revolutionary science hold water? In L. Imre & M. Alan (Eds.), Criticism and the growth of knowledge. Cambridge: Cambridge University Press.
Google Scholar
van Fraassen, B. C. (1980). The scientific image. New York: Oxford University Press.
Book Google Scholar
Wilson, D. R., & Martinez, T. R. (1997). Bias and the probability of generalization. In Proceedings Intelligent Information Systems. IIS’97 (pp. 108–114). https://doi.org/10.1109/iis.1997.645199.
Wolpert, D. H. (1996). The lack of a priori distinctions between learning algorithms. Neural Computation, 8(7), 1391–1420. https://doi.org/10.1162/neco.1996.8.7.1391.
Article Google Scholar
Wolpert, D. H. (2012). What the no free lunch theorems really mean ; how to improve search algorithms (pp. 1–13).
Wolpert, D. H. On overfitting avoidance as bias. Technical Report SFI TR 92-03-5001. Santa Fe, NM: The Santa Fe Institute, 1993.

Download references

Acknowledgements

For extensive feedback on this paper, I would like to thank Lara Buchak and Shamik Dasgupta. For comments on earlier drafts, I would like to thank Greyson Abid, Michael Arsenault, Nick French, Alvin Goldman, Tyler Haddow, Daniel Harman, Dan Hicks, John MacFarlane, Sven Neth, Emily Perry, Daniel Warren, and two anonymous referees for Synthese. For extensive conversations, I thank Gil Rosenthal. I am also grateful for comments and discussion from the conferences where versions of this paper were presented, including the 2020 Eastern APA, the 2019 Congress on Logic, Methodology, and philosophy of Science and Technology, the 2019 Canadian Society for the History and Philosophy of Science conference, the 2019 Society of Exact Philosophy conference, the 2019 Values in Medicine, Science, and Technology conference, and the 2018 Philosophy of Science Association conference.

Author information

Authors and Affiliations

UC Berkeley (Philosophy), Berkeley, CA, USA
Ravit Dotan

Authors

Ravit Dotan
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ravit Dotan.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix: The no free lunch theorem(s)

“No Free Lunch” is the name of a family of theorems. Differences between No Free Lunch theorems include differences between the kinds of algorithms they consider. For example, initially No Free Lunch theorems were proven for optimization algorithms (Wolpert and Macready 1993). Wolpert (1996, 2001, 2012) proves No Free Lunch theorems for supervised learning algorithms, and this is what I have focused on in this paper. Schaffer (1994) gives an elegant formulation of Wolpert’s main No Free Lunch Theorem for classification learning algorithms, based on a preprint of Wolpert (1996). In this appendix, I state Schaffer’s formulation to illustrate what NFL theorems say more formally (Schaffer calls it the “Law of Conservation of Generalization of Performance”). See Montanez (2017, chapter 2) for a review of various No Free Lunch results, and see Schaffer (1994) and Wolpert (1996) for a proof of the theorem which I will state here.

We start with defining cases in a classification problem. Each case in a classification problem, $A_{i}$, is a vector of attributes. For simplicity, we assume that each component in the vector is a finite number. $\left\{ {A_{1} , \ldots ,A_{m} } \right\}$ is the set of all possible attribute vectors, where m is finite. C is a class probability vector, which defines the relationship between attribute vectors and classes. Each component of C, $C_{\text{i}}$, is the probability that a case with attribute $A_{i}$ belongs to class 1. We assume that data is generated in the same way in training and testing a learner. Attribute vectors are sampled with replacement according to an arbitrary distribution D and a class is assigned to them using C. We also assume that the training set contains n samples. A learning situation S is a triple (D, C, n).

The Generalization Accuracy of a learner (GA_L) is the expected prediction performance of a learner on cases with attribute vectors not represented in the training set. For example, the generalization accuracy of a random guesser in a two-class problem is 1/2 for every D and C. We use the generalization accuracy of a random guesser as a baseline and define Generalization Performance of a learner (GP_L) the difference between its generalization accuracy and the generalization accuracy of a random guesser:

$$GP_{L} = GA_{L} - GA_{random\,guesser}$$

Generalization performance greater than zero means better than chance performance. $GP_{L} \left( S \right)$ is the generalization performance of learner L in learning situation S.

Using the notation above we can write Schaffer’s Law of Conservation of Generalization Performance:

$$\mathop \sum \limits_{S} GP_{L} \left( S \right) = 0,\;{\text{for}}\;{\text{every}}\;{\text{D}},{\text{n}}$$

In words, this law says that any positive performance by a learner in a certain learning situation must be exactly balanced by negative performance in other learning situations.

If we allow for the possibility of noise, then the law is properly written with an integral instead of a summation:

$$\mathop \int \limits_{S}^{{}} GP_{L} \left( S \right)ds = 0,\;{\text{for}}\;{\text{every}}\;{\text{D}},{\text{n}}$$

In this case, the components of C are taken from the real interval [0,1] and the integral runs over the space [0,1]^m of class probability vector. Without noise, the components of C are taken from {0,1} and the summation runs over 2 ^m possible class probability vectors.

From the conservation law, it follows that all learners have the same average generalization performance if we average over all possible learning situations (or, as I put it, that all algorithms have the same expected error if we make no assumptions about the problem we are trying to solve). Here’s why.

For any learner:

$$\mathop \sum \limits_{S} GP_{L} \left( S \right) = \mathop \sum \limits_{S} \left( {GA_{L} \left( {\text{S}} \right) - GA_{random\,guesser} \left( S \right)} \right) = 0$$

Add $\mathop \sum \nolimits_{S} GA_{random\,guesser} \left( S \right)$ to both sides and get:

$$\mathop \sum \limits_{S} GA_{L} \left( S \right) = \mathop \sum \limits_{S} GA_{random \,guesser} \left( S \right)$$

Divide by the number of learning situations and get the formulation that was used in this paper—that the average generalization performance of any learner L is equal, and in particular equal to that of the random guesser:

$$\frac{{\mathop \sum \nolimits_{S} GA_{L} \left( S \right)}}{\# S} = \frac{{\mathop \sum \nolimits_{S} GA_{random\, guesser} \left( S \right)}}{\# S}$$

Rights and permissions

Reprints and permissions

About this article

Cite this article

Dotan, R. Theory choice, non-epistemic values, and machine learning. Synthese 198, 11081–11101 (2021). https://doi.org/10.1007/s11229-020-02773-2

Download citation

Received: 09 February 2020
Accepted: 29 June 2020
Published: 13 August 2020
Issue Date: November 2021
DOI: https://doi.org/10.1007/s11229-020-02773-2

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Theory choice, non-epistemic values, and machine learning

Abstract

Access this article

Similar content being viewed by others

No free theory choice from machine learning

What Makes a Good Theory, and How Do We Make a Theory Good?

Aleatoric and epistemic uncertainty in machine learning: an introduction to concepts and methods

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Appendix: The no free lunch theorem(s)

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Theory choice, non-epistemic values, and machine learning

Abstract

Access this article

Similar content being viewed by others

No free theory choice from machine learning

What Makes a Good Theory, and How Do We Make a Theory Good?

Aleatoric and epistemic uncertainty in machine learning: an introduction to concepts and methods

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Appendix: The no free lunch theorem(s)

Appendix: The no free lunch theorem(s)

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation