Understanding the Sampling Bias: A Case Study on NBA Drafts

Economou, Polychronis; Batsidis, Apostolos; Tzavelas, George; Malefaki, Sonia

doi:10.1007/s42519-021-00167-2

Understanding the Sampling Bias: A Case Study on NBA Drafts

Original Article
Published: 23 March 2021

Volume 15, article number 45, (2021)
Cite this article

Journal of Statistical Theory and Practice Aims and scope Submit manuscript

Polychronis Economou ORCID: orcid.org/0000-0001-6452-5920¹,
Apostolos Batsidis²,
George Tzavelas³ &
…
Sonia Malefaki⁴

261 Accesses
1 Citation
Explore all metrics

Abstract

In several real data applications a biased sample arises naturally from the selection procedure. Recently, Economou et al. (Biom J 62: 238–249, 2020) used the concept of bivariate weighted distributions and proposed four different families of weight functions to describe cases in which the bias in a bivariate sample is caused by adopting sampling schemes that result in over- or under-representation of individuals with specific properties in the sample. The current paper focuses on revealing the contribution of each variable to the bias in the bivariate sample. More specifically, under the Bayesian perspective, Approximate Bayesian Computation methods are used to sample approximately from the posterior distribution, and the Deviance Information Criterion is employed to compare the fit of the models obtained by using different weight functions. The proposed method is illustrated to a real data set concerning NBA draft players.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Sampling methods to estimate the Banzhaf–Owen value

Article 07 May 2020

A fast look-up method for Bayesian mean-parameterised Conway–Maxwell–Poisson regression models

Article Open access 18 May 2023

Estimation of the Owen Value Based on Sampling

References

Afonso L, Corte Real P (2016) Using weighted distributions to model operational risk. ASTIN Bull 46(2):469–485
Article MathSciNet Google Scholar
Arnold B, Nagaraja H (1991) On some properties of bivariate weighted distributions. Commun Stat Theory Methods 20(5–6):1853–1860
Article MathSciNet Google Scholar
Berkson J (1946) Limitations of the application of fourfold table analysis to hospital data. Biom Bull 2:47–53
Article Google Scholar
Celeux G, Forbes F, Robert CP, Titterington DM (2006) Bayesian Anal 1(4):651–673
MathSciNet Google Scholar
Duong T, Goud B, Schauer K (2012) Closed-form density-based framework for automatic detection of cellular morphology changes. Proc Nat Acad Sci 109(22):8382–8387
Article Google Scholar
Economou P, Batsidis A, Tzavelas G, Alexopoulos P (2020) ADNI: Berkson’s paradox and weighted distributions: An application to alzheimer’s disease. Bioml J 62:238–249
Article Google Scholar
Economou P, Tzavelas G, Batsidis A (2020) Robust inference under r-size-biased sampling without replacement from finite population. J Appl Stat 47(13–15):2808–2824
Article MathSciNet Google Scholar
Fisher R (1934) The effect of methods of ascertainment upon the estimation of frequencies. Ann Eugen 6(1):13–25
Article Google Scholar
Geneletti S, Best N, Toledano MB, Elliot P, Richardson S (2013) Uncovering selection bias in case-control studies using Bayesian post-stratification. Stat Med 32:2555–2570
Article MathSciNet Google Scholar
Greenland S (2003) Quantifying biases in casual models: classical confounding vs collider-stratification bias. Epidemiology 14:300–306
Google Scholar
Gupta RC, Kirmani S (1990) The role of weighted distributions in stochastic modeling. Commun Statist 19(9):3147–3162
Article MathSciNet Google Scholar
Hernan M, Hernandez-Diaz S, Robins J (2004) A structural approach to selection bias. Epidemiology 15:615–625
Article Google Scholar
Jain K, Nanda A (1995) On multivariate weighted distributions. Commun Stat Theory Method 24(10):2517–2519
Article MathSciNet Google Scholar
Kacprzak T, Herbel J, Amara A, Réfrégier A (2018) Accelerating approximate Bayesian computation with quantile regression: application to cosmological redshift distributions. J Cosmol Astropart Phys 2018(02):042
Article Google Scholar
Kavetski D, Fenicia F, Reichert P, Albert C (2018) Signature-domain calibration of hydrological models using approximate Bayesian computation: theory and comparison to existing applications. Water Resour Res 54(6):4059–4083
Article Google Scholar
McKinley T, Vernon I, Andrianakis I, McCreesh N, Oakley J, Nsubuga R, Goldstein M, White R (2018) Approximate Bayesian computation and simulation-based inference for complex stochastic epidemic models. Stat Sci 33(1):4–18. https://doi.org/10.1214/17-STS618
Article MathSciNet MATH Google Scholar
Nanda A, Jain K (1999) Some weighted distribution results on univariate and bivariate cases. J Stat Plan Inference 77(2):169–180
Article MathSciNet Google Scholar
Navarro J, Ruiz J, Aguila YD (2006) Multivariate weighted distributions: a review and some extensions. Statistics 40(1):51–64
Article MathSciNet Google Scholar
Patil G, Rao C (1978) Weighted distributions and size-biased sampling with applications to wildlife populations and human families. Biometrics 34(2):179–189
Article MathSciNet Google Scholar
Pearl J (1995) Casual diagrams for empirical research. Biometrika 82(4):669–688
Article MathSciNet Google Scholar
Rao C (1965) On discrete distributions arising out of methods of ascertainment. Sankhya Indian J Stat Ser A (1961–2002) 27(2/4):311–324
MathSciNet MATH Google Scholar
Raynal L, Marin J, Pudlo P, Ribatet M, Robert CP, Estoup A (2018) ABC random forests for Bayesian parameter inference. Bioinformatics 35(10):1720–1728
Article Google Scholar
Richard L, Berg K, Thomas B (1994) Physical and performance characteristics of ncaa division i male basketball players. J Strength Cond Res 8(4):214–218
Google Scholar
Rotnitzky A, Robins J (2005) Inverse probability weighted estimation in survival analysis. In: Encyclopedia of Biostatistics. Wiley, London
Samuelsen S, Anestad H, Skrondal A (2007) Stratified case-cohort analysis of general cohort sampling designs. Scan J Stat 343:103–119
Article MathSciNet Google Scholar
Sarabia JM, Gomez-Deniz E (2008) Construction of multivariate distributions: a review of some recent results. SORT 32(1):3–36
MathSciNet MATH Google Scholar
Spiegelhalter DJ, Best NG, Carlin BP, Van Der Linde A (2002) Bayesian measures of model complexity and fit. J R Stat Soc Ser B (Stat Methodol) 64(4):583–639
Article MathSciNet Google Scholar
Spirtes P, Glymour C, Scheines R (1993) Causation, prediction, and search. The MIT press, Cambridge
Book Google Scholar
Tzavelas G, Douli M, Economou P (2017) Model misspecification effects for biased samples. Metrika 80(2):171–185
Article MathSciNet Google Scholar
VanderWeel T, Herman M, Robins J (2008) Casual directed acyclic graphs and the direction of unmeasured confoundin bias. Epidemiology 19:720–728
Article Google Scholar
Ziv G, Lidor R (2010) Vertical jump in female and male basketball players-a review of observational and experimental studies. J Sci Med Sport 13(3):332–9
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of Civil Engineering, University of Patras, 265 00, Rion-Patras, Greece
Polychronis Economou
Department of Mathematics, University of Ioannina, 45 110, Ioannina, Greece
Apostolos Batsidis
Department of Statistics and Insurance Science, University of Piraeus, 80, M. Karaoli and A. Dimitriou St., 18534, Piraeus, Greece
George Tzavelas
Department of Mechanical Engineering and Aeronautics, University of Patras, 265 00, Rion-Patras, Greece
Sonia Malefaki

Authors

Polychronis Economou
View author publications
You can also search for this author in PubMed Google Scholar
Apostolos Batsidis
View author publications
You can also search for this author in PubMed Google Scholar
George Tzavelas
View author publications
You can also search for this author in PubMed Google Scholar
Sonia Malefaki
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Polychronis Economou.

Ethics declarations

Conflict of interest:

On behalf of all authors, the corresponding author states that there is no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Below is the link to the electronic supplementary material.

Supplementary file1 (ZIP 6 kb)

Appendix

In this Appendix the posterior density is reported for the general case and in detail for the special case of the application.

The likelihood function of a biased bivariate sample $D = (x_j, y_j), j=1,\ldots ,n$ from a parent population with known pdf $f(x,y;\theta )$ where $\theta $ unknown parameters’ vector, when the bias in the sample is described by the weight function $w_{i}(x,y;\theta ,\gamma _X,\gamma _Y)$ is

$$\begin{aligned} \prod _{j=1}^{n}f_{w_{i}}(x_j,y_j;\theta ,\gamma _X,\gamma _Y)=\prod _{j=1}^{n}f_{w_{i}}(x_j,y_j;\zeta )= \frac{\prod _{j=1}^{n} w_{i}(x_j, y_j;\zeta )f(x_j, y_j;\theta )}{E_{f}^n[w_i(X,Y;\zeta )]}. \end{aligned}$$

Let $\pi (\zeta )$ be the joint prior density of the parameters of the model, where $\zeta = (\theta , \gamma _X, \gamma _Y)$. Then, the posterior density of the model has the form:

$$\begin{aligned} \pi (\zeta |{\mathrm{data}})\propto & {} \frac{\prod _{j=1}^{n} w_{i}(x_j, y_j;\zeta )f(x_j, y_j;\theta )}{E_{f}^n[w_i(X,Y;\zeta )]} \cdot \pi (\zeta ). \end{aligned}$$

Based on the discussion of Sect. 4.2, the joint distribution of height and the vertical jump in the population of interest is a bivariate normal. Moreover, independence of the parameters of the model is assumed and a prior distribution is adopted for each parameter $\mu _X$, $\mu _Y$, $\sigma ^2_{X}$, $\sigma ^2_{Y}$, $\rho $, $\gamma _X$ and $\gamma _Y$. Then, the posterior density takes the form:

$$\begin{aligned} \pi (\zeta |data)\propto & {} \frac{\prod _{j=1}^{n} w_{i}(x_j, y_j;\zeta )f(x_j, y_j;\theta )}{E_{f}^n[w_i(X,Y;\zeta )]}\\&\pi (\mu _X) \pi (\mu _Y) \pi (\sigma ^2_X) \pi (\sigma ^2_Y) \pi (\rho ) \pi (\gamma _X)\pi (\gamma _Y). \end{aligned}$$

Using the priors described in Sect. 4.2 the following relation is obtained:

$$\begin{aligned} \pi (\zeta |data)\propto & {} \frac{\prod _{j=1}^{n} w_{i}(x_j, y_j;\zeta )}{E_{f}^n[w_i(X,Y;\zeta )]} \cdot \\&\exp \left[ -\frac{1}{2(1-\rho ^2)} \sum _{j=1}^n \left[ \frac{(x_j-\mu _X)^2}{\sigma ^2_X}\right. \right. +\\&\left. \left. \frac{(y_j-\mu _Y)^2}{\sigma ^2_Y}-2\rho \frac{(x_j-\mu _X)(y_j-\mu _Y)}{\sigma _X\sigma _Y}\right] \right] \\&\exp \left[ -\frac{1}{2}\left( \frac{(\mu _X-76.5)^2}{4.167^2}+ \frac{(\mu _Y-30)^2}{4^2}\right) \right] \\&\cdot (1+\rho )^{25-1} (1-\rho )^{30-1} (1-\rho ^2)^{-n/2}\\&\left( \frac{1}{\sigma ^2_X}\right) ^{2+1+n/2}\exp \left[ -\frac{4.167^2}{\sigma ^2_X}\right] \left( \frac{1}{\sigma ^2_Y}\right) ^{2+1+n/2}\exp \left[ -\frac{4^2}{\sigma ^2_Y}\right] \cdot \\&\exp \left[ -\frac{1}{2}\left( \frac{(\gamma _X-1)^2}{10}+\frac{(\gamma _Y-1)^2}{10}\right) \right] I(\gamma _X>0) \cdot I(\gamma _Y>0) \end{aligned}$$

which can be expressed equivalently as

$$\begin{aligned} \pi (\zeta |data)\propto & {} \frac{\prod _{j=1}^{n} w_{i}(x_j, y_j;\zeta )}{E_{f}^n[w_i(X,Y;\zeta )]} \cdot \\&\exp \left[ -\frac{1}{2(1-\rho ^2)} \sum _{j=1}^n \left[ \frac{(x_j-\mu _X)^2}{\sigma ^2_X}\right. \right. \\&\left. \left. + \frac{(y_j-\mu _Y)^2}{\sigma ^2_Y}-2\rho \frac{(x_j-\mu _X)(y_j-\mu _Y)}{\sigma _X\sigma _Y}\right] \right] \\&\exp \left[ -\frac{1}{2}\left( \frac{(\mu _X-76.5)^2}{4.167^2}+ \frac{(\mu _Y-30)^2}{4^2}\right) \right] \cdot \\&(1+\rho )^{24-n/2} (1-\rho )^{29-n/2}\\&\left( \frac{1}{\sigma ^2_X\sigma ^2_Y}\right) ^{3+n/2} \exp \left[ -\frac{4.167^2}{\sigma ^2_X}-\frac{4^2}{\sigma ^2_Y}\right] \cdot \\&\exp \left[ -\frac{1}{2}\left( \frac{(\gamma _X-1)^2}{10}+\frac{(\gamma _Y-1)^2}{10}\right) \right] I(\gamma _X>0) \cdot I(\gamma _Y>0). \end{aligned}$$

For the model $\mathcal {M}_{1f}$, i.e., $i=1$ and $\gamma _X, \ \gamma _Y$ strictly positive, the posterior density has the form

$$\begin{aligned} \pi (\zeta |data)\propto & {} \frac{\prod _{j=1}^{n} \left( 1 - \left( 1-\Phi \left( \frac{x_j-\mu _X}{\sigma _X}\right) ^{\gamma _X} \right) \left( 1-\Phi \left( \frac{y_j-\mu _Y}{\sigma _Y}\right) ^{\gamma _Y} \right) \right) }{E_{f}^n\left[ \left( 1 - \left( 1-\Phi \left( \frac{X-\mu _X}{\sigma _X}\right) ^{\gamma _X} \right) \left( 1-\Phi \left( \frac{Y-\mu _Y}{\sigma _Y}\right) ^{\gamma _Y} \right) \right) \right] } \cdot \\&\exp \left[ -\frac{1}{2(1-\rho ^2)} \sum _{j=1}^n \left[ \frac{(x_j-\mu _X)^2}{\sigma ^2_X}\right. \right. \\&\left. \left. +\frac{(y_j-\mu _Y)^2}{\sigma ^2_Y}-2\rho \frac{(x_j-\mu _X)(y_j-\mu _Y)}{\sigma _X\sigma _Y}\right] \right] \\&\exp \left[ -\frac{1}{2}\left( \frac{(\mu _X-76.5)^2}{4.167^2}+ \frac{(\mu _Y-30)^2}{4^2}\right) \right] \cdot \\&(1+\rho )^{24-n/2} (1-\rho )^{29-n/2}\\&\left( \frac{1}{\sigma ^2_X\sigma ^2_Y}\right) ^{3+n/2} \exp \left[ -\frac{4.167^2}{\sigma ^2_X}-\frac{4^2}{\sigma ^2_Y}\right] \cdot \\&\exp \left[ -\frac{1}{2}\left( \frac{(\gamma _X-1)^2}{10}+\frac{(\gamma _Y-1)^2}{10}\right) \right] I(\gamma _X>0) \cdot I(\gamma _Y>0). \end{aligned}$$

Due to the posterior’s form direct sampling from it or even sampling from a standard MCMC method is not an easy task. Thus, ABC methods are used.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Economou, P., Batsidis, A., Tzavelas, G. et al. Understanding the Sampling Bias: A Case Study on NBA Drafts. J Stat Theory Pract 15, 45 (2021). https://doi.org/10.1007/s42519-021-00167-2

Download citation

Accepted: 19 January 2021
Published: 23 March 2021
DOI: https://doi.org/10.1007/s42519-021-00167-2

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Understanding the Sampling Bias: A Case Study on NBA Drafts

Abstract

Access this article

Similar content being viewed by others

Sampling methods to estimate the Banzhaf–Owen value

A fast look-up method for Bayesian mean-parameterised Conway–Maxwell–Poisson regression models

Estimation of the Owen Value Based on Sampling

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest:

Additional information

Publisher's Note

Supplementary Information

Supplementary file1 (ZIP 6 kb)

Appendix

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Understanding the Sampling Bias: A Case Study on NBA Drafts

Abstract

Access this article

Similar content being viewed by others

Sampling methods to estimate the Banzhaf–Owen value

A fast look-up method for Bayesian mean-parameterised Conway–Maxwell–Poisson regression models

Estimation of the Owen Value Based on Sampling

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest:

Additional information

Publisher's Note

Supplementary Information

Supplementary file1 (ZIP 6 kb)

Appendix

Appendix

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation