Skip to main content
Log in

On Efficient Design of Pilot Experiment for Generalized Linear Models

  • Original Article
  • Published:
Journal of Statistical Theory and Practice Aims and scope Submit manuscript

Abstract

The experimental design for a generalized linear model (GLM) is important but challenging since the design criterion often depends on model specification including the link function, the linear predictor, and the unknown regression coefficients. Prior to constructing locally or globally optimal designs, a pilot experiment is usually conducted to provide some insights on the model specifications. In pilot experiments, little information on the model specification of GLM is available. Surprisingly, there is very limited research on the design of pilot experiments for GLMs. In this work, we obtain some theoretical understanding of the design efficiency in pilot experiments for GLMs. Guided by the theory, we propose to adopt a low-discrepancy design with respect to some target distribution for pilot experiments. The performance of the proposed design is assessed through several numerical examples.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

References

  1. Amzal B, Bois FY, Parent E, Robert CP (2006) Bayesian-optimal design via interacting particle systems. J Am Stat Assoc 101:773–785

    Article  MathSciNet  Google Scholar 

  2. Aronszajn N (1950) Theory of reproducing kernels. Trans Am Math Soc 68:337–404

    Article  MathSciNet  Google Scholar 

  3. Atkinson A, Donev A, Tobias R (2007) Optimum experimental designs, with SAS, vol 34. Oxford University Press, Oxford

    MATH  Google Scholar 

  4. Atkinson AC, Woods DC (2015) Designs for generalized linear models. In: Handbook of design and analysis of experiments, pp 471–514

  5. Berlinet A, Thomas-Agnan C (2004) Reproducing kernel Hilbert spaces in probability and statistics. Kluwer Academic Publishers, Boston

    Book  Google Scholar 

  6. Dean A, Morris M, Stufken J, Bingham D (2015) Handbook of design and analysis of experiments, vol 7. CRC Press, Boca Raton

    Book  Google Scholar 

  7. Deng X, Jin R (2015) QQ models: joint modeling for quantitative and qualitative quality responses in manufacturing systems. Technometrics 57:320–331

    Article  MathSciNet  Google Scholar 

  8. Dette H (1997) Designing experiments with respect to ‘standardized’ optimality criteria. J R Stat Soc Ser B (Stat Methodol) 59:97–110

    Article  MathSciNet  Google Scholar 

  9. Fang K-T, Lin DK, Winker P, Zhang Y (2000) Uniform design: theory and application. Technometrics 42:237–248

    Article  MathSciNet  Google Scholar 

  10. Fasshauer GE (2007) Meshfree approximation methods with Matlab. Interdisciplinary mathematical sciences, vol 6. World Scientific Publishing Co., Singapore

  11. Fedorov VV (1972) Theory of optimal experiments. Academic Press, New York

    Google Scholar 

  12. Hickernell FJ (1996) The mean square discrepancy of randomized nets. ACM Trans Model Comput Simul (TOMACS) 6:274–296

    Article  Google Scholar 

  13. Hickernell FJ (1998) A generalized discrepancy and quadrature error bound. Math Comput 67:299–322

    Article  MathSciNet  Google Scholar 

  14. Hickernell FJ (1999) Goodness-of-fit statistics, discrepancies and robust designs. Stat Probab Lett 44:73–78

    Article  MathSciNet  Google Scholar 

  15. Hickernell FJ, Liu M-Q (2002) Uniform designs limit aliasing. Biometrika 89:893–904

    Article  MathSciNet  Google Scholar 

  16. Iman RL, Conover W-J (1982) A distribution-free approach to inducing rank correlation among input variables. Commun Stat Simul Comput 11:311–334

    Article  Google Scholar 

  17. Imhof L, Wong WK (2000) A graphical method for finding maximin efficiency designs. Biometrics 56:113–117

    Article  Google Scholar 

  18. Joseph VR, Gul E, Ba S (2015) Maximum projection designs for computer experiments. Biometrika 102:371–380

    Article  MathSciNet  Google Scholar 

  19. Kang L, Kang X, Deng X, Jin R (2018) A Bayesian hierarchical model for quantitative and qualitative responses. J Qual Technol 50:290–308

    Article  Google Scholar 

  20. Li Y, Deng X (2020) An efficient algorithm for Elastic I-optimal design of generalized linear models. Can J Stat 49:438–470

    Article  MathSciNet  Google Scholar 

  21. Li Y, Kang L, Deng X (2020) A maximin \(\Phi _p\)-efficient design for multivariate GLM. Stat Sin. https://doi.org/10.5705/ss.202020.0278

  22. Li Y, Kang L, Hickernell FJ (2020) Is a transformed low discrepancy design also low discrepancy? Contemporary experimental design, multivariate analysis and data mining. Springer, Cham, pp 69–92

    Chapter  Google Scholar 

  23. Mao X, Chen SX, Wong RK (2019) Matrix completion with covariate information. J Am Stat Assoc 114:198–210

    Article  MathSciNet  Google Scholar 

  24. McKay MD, Beckman RJ, Conover WJ (2000) A comparison of three methods for selecting values of input variables in the analysis of output from a computer code. Technometrics 42:55–61

    Article  Google Scholar 

  25. Morris MD, Mitchell TJ (1995) Exploratory designs for computational experiments. J Stat Plan Inference 43:381–402

    Article  Google Scholar 

  26. Nelder J, Wedderburn R (1972) Generalized linear models. J R Stat Soc Ser B 135:370–384

    Google Scholar 

  27. Niederreiter H (1988) Low-discrepancy and low-dispersion sequences. J Number Theor 30:51–70

    Article  MathSciNet  Google Scholar 

  28. Novak E, Wozniakowski H (2001) When are integration and discrepancy tractable? London mathematical society lecture note series, pp 211–266

  29. Owen AB (1994) Controlling correlations in Latin hypercube samples. J Am Stat Assoc 89:1517–1522

    Article  Google Scholar 

  30. Owen AB (2000) Monte Carlo, quasi-Monte carlo, and randomized quasi-Monte Carlo. Monte-Carlo and Quasi-Monte Carlo Methods 1998:86–97

    MathSciNet  MATH  Google Scholar 

  31. Pukelsheim F (1993) Optimal design of experiments. Wiley, Hoboken

    MATH  Google Scholar 

  32. Shen S, Kang L, Deng X (2020) Additive heredity model for the analysis of mixture-of-mixtures experiments. Technometrics 62:265–276

    Article  MathSciNet  Google Scholar 

  33. Tang B (1998) Selecting Latin hypercubes using correlation criteria. Stat Sin 8:965–977

    MathSciNet  MATH  Google Scholar 

  34. Tekle FB, Tan FE, Berger MP (2008) Maximin D-optimal designs for binary longitudinal responses. Comput Stat Data Anal 52:5253–5262

    Article  MathSciNet  Google Scholar 

  35. Winker P, Fang K-T (1997) Application of threshold-accepting to the evaluation of the discrepancy of a set of points. SIAM J Numer Anal 34:2028–2042

    Article  MathSciNet  Google Scholar 

  36. Woods DC, Lewis SM (2011) Continuous optimal designs for generalized linear models under model uncertainty. J Stat Theor Pract 5:137–145

    Article  MathSciNet  Google Scholar 

  37. Woods DC, Lewis SM, Eccleston JA, Russell K (2006) Designs for generalized linear models with several variables and model uncertainty. Technometrics 48:284–292

    Article  MathSciNet  Google Scholar 

  38. Woods DC, Overstall AM, Adamou M, Waite TW (2017) Bayesian design of experiments for generalized linear models and dimensional analysis with industrial and scientific application. Qual Eng 29:91–103

    Google Scholar 

  39. Wu CJ, Hamada MS (2011) Experiments: planning, analysis, and optimization, vol 552. Wiley, Hoboken

    MATH  Google Scholar 

  40. Yang M, Biedermann S, Tang E (2013) On optimal designs for nonlinear models: a general and efficient algorithm. J Am Stat Assoc 108:1411–1420

    Article  MathSciNet  Google Scholar 

  41. Zeng Y, Chen X, Deng X, Jin R (2021) A prediction-oriented optimal design for visualization recommender system. Stat Theor Relat Fields 5(2):134–148

Download references

Acknowledgements

The authors would like to sincerely thank the Associate Editor and reviewers for their insightful comments. Deng’s work was partly supported by National Science Foundation CISE Expedition Grant CCF-1918770.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Xinwei Deng.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This article is part of the topical collection “Special Issue: State of the art in research on design and analysis of experiments” guest edited by John Stufken, Abhyuday Mandal, and Rakhi Singh.

Appendix

Appendix

Derivation of the discrepancy in (10).

We first consider the case \(d=1\). We integrate the kernel once:

$$\begin{aligned} \int _{-1}^1 K(t,x) \, \mathrm{d}F_{unif }(t)=&\frac{1}{2}\int _{-1}^1 \left[ 1+\frac{1}{2}(|t|+|x|-|t-x|)\right] \, \mathrm{d}t\\ =&\frac{1}{2}\left[ 2+|x|+\frac{1}{2}-\frac{1}{2}\left( \int _{-1}^x(x-t)\mathrm{d}t+\int _x^1 (t-x)\mathrm{d}t\right) \right] \\ =&\frac{1}{2}\left[ \frac{5}{2}+|x|-\frac{1}{2}\left( x^2+1\right) \right] \\ =&\frac{1}{2}\left( 2+|x|-\frac{1}{2}x^2\right) . \end{aligned}$$

Then we integrate once more:

$$\begin{aligned} {\int _{-1}^1 \int _{-1}^1 K(t,x) \, \mathrm{d}F_{unif }(t) \mathrm{d}F_{unif }(x)}&= \int _{-1}^{1} \frac{1}{4}\left( 2+|x|-\frac{1}{2}x^2\right) \, \mathrm{d}x\\&= \frac{7}{6}. \end{aligned}$$

Generalizing this to the d-dimensional case yields

$$\begin{aligned}&\int _{[-1,1]^d\times [-1,1]^d} K({\varvec{x}},{\varvec{t}}) \, \mathrm{d}F_{unif }({\varvec{x}})\mathrm{d}F_{unif }({\varvec{t}}) = \left( \frac{7}{6}\right) ^d, \\&\int _{[-1,1]^d}K({\varvec{x}},{\varvec{x}}_i) \, \mathrm{d}F_{unif }({\varvec{x}}) = \frac{1}{2^d}\prod \limits _{j=1}^d \left( 2+|x_{ij}|-\frac{1}{2}x_{ij}^2\right) . \end{aligned}$$

Thus, the discrepancy of a design \(\xi \) for the uniform distribution on \([-1,1]^d\) is

$$\begin{aligned} D^2(\xi ; F_unif )&= \left( \frac{7}{6}\right) ^d - \frac{1}{2^{d-1}n}\sum _{i=1}^m n_i \prod _{j=1}^d \left[ 2+|x_{ij}|-\frac{x_{ij}^2}{2} \right] \nonumber \\&\qquad + \frac{1}{n^2}\sum _{i,k=1}^m n_in_k\prod _{j=1}^d\left[ 1+\frac{1}{2}\left( |x_{ij}|+|x_{kj}|-|x_{ij}-x_{kj}| \right) \right] , \end{aligned}$$

Derivation of the discrepancy in (11).

Following the same procedure as the derivation of \(D^2(\xi ; F_unif )\),

$$\begin{aligned}&\int _{-1}^1 K(t,x) \, \mathrm{d}F_{asin }(t) = 1+\frac{1}{\pi }+\frac{1}{2}|x|-\frac{1}{\pi }\left( x\arcsin (x)+\sqrt{1-x^2}\right) , \\&{\int _{-1}^1 \int _{-1}^1 K(t,x) \, \mathrm{d}F_{asin }(t) \mathrm{d}F_{asin }(x)} = 1+\frac{2}{\pi }-\frac{4}{\pi ^2}, \end{aligned}$$

and thus, the discrepancy of a design \(\xi \) for the arcsine distribution on \([-1,1]^d\) is

$$\begin{aligned} D^2(\xi ; F_asin )&= \left( 1+\frac{2}{\pi }-\frac{4}{\pi ^2}\right) ^d - \frac{2}{n}\sum _{i=1}^m n_i \prod _{j=1}^d \left[ 1+\frac{1}{\pi }\right. \\&\qquad \left. +\frac{1}{2}|x_{ij}|-\frac{1}{\pi }\left( x_{ij}\arcsin (x_{ij}) +\sqrt{1-x_{ij}^2} \right) \right] \nonumber \\&\quad + \frac{1}{n^2}\sum _{i,k=1}^m n_in_k\prod _{j=1}^d\left[ 1+\frac{1}{2}\left( |x_{ij}|+|x_{kj}|-|x_{ij}-x_{kj}| \right) \right] . \end{aligned}$$

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Li, Y., Deng, X. On Efficient Design of Pilot Experiment for Generalized Linear Models. J Stat Theory Pract 15, 83 (2021). https://doi.org/10.1007/s42519-021-00222-y

Download citation

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s42519-021-00222-y

Keywords

Navigation