Skip to main content

Advertisement

Log in

Handling covariates subject to limits of detection in regression

  • Published:
Environmental and Ecological Statistics Aims and scope Submit manuscript

Abstract

In the environmental health sciences, measurements of toxic exposures are often constrained by a lower limit called the limit of detection (LOD), with observations below this limit called non-detects. Although valid inference may be obtained by excluding non-detects in the estimation of exposure effects, this practice can lead to substantial reduction in power to detect a significant effect, depending on the proportion of censoring and the closeness of the effect size to the null value. Therefore, a variety of methods have been commonly used in the environmental science literature to substitute values for the non-detects for the purpose of estimating exposure effects, including ad hoc values such as \({LOD/2, LOD/\sqrt{2}}\) and LOD. Another method substitutes the expected value of the non-detects, i.e., E[X|X ≤ LOD] but this requires that the inference be robust to mild miss-specifications in the distribution of the exposure variable. In this paper, we demonstrate that the estimate of the exposure effect is extremely sensitive to ad-hoc substitutions and moderate distribution miss-specifications under the conditions of large sample sizes and moderate effect size, potentially leading to biased estimates. We propose instead the use of the generalized gamma distribution to estimate imputed values for the non-detects, and show that this method avoids the risk of distribution miss-specification among the class of distributions represented by the generalized gamma distribution. A multiple imputation-based procedure is employed to estimate the regression parameters. Compared to the method of excluding non-detects, the proposed method can substantially increase the power to detect a significant effect when the effect size is close to the null value in small samples with moderate levels of censoring ( ≤ 50%), without compromising the coverage and relative bias of the estimates.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Baccarelli A, Pfeiffer R, Consonni D, Pesatori A, Bonzini M, Patterson D Jr, Bertazzi P, Landi M (2005) Handling of dioxin measurement data in the presence of non-detectable values: overview of available methods and their application in the Seveso chloracne study. Chemosphere 60(7): 898–906

    Article  PubMed  CAS  Google Scholar 

  • Carroll R, Ruppert D, Stefanski L, Crainiceanu C (2006) Measurement error in nonlinear models: a modern perspective. Chapman and Hall/CRC, Boca Raton

    Book  Google Scholar 

  • Cox C, Chu H, Schneider M, Muñoz A (2007) Parametric survival analysis and taxonomy of hazard functions for the generalized gamma distribution. Stat Med 26(23): 4352–4374

    Article  PubMed  Google Scholar 

  • Gillespie B, Chen Q, Reichert H, Franzblau A, Hedgeman E, Lepkowski J, Adriaens P, Demond A, Luksemburg W, Garabrant D (2010) Estimating population distributions when some data are below a limit of detection by using a reverse Kaplan-Meier estimator. Epidemiology 21(4): S64

    Article  PubMed  Google Scholar 

  • Gilliom R, Helsel D (1986) Estimation of distributional parameters for censored trace level water quality data 1. estimation techniques. Water Resour Res. http://www.agu.org/pubs/crossref/1986/WR022i002p00135.shtml

  • Gomes O, Combes C, Dussauchoy A (2008) Parameter estimation of the generalized gamma distribution. Math Comput Simul 79(4): 955–963

    Article  Google Scholar 

  • Helsel D (1990) Less than obvious-statistical treatment of data below the detection limit. Environ Sci Technol. http://pubs.acs.org/doi/abs/10.1021/es00082a001

  • Helsel D (2005) Nondetects and data analysis: statistics for censored environmental data. Wiley-Blackwell, Hoboken

    Google Scholar 

  • Helsel D, Cohn T (1988) Estimation of descriptive statistics for multiply censored water quality data. Water Resour Res. http://www.agu.org/pubs/crossref/1988/WR024i012p01997.shtml

  • Hughes J (1999) Mixed effects models with censored data with application to hiv rna levels. Biometrics. http://www3.interscience.wiley.com/journal/119061990/abstract

  • Jacqmin-Gadda H, Thiebaut R, Chene G (2000) Analysis of left-censored longitudinal data with application to viral load in hiv infection. Biostatistics. http://biostatistics.oxfordjournals.org/cgi/content/abstract/1/4/355

  • Jassal S, Kritz-Silverstein D, Barrett-Connor E (2010) A prospective study of albuminuria and cognitive function in older adults: the Rancho Bernardo Study. Am J Epidemiol 171: 277–286

    Article  PubMed  Google Scholar 

  • Leith K, Bowerman W, Wierda M, Best D, Grubb T, Sikarske J (2010) A comparison of techniques for assessing central tendency in left-censored data using PCB and p, p’DDE contaminant concentrations from Michigan’s Bald Eagle Biosentinel Program. Chemosphere 80: 7–12

    Article  PubMed  CAS  Google Scholar 

  • Little R (1992) Regression with missing X’s: a review. J Am Stat Assoc 87(420): 1227–1237

    Google Scholar 

  • Lubin J, Colt J, Camann D, Davis S, Cerhan J, Severson R, Bernstein L, Hartge P (2004) Epidemiologic evaluation of measurement data in the presence of detection limits. Environ Health Perspect 112(17): 1691

    Article  PubMed  CAS  Google Scholar 

  • Lyles R, Lyles C, Taylor D (2000) Random regression models for human immunodeficiency virus ribonucleic acid data subject to left censoring and informative drop-outs. J R Stat Soc Ser C. http://www3.interscience.wiley.com/journal/119037970/abstract

  • Lynn H (2001) Maximum likelihood inference for left-censored HIV RNA data. Stat Med 20(1): 33–45

    Article  PubMed  CAS  Google Scholar 

  • Nadarajah S, Kotz S (2006) R programs for computing truncated distributions. J Stat Softw 16: 273–278

    Google Scholar 

  • Navas-Acien A, Tellez-Plaza M, Guallar E, Muntner P, Silbergeld E, Jaar B, Weaver V (2009) Blood cadmium and lead and chronic kidney disease in US adults: a joint analysis. Am J Epidemiol 170: 1156–1164

    Article  PubMed  Google Scholar 

  • Neta G, von Ehrenstein O, Goldman L, Lum K, Sundaram R, Andrews W, Zhang J (2010) Umbilical cord serum cytokine levels and risks of small-for-gestational-age and preterm birth. Am J Epidemiol 171(8): 859

    Article  PubMed  Google Scholar 

  • Nie L, Chu H, Liu C, Cole SR, Vexler A, Schisterman EF (2010) Linear regression with an independent variable subject to a detection limit. Epidemiology 21: S17–S24. doi:10.1097/EDE.0b013e3181ce97d8

    Article  PubMed  Google Scholar 

  • Prentice R (1974) A log gamma model and its maximum likelihood estimation. Biometrika 61(3): 539

    Article  Google Scholar 

  • R Development Core Team (2010) R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria, http://www.R-project.org, ISBN 3-900051-07-0

  • Raghunathan T, Lepkowski J, Van Hoewyk J, Solenberger P (2001) A multivariate technique for multiply imputing missing values using a sequence of regression models. Survey Methodol 27(1): 85–96

    Google Scholar 

  • Richardson D, Ciampi A (2003) Effects of exposure measurement error when an exposure variable is constrained by a lower limit. Am J Epidemiol. http://aje.oxfordjournals.org/cgi/content/abstract/157/4/355

  • Rubin D (2004) Multiple imputation for nonresponse in surveys. John Wiley and Sons Inc, Hoboken

    Google Scholar 

  • Schafer J (1999) Multiple imputation: a primer. Stat Methods Med Res 8(1): 3

    Article  PubMed  CAS  Google Scholar 

  • Schisterman E, Vexler A, Whitcomb B, Liu A (2006) The limitations due to exposure detection limits for regression models. Am J Epidemiol. 163:374–383. http://aje.oxfordjournals.org/cgi/content/abstract/163/4/374

    Google Scholar 

  • Stacy E, Mihram G (1965) Parameter estimation for a generalized gamma distribution. Technometrics 7(3): 349–358

    Article  Google Scholar 

  • Stein C, Savitz D, Dougan M (2009) Serum levels of perfluorooctanoic acid and perfluorooctane sulfonate and pregnancy outcome. Am J Epidemiol 170(7): 837

    Article  PubMed  Google Scholar 

  • Sutton-Tyrrell K, Zhao X, Santoro N, Lasley B, Sowers M, Johnston J, Mackey R, Matthews K (2010) Reproductive hormones and obesity: 9 years of observation from the study of women’s health across the nation. Am J Epidemiol 171: 1203–1213

    Article  PubMed  Google Scholar 

  • Waller L, Turnbull B (1992) Probability plotting with censored data. Am Stat 46: 5–12

    Google Scholar 

  • Wannemuehler K, Lyles R (2005) A unified model for covariate measurement error adjustment in an occupational health study while accounting for non-detectable exposures. J R Stat Soc Ser C Appl Stat 54(1):259–271. http://www.jstor.org/stable/3592611

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Srikesh G. Arunajadai.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Arunajadai, S.G., Rauh, V.A. Handling covariates subject to limits of detection in regression. Environ Ecol Stat 19, 369–391 (2012). https://doi.org/10.1007/s10651-012-0191-6

Download citation

  • Received:

  • Revised:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10651-012-0191-6

Keywords

Navigation