Abstract
In the environmental health sciences, measurements of toxic exposures are often constrained by a lower limit called the limit of detection (LOD), with observations below this limit called non-detects. Although valid inference may be obtained by excluding non-detects in the estimation of exposure effects, this practice can lead to substantial reduction in power to detect a significant effect, depending on the proportion of censoring and the closeness of the effect size to the null value. Therefore, a variety of methods have been commonly used in the environmental science literature to substitute values for the non-detects for the purpose of estimating exposure effects, including ad hoc values such as \({LOD/2, LOD/\sqrt{2}}\) and LOD. Another method substitutes the expected value of the non-detects, i.e., E[X|X ≤ LOD] but this requires that the inference be robust to mild miss-specifications in the distribution of the exposure variable. In this paper, we demonstrate that the estimate of the exposure effect is extremely sensitive to ad-hoc substitutions and moderate distribution miss-specifications under the conditions of large sample sizes and moderate effect size, potentially leading to biased estimates. We propose instead the use of the generalized gamma distribution to estimate imputed values for the non-detects, and show that this method avoids the risk of distribution miss-specification among the class of distributions represented by the generalized gamma distribution. A multiple imputation-based procedure is employed to estimate the regression parameters. Compared to the method of excluding non-detects, the proposed method can substantially increase the power to detect a significant effect when the effect size is close to the null value in small samples with moderate levels of censoring ( ≤ 50%), without compromising the coverage and relative bias of the estimates.
Similar content being viewed by others
References
Baccarelli A, Pfeiffer R, Consonni D, Pesatori A, Bonzini M, Patterson D Jr, Bertazzi P, Landi M (2005) Handling of dioxin measurement data in the presence of non-detectable values: overview of available methods and their application in the Seveso chloracne study. Chemosphere 60(7): 898–906
Carroll R, Ruppert D, Stefanski L, Crainiceanu C (2006) Measurement error in nonlinear models: a modern perspective. Chapman and Hall/CRC, Boca Raton
Cox C, Chu H, Schneider M, Muñoz A (2007) Parametric survival analysis and taxonomy of hazard functions for the generalized gamma distribution. Stat Med 26(23): 4352–4374
Gillespie B, Chen Q, Reichert H, Franzblau A, Hedgeman E, Lepkowski J, Adriaens P, Demond A, Luksemburg W, Garabrant D (2010) Estimating population distributions when some data are below a limit of detection by using a reverse Kaplan-Meier estimator. Epidemiology 21(4): S64
Gilliom R, Helsel D (1986) Estimation of distributional parameters for censored trace level water quality data 1. estimation techniques. Water Resour Res. http://www.agu.org/pubs/crossref/1986/WR022i002p00135.shtml
Gomes O, Combes C, Dussauchoy A (2008) Parameter estimation of the generalized gamma distribution. Math Comput Simul 79(4): 955–963
Helsel D (1990) Less than obvious-statistical treatment of data below the detection limit. Environ Sci Technol. http://pubs.acs.org/doi/abs/10.1021/es00082a001
Helsel D (2005) Nondetects and data analysis: statistics for censored environmental data. Wiley-Blackwell, Hoboken
Helsel D, Cohn T (1988) Estimation of descriptive statistics for multiply censored water quality data. Water Resour Res. http://www.agu.org/pubs/crossref/1988/WR024i012p01997.shtml
Hughes J (1999) Mixed effects models with censored data with application to hiv rna levels. Biometrics. http://www3.interscience.wiley.com/journal/119061990/abstract
Jacqmin-Gadda H, Thiebaut R, Chene G (2000) Analysis of left-censored longitudinal data with application to viral load in hiv infection. Biostatistics. http://biostatistics.oxfordjournals.org/cgi/content/abstract/1/4/355
Jassal S, Kritz-Silverstein D, Barrett-Connor E (2010) A prospective study of albuminuria and cognitive function in older adults: the Rancho Bernardo Study. Am J Epidemiol 171: 277–286
Leith K, Bowerman W, Wierda M, Best D, Grubb T, Sikarske J (2010) A comparison of techniques for assessing central tendency in left-censored data using PCB and p, p’DDE contaminant concentrations from Michigan’s Bald Eagle Biosentinel Program. Chemosphere 80: 7–12
Little R (1992) Regression with missing X’s: a review. J Am Stat Assoc 87(420): 1227–1237
Lubin J, Colt J, Camann D, Davis S, Cerhan J, Severson R, Bernstein L, Hartge P (2004) Epidemiologic evaluation of measurement data in the presence of detection limits. Environ Health Perspect 112(17): 1691
Lyles R, Lyles C, Taylor D (2000) Random regression models for human immunodeficiency virus ribonucleic acid data subject to left censoring and informative drop-outs. J R Stat Soc Ser C. http://www3.interscience.wiley.com/journal/119037970/abstract
Lynn H (2001) Maximum likelihood inference for left-censored HIV RNA data. Stat Med 20(1): 33–45
Nadarajah S, Kotz S (2006) R programs for computing truncated distributions. J Stat Softw 16: 273–278
Navas-Acien A, Tellez-Plaza M, Guallar E, Muntner P, Silbergeld E, Jaar B, Weaver V (2009) Blood cadmium and lead and chronic kidney disease in US adults: a joint analysis. Am J Epidemiol 170: 1156–1164
Neta G, von Ehrenstein O, Goldman L, Lum K, Sundaram R, Andrews W, Zhang J (2010) Umbilical cord serum cytokine levels and risks of small-for-gestational-age and preterm birth. Am J Epidemiol 171(8): 859
Nie L, Chu H, Liu C, Cole SR, Vexler A, Schisterman EF (2010) Linear regression with an independent variable subject to a detection limit. Epidemiology 21: S17–S24. doi:10.1097/EDE.0b013e3181ce97d8
Prentice R (1974) A log gamma model and its maximum likelihood estimation. Biometrika 61(3): 539
R Development Core Team (2010) R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria, http://www.R-project.org, ISBN 3-900051-07-0
Raghunathan T, Lepkowski J, Van Hoewyk J, Solenberger P (2001) A multivariate technique for multiply imputing missing values using a sequence of regression models. Survey Methodol 27(1): 85–96
Richardson D, Ciampi A (2003) Effects of exposure measurement error when an exposure variable is constrained by a lower limit. Am J Epidemiol. http://aje.oxfordjournals.org/cgi/content/abstract/157/4/355
Rubin D (2004) Multiple imputation for nonresponse in surveys. John Wiley and Sons Inc, Hoboken
Schafer J (1999) Multiple imputation: a primer. Stat Methods Med Res 8(1): 3
Schisterman E, Vexler A, Whitcomb B, Liu A (2006) The limitations due to exposure detection limits for regression models. Am J Epidemiol. 163:374–383. http://aje.oxfordjournals.org/cgi/content/abstract/163/4/374
Stacy E, Mihram G (1965) Parameter estimation for a generalized gamma distribution. Technometrics 7(3): 349–358
Stein C, Savitz D, Dougan M (2009) Serum levels of perfluorooctanoic acid and perfluorooctane sulfonate and pregnancy outcome. Am J Epidemiol 170(7): 837
Sutton-Tyrrell K, Zhao X, Santoro N, Lasley B, Sowers M, Johnston J, Mackey R, Matthews K (2010) Reproductive hormones and obesity: 9 years of observation from the study of women’s health across the nation. Am J Epidemiol 171: 1203–1213
Waller L, Turnbull B (1992) Probability plotting with censored data. Am Stat 46: 5–12
Wannemuehler K, Lyles R (2005) A unified model for covariate measurement error adjustment in an occupational health study while accounting for non-detectable exposures. J R Stat Soc Ser C Appl Stat 54(1):259–271. http://www.jstor.org/stable/3592611
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Arunajadai, S.G., Rauh, V.A. Handling covariates subject to limits of detection in regression. Environ Ecol Stat 19, 369–391 (2012). https://doi.org/10.1007/s10651-012-0191-6
Received:
Revised:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10651-012-0191-6