Skip to main content
Log in

A semi-parametric method for transforming data to normality

  • Published:
Statistics and Computing Aims and scope Submit manuscript

Abstract

A non-parametric transformation function is introduced to transform data to any continuous distribution. When transformation of data to normality is desired, the use of a suitable parametric pre-transformation function improves the performance of the proposed non-parametric transformation function. The resulting semi-parametric transformation function is shown empirically, via a Monte Carlo study, to perform at least as well as any parametric transformation currently available in the literature.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Altman, N., Léger, C.: Bandwidth selection for kernel distribution function estimation. J. Stat. Plan. Inference 46, 195–214 (1995)

    Article  MATH  Google Scholar 

  • Atkinson, A.C.: Plots, Transformations and Regression. Clarendon/Oxford University Press, Oxford (1985)

    MATH  Google Scholar 

  • Atkinson, A.C., Pericchi, L.R., Smith, R.L.: Grouped likelihood for the shifted power transformation. J. Roy. Stat. Soc. Ser. B 53, 473–482 (1991)

    MATH  MathSciNet  Google Scholar 

  • Azzalini, A.: A note on the estimation of a distribution function and quantiles by a kernel method. Biometrika 68, 326–328 (1981)

    Article  MathSciNet  Google Scholar 

  • Bickel, P.J., Doksum, K.A.: An analysis of transformations revisited. J. Am. Stat. Assoc. 76, 296–311 (1981)

    Article  MATH  MathSciNet  Google Scholar 

  • Boos, D.D.: Rates of convergence for the distance between distribution function estimators. Metrika 33, 197–202 (1986)

    Article  MATH  MathSciNet  Google Scholar 

  • Bowman, A., Hall, P., Prvan, T.: Bandwidth selection for the smoothing of distribution functions. Biometrika 85, 799–808 (1998)

    Article  MATH  MathSciNet  Google Scholar 

  • Box, G.E.P., Cox, D.R.: An analysis of transformations. J. Roy. Stat. Soc. Ser. B 26, 211–252 (1964)

    MATH  MathSciNet  Google Scholar 

  • Burdige, J.B., Magee, L., Robb, A.L.: Alternative transformations to handle extreme values of the dependent variable. J. Am. Stat. Assoc. 83, 123–127 (1988)

    Article  Google Scholar 

  • Cheng, R.C.H., Amin, N.A.K.: Estimating parameters in continuous univariate distributions with a shifted origin. J. Roy. Stat. Soc. Ser. B 45, 394–403 (1983)

    MATH  MathSciNet  Google Scholar 

  • Chu, I.-S.: Bootstrap smoothing parameter selection for distribution function estimation. Math. Japonica 41, 189–197 (1995)

    MATH  Google Scholar 

  • D’Agostino, R.B., Stephens, M.A.: Goodness-of-Fit Techniques. Marcel Dekker, New York (1986)

    MATH  Google Scholar 

  • Dony, J., Einmahl, U., Mason, D.M.: Uniform in bandwidth consistency of local polynomial regression function estimators. Austrian J. Stat. 35, 105–120 (2006)

    Google Scholar 

  • Einmahl, U., Mason, D.M.: Uniform in bandwidth consistency of kernel-type function estimators. Ann. Stat. 33, 1380–1403 (2005)

    Article  MATH  MathSciNet  Google Scholar 

  • Gaudard, M., Karson, M.: On estimating the Box–Cox transformation to normality. Commun. Stat. Simul. Comput. 29, 559–582 (2000)

    Article  MATH  Google Scholar 

  • John, J.A., Draper, N.R.: An alternative family of transformations. J. Roy. Stat. Soc. Ser. C 29, 190–197 (1980)

    MATH  Google Scholar 

  • Johnson, N.L.: Systems of frequency curves generated by methods of translation. Biometrika 36, 149–176 (1949)

    MATH  MathSciNet  Google Scholar 

  • Jones, M.C.: The performance of kernel density functions in kernel distribution function estimation. Stat. Probab. Lett. 9, 129–132 (1990)

    Article  MATH  Google Scholar 

  • Koekemoer, G.: A new method for transforming data to normality with application to density estimation. Ph.D. thesis, Potchefstroom University (2004)

  • Manley, B.F.: Exponential data transformations. Statistician 25, 37–42 (1976)

    Article  Google Scholar 

  • Marron, J.S., Wand, M.P.: Exact mean integrated squared error. Ann. Stat. 20, 712–736 (1992)

    Article  MATH  MathSciNet  Google Scholar 

  • Polansky, A.M., Baker, E.R.: Multistage plug-in bandwidth selection for kernel distribution function estimates. J. Stat. Comput. Simul. 65, 63–80 (2000)

    Article  MATH  MathSciNet  Google Scholar 

  • Reiss, R.D.: Nonparametric estimation of smooth distribution functions. Scand. J. Stat. 8, 116–119 (1981)

    MathSciNet  Google Scholar 

  • Ruppert, D., Cline, D.B.H.: Bias reduction in kernel density estimation by smoothed empirical transformations. Ann. Stat. 22, 185–210 (1994)

    Article  MATH  MathSciNet  Google Scholar 

  • Ruppert, D., Wand, M.P.: Correcting for kurtosis in density estimation. Australian J. Stat. 34, 19–29 (1992)

    Article  MATH  MathSciNet  Google Scholar 

  • Sakia, R.M.: The Box–Cox transformation technique: a review. Statistician 41, 169–178 (1992)

    Article  Google Scholar 

  • Sarda, P.: Smoothing parameter selection for smooth distribution functions. J. Stat. Plan. Inference 35, 65–75 (1993)

    Article  MATH  MathSciNet  Google Scholar 

  • Serfling, R.J.: Properties and applications of metrics on nonparametric density estimators. In: Proceedings of the International Colloquium on Nonparametric Statistical Inference, Budapest, pp. 859–873. North-Holland, Amsterdam (1980)

    Google Scholar 

  • Shapiro, S.S., Wilk, M.B.: An analysis of variance test for normality. Biometrika 52, 591–611 (1965)

    MATH  MathSciNet  Google Scholar 

  • Swanepoel, J.W.H.: Mean integrated squared error properties and optimal kernels when estimating a distribution function. Commun. Stat. Theory Methods 17, 3785–3799 (1988)

    Article  MATH  MathSciNet  Google Scholar 

  • Titterington, D.M.: Comment on ‘Estimating parameters in continuous univariate distributions’. J. Roy. Stat. Soc. Ser. B 47, 115–116 (1985)

    MathSciNet  Google Scholar 

  • Tukey, J.W.: The comparative anatomy of transformations. Ann. Math. Stat. 28, 602–632 (1957)

    Article  MathSciNet  Google Scholar 

  • van der Vaart, A.W.: Asymptotic Statistics. Cambridge University Press, Cambridge (1998)

    MATH  Google Scholar 

  • van Graan, F.C.: Nie-parametriese beraming van verdelingsfunksies. Master’s thesis, P.U. for C.H.E. Potchefstroom (1982)

  • Yang, L.: Root-n convergent transformation-kernel density estimation. J. Nonparametric Stat. 12, 447–474 (2000)

    Article  MATH  Google Scholar 

  • Yang, L., Marron, J.S.: Iterated transformation kernel density estimation. J. Am. Stat. Assoc. 94, 580–589 (1999)

    Article  MATH  MathSciNet  Google Scholar 

  • Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87, 954–959 (2000)

    Article  MATH  MathSciNet  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Gerhard Koekemoer.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Koekemoer, G., Swanepoel, J.W.H. A semi-parametric method for transforming data to normality. Stat Comput 18, 241–257 (2008). https://doi.org/10.1007/s11222-008-9053-3

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11222-008-9053-3

Keywords

Navigation