Abstract
Marketing research operates with multivariate data for solving such problems as market segmentation, estimating purchasing power of a market sector, modeling attrition. In many cases, the data collected or supplied for these purposes may have a number of missing entries.The paper is devoted to an empirical evaluation of method for imputation of missing data in the so-called nearest neighbour of least-squares approximation approach, a non-parametric computationally efficient multidimensional technique. We make contributions to each of the two components of the experiment setting: (a) An empirical evaluation of the nearest neighbour in least-squares data imputation algorithm for marketing research (b) experimental comparisons with expectation–maximization (EM) algorithm and multiple imputation (MI) using real marketing data sets. Specifically, we review “global” methods for least-squares data imputation and propose extensions to them based on the nearest neighbours (NN) approach. It appears that NN in the least-squares data imputation algorithm almost always outperforms EM algorithm and is comparable to the multiple imputation approach.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Aha, D.: Editorial. Artif. Intel. Rev. 11, 1–6 (1997)
Dempster, A.P., Laird, N.M., Rubin, D.B.: Maximum likelihood from incomplete data via the EM algorithm. J. R. Stat. Soc. 39, 1–38 (1977)
EM Based Imputation Software. http://www.stat.psu.edu/jls/misoftwa.html, http://methcenter.psu.edu/EMCOV.html (1995)
Gabriel, K.R., Zamir, S.: Lower rank approximation of matrices by least squares with any choices of weights. Technometrics 21, 489–298 (1979)
Golub, G.H., Loan, C.F.: Matrix Computation, 2nd edn. John Hopkins University Press, Baltimore (1986)
Heiser, W.J.: Convergent computation by iterative majorization: theory and applications in multidimensional analysis, In: Krzanowski, W.J. (ed.) Recent Advances in Descriptive Multivariate Analysis, pp. 157–189. Oxford University Press, Oxford (1995)
Ho, Y., Chung, Y., Lau, K.: Unfolding large-scale marketing data. Int. J. Res. Mark. 27, 119–132 (2010)
Holzinger, K.J., Harman, H.H.: Factor Analysis. University of Chicago Press, Chicago (1941)
Jollife, I.T.: Principal Component Analysis. Springer, New-York (1986)
Kiers, H.A.L.: Weighted least squares fitting using ordinary least squares algorithms. Psychometrika 62, 251–266 (1997)
Laaksonen, S.: Regression-based nearest neighbour hot decking. Comput. Stat. 15, 65–71 (2000)
Little, R.J.A., Rubin, D.B.: Statistical Analysis with Missing Data. Wiley, New York (1987)
Mirkin, B.: Mathematical Classification and Clustering. Kluwer Academic, Dordrecht (1996)
Mitchell, T.M.: Machine Learning. McGraw-Hill, London (1997)
Myrtveit, I., Stensrud, E., Olsson, U.H.: Analyzing data sets with missing data: an empirical evaluation of imputation methods and likelihood-based methods. IEEE Trans. Softw. Eng. 27, 999–1013 (2001)
Rubin, D.B.: Multiple Imputation for Nonresponse in Surveys. Wiley, New York (1987)
Rubin, D.B.: Multiple imputation after 18+ years. J. Am. Stat. Assoc. 91, 473–489 (1996)
Schafer, J.L.: Analysis of Incomplete Multivariate Data. Chapman and Hall, London (1997)
Schafer, J.L.: NORM. http://www.stst.psu.edu/jls/misoftwa.html (1997)
Strauss, R.E., Atanassov, M.N., De Oliveira, J.A.: Evaluation of the principal-component and expectation-maximization methods for estimating missing data in morphometric studies. J. Vertebr. Paleontol. 23(2), 284–296 (2003)
Troyanskaya, O., Cantor, M., Sherlock, G., Brown, P., Hastie, T., Hastie, R., Tibshirani, R., Botsein, D., Altman, R.B.: Missing value estimation methods for DNA microarrays. Bioinformatics 17, 520–525 (2001)
Wasito, I., Mirkin, B.: Nearest neighbour approach in the least-squares data imputation algorithms. Inf. Sci. 169, 1–25 (2005)
Wasito, I., Mirkin, B.: Least squares data imputation with nearest neighbour approach with different missing patterns. Comput. Stat. Data Anal. 50, 926–949 (2006)
Acknowledgements
The author gratefully acknowledges many comments by reviewers that have been very helpful in improving the presentation.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer Science+Business Media New York
About this chapter
Cite this chapter
Wasito, I. (2014). Nearest Neighbour in Least Squares Data Imputation Algorithms for Marketing Data. In: Aleskerov, F., Goldengorin, B., Pardalos, P. (eds) Clusters, Orders, and Trees: Methods and Applications. Springer Optimization and Its Applications, vol 92. Springer, New York, NY. https://doi.org/10.1007/978-1-4939-0742-7_19
Download citation
DOI: https://doi.org/10.1007/978-1-4939-0742-7_19
Published:
Publisher Name: Springer, New York, NY
Print ISBN: 978-1-4939-0741-0
Online ISBN: 978-1-4939-0742-7
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)