Clustering and Prediction of Rankings Within a Kemeny Distance Framework

Heiser, Willem J.; D’Ambrosio, Antonio

doi:10.1007/978-3-319-00035-0_2

Willem J. Heiser²¹ &
Antonio D’Ambrosio²²

Part of the book series: Studies in Classification, Data Analysis, and Knowledge Organization ((STUDIES CLASS))

3004 Accesses
17 Citations

Abstract

Rankings and partial rankings are ubiquitous in data analysis, yet there is relatively little work in the classification community that uses the typical properties of rankings. We review the broader literature that we are aware of, and identify a common building block for both prediction of rankings and clustering of rankings, which is also valid for partial rankings. This building block is the Kemeny distance, defined as the minimum number of interchanges of two adjacent elements required to transform one (partial) ranking into another. The Kemeny distance is equivalent to Kendall’s τ for complete rankings, but for partial rankings it is equivalent to Emond and Mason’s extension of τ. For clustering, we use the flexible class of methods proposed by Ben-Israel and Iyigun (Journal of Classification 25: 5–26, 2008), and define the disparity between a ranking and the center of cluster as the Kemeny distance. For prediction, we build a prediction tree by recursive partitioning, and define the impurity measure of the subgroups formed as the sum of all within-node Kemeny distances. The median ranking characterizes subgroups in both cases.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
During the Frankfurt DAGM-GfKl-2011-conference, Eyke Hüllermeier kindly pointed out that there is related work in the computer science community under the name “preference learning” (in particular, Cheng et al. (2009), and more generally, Fürnkranz and Hüllermeier 2010).

References

Barthelémy, J. P., Guénoche, A., & Hudry, O. (1989). Median linear orders: Heuristics and a branch and bound algorithm. European Journal of Operational Research, 42, 313–325.
Article MathSciNet MATH Google Scholar
Ben-Israel, A., & Iyigun, C. (2008). Probabilistic distance clustering. Journal of Classification, 25, 5–26.
Article MathSciNet MATH Google Scholar
Böckenholt, U. (1992). Thurstonian representation for partial ranking data. British Journal of Mathematical and Statistical Psychology, 45, 31–49.
Article Google Scholar
Böckenholt, U. (2001). Mixed-effects analysis of rank-ordered data. Psychometrika, 77, 45–62.
Article Google Scholar
Bradley, R. A., & Terry, M. A. (1952). Rank analysis of incomplete block designs, I. Biometrika, 39, 324–345.
MathSciNet MATH Google Scholar
Brady, H. E. (1989). Factor and ideal point analysis for interpersonally incomparable data. Psychometrika, 54, 181–202.
Article MathSciNet MATH Google Scholar
Breiman, L., Froedman, J.H., Olshen, R.A., & Stone, C.J. (1984). Classification and regression trees. Wadsworth Publishing Co., Inc, Belmont, CA.
Google Scholar
Busing, F. M. T. A. (2009). Some advances in multidimensional unfolding. Doctoral Dissertation, Leiden, The Netherlands: Leiden University.
Google Scholar
Busing, F. M. T. A., Groenen, P., & Heiser, W. J. (2005). Avoiding degeneracy in multidimensional unfolding by penalizing on the coefficient of variation. Psychometrika, 70, 71–98.
Article MathSciNet Google Scholar
Busing, F. M. T. A., Heiser, W. J., & Cleaver, G. (2010). Restricted unfolding: Preference analysis with optimal transformations of preferences and attributes. Food Quality and Preference, 21, 82–92.
Article Google Scholar
Cappelli, C., Mola, F., & Siciliano, R. (2002). A statistical approach to growing a reliable honest tree. Computational Statistics and Data Analysis, 38, 285–299.
Article MathSciNet MATH Google Scholar
Carroll, J. D. (1972). Individual differences and multidimensional scaling. In R. N. Shepard et al. (Eds.), Multidimensional scaling, Vol. I theory (pp. 105–155). New York: Seminar Press.
Google Scholar
Chan, W., & Bentler, P. M. (1998). Covariance structure analysis of ordinal ipsative data. Psychometrika, 63, 369–399.
Article MathSciNet Google Scholar
Chapman, R. G., & Staelin, R. (1982). Exploiting rank ordered choice set data within the stochastic utility model. Journal of Marketing Research, 19, 288–301.
Article Google Scholar
Cheng, W., Hühn, J., & Hüllermeier, E. (2009). Decision tree and instance-based learning for label ranking. In: Proceedings of the 26th international conference on machine learning (pp. 161–168). Montreal. Canada.
Google Scholar
Cohen, A., & Mellows, C. L. (1980). Analysis of ranking data (Tech. Rep.). Murray Hill: Bell Telephone Laboratories.
Google Scholar
Cook, W. D. (2006). Distance-based and ad hoc consensus models in ordinal preference ranking. European Journal of Operational Research, 172, 369–385.
Article MathSciNet MATH Google Scholar
Coombs, C. H. (1950). Psychological scaling without a unit of measurement. Psychological Review, 57, 145–158.
Article Google Scholar
Coombs, C. H. (1964). A theory of data. New York: Wiley.
Google Scholar
Critchlow, D. E., & Fligner, M. A. (1991). Paired comparison, triple comparison, and ranking experiments as generalized linear models, and their implementation on GLIM. Psychometrika, 56, 517–533.
Article MATH Google Scholar
Critchlow, D. E., Fligner, M. A., & Verducci, J. S. (1991). Probability models on rankings. Journal of Mathematical Psychology, 35, 294–318.
Article MathSciNet MATH Google Scholar
Croon, M. A. (1989). Latent class models for the analysis of rankings. In G. De Soete et al. (Eds.) New developments in psychological choice modeling (pp. 99–121). North-Holland, Elsevier.
Google Scholar
D’ambrosio, A. (2007). Tree-based methods for data editing and preference rankings. Doctoral dissertation. Naples, Italy: Department of Mathematics and Statistics.
Google Scholar
D’ambrosio, A., & Heiser, W. J. (2011). Distance-based multivariate trees for rankings. Technical report.
Google Scholar
Daniels, H. E. (1950). Rank correlation and population models. Journal of the Royal Statistical Society, Series B, 12, 171–191.
MathSciNet MATH Google Scholar
Diaconis, P. (1989). A generalization of spectral analysis with application to ranked data. The Annals of Statistics, 17, 949–979.
Article MathSciNet MATH Google Scholar
Dittrich, R., Katzenbeisser, W., & Reisinger, H. (2000). The analysis of rank ordered preference data based on Bradley-Terry type models. OR-Spektrum, 22, 117–134.
Article MATH Google Scholar
Emond, E. J., & Mason, D. W. (2002). A new rank correlation coefficient with application to the consensus ranking problem. Journal of Multi-Criteria Decision Analysis, 11, 17–28.
Article MATH Google Scholar
Fligner, M. A., & Verducci, J. S. (1986). Distance based ranking models. Journal of the Royal Statistical Society, Series B, 48, 359–369.
MathSciNet MATH Google Scholar
Fligner, M. A., & Verducci, J. S. (1988). Multistage ranking models. Journal of the American Statistical Association, 83, 892–901.
Article MathSciNet MATH Google Scholar
Francis, B., Dittrich, R., Hatzinger, R., & Penn, R. (2002). Analysing partial ranks by using smoothed paired comparison methods: An investigation of value orientation in Europe. Applied Statistics, 51, 319–336.
MathSciNet MATH Google Scholar
Fürnkranz, J., & Hüllermeier, E. (Eds.). (2010). Preference learning. Heidelberg: Springer.
MATH Google Scholar
Gormley, I. C., & Murphy, T. B. (2008a). Exploring voting blocs within the Irish electorate: A mixture modeling approach. Journal of the American Statistical Association, 103, 1014–1027.
Article MathSciNet MATH Google Scholar
Gormley, I. C., & Murphy, T. B. (2008b). A mixture of experts model for rank data with applications in election studies. The Annals of Applied Statistics, 2, 1452–1477.
Article MathSciNet MATH Google Scholar
Guttman, L. (1946). An approach for quantifying paired comparisons and rank order. Annals of Mathematical Statistics, 17, 144–163.
Article MathSciNet MATH Google Scholar
Hastie, T., Tibshirani, R., & Friedman, J. (2001). The elements of statistical learning: Data mining, inference, and prediction. New York: Springer.
Book Google Scholar
Heiser, W. J. (2004). Geometric representation of association between categories. Psychometrika, 69, 513–546.
Article MathSciNet Google Scholar
Heiser, W. J., & Busing, F. M. T. A. (2004). Multidimensional scaling and unfolding of symmetric and asymmetric proximity relations. In D. Kaplan (Ed.), The SAGE handbook of quantitative methodology for the social sciences (pp. 25–48). Thousand Oaks: Sage.
Google Scholar
Heiser, W. J., & D’ambrosio, A. (2011). K-Median cluster component analysis. Technical report.
Google Scholar
Heiser, W. J., & De Leeuw, J. (1981). Multidimensional mapping of preference data. Mathématiques et Sciences Humaines, 19, 39–96.
Google Scholar
Hojo, H. (1997). A marginalization model for the multidimensional unfolding analysis of ranking data. Japanese Psychological Research, 39, 33–42.
Article Google Scholar
Hojo, H. (1998). Multidimensional unfolding analysis of ranking data for groups. Japanese Psychological Research, 40, 166–171.
Article Google Scholar
Iyigun, C., & Ben-Israel, A. (2008). Probabilistic distance clustering adjusted for cluster size. Probability in the Engineering and Informational Sciences, 22, 603–621.
Article MathSciNet MATH Google Scholar
Iyigun, C., & Ben-Israel, A. (2010). Semi-supervised probabilistic distance clustering and the uncertainty of classification. In A. Fink et al. (Eds.), Advances in data analysis, data handling and business intelligence (pp. 3–20). Heidelberg: Springer.
Google Scholar
Kamakura, W. A., & Srivastava, R. K. (1986). An ideal-point probabilistic choice model for heterogeneous preferences. Marketing Science, 5, 199–218.
Article Google Scholar
Kamiya, H., & Takemura, A. (1997). On rankings generated by pairwise linear discriminant analysis of m populations. Journal of Multivariate Analysis, 61, 1–28.
Article MathSciNet MATH Google Scholar
Kamiya, H., & Takemura, A. (2005). Characterization of rankings generated by linear discriminant analysis. Journal of Multivariate Analysis, 92, 343–358.
Article MathSciNet MATH Google Scholar
Kamiya, H., Orlik, P., Takemura, A., & Terao, H. (2006). Arrangements and ranking patterns. Annals of Combinatorics, 10, 219–235.
Article MathSciNet MATH Google Scholar
Kamiya, H., Takemura, A., & Terao, H. (2011). Ranking patterns of unfolding models of codimension one. Advances in Applied Mathematics, 47, 379–400.
Article MathSciNet MATH Google Scholar
Kemeny, J. G. (1959). Mathematics without numbers. Daedalus, 88, 577–591.
Google Scholar
Kemeny, J. G., & Snell, J. L. (1962). Preference rankings: An axiomatic approach. In J. G. Kemeny & J. L. Snell (Eds.), Mathematical models in the social sciences (pp. 9–23). New York: Blaisdell.
Google Scholar
Kendall, M. G. (1938). A new measure of rank correlation. Biometrika, 30, 81–93.
MathSciNet MATH Google Scholar
Kendall, M. G. (1948). Rank correlation methods. London: Charles Griffin.
MATH Google Scholar
Kruskal, W. (1958). Ordinal measures of association. Journal of the American Statistical Association, 53, 814–861.
Article MathSciNet MATH Google Scholar
Kruskal, J. B., & Carroll, J. D. (1969). Geometrical models and badness-of-fit functions. In P. R. Krishnaiah (Ed.), Multivariate analysis (Vol. 2, pp. 639–671). New York: Academic.
Google Scholar
Luce, R. D. (1959). Individual choice behavior. New York: Wiley.
MATH Google Scholar
Mallows, C. L. (1957). Non-null ranking models, I. Biometrika, 44, 114–130.
MathSciNet MATH Google Scholar
Marden, J. I. (1995). Analyzing and modeling rank data. New York: Chapman & Hall.
MATH Google Scholar
Maydeu-Olivares, A. (1999). Thurstonian modeling of ranking data via mean and covariance structure analysis. Psychometrika, 64, 325–340.
Article MathSciNet Google Scholar
Meulman, J. J., Van Der Kooij, A. J., & Heiser, W. J. (2004). Principal components analysis with nonlinear optimal scaling transformations for ordinal and nominal data. In D. Kaplan (Ed.), The SAGE handbook of quantitative methodology for the social sciences (pp. 49–70). Thousand Oaks: Sage.
Google Scholar
Mingers, J. (1989). An empirical comparison of pruning methods for decision tree induction. Machine Learning, 4, 227–243.
Article Google Scholar
Morgan, K. O., & Morgan, S. (2010). State rankings 2010: A statistical view of America. Washington, DC: CQ Press.
Google Scholar
Murphy, T. B., & Martin, D. (2003). Mixtures of distance-based models for ranking data. Computational Statistics and Data Analysis, 41, 645–655.
Article MathSciNet MATH Google Scholar
Roskam, Ed. E. C. I. (1968). Metric analysis of ordinal data in psychology: Models and numerical methods for metric analysis of conjoint ordinal data in psychology. Doctoral dissertation, Voorschoten, The Netherlands: VAM.
Google Scholar
Shepard, R. N. (1987). Toward a universal law of generalization for psychological science. Science, 237, 1317–1323.
Article MathSciNet MATH Google Scholar
Skrondal, A., & Rabe-Hesketh, S. (2003). Multilevel logistic regression for polytomous data and rankings. Psychometrika, 68, 267–287.
Article MathSciNet Google Scholar
Slater, P. (1960). The analysis of personal preferences. British Journal of Statistical Psychology, 13, 119–135.
Article Google Scholar
Thompson, G. L. (1993). Generalized permutation polytopes and exploratory graphical methods for ranked data. The Annals of Statistics, 21, 1401–1430.
Article MathSciNet MATH Google Scholar
Thurstone, L. L. (1927). A law of comparative judgment. Psychological Review, 34, 273–286.
Article Google Scholar
Thurstone, L. L. (1931). Rank order as a psychophysical method. Journal of Experimental Psychology, 14, 187–201.
Article Google Scholar
Tucker, L. R. (1960). Intra-individual and inter-individual multidimensionality. In H. Gulliksen & S. Messick (Eds.), Psychological scaling: Theory and applications (pp. 155–167). New York: Wiley.
Google Scholar
Van Blokland-Vogelesang, A. W. (1989). Unfolding and consensus ranking: A prestige ladder for technical occupations. In G. De Soete et al. (Eds.), New developments in psychological choice modeling (pp. 237–258). The Netherlands\North-Holland: Amsterdam.
Chapter Google Scholar
van Buuren, S., & Heiser, W. J. (1989). Clustering n objects into k groups under optimal scaling of variables. Psychometrika, 54, 699–706.
Article MathSciNet Google Scholar
Van Deun, K. (2005). Degeneracies in multidimensional unfolding. Doctoral dissertation, Leuven, Belgium: Catholic University of Leuven.
Google Scholar
Yao, G., & Böckenholt, U. (1999). Bayesian estimation of Thurstonian ranking models based on the Gibbs sampler. British Journal of Mathematical and Statistical Psychology, 52, 79–92.
Article Google Scholar
Zhang, J. (2004). Binary choice, subset choice, random utility, and ranking: A unified perspective using the permutahedron. Journal of Mathematical Psychology, 48, 107–134.
Article MathSciNet MATH Google Scholar
Zinnes, J. L., & Griggs, R. A. (1974). Probabilistic, multidimensional unfolding analysis. Psychometrika, 39, 327–350.
Article MathSciNet MATH Google Scholar

Download references

Author information

Authors and Affiliations

Institute of Psychology, Leiden University, 2300 RB, Leiden, The Netherlands
Willem J. Heiser
Department of Industrial Engineering, University of Naples Federico II, Piazzale Tecchio, 80125, Naples, Italy
Antonio D’Ambrosio

Authors

Willem J. Heiser
View author publications
You can also search for this author in PubMed Google Scholar
Antonio D’Ambrosio
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Willem J. Heiser .

Editor information

Editors and Affiliations

University of Essex Department of Mathematical Sciences, Colchester, United Kingdom
Berthold Lausen
Ghent University Department of Marketing, Ghent, Belgium
Dirk Van den Poel
University of Marburg Databionics, FB 12, Marburg, Germany
Alfred Ultsch

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Heiser, W.J., D’Ambrosio, A. (2013). Clustering and Prediction of Rankings Within a Kemeny Distance Framework. In: Lausen, B., Van den Poel, D., Ultsch, A. (eds) Algorithms from and for Nature and Life. Studies in Classification, Data Analysis, and Knowledge Organization. Springer, Cham. https://doi.org/10.1007/978-3-319-00035-0_2

Download citation

DOI: https://doi.org/10.1007/978-3-319-00035-0_2
Published: 16 July 2013
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-00034-3
Online ISBN: 978-3-319-00035-0
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)

Publish with us

Policies and ethics