Skip to main content

Clustering and Prediction of Rankings Within a Kemeny Distance Framework

  • Conference paper
  • First Online:
Algorithms from and for Nature and Life

Abstract

Rankings and partial rankings are ubiquitous in data analysis, yet there is relatively little work in the classification community that uses the typical properties of rankings. We review the broader literature that we are aware of, and identify a common building block for both prediction of rankings and clustering of rankings, which is also valid for partial rankings. This building block is the Kemeny distance, defined as the minimum number of interchanges of two adjacent elements required to transform one (partial) ranking into another. The Kemeny distance is equivalent to Kendall’s τ for complete rankings, but for partial rankings it is equivalent to Emond and Mason’s extension of τ. For clustering, we use the flexible class of methods proposed by Ben-Israel and Iyigun (Journal of Classification 25: 5–26, 2008), and define the disparity between a ranking and the center of cluster as the Kemeny distance. For prediction, we build a prediction tree by recursive partitioning, and define the impurity measure of the subgroups formed as the sum of all within-node Kemeny distances. The median ranking characterizes subgroups in both cases.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    During the Frankfurt DAGM-GfKl-2011-conference, Eyke Hüllermeier kindly pointed out that there is related work in the computer science community under the name “preference learning” (in particular, Cheng et al. (2009), and more generally, Fürnkranz and Hüllermeier 2010).

References

  • Barthelémy, J. P., Guénoche, A., & Hudry, O. (1989). Median linear orders: Heuristics and a branch and bound algorithm. European Journal of Operational Research, 42, 313–325.

    Article  MathSciNet  MATH  Google Scholar 

  • Ben-Israel, A., & Iyigun, C. (2008). Probabilistic distance clustering. Journal of Classification, 25, 5–26.

    Article  MathSciNet  MATH  Google Scholar 

  • Böckenholt, U. (1992). Thurstonian representation for partial ranking data. British Journal of Mathematical and Statistical Psychology, 45, 31–49.

    Article  Google Scholar 

  • Böckenholt, U. (2001). Mixed-effects analysis of rank-ordered data. Psychometrika, 77, 45–62.

    Article  Google Scholar 

  • Bradley, R. A., & Terry, M. A. (1952). Rank analysis of incomplete block designs, I. Biometrika, 39, 324–345.

    MathSciNet  MATH  Google Scholar 

  • Brady, H. E. (1989). Factor and ideal point analysis for interpersonally incomparable data. Psychometrika, 54, 181–202.

    Article  MathSciNet  MATH  Google Scholar 

  • Breiman, L., Froedman, J.H., Olshen, R.A., & Stone, C.J. (1984). Classification and regression trees. Wadsworth Publishing Co., Inc, Belmont, CA.

    Google Scholar 

  • Busing, F. M. T. A. (2009). Some advances in multidimensional unfolding. Doctoral Dissertation, Leiden, The Netherlands: Leiden University.

    Google Scholar 

  • Busing, F. M. T. A., Groenen, P., & Heiser, W. J. (2005). Avoiding degeneracy in multidimensional unfolding by penalizing on the coefficient of variation. Psychometrika, 70, 71–98.

    Article  MathSciNet  Google Scholar 

  • Busing, F. M. T. A., Heiser, W. J., & Cleaver, G. (2010). Restricted unfolding: Preference analysis with optimal transformations of preferences and attributes. Food Quality and Preference, 21, 82–92.

    Article  Google Scholar 

  • Cappelli, C., Mola, F., & Siciliano, R. (2002). A statistical approach to growing a reliable honest tree. Computational Statistics and Data Analysis, 38, 285–299.

    Article  MathSciNet  MATH  Google Scholar 

  • Carroll, J. D. (1972). Individual differences and multidimensional scaling. In R. N. Shepard et al. (Eds.), Multidimensional scaling, Vol. I theory (pp. 105–155). New York: Seminar Press.

    Google Scholar 

  • Chan, W., & Bentler, P. M. (1998). Covariance structure analysis of ordinal ipsative data. Psychometrika, 63, 369–399.

    Article  MathSciNet  Google Scholar 

  • Chapman, R. G., & Staelin, R. (1982). Exploiting rank ordered choice set data within the stochastic utility model. Journal of Marketing Research, 19, 288–301.

    Article  Google Scholar 

  • Cheng, W., Hühn, J., & Hüllermeier, E. (2009). Decision tree and instance-based learning for label ranking. In: Proceedings of the 26th international conference on machine learning (pp. 161–168). Montreal. Canada.

    Google Scholar 

  • Cohen, A., & Mellows, C. L. (1980). Analysis of ranking data (Tech. Rep.). Murray Hill: Bell Telephone Laboratories.

    Google Scholar 

  • Cook, W. D. (2006). Distance-based and ad hoc consensus models in ordinal preference ranking. European Journal of Operational Research, 172, 369–385.

    Article  MathSciNet  MATH  Google Scholar 

  • Coombs, C. H. (1950). Psychological scaling without a unit of measurement. Psychological Review, 57, 145–158.

    Article  Google Scholar 

  • Coombs, C. H. (1964). A theory of data. New York: Wiley.

    Google Scholar 

  • Critchlow, D. E., & Fligner, M. A. (1991). Paired comparison, triple comparison, and ranking experiments as generalized linear models, and their implementation on GLIM. Psychometrika, 56, 517–533.

    Article  MATH  Google Scholar 

  • Critchlow, D. E., Fligner, M. A., & Verducci, J. S. (1991). Probability models on rankings. Journal of Mathematical Psychology, 35, 294–318.

    Article  MathSciNet  MATH  Google Scholar 

  • Croon, M. A. (1989). Latent class models for the analysis of rankings. In G. De Soete et al. (Eds.) New developments in psychological choice modeling (pp. 99–121). North-Holland, Elsevier.

    Google Scholar 

  • D’ambrosio, A. (2007). Tree-based methods for data editing and preference rankings. Doctoral dissertation. Naples, Italy: Department of Mathematics and Statistics.

    Google Scholar 

  • D’ambrosio, A., & Heiser, W. J. (2011). Distance-based multivariate trees for rankings. Technical report.

    Google Scholar 

  • Daniels, H. E. (1950). Rank correlation and population models. Journal of the Royal Statistical Society, Series B, 12, 171–191.

    MathSciNet  MATH  Google Scholar 

  • Diaconis, P. (1989). A generalization of spectral analysis with application to ranked data. The Annals of Statistics, 17, 949–979.

    Article  MathSciNet  MATH  Google Scholar 

  • Dittrich, R., Katzenbeisser, W., & Reisinger, H. (2000). The analysis of rank ordered preference data based on Bradley-Terry type models. OR-Spektrum, 22, 117–134.

    Article  MATH  Google Scholar 

  • Emond, E. J., & Mason, D. W. (2002). A new rank correlation coefficient with application to the consensus ranking problem. Journal of Multi-Criteria Decision Analysis, 11, 17–28.

    Article  MATH  Google Scholar 

  • Fligner, M. A., & Verducci, J. S. (1986). Distance based ranking models. Journal of the Royal Statistical Society, Series B, 48, 359–369.

    MathSciNet  MATH  Google Scholar 

  • Fligner, M. A., & Verducci, J. S. (1988). Multistage ranking models. Journal of the American Statistical Association, 83, 892–901.

    Article  MathSciNet  MATH  Google Scholar 

  • Francis, B., Dittrich, R., Hatzinger, R., & Penn, R. (2002). Analysing partial ranks by using smoothed paired comparison methods: An investigation of value orientation in Europe. Applied Statistics, 51, 319–336.

    MathSciNet  MATH  Google Scholar 

  • Fürnkranz, J., & Hüllermeier, E. (Eds.). (2010). Preference learning. Heidelberg: Springer.

    MATH  Google Scholar 

  • Gormley, I. C., & Murphy, T. B. (2008a). Exploring voting blocs within the Irish electorate: A mixture modeling approach. Journal of the American Statistical Association, 103, 1014–1027.

    Article  MathSciNet  MATH  Google Scholar 

  • Gormley, I. C., & Murphy, T. B. (2008b). A mixture of experts model for rank data with applications in election studies. The Annals of Applied Statistics, 2, 1452–1477.

    Article  MathSciNet  MATH  Google Scholar 

  • Guttman, L. (1946). An approach for quantifying paired comparisons and rank order. Annals of Mathematical Statistics, 17, 144–163.

    Article  MathSciNet  MATH  Google Scholar 

  • Hastie, T., Tibshirani, R., & Friedman, J. (2001). The elements of statistical learning: Data mining, inference, and prediction. New York: Springer.

    Book  Google Scholar 

  • Heiser, W. J. (2004). Geometric representation of association between categories. Psychometrika, 69, 513–546.

    Article  MathSciNet  Google Scholar 

  • Heiser, W. J., & Busing, F. M. T. A. (2004). Multidimensional scaling and unfolding of symmetric and asymmetric proximity relations. In D. Kaplan (Ed.), The SAGE handbook of quantitative methodology for the social sciences (pp. 25–48). Thousand Oaks: Sage.

    Google Scholar 

  • Heiser, W. J., & D’ambrosio, A. (2011). K-Median cluster component analysis. Technical report.

    Google Scholar 

  • Heiser, W. J., & De Leeuw, J. (1981). Multidimensional mapping of preference data. Mathématiques et Sciences Humaines, 19, 39–96.

    Google Scholar 

  • Hojo, H. (1997). A marginalization model for the multidimensional unfolding analysis of ranking data. Japanese Psychological Research, 39, 33–42.

    Article  Google Scholar 

  • Hojo, H. (1998). Multidimensional unfolding analysis of ranking data for groups. Japanese Psychological Research, 40, 166–171.

    Article  Google Scholar 

  • Iyigun, C., & Ben-Israel, A. (2008). Probabilistic distance clustering adjusted for cluster size. Probability in the Engineering and Informational Sciences, 22, 603–621.

    Article  MathSciNet  MATH  Google Scholar 

  • Iyigun, C., & Ben-Israel, A. (2010). Semi-supervised probabilistic distance clustering and the uncertainty of classification. In A. Fink et al. (Eds.), Advances in data analysis, data handling and business intelligence (pp. 3–20). Heidelberg: Springer.

    Google Scholar 

  • Kamakura, W. A., & Srivastava, R. K. (1986). An ideal-point probabilistic choice model for heterogeneous preferences. Marketing Science, 5, 199–218.

    Article  Google Scholar 

  • Kamiya, H., & Takemura, A. (1997). On rankings generated by pairwise linear discriminant analysis of m populations. Journal of Multivariate Analysis, 61, 1–28.

    Article  MathSciNet  MATH  Google Scholar 

  • Kamiya, H., & Takemura, A. (2005). Characterization of rankings generated by linear discriminant analysis. Journal of Multivariate Analysis, 92, 343–358.

    Article  MathSciNet  MATH  Google Scholar 

  • Kamiya, H., Orlik, P., Takemura, A., & Terao, H. (2006). Arrangements and ranking patterns. Annals of Combinatorics, 10, 219–235.

    Article  MathSciNet  MATH  Google Scholar 

  • Kamiya, H., Takemura, A., & Terao, H. (2011). Ranking patterns of unfolding models of codimension one. Advances in Applied Mathematics, 47, 379–400.

    Article  MathSciNet  MATH  Google Scholar 

  • Kemeny, J. G. (1959). Mathematics without numbers. Daedalus, 88, 577–591.

    Google Scholar 

  • Kemeny, J. G., & Snell, J. L. (1962). Preference rankings: An axiomatic approach. In J. G. Kemeny & J. L. Snell (Eds.), Mathematical models in the social sciences (pp. 9–23). New York: Blaisdell.

    Google Scholar 

  • Kendall, M. G. (1938). A new measure of rank correlation. Biometrika, 30, 81–93.

    MathSciNet  MATH  Google Scholar 

  • Kendall, M. G. (1948). Rank correlation methods. London: Charles Griffin.

    MATH  Google Scholar 

  • Kruskal, W. (1958). Ordinal measures of association. Journal of the American Statistical Association, 53, 814–861.

    Article  MathSciNet  MATH  Google Scholar 

  • Kruskal, J. B., & Carroll, J. D. (1969). Geometrical models and badness-of-fit functions. In P. R. Krishnaiah (Ed.), Multivariate analysis (Vol. 2, pp. 639–671). New York: Academic.

    Google Scholar 

  • Luce, R. D. (1959). Individual choice behavior. New York: Wiley.

    MATH  Google Scholar 

  • Mallows, C. L. (1957). Non-null ranking models, I. Biometrika, 44, 114–130.

    MathSciNet  MATH  Google Scholar 

  • Marden, J. I. (1995). Analyzing and modeling rank data. New York: Chapman & Hall.

    MATH  Google Scholar 

  • Maydeu-Olivares, A. (1999). Thurstonian modeling of ranking data via mean and covariance structure analysis. Psychometrika, 64, 325–340.

    Article  MathSciNet  Google Scholar 

  • Meulman, J. J., Van Der Kooij, A. J., & Heiser, W. J. (2004). Principal components analysis with nonlinear optimal scaling transformations for ordinal and nominal data. In D. Kaplan (Ed.), The SAGE handbook of quantitative methodology for the social sciences (pp. 49–70). Thousand Oaks: Sage.

    Google Scholar 

  • Mingers, J. (1989). An empirical comparison of pruning methods for decision tree induction. Machine Learning, 4, 227–243.

    Article  Google Scholar 

  • Morgan, K. O., & Morgan, S. (2010). State rankings 2010: A statistical view of America. Washington, DC: CQ Press.

    Google Scholar 

  • Murphy, T. B., & Martin, D. (2003). Mixtures of distance-based models for ranking data. Computational Statistics and Data Analysis, 41, 645–655.

    Article  MathSciNet  MATH  Google Scholar 

  • Roskam, Ed. E. C. I. (1968). Metric analysis of ordinal data in psychology: Models and numerical methods for metric analysis of conjoint ordinal data in psychology. Doctoral dissertation, Voorschoten, The Netherlands: VAM.

    Google Scholar 

  • Shepard, R. N. (1987). Toward a universal law of generalization for psychological science. Science, 237, 1317–1323.

    Article  MathSciNet  MATH  Google Scholar 

  • Skrondal, A., & Rabe-Hesketh, S. (2003). Multilevel logistic regression for polytomous data and rankings. Psychometrika, 68, 267–287.

    Article  MathSciNet  Google Scholar 

  • Slater, P. (1960). The analysis of personal preferences. British Journal of Statistical Psychology, 13, 119–135.

    Article  Google Scholar 

  • Thompson, G. L. (1993). Generalized permutation polytopes and exploratory graphical methods for ranked data. The Annals of Statistics, 21, 1401–1430.

    Article  MathSciNet  MATH  Google Scholar 

  • Thurstone, L. L. (1927). A law of comparative judgment. Psychological Review, 34, 273–286.

    Article  Google Scholar 

  • Thurstone, L. L. (1931). Rank order as a psychophysical method. Journal of Experimental Psychology, 14, 187–201.

    Article  Google Scholar 

  • Tucker, L. R. (1960). Intra-individual and inter-individual multidimensionality. In H. Gulliksen & S. Messick (Eds.), Psychological scaling: Theory and applications (pp. 155–167). New York: Wiley.

    Google Scholar 

  • Van Blokland-Vogelesang, A. W. (1989). Unfolding and consensus ranking: A prestige ladder for technical occupations. In G. De Soete et al. (Eds.), New developments in psychological choice modeling (pp. 237–258). The Netherlands\North-Holland: Amsterdam.

    Chapter  Google Scholar 

  • van Buuren, S., & Heiser, W. J. (1989). Clustering n objects into k groups under optimal scaling of variables. Psychometrika, 54, 699–706.

    Article  MathSciNet  Google Scholar 

  • Van Deun, K. (2005). Degeneracies in multidimensional unfolding. Doctoral dissertation, Leuven, Belgium: Catholic University of Leuven.

    Google Scholar 

  • Yao, G., & Böckenholt, U. (1999). Bayesian estimation of Thurstonian ranking models based on the Gibbs sampler. British Journal of Mathematical and Statistical Psychology, 52, 79–92.

    Article  Google Scholar 

  • Zhang, J. (2004). Binary choice, subset choice, random utility, and ranking: A unified perspective using the permutahedron. Journal of Mathematical Psychology, 48, 107–134.

    Article  MathSciNet  MATH  Google Scholar 

  • Zinnes, J. L., & Griggs, R. A. (1974). Probabilistic, multidimensional unfolding analysis. Psychometrika, 39, 327–350.

    Article  MathSciNet  MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Willem J. Heiser .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2013 Springer International Publishing Switzerland

About this paper

Cite this paper

Heiser, W.J., D’Ambrosio, A. (2013). Clustering and Prediction of Rankings Within a Kemeny Distance Framework. In: Lausen, B., Van den Poel, D., Ultsch, A. (eds) Algorithms from and for Nature and Life. Studies in Classification, Data Analysis, and Knowledge Organization. Springer, Cham. https://doi.org/10.1007/978-3-319-00035-0_2

Download citation

Publish with us

Policies and ethics