Skip to main content

Evaluating Recommender Systems with User Experiments

  • Chapter
Recommender Systems Handbook

Abstract

Proper evaluation of the user experience of recommender systems requires conducting user experiments. This chapter is a guideline for students and researchers aspiring to conduct user experiments with their recommender systems. It first covers the theory of user-centric evaluation of recommender systems, and gives an overview of recommender system aspects to evaluate. It then provides a detailed practical description of how to conduct user experiments, covering the following topics: formulating hypotheses, sampling participants, creating experimental manipulations, measuring subjective constructs with questionnaires, and statistically evaluating the results.

The author contributed to this chapter while he was at the University of California, Irvine.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 299.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 379.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    We use the term “user experiment” to denote the use of experimental conditions and formal measurement as a means of testing theories about users interacting with recommender systems. This as opposed to “user studies”, which are typically smaller observational studies used to iteratively improve the usability of a recommender system.

  2. 2.

    See [45] for a taxonomy of different types of theory.

  3. 3.

    Like Hassenzahl [46, 47], our framework describes the formation of experiences during technology use rather than the longer-term phenomenon of technology acceptance, but it extends this model to behavioral consequences using attitude-behavior theories [24, 37] (a theoretical structure that is prominent in technology acceptance models [26, 116]).

  4. 4.

    The paths from Personal and Situation Characteristics to Subjective System Aspects were added to the original framework (as presented in [67]) based on insights from various experiments with the framework.

  5. 5.

    In some cases PCs and SCs can be inferred from user behavior, e.g. observing the click-stream can tell us the market segment a user belongs to [44]. SCs can also be manipulated, e.g. by priming users to approach the recommender with either a concrete or abstract mindset [71, 120].

  6. 6.

    Mechanical Turk is currently only available for researchers in the United States, but various alternatives for non-US researchers exist.

  7. 7.

    http://www.statmodel.com/.

  8. 8.

    http://lavaan.ugent.be/.

  9. 9.

    Or, multiple measurement scales for the different constructs (e.g. system satisfaction, ease of use, and recommendation quality), each measured with multiple items.

  10. 10.

    MPlus and Lavaan use a different parameterization by default by fixing the loading of the first item to 1. We free up these loadings by including an asterisk after (MPlus) or NA* before (Lavaan) the first item of each factor. This alternative solution conveniently standardizes the factor scores.

  11. 11.

    Moreover, even if you are more or less certain about the factor structure of a CFA model, it pays to consult the modification indices of the model. The use of modification indices and CFA goes beyond the current chapter, but is thoroughly explained in Kline’s [59] practical primer on Structural Equation Models.

  12. 12.

    An important property of the “interval” data type is that differences between values are comparable. This is for instance not true for a rating score: the difference between 1 and 2 stars is not necessarily the same as the difference between 3 and 4 stars (cf. [74]).

  13. 13.

    Here we do not discuss the interaction effect between inspectability and control. This interaction can be tested by multiplying their dummies, creating cgraphitem and cgraphfriend. These dummies represent the additional effect of item- and friend-control in the graph condition (and likewise, the additional effect of the graph in the item- and friend-control conditions).

  14. 14.

    By design, experimental manipulations can only be independent variables (i.e. they never have incoming arrows), so they always start the causal chain.

  15. 15.

    Like in CFA, more exploratory model efforts can be assisted by the use of modification indices. Please consult [59] for examples.

References

  1. Adomavicius, G., Kwon, Y.: Improving aggregate recommendation diversity using ranking-based techniques. IEEE Transactions on Knowledge and Data Engineering 24(5), 896–911 (2012). DOI 10.1109/TKDE.2011.15

    Article  Google Scholar 

  2. Ajzen, I.: From intentions to actions: A theory of planned behavior. In: P.D.J. Kuhl, D.J. Beckmann (eds.) Action Control, SSSP Springer Series in Social Psychology, pp. 11–39. Springer Berlin Heidelberg (1985).

    Google Scholar 

  3. Ajzen, I.: The theory of planned behavior. Organizational Behavior and Human Decision Processes 50(2), 179–211 (1991).

    Article  Google Scholar 

  4. Ajzen, I., Fishbein, M.: Understanding attitudes and predicting social behaviour. Prentice-Hall, Englewood Cliffs, NJ (1980)

    Google Scholar 

  5. Amatriain, X., Pujol, J.M., Tintarev, N., Oliver, N.: Rate it again: Increasing recommendation accuracy by user re-rating. In: Proceedings of the Third ACM Conference on Recommender Systems, RecSys ‘09, pp. 173–180. ACM, New York, NY, USA (2009). DOI 10.1145/1639714.1639744

  6. Basartan, Y.: Amazon versus the shopbot: An experiment about how to improve the shopbots (2001)

    Google Scholar 

  7. Bennett, J., Lanning, S.: The netflix prize. In: In KDD Cup and Workshop in conjunction with KDD. San Jose, CA, USA (2007). URL http://www.cs.uic.edu/~liub/KDD-cup-2007/proceedings/The-Netflix-Prize-Bennett.pdf

  8. Bentler, P.M., Bonett, D.G.: Significance tests and goodness of fit in the analysis of covariance structures. Psychological Bulletin 88(3), 588–606 (1980). DOI 10.1037/0033-2909.88.3.588

    Article  Google Scholar 

  9. Bettman, J.R., Luce, M.F., Payne, J.W.: Constructive consumer choice processes. Journal of consumer research 25(3), 187–217 (1998). DOI 10.1086/209535

    Article  Google Scholar 

  10. Bilgic, M., Mooney, R.J.: Explaining recommendations: Satisfaction vs. promotion. In: IUI Workshop: Beyond Personalization. San Diego, CA (2005)

    Google Scholar 

  11. Blackwelder, W.C.: “Proving the null hypothesis” in clinical trials. Controlled Clinical Trials 3(4), 345–353 (1982). DOI 10.1016/0197-2456(82)90024-1

    Article  Google Scholar 

  12. Bollen, D., Knijnenburg, B.P., Willemsen, M.C., Graus, M.: Understanding choice overload in recommender systems. In: Proceedings of the fourth ACM conference on Recommender systems, pp. 63–70. Barcelona, Spain (2010). DOI 10.1145/1864708.1864724

  13. Bollen, K.A.: Structural equation models. In: Encyclopedia of Biostatistics. John Wiley & Sons, Ltd (2005)

    Book  Google Scholar 

  14. Bostandjiev, S., O’Donovan, J., Höllerer, T.: TasteWeights: a visual interactive hybrid recommender system. In: Proceedings of the Sixth ACM Conference on Recommender Systems, RecSys ‘12, pp. 35–42. ACM, Dublin, Ireland (2012). DOI 10.1145/2365952.2365964

  15. Cena, F., Vernero, F., Gena, C.: Towards a customization of rating scales in adaptive systems. In: P.D. Bra, A. Kobsa, D. Chin (eds.) User Modeling, Adaptation, and Personalization, no. 6075 in Lecture Notes in Computer Science, pp. 369–374. Springer Berlin Heidelberg (2010). DOI 10.1007/978-3-642-13470-8_34

    Google Scholar 

  16. Chen, L., Pu, P.: Interaction design guidelines on critiquing-based recommender systems. User Modeling and User-Adapted Interaction 19(3), 167–206 (2009). DOI 10.1007/s11257-008-9057-x

    Article  Google Scholar 

  17. Chen, L., Pu, P.: Experiments on the preference-based organization interface in recommender systems. ACM Transactions on Computer-Human Interaction 17(1), 5:1–5:33 (2010). DOI 10.1145/1721831.1721836

  18. Chen, L., Pu, P.: Eye-tracking study of user behavior in recommender interfaces. In: P.D. Bra, A. Kobsa, D. Chin (eds.) User Modeling, Adaptation, and Personalization, no. 6075 in Lecture Notes in Computer Science, pp. 375–380. Springer Berlin Heidelberg (2010). DOI 10.1007/978-3-642-13470-8_35

    Google Scholar 

  19. Chen, L., Pu, P.: Critiquing-based recommenders: survey and emerging trends. User Modeling and User-Adapted Interaction 22(1–2), 125–150 (2012). DOI 10.1007/s11257-011-9108-6

    Article  Google Scholar 

  20. Chen, L., Tsoi, H.K.: Users’ decision behavior in recommender interfaces: Impact of layout design. In: RecSys’ 11 Workshop on Human Decision Making in Recommender Systems, pp. 21–26. Chicago, IL, USA (2011). URL http://ceur-ws.org/Vol-811/paper4.pdf

  21. Chin, D.N.: Empirical evaluation of user models and user-adapted systems. User Modeling and User-Adapted Interaction 11(1–2), 181–194 (2001). DOI 10.1023/A:1011127315884

    Article  MATH  Google Scholar 

  22. Cohen, J.: Statistical power analysis for the behavioral sciences. Psychology Press (1988)

    Google Scholar 

  23. Cosley, D., Lam, S.K., Albert, I., Konstan, J.A., Riedl, J.: Is seeing believing?: How recommender system interfaces affect users’ opinions. In: Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, CHI ‘03, pp. 585–592. ACM, Ft. Lauderdale, Florida, USA (2003). DOI 10.1145/642611.642713

  24. Cramer, H., Evers, V., Ramlal, S., Someren, M., Rutledge, L., Stash, N., Aroyo, L., Wielinga, B.: The effects of transparency on trust in and acceptance of a content-based art recommender. User Modeling and User-Adapted Interaction 18(5), 455–496 (2008). DOI 10.1007/s11257-008-9051-3

    Article  Google Scholar 

  25. Cremonesi, P., Garzotto, F., Negro, S., Papadopoulos, A.V., Turrin, R.: Looking for “Good” recommendations: A comparative evaluation of recommender systems. In: P. Campos, N. Graham, J. Jorge, N. Nunes, P. Palanque, M. Winckler (eds.) Human-Computer Interaction – INTERACT 2011, no. 6948 in Lecture Notes in Computer Science, pp. 152–168. Springer Berlin Heidelberg (2011). DOI 10.1007/978-3-642-23765-2_11

    Google Scholar 

  26. Davis, F.D.: Perceived usefulness, perceived ease of use, and user acceptance of information technology. MIS Quarterly 13(3), 319–340 (1989). DOI 10.2307/249008

    Article  Google Scholar 

  27. DeVellis, R.F.: Scale development: theory and applications. SAGE, Thousand Oaks, Calif. (2011)

    Google Scholar 

  28. Dooms, S., De Pessemier, T., Martens, L.: An online evaluation of explicit feedback mechanisms for recommender systems. In: 7th International Conference on Web Information Systems and Technologies (WEBIST-2011), pp. 391–394. Noordwijkerhout, The Netherlands (2011). URL https://biblio.ugent.be/publication/2039743/file/2039745.pdf

  29. Dooms, S., De Pessemier, T., Martens, L.: A user-centric evaluation of recommender algorithms for an event recommendation system. In: RecSys 2011 Workshop on Human Decision Making in Recommender Systems (Decisions@ RecSys’ 11) and User-Centric Evaluation of Recommender Systems and Their Interfaces-2 (UCERSTI 2) affiliated with the 5th ACM Conference on Recommender Systems (RecSys 2011), pp. 67–73. Chicago, IL, USA (2011). URL http://ceur-ws.org/Vol-811/paper10.pdf

  30. Downs, J.S., Holbrook, M.B., Sheng, S., Cranor, L.F.: Are your participants gaming the system?: screening mechanical turk workers. In: Proceedings of the 28th SIGCHI conference on Human factors in computing systems, pp. 2399–2402. Atlanta, Georgia, USA (2010). DOI 10.1145/1753326.1753688

  31. Ekstrand, M.D., Harper, F.M., Willemsen, M.C., Konstan, J.A.: User perception of differences in recommender algorithms. In: Proceedings of the eighth ACM conference on Recommender systems. Foster City, CA (2014). DOI 10.1145/2645710.2645737

    Book  Google Scholar 

  32. Erickson, B.H.: Some problems of inference from chain data. Sociological methodology 10(1), 276–302 (1979)

    Article  MathSciNet  Google Scholar 

  33. Farzan, R., Brusilovsky, P.: Encouraging user participation in a course recommender system: An impact on user behavior. Computers in Human Behavior 27(1), 276–284 (2011). DOI 10.1016/j.chb.2010.08.005

    Article  Google Scholar 

  34. Fasolo, B., Hertwig, R., Huber, M., Ludwig, M.: Size, entropy, and density: What is the difference that makes the difference between small and large real-world assortments? Psychology and Marketing 26(3), 254–279 (2009). DOI 10.1002/mar.20272

    Article  Google Scholar 

  35. Faul, F., Erdfelder, E., Lang, A.G., Buchner, A.: G*Power 3: A flexible statistical power analysis program for the social, behavioral, and biomedical sciences. Behavior Research Methods 39(2), 175–191 (2007). DOI 10.3758/BF03193146

    Article  Google Scholar 

  36. Felfernig, A.: Knowledge-based recommender technologies for marketing and sales. Intl. J. of Pattern Recognition and Artificial Intelligence 21(2), 333–354 (2007). DOI 10.1142/S0218001407005417

    Article  Google Scholar 

  37. Fishbein, M., Ajzen, I.: Belief, attitude, intention, and behavior: an introduction to theory and research. Addison-Wesley Pub. Co., Reading, MA (1975)

    Google Scholar 

  38. Fisher, R.A.: The design of experiments, vol. xi. Oliver & Boyd, Oxford, England (1935)

    Google Scholar 

  39. Freyne, J., Jacovi, M., Guy, I., Geyer, W.: Increasing engagement through early recommender intervention. In: Proceedings of the Third ACM Conference on Recommender Systems, RecSys ‘09, pp. 85–92. ACM, New York, NY, USA (2009). DOI 10.1145/1639714.1639730

  40. Friedrich, G., Zanker, M.: A taxonomy for generating explanations in recommender systems. AI Magazine 32(3), 90–98 (2011). DOI 10.1609/aimag.v32i3.2365

  41. Gedikli, F., Jannach, D., Ge, M.: How should i explain? a comparison of different explanation types for recommender systems. International Journal of Human-Computer Studies 72(4), 367–382 (2014). DOI 10.1016/j.ijhcs.2013.12.007

  42. Gena, C., Brogi, R., Cena, F., Vernero, F.: The impact of rating scales on user’s rating behavior. In: D. Hutchison, T. Kanade, J. Kittler, J.M. Kleinberg, F. Mattern, J.C. Mitchell, M. Naor, O. Nierstrasz, C. Pandu Rangan, B. Steffen, M. Sudan, D. Terzopoulos, D. Tygar, M.Y. Vardi, G. Weikum, J.A. Konstan, R. Conejo, J.L. Marzo, N. Oliver (eds.) User Modeling, Adaption and Personalization, vol. 6787, pp. 123–134. Springer, Berlin, Heidelberg (2011). DOI 10.1007/978-3-642-22362-4_11

  43. Ghose, A., Ipeirotis, P.G., Li, B.: Designing ranking systems for hotels on travel search engines by mining user-generated and crowdsourced content. Marketing Science 31(3), 493–520 (2012). DOI 10.1287/mksc.1110.0700

  44. Graus, M.P., Willemsen, M.C., Swelsen, K.: Understanding real-life website adaptations by investigating the relations between user behavior and user experience. In: F. Ricci, K. Bontcheva, O. Conlan, S. Lawless (eds.) User Modeling, Adaptation and Personalization, 9146, 350–356. Springer, Berlin, Heidelberg (2015)

    Google Scholar 

  45. Gregor, S.: The nature of theory in information systems. MIS Quarterly 30(3), 611–642 (2006). URL http://www.jstor.org/stable/25148742

  46. Hassenzahl, M.: The thing and i: understanding the relationship between user and product. In: M. Blythe, K. Overbeeke, A. Monk, P. Wright (eds.) Funology, From Usability to Enjoyment, pp. 31–42. Kluwer Academic Publishers, Dordrecht, The Netherlands (2005). DOI 10.1007/1-4020-2967-5_4

  47. Hassenzahl, M.: User experience (UX). In: Proceedings of the 20th International Conference of the Association Francophone d’Interaction Homme-Machine on - IHM ‘08, pp. 11–15. Metz, France (2008). DOI 10.1145/1512714.1512717

  48. Häubl, G., Trifts, V.: Consumer decision making in online shopping environments: The effects of interactive decision aids. Marketing Science 19(1), 4–21 (2000). URL http://www.jstor.org/stable/193256

  49. Heckathorn, D.D.: Respondent-driven sampling II: deriving valid population estimates from chain-referral samples of hidden populations. Social problems 49(1), 11–34 (2002). DOI 10.1525/sp.2002.49.1.11

  50. Herlocker, J.L., Konstan, J.A., Riedl, J.: Explaining collaborative filtering recommendations. In: Proc. of the 2000 ACM conference on Computer supported cooperative work, pp. 241–250. ACM Press, Philadelphia, PA (2000). DOI 10.1145/358916.358995

  51. Hu, L., Bentler, P.M.: Cutoff criteria for fit indexes in covariance structure analysis: Conventional criteria versus new alternatives. Structural Equation Modeling: A Multidisciplinary Journal 6(1), 1–55 (1999). DOI 10.1080/10705519909540118

  52. Hu, R., Pu, P.: Enhancing recommendation diversity with organization interfaces. In: Proceedings of the 16th International Conference on Intelligent User Interfaces, IUI ‘11, pp. 347–350. ACM, Palo Alto, CA, USA (2011). DOI 10.1145/1943403.1943462

  53. Iivari, J.: Contributions to the theoretical foundations of systemeering research and the PIOCO model. Ph.D. thesis, University of Oulu, Finland (1983)

    Google Scholar 

  54. Jacko, J.A.: The human-computer interaction handbook: fundamentals, evolving technologies, and emerging applications. CRC Press, Boca Raton, FL (2012)

    Google Scholar 

  55. Jackson, D.L.: Revisiting sample size and number of parameter estimates: Some support for the n:q hypothesis. Structural Equation Modeling: A Multidisciplinary Journal 10(1), 128–141 (2003). DOI 10.1207/S15328007SEM1001_6

  56. Kahneman, D.: Thinking, fast and slow. Macmillan (2011)

    Google Scholar 

  57. Kammerer, Y., Gerjets, P.: How the interface design influences users’ spontaneous trustworthiness evaluations of web search results: Comparing a list and a grid interface. In: Proceedings of the 2010 Symposium on Eye-Tracking Research & Applications, ETRA ‘10, pp. 299–306. ACM, Austin, TX, USA (2010). DOI 10.1145/1743666.1743736

  58. Kittur, A., Chi, E.H., Suh, B.: Crowdsourcing user studies with mechanical turk. In: Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, pp. 453–456. ACM Press, Florence, Italy (2008). DOI 10.1145/1357054.1357127

  59. Kline, R.B.: Principles and practice of structural equation modeling. Guilford Press, New York (2011)

    Google Scholar 

  60. Kluver, D., Nguyen, T.T., Ekstrand, M., Sen, S., Riedl, J.: How many bits per rating? In: Proceedings of the Sixth ACM Conference on Recommender Systems, RecSys ‘12, pp. 99–106. ACM, Dublin, Ireland (2012). DOI 10.1145/2365952.2365974

  61. Knijnenburg, B.P.: Simplifying privacy decisions: Towards interactive and adaptive solutions. In: Proceedings of the Recsys 2013 Workshop on Human Decision Making in Recommender Systems (Decisions@ RecSys’13), pp. 40–41. Hong Kong, China (2013). URL http://ceur-ws.org/Vol-1050/paper7.pdf

  62. Knijnenburg, B.P., Bostandjiev, S., O’Donovan, J., Kobsa, A.: Inspectability and control in social recommenders. In: Proceedings of the sixth ACM conference on Recommender systems, RecSys ‘12, pp. 43–50. ACM, Dublin, Ireland (2012). DOI 10.1145/2365952.2365966

  63. Knijnenburg, B.P., Kobsa, A.: Making decisions about privacy: Information disclosure in context-aware recommender systems. ACM Transactions on Interactive Intelligent Systems 3(3), 20:1–20:23 (2013). DOI 10.1145/2499670

  64. Knijnenburg, B.P., Kobsa, A., Jin, H.: Dimensionality of information disclosure behavior. International Journal of Human-Computer Studies 71(12), 1144–1162 (2013). DOI 10.1016/j.ijhcs.2013.06.003

  65. Knijnenburg, B.P., Reijmer, N.J., Willemsen, M.C.: Each to his own: how different users call for different interaction methods in recommender systems. In: Proceedings of the fifth ACM conference on Recommender systems, pp. 141–148. ACM Press, Chicago, IL, USA (2011). DOI 10.1145/2043932.2043960

  66. Knijnenburg, B.P., Willemsen, M.C.: Understanding the effect of adaptive preference elicitation methods on user satisfaction of a recommender system. In: Proceedings of the third ACM conference on Recommender systems, pp. 381–384. New York, NY (2009). DOI 10.1145/1639714.1639793

  67. Knijnenburg, B.P., Willemsen, M.C., Gantner, Z., Soncu, H., Newell, C.: Explaining the user experience of recommender systems. User Modeling and User-Adapted Interaction 22(4–5), 441–504 (2012). DOI 10.1007/s11257-011-9118-4

  68. Knijnenburg, B.P., Willemsen, M.C., Hirtbach, S.: Receiving recommendations and providing feedback: The user-experience of a recommender system. In: F. Buccafurri, G. Semeraro (eds.) E-Commerce and Web Technologies, vol. 61, pp. 207–216. Springer, Berlin, Heidelberg (2010). DOI 10.1007/978-3-642-15208-5_19

  69. Knijnenburg, B.P., Willemsen, M.C., Kobsa, A.: A pragmatic procedure to support the user-centric evaluation of recommender systems. In: Proceedings of the fifth ACM conference on Recommender systems, RecSys ‘11, pp. 321–324. ACM, Chicago, IL, USA (2011). DOI 10.1145/2043932.2043993

  70. Kobsa, A., Cho, H., Knijnenburg, B.P.: An attitudinal and behavioral model of personalization at different providers. Journal of the Association for Information Science and Technology. http://onlinelibrary.wiley.com/journal/10.1002/(ISSN)2330-1643/earlyview (In press)

  71. Köhler, C.F., Breugelmans, E., Dellaert, B.G.C.: Consumer acceptance of recommendations by interactive decision aids: The joint role of temporal distance and concrete versus abstract communications. Journal of Management Information Systems 27(4), 231–260 (2011). DOI 10.2753/MIS0742-1222270408

  72. Konstan, J., Riedl, J.: Recommender systems: from algorithms to user experience. User Modeling and User-Adapted Interaction 22(1), 101–123 (2012). DOI 10.1007/s11257-011-9112-x

  73. Koren, Y., Bell, R., Volinsky, C.: Matrix factorization techniques for recommender systems. Computer 42(8), 30–37 (2009). DOI 10.1109/MC.2009.263

  74. Koren, Y., Sill, J.: OrdRec: An ordinal model for predicting personalized item rating distributions. In: Proceedings of the Fifth ACM Conference on Recommender Systems, RecSys ‘11, pp. 117–124. ACM, New York, NY, USA (2011). DOI 10.1145/2043932.2043956

  75. Landsberger, H.A.: Hawthorne revisited: Management and the worker: its critics, and developments in human relations in industry. Cornell University (1958)

    Google Scholar 

  76. Lathia, N., Hailes, S., Capra, L., Amatriain, X.: Temporal diversity in recommender systems. In: Proceedings of the 33rd International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR ‘10, pp. 210–217. ACM, Geneva, Switzerland (2010). DOI 10.1145/1835449.1835486

  77. Lee, Y.E., Benbasat, I.: The influence of trade-off difficulty caused by preference elicitation methods on user acceptance of recommendation agents across loss and gain conditions. Information Systems Research 22(4), 867–884 (2011). DOI 10.1287/isre.1100.0334

  78. Lopes, C.S., Rodrigues, L.C., Sichieri, R.: The lack of selection bias in a snowball sampled case-control study on drug abuse. International journal of epidemiology 25(6), 1267–1270 (1996). DOI 10.1093/ije/25.6.1267

  79. MacCallum, R.C., Widaman, K.F., Zhang, S., Hong, S.: Sample size in factor analysis. Psychological Methods 4(1), 84–99 (1999). DOI 10.1037/1082-989X.4.1.84

  80. MacKenzie, I.S.: Human-Computer Interaction: An Empirical Research Perspective, 1st edn. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA (2013)

    Google Scholar 

  81. Martin, F.J.: Recsys’09 industrial keynote: Top 10 lessons learned developing deploying and operating real-world recommender systems. In: Proceedings of the Third ACM Conference on Recommender Systems, RecSys ‘09, pp. 1–2. ACM, New York, NY, USA (2009). DOI 10.1145/1639714.1639715

  82. McNee, S.M., Albert, I., Cosley, D., Gopalkrishnan, P., Lam, S.K., Rashid, A.M., Konstan, J.A., Riedl, J.: On the recommending of citations for research papers. In: Proceedings of the 2002 ACM conference on Computer supported cooperative work, pp. 116–125. New Orleans, LA (2002). DOI 10.1145/587078.587096

  83. McNee, S.M., Riedl, J., Konstan, J.A.: Being accurate is not enough: how accuracy metrics have hurt recommender systems. In: Extended abstracts on Human factors in computing systems, pp. 1097–1101. Montréal, Québec, Canada (2006). DOI 10.1145/1125451.1125659

  84. McNee, S.M., Riedl, J., Konstan, J.A.: Making recommendations better: An analytic model for human-recommender interaction. In: Extended Abstracts on Human Factors in Computing Systems, CHI EA ‘06, pp. 1103–1108. ACM, Montréal, Québec, Canada (2006). DOI 10.1145/1125451.1125660

  85. Mogilner, C., Rudnick, T., Iyengar, S.S.: The mere categorization effect: How the presence of categories increases choosers’ perceptions of assortment variety and outcome satisfaction. Journal of Consumer Research 35(2), 202–215 (2008). DOI 10.1086/586908

  86. Neter, J., Kutner, M.H., Nachtsheim, C.J., Wasserman, W.: Applied linear statistical models, vol. 4. Irwin Chicago (1996)

    Google Scholar 

  87. Nguyen, T.T., Kluver, D., Wang, T.Y., Hui, P.M., Ekstrand, M.D., Willemsen, M.C., Riedl, J.: Rating support interfaces to improve user experience and recommender accuracy. In: Proceedings of the 7th ACM Conference on Recommender Systems, RecSys ‘13, pp. 149–156. ACM, Hong Kong, China (2013). DOI 10.1145/2507157.2507188

  88. Nuzzo, R.: Scientific method: Statistical errors. Nature 506(7487), 150–152 (2014). DOI 10.1038/506150a

  89. Oestreicher-Singer, G., Sundararajan, A.: Recommendation networks and the long tail of electronic commerce. Management Information Systems Quarterly 36(1), 65–83 (2012). URL http://aisel.aisnet.org/misq/vol36/iss1/7

  90. Oestreicher-Singer, G., Sundararajan, A.: The visible hand? demand effects of recommendation networks in electronic markets. Management Science 58(11), 1963–1981 (2012). DOI 10.1287/mnsc.1120.1536

  91. Orne, M.T.: On the social psychology of the psychological experiment: With particular reference to demand characteristics and their implications. American Psychologist 17(11), 776–783 (1962). DOI 10.1037/h0043424

  92. Paolacci, G., Chandler, J., Ipeirotis, P.: Running experiments on amazon mechanical turk. Judgment and Decision Making 5(5), 411–419 (2010). URL http://www.sjdm.org/journal/10/10630a/jdm10630a.pdf

  93. Podsakoff, P.M., MacKenzie, S.B., Lee, J.Y., Podsakoff, N.P.: Common method biases in behavioral research: A critical review of the literature and recommended remedies. Journal of Applied Psychology 88(5), 879–903 (2003). DOI 10.1037/0021-9010.88.5.879

  94. Pu, P., Chen, L.: Trust-inspiring explanation interfaces for recommender systems. Knowledge-Based Systems 20(6), 542–556 (2007). DOI 10.1016/j.knosys.2007.04.004

  95. Pu, P., Chen, L., Hu, R.: A user-centric evaluation framework for recommender systems. In: Proceedings of the Fifth ACM Conference on Recommender Systems, RecSys ‘11, pp. 157–164. ACM, Chicago, IL, USA (2011). DOI 10.1145/2043932.2043962

  96. Pu, P., Chen, L., Hu, R.: Evaluating recommender systems from the user’s perspective: survey of the state of the art. User Modeling and User-Adapted Interaction 22(4), 317–355 (2012). DOI 10.1007/s11257-011-9115-7

  97. Purchase, H.C.: Experimental Human-Computer Interaction: A Practical Guide with Visual Examples, 1st edn. Cambridge University Press, New York, NY, USA (2012)

    Google Scholar 

  98. Randall, T., Terwiesch, C., Ulrich, K.T.: User design of customized products. Marketing Science 26(2), 268–280 (2007). DOI 10.1287/mksc.1050.0116

  99. Said, A., Fields, B., Jain, B.J., Albayrak, S.: User-centric evaluation of a k-furthest neighbor collaborative filtering recommender algorithm. In: Proceedings of the 2013 Conference on Computer Supported Cooperative Work, CSCW ‘13, pp. 1399–1408. ACM, New York, NY, USA (2013). DOI 10.1145/2441776.2441933

  100. Said, A., Jain, B.J., Narr, S., Plumbaum, T., Albayrak, S., Scheel, C.: Estimating the magic barrier of recommender systems: A user study. In: Proceedings of the 35th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR ‘12, pp. 1061–1062. ACM, Portland, Oregon (2012). DOI 10.1145/2348283.2348469

  101. Salganik, M.J., Heckathorn, D.D.: Sampling and estimation in hidden populations using respondent-driven sampling. Sociological Methodology 34(1), 193–240 (2004). DOI 10.1111/j.0081-1750.2004.00152.x

  102. Schaeffer, N.C., Presser, S.: The science of asking questions. Annual Review of Sociology 29(1), 65–88 (2003). DOI 10.1146/annurev.soc.29.110702.110112

  103. Scheibehenne, B., Greifeneder, R., Todd, P.M.: Can there ever be too many options? a Meta-Analytic review of choice overload. Journal of Consumer Research 37(3), 409–425 (2010). DOI 10.1086/651235

  104. Sinha, R., Swearingen, K.: Comparing recommendations made by online systems and friends. In: In Proceedings of the DELOS-NSF Workshop on Personalization and Recommender Systems in Digital Libraries (2001)

    Google Scholar 

  105. Smith, N.C., Goldstein, D.G., Johnson, E.J.: Choice without awareness: Ethical and policy implications of defaults. Journal of Public Policy & Marketing 32(2), 159–172 (2013). DOI 10.1509/jppm.10.114

  106. Sparling, E.I., Sen, S.: Rating: How difficult is it? In: Proceedings of the Fifth ACM Conference on Recommender Systems, RecSys ‘11, pp. 149–156. ACM, Chicago, IL, USA (2011). DOI 10.1145/2043932.2043961

  107. Steele-Johnson, D., Beauregard, R.S., Hoover, P.B., Schmidt, A.M.: Goal orientation and task demand effects on motivation, affect, and performance. Journal of Applied Psychology 85(5), 724–738 (2000). DOI 10.1037/0021-9010.85.5.724

  108. Symeonidis, P., Nanopoulos, A., Manolopoulos, Y.: Providing justifications in recommender systems. IEEE Transactions on Systems, Man and Cybernetics, Part A: Systems and Humans 38(6), 1262–1272 (2008). DOI 10.1109/TSMCA.2008.2003969

  109. Tam, K.Y., Ho, S.Y.: Web personalization: is it effective? IT Professional 5(5), 53–57 (2003). DOI 10.1109/MITP.2003.1235611

  110. Tintarev, N., Masthoff, J.: A survey of explanations in recommender systems. In: Data Engineering Workshop, pp. 801–810. IEEE, Istanbul, Turkey (2007). DOI 10.1109/ICDEW.2007.4401070

  111. Tintarev, N., Masthoff, J.: Evaluating the effectiveness of explanations for recommender systems. User Modeling and User-Adapted Interaction 22(4–5), 399–439 (2012). DOI 10.1007/s11257-011-9117-5

  112. Torres, R., McNee, S.M., Abel, M., Konstan, J.A., Riedl, J.: Enhancing digital libraries with TechLens+. In: Proceedings of the 2004 joint ACM/IEEE conference on Digital libraries - JCDL ‘04, pp. 228–236. Tuscon, AZ, USA (2004). DOI 10.1145/996350.996402

  113. Utts, J.: Seeing Through Statistics. Cengage Learning (2004)

    Google Scholar 

  114. Van Velsen, L., Van Der Geest, T., Klaassen, R., Steehouder, M.: User-centered evaluation of adaptive and adaptable systems: a literature review. The Knowledge Engineering Review 23(03), 261–281 (2008). DOI 10.1017/S0269888908001379

  115. Vargas, S., Castells, P.: Rank and relevance in novelty and diversity metrics for recommender systems. In: Proceedings of the Fifth ACM Conference on Recommender Systems, RecSys ‘11, pp. 109–116. ACM, Chicago, IL, USA (2011). DOI 10.1145/2043932.2043955

  116. Venkatesh, V., Morris, M.G., Davis, G.B., Davis, F.D.: User acceptance of information technology: Toward a unified view. MIS Quarterly 27(3), 425–478 (2003). URL http://www.jstor.org/stable/30036540

  117. Vig, J., Sen, S., Riedl, J.: Tagsplanations: Explaining recommendations using tags. In: Proceedings of the 14th International Conference on Intelligent User Interfaces, IUI ‘09, pp. 47–56. ACM, Sanibel Island, Florida, USA (2009). DOI 10.1145/1502650.1502661

  118. Wang, H.C., Doong, H.S.: Argument form and spokesperson type: The recommendation strategy of virtual salespersons. International Journal of Information Management 30(6), 493–501 (2010). DOI 10.1016/j.ijinfomgt.2010.03.006

  119. Wang, W., Benbasat, I.: Recommendation agents for electronic commerce: Effects of explanation facilities on trusting beliefs. Journal of Management Information Systems 23(4), 217–246 (2007). DOI 10.2753/MIS0742-1222230410

  120. Willemsen, M.C., Graus, M.P., Knijnenburg, B.P.: Understanding the role of latent feature diversification on choice difficulty and satisfaction (manuscript, under review)

    Google Scholar 

  121. Willemsen, M.C., Knijnenburg, B.P., Graus, M.P., Velter-Bremmers, L.C., Fu, K.: Using latent features diversification to reduce choice difficulty in recommendation lists. In: RecSys’11 Workshop on Human Decision Making in Recommender Systems, CEUR-WS, vol. 811, pp. 14–20. Chicago, IL (2011). URL http://ceur-ws.org/Vol-811/paper3.pdf

  122. Xiao, B., Benbasat, I.: E-commerce product recommendation agents: Use, characteristics, and impact. Mis Quarterly 31(1), 137–209 (2007). URL http://www.jstor.org/stable/25148784

  123. Xiao, B., Benbasat, I.: Research on the use, characteristics, and impact of e-commerce product recommendation agents: A review and update for 2007–2012. In: F.J. Martínez-López (ed.) Handbook of Strategic e-Business Management, Progress in IS, pp. 403–431. Springer Berlin Heidelberg (2014). DOI 10.1007/978-3-642-39747-9_18

  124. Zhang, M., Hurley, N.: Avoiding monotony: Improving the diversity of recommendation lists. In: Proceedings of the 2008 ACM Conference on Recommender Systems, RecSys ‘08, pp. 123–130. ACM, Lausanne, Switzerland (2008). DOI 10.1145/1454008.1454030

  125. Zhou, T., Kuscsik, Z., Liu, J.G., Medo, M., Wakeling, J.R., Zhang, Y.C.: Solving the apparent diversity-accuracy dilemma of recommender systems. Proceedings of the National Academy of Sciences 107(10), 4511–4515 (2010). DOI 10.1073/pnas.1000488107

  126. Ziegler, C.N., McNee, S.M., Konstan, J.A., Lausen, G.: Improving recommendation lists through topic diversification. In: Proceedings of the 14th international conference on World Wide Web - WWW ‘05, pp. 22–32. Chiba, Japan (2005). DOI 10.1145/1060745.1060754

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Bart P. Knijnenburg .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer Science+Business Media New York

About this chapter

Cite this chapter

Knijnenburg, B.P., Willemsen, M.C. (2015). Evaluating Recommender Systems with User Experiments. In: Ricci, F., Rokach, L., Shapira, B. (eds) Recommender Systems Handbook. Springer, Boston, MA. https://doi.org/10.1007/978-1-4899-7637-6_9

Download citation

  • DOI: https://doi.org/10.1007/978-1-4899-7637-6_9

  • Publisher Name: Springer, Boston, MA

  • Print ISBN: 978-1-4899-7636-9

  • Online ISBN: 978-1-4899-7637-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics