Evaluating Recommender Systems with User Experiments

Knijnenburg, Bart P.; Willemsen, Martijn C.

doi:10.1007/978-1-4899-7637-6_9

Bart P. Knijnenburg⁴ &
Martijn C. Willemsen⁵

18k Accesses
47 Citations
9 Altmetric

Abstract

Proper evaluation of the user experience of recommender systems requires conducting user experiments. This chapter is a guideline for students and researchers aspiring to conduct user experiments with their recommender systems. It first covers the theory of user-centric evaluation of recommender systems, and gives an overview of recommender system aspects to evaluate. It then provides a detailed practical description of how to conduct user experiments, covering the following topics: formulating hypotheses, sampling participants, creating experimental manipulations, measuring subjective constructs with questionnaires, and statistically evaluating the results.

The author contributed to this chapter while he was at the University of California, Irvine.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 299.00; Price excludes VAT (USA)

Softcover Book: USD 379.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
We use the term “user experiment” to denote the use of experimental conditions and formal measurement as a means of testing theories about users interacting with recommender systems. This as opposed to “user studies”, which are typically smaller observational studies used to iteratively improve the usability of a recommender system.
2.
See [45] for a taxonomy of different types of theory.
3.
Like Hassenzahl [46, 47], our framework describes the formation of experiences during technology use rather than the longer-term phenomenon of technology acceptance, but it extends this model to behavioral consequences using attitude-behavior theories [2–4, 37] (a theoretical structure that is prominent in technology acceptance models [26, 116]).
4.
The paths from Personal and Situation Characteristics to Subjective System Aspects were added to the original framework (as presented in [67]) based on insights from various experiments with the framework.
5.
In some cases PCs and SCs can be inferred from user behavior, e.g. observing the click-stream can tell us the market segment a user belongs to [44]. SCs can also be manipulated, e.g. by priming users to approach the recommender with either a concrete or abstract mindset [71, 120].
6.
Mechanical Turk is currently only available for researchers in the United States, but various alternatives for non-US researchers exist.
7.
http://www.statmodel.com/.
8.
http://lavaan.ugent.be/.
9.
Or, multiple measurement scales for the different constructs (e.g. system satisfaction, ease of use, and recommendation quality), each measured with multiple items.
10.
MPlus and Lavaan use a different parameterization by default by fixing the loading of the first item to 1. We free up these loadings by including an asterisk after (MPlus) or NA* before (Lavaan) the first item of each factor. This alternative solution conveniently standardizes the factor scores.
11.
Moreover, even if you are more or less certain about the factor structure of a CFA model, it pays to consult the modification indices of the model. The use of modification indices and CFA goes beyond the current chapter, but is thoroughly explained in Kline’s [59] practical primer on Structural Equation Models.
12.
An important property of the “interval” data type is that differences between values are comparable. This is for instance not true for a rating score: the difference between 1 and 2 stars is not necessarily the same as the difference between 3 and 4 stars (cf. [74]).
13.
Here we do not discuss the interaction effect between inspectability and control. This interaction can be tested by multiplying their dummies, creating cgraphitem and cgraphfriend. These dummies represent the additional effect of item- and friend-control in the graph condition (and likewise, the additional effect of the graph in the item- and friend-control conditions).
14.
By design, experimental manipulations can only be independent variables (i.e. they never have incoming arrows), so they always start the causal chain.
15.
Like in CFA, more exploratory model efforts can be assisted by the use of modification indices. Please consult [59] for examples.

References

Adomavicius, G., Kwon, Y.: Improving aggregate recommendation diversity using ranking-based techniques. IEEE Transactions on Knowledge and Data Engineering 24(5), 896–911 (2012). DOI 10.1109/TKDE.2011.15
Article Google Scholar
Ajzen, I.: From intentions to actions: A theory of planned behavior. In: P.D.J. Kuhl, D.J. Beckmann (eds.) Action Control, SSSP Springer Series in Social Psychology, pp. 11–39. Springer Berlin Heidelberg (1985).
Google Scholar
Ajzen, I.: The theory of planned behavior. Organizational Behavior and Human Decision Processes 50(2), 179–211 (1991).
Article Google Scholar
Ajzen, I., Fishbein, M.: Understanding attitudes and predicting social behaviour. Prentice-Hall, Englewood Cliffs, NJ (1980)
Google Scholar
Amatriain, X., Pujol, J.M., Tintarev, N., Oliver, N.: Rate it again: Increasing recommendation accuracy by user re-rating. In: Proceedings of the Third ACM Conference on Recommender Systems, RecSys ‘09, pp. 173–180. ACM, New York, NY, USA (2009). DOI 10.1145/1639714.1639744
Basartan, Y.: Amazon versus the shopbot: An experiment about how to improve the shopbots (2001)
Google Scholar
Bennett, J., Lanning, S.: The netflix prize. In: In KDD Cup and Workshop in conjunction with KDD. San Jose, CA, USA (2007). URL http://www.cs.uic.edu/~liub/KDD-cup-2007/proceedings/The-Netflix-Prize-Bennett.pdf
Bentler, P.M., Bonett, D.G.: Significance tests and goodness of fit in the analysis of covariance structures. Psychological Bulletin 88(3), 588–606 (1980). DOI 10.1037/0033-2909.88.3.588
Article Google Scholar
Bettman, J.R., Luce, M.F., Payne, J.W.: Constructive consumer choice processes. Journal of consumer research 25(3), 187–217 (1998). DOI 10.1086/209535
Article Google Scholar
Bilgic, M., Mooney, R.J.: Explaining recommendations: Satisfaction vs. promotion. In: IUI Workshop: Beyond Personalization. San Diego, CA (2005)
Google Scholar
Blackwelder, W.C.: “Proving the null hypothesis” in clinical trials. Controlled Clinical Trials 3(4), 345–353 (1982). DOI 10.1016/0197-2456(82)90024-1
Article Google Scholar
Bollen, D., Knijnenburg, B.P., Willemsen, M.C., Graus, M.: Understanding choice overload in recommender systems. In: Proceedings of the fourth ACM conference on Recommender systems, pp. 63–70. Barcelona, Spain (2010). DOI 10.1145/1864708.1864724
Bollen, K.A.: Structural equation models. In: Encyclopedia of Biostatistics. John Wiley & Sons, Ltd (2005)
Book Google Scholar
Bostandjiev, S., O’Donovan, J., Höllerer, T.: TasteWeights: a visual interactive hybrid recommender system. In: Proceedings of the Sixth ACM Conference on Recommender Systems, RecSys ‘12, pp. 35–42. ACM, Dublin, Ireland (2012). DOI 10.1145/2365952.2365964
Cena, F., Vernero, F., Gena, C.: Towards a customization of rating scales in adaptive systems. In: P.D. Bra, A. Kobsa, D. Chin (eds.) User Modeling, Adaptation, and Personalization, no. 6075 in Lecture Notes in Computer Science, pp. 369–374. Springer Berlin Heidelberg (2010). DOI 10.1007/978-3-642-13470-8_34
Google Scholar
Chen, L., Pu, P.: Interaction design guidelines on critiquing-based recommender systems. User Modeling and User-Adapted Interaction 19(3), 167–206 (2009). DOI 10.1007/s11257-008-9057-x
Article Google Scholar
Chen, L., Pu, P.: Experiments on the preference-based organization interface in recommender systems. ACM Transactions on Computer-Human Interaction 17(1), 5:1–5:33 (2010). DOI 10.1145/1721831.1721836
Chen, L., Pu, P.: Eye-tracking study of user behavior in recommender interfaces. In: P.D. Bra, A. Kobsa, D. Chin (eds.) User Modeling, Adaptation, and Personalization, no. 6075 in Lecture Notes in Computer Science, pp. 375–380. Springer Berlin Heidelberg (2010). DOI 10.1007/978-3-642-13470-8_35
Google Scholar
Chen, L., Pu, P.: Critiquing-based recommenders: survey and emerging trends. User Modeling and User-Adapted Interaction 22(1–2), 125–150 (2012). DOI 10.1007/s11257-011-9108-6
Article Google Scholar
Chen, L., Tsoi, H.K.: Users’ decision behavior in recommender interfaces: Impact of layout design. In: RecSys’ 11 Workshop on Human Decision Making in Recommender Systems, pp. 21–26. Chicago, IL, USA (2011). URL http://ceur-ws.org/Vol-811/paper4.pdf
Chin, D.N.: Empirical evaluation of user models and user-adapted systems. User Modeling and User-Adapted Interaction 11(1–2), 181–194 (2001). DOI 10.1023/A:1011127315884
Article MATH Google Scholar
Cohen, J.: Statistical power analysis for the behavioral sciences. Psychology Press (1988)
Google Scholar
Cosley, D., Lam, S.K., Albert, I., Konstan, J.A., Riedl, J.: Is seeing believing?: How recommender system interfaces affect users’ opinions. In: Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, CHI ‘03, pp. 585–592. ACM, Ft. Lauderdale, Florida, USA (2003). DOI 10.1145/642611.642713
Cramer, H., Evers, V., Ramlal, S., Someren, M., Rutledge, L., Stash, N., Aroyo, L., Wielinga, B.: The effects of transparency on trust in and acceptance of a content-based art recommender. User Modeling and User-Adapted Interaction 18(5), 455–496 (2008). DOI 10.1007/s11257-008-9051-3
Article Google Scholar
Cremonesi, P., Garzotto, F., Negro, S., Papadopoulos, A.V., Turrin, R.: Looking for “Good” recommendations: A comparative evaluation of recommender systems. In: P. Campos, N. Graham, J. Jorge, N. Nunes, P. Palanque, M. Winckler (eds.) Human-Computer Interaction – INTERACT 2011, no. 6948 in Lecture Notes in Computer Science, pp. 152–168. Springer Berlin Heidelberg (2011). DOI 10.1007/978-3-642-23765-2_11
Google Scholar
Davis, F.D.: Perceived usefulness, perceived ease of use, and user acceptance of information technology. MIS Quarterly 13(3), 319–340 (1989). DOI 10.2307/249008
Article Google Scholar
DeVellis, R.F.: Scale development: theory and applications. SAGE, Thousand Oaks, Calif. (2011)
Google Scholar
Dooms, S., De Pessemier, T., Martens, L.: An online evaluation of explicit feedback mechanisms for recommender systems. In: 7th International Conference on Web Information Systems and Technologies (WEBIST-2011), pp. 391–394. Noordwijkerhout, The Netherlands (2011). URL https://biblio.ugent.be/publication/2039743/file/2039745.pdf
Dooms, S., De Pessemier, T., Martens, L.: A user-centric evaluation of recommender algorithms for an event recommendation system. In: RecSys 2011 Workshop on Human Decision Making in Recommender Systems (Decisions@ RecSys’ 11) and User-Centric Evaluation of Recommender Systems and Their Interfaces-2 (UCERSTI 2) affiliated with the 5th ACM Conference on Recommender Systems (RecSys 2011), pp. 67–73. Chicago, IL, USA (2011). URL http://ceur-ws.org/Vol-811/paper10.pdf
Downs, J.S., Holbrook, M.B., Sheng, S., Cranor, L.F.: Are your participants gaming the system?: screening mechanical turk workers. In: Proceedings of the 28th SIGCHI conference on Human factors in computing systems, pp. 2399–2402. Atlanta, Georgia, USA (2010). DOI 10.1145/1753326.1753688
Ekstrand, M.D., Harper, F.M., Willemsen, M.C., Konstan, J.A.: User perception of differences in recommender algorithms. In: Proceedings of the eighth ACM conference on Recommender systems. Foster City, CA (2014). DOI 10.1145/2645710.2645737
Book Google Scholar
Erickson, B.H.: Some problems of inference from chain data. Sociological methodology 10(1), 276–302 (1979)
Article MathSciNet Google Scholar
Farzan, R., Brusilovsky, P.: Encouraging user participation in a course recommender system: An impact on user behavior. Computers in Human Behavior 27(1), 276–284 (2011). DOI 10.1016/j.chb.2010.08.005
Article Google Scholar
Fasolo, B., Hertwig, R., Huber, M., Ludwig, M.: Size, entropy, and density: What is the difference that makes the difference between small and large real-world assortments? Psychology and Marketing 26(3), 254–279 (2009). DOI 10.1002/mar.20272
Article Google Scholar
Faul, F., Erdfelder, E., Lang, A.G., Buchner, A.: G*Power 3: A flexible statistical power analysis program for the social, behavioral, and biomedical sciences. Behavior Research Methods 39(2), 175–191 (2007). DOI 10.3758/BF03193146
Article Google Scholar
Felfernig, A.: Knowledge-based recommender technologies for marketing and sales. Intl. J. of Pattern Recognition and Artificial Intelligence 21(2), 333–354 (2007). DOI 10.1142/S0218001407005417
Article Google Scholar
Fishbein, M., Ajzen, I.: Belief, attitude, intention, and behavior: an introduction to theory and research. Addison-Wesley Pub. Co., Reading, MA (1975)
Google Scholar
Fisher, R.A.: The design of experiments, vol. xi. Oliver & Boyd, Oxford, England (1935)
Google Scholar
Freyne, J., Jacovi, M., Guy, I., Geyer, W.: Increasing engagement through early recommender intervention. In: Proceedings of the Third ACM Conference on Recommender Systems, RecSys ‘09, pp. 85–92. ACM, New York, NY, USA (2009). DOI 10.1145/1639714.1639730
Friedrich, G., Zanker, M.: A taxonomy for generating explanations in recommender systems. AI Magazine 32(3), 90–98 (2011). DOI 10.1609/aimag.v32i3.2365
Gedikli, F., Jannach, D., Ge, M.: How should i explain? a comparison of different explanation types for recommender systems. International Journal of Human-Computer Studies 72(4), 367–382 (2014). DOI 10.1016/j.ijhcs.2013.12.007
Gena, C., Brogi, R., Cena, F., Vernero, F.: The impact of rating scales on user’s rating behavior. In: D. Hutchison, T. Kanade, J. Kittler, J.M. Kleinberg, F. Mattern, J.C. Mitchell, M. Naor, O. Nierstrasz, C. Pandu Rangan, B. Steffen, M. Sudan, D. Terzopoulos, D. Tygar, M.Y. Vardi, G. Weikum, J.A. Konstan, R. Conejo, J.L. Marzo, N. Oliver (eds.) User Modeling, Adaption and Personalization, vol. 6787, pp. 123–134. Springer, Berlin, Heidelberg (2011). DOI 10.1007/978-3-642-22362-4_11
Ghose, A., Ipeirotis, P.G., Li, B.: Designing ranking systems for hotels on travel search engines by mining user-generated and crowdsourced content. Marketing Science 31(3), 493–520 (2012). DOI 10.1287/mksc.1110.0700
Graus, M.P., Willemsen, M.C., Swelsen, K.: Understanding real-life website adaptations by investigating the relations between user behavior and user experience. In: F. Ricci, K. Bontcheva, O. Conlan, S. Lawless (eds.) User Modeling, Adaptation and Personalization, 9146, 350–356. Springer, Berlin, Heidelberg (2015)
Google Scholar
Gregor, S.: The nature of theory in information systems. MIS Quarterly 30(3), 611–642 (2006). URL http://www.jstor.org/stable/25148742
Hassenzahl, M.: The thing and i: understanding the relationship between user and product. In: M. Blythe, K. Overbeeke, A. Monk, P. Wright (eds.) Funology, From Usability to Enjoyment, pp. 31–42. Kluwer Academic Publishers, Dordrecht, The Netherlands (2005). DOI 10.1007/1-4020-2967-5_4
Hassenzahl, M.: User experience (UX). In: Proceedings of the 20th International Conference of the Association Francophone d’Interaction Homme-Machine on - IHM ‘08, pp. 11–15. Metz, France (2008). DOI 10.1145/1512714.1512717
Häubl, G., Trifts, V.: Consumer decision making in online shopping environments: The effects of interactive decision aids. Marketing Science 19(1), 4–21 (2000). URL http://www.jstor.org/stable/193256
Heckathorn, D.D.: Respondent-driven sampling II: deriving valid population estimates from chain-referral samples of hidden populations. Social problems 49(1), 11–34 (2002). DOI 10.1525/sp.2002.49.1.11
Herlocker, J.L., Konstan, J.A., Riedl, J.: Explaining collaborative filtering recommendations. In: Proc. of the 2000 ACM conference on Computer supported cooperative work, pp. 241–250. ACM Press, Philadelphia, PA (2000). DOI 10.1145/358916.358995
Hu, L., Bentler, P.M.: Cutoff criteria for fit indexes in covariance structure analysis: Conventional criteria versus new alternatives. Structural Equation Modeling: A Multidisciplinary Journal 6(1), 1–55 (1999). DOI 10.1080/10705519909540118
Hu, R., Pu, P.: Enhancing recommendation diversity with organization interfaces. In: Proceedings of the 16th International Conference on Intelligent User Interfaces, IUI ‘11, pp. 347–350. ACM, Palo Alto, CA, USA (2011). DOI 10.1145/1943403.1943462
Iivari, J.: Contributions to the theoretical foundations of systemeering research and the PIOCO model. Ph.D. thesis, University of Oulu, Finland (1983)
Google Scholar
Jacko, J.A.: The human-computer interaction handbook: fundamentals, evolving technologies, and emerging applications. CRC Press, Boca Raton, FL (2012)
Google Scholar
Jackson, D.L.: Revisiting sample size and number of parameter estimates: Some support for the n:q hypothesis. Structural Equation Modeling: A Multidisciplinary Journal 10(1), 128–141 (2003). DOI 10.1207/S15328007SEM1001_6
Kahneman, D.: Thinking, fast and slow. Macmillan (2011)
Google Scholar
Kammerer, Y., Gerjets, P.: How the interface design influences users’ spontaneous trustworthiness evaluations of web search results: Comparing a list and a grid interface. In: Proceedings of the 2010 Symposium on Eye-Tracking Research & Applications, ETRA ‘10, pp. 299–306. ACM, Austin, TX, USA (2010). DOI 10.1145/1743666.1743736
Kittur, A., Chi, E.H., Suh, B.: Crowdsourcing user studies with mechanical turk. In: Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, pp. 453–456. ACM Press, Florence, Italy (2008). DOI 10.1145/1357054.1357127
Kline, R.B.: Principles and practice of structural equation modeling. Guilford Press, New York (2011)
Google Scholar
Kluver, D., Nguyen, T.T., Ekstrand, M., Sen, S., Riedl, J.: How many bits per rating? In: Proceedings of the Sixth ACM Conference on Recommender Systems, RecSys ‘12, pp. 99–106. ACM, Dublin, Ireland (2012). DOI 10.1145/2365952.2365974
Knijnenburg, B.P.: Simplifying privacy decisions: Towards interactive and adaptive solutions. In: Proceedings of the Recsys 2013 Workshop on Human Decision Making in Recommender Systems (Decisions@ RecSys’13), pp. 40–41. Hong Kong, China (2013). URL http://ceur-ws.org/Vol-1050/paper7.pdf
Knijnenburg, B.P., Bostandjiev, S., O’Donovan, J., Kobsa, A.: Inspectability and control in social recommenders. In: Proceedings of the sixth ACM conference on Recommender systems, RecSys ‘12, pp. 43–50. ACM, Dublin, Ireland (2012). DOI 10.1145/2365952.2365966
Knijnenburg, B.P., Kobsa, A.: Making decisions about privacy: Information disclosure in context-aware recommender systems. ACM Transactions on Interactive Intelligent Systems 3(3), 20:1–20:23 (2013). DOI 10.1145/2499670
Knijnenburg, B.P., Kobsa, A., Jin, H.: Dimensionality of information disclosure behavior. International Journal of Human-Computer Studies 71(12), 1144–1162 (2013). DOI 10.1016/j.ijhcs.2013.06.003
Knijnenburg, B.P., Reijmer, N.J., Willemsen, M.C.: Each to his own: how different users call for different interaction methods in recommender systems. In: Proceedings of the fifth ACM conference on Recommender systems, pp. 141–148. ACM Press, Chicago, IL, USA (2011). DOI 10.1145/2043932.2043960
Knijnenburg, B.P., Willemsen, M.C.: Understanding the effect of adaptive preference elicitation methods on user satisfaction of a recommender system. In: Proceedings of the third ACM conference on Recommender systems, pp. 381–384. New York, NY (2009). DOI 10.1145/1639714.1639793
Knijnenburg, B.P., Willemsen, M.C., Gantner, Z., Soncu, H., Newell, C.: Explaining the user experience of recommender systems. User Modeling and User-Adapted Interaction 22(4–5), 441–504 (2012). DOI 10.1007/s11257-011-9118-4
Knijnenburg, B.P., Willemsen, M.C., Hirtbach, S.: Receiving recommendations and providing feedback: The user-experience of a recommender system. In: F. Buccafurri, G. Semeraro (eds.) E-Commerce and Web Technologies, vol. 61, pp. 207–216. Springer, Berlin, Heidelberg (2010). DOI 10.1007/978-3-642-15208-5_19
Knijnenburg, B.P., Willemsen, M.C., Kobsa, A.: A pragmatic procedure to support the user-centric evaluation of recommender systems. In: Proceedings of the fifth ACM conference on Recommender systems, RecSys ‘11, pp. 321–324. ACM, Chicago, IL, USA (2011). DOI 10.1145/2043932.2043993
Kobsa, A., Cho, H., Knijnenburg, B.P.: An attitudinal and behavioral model of personalization at different providers. Journal of the Association for Information Science and Technology. http://onlinelibrary.wiley.com/journal/10.1002/(ISSN)2330-1643/earlyview (In press)
Köhler, C.F., Breugelmans, E., Dellaert, B.G.C.: Consumer acceptance of recommendations by interactive decision aids: The joint role of temporal distance and concrete versus abstract communications. Journal of Management Information Systems 27(4), 231–260 (2011). DOI 10.2753/MIS0742-1222270408
Konstan, J., Riedl, J.: Recommender systems: from algorithms to user experience. User Modeling and User-Adapted Interaction 22(1), 101–123 (2012). DOI 10.1007/s11257-011-9112-x
Koren, Y., Bell, R., Volinsky, C.: Matrix factorization techniques for recommender systems. Computer 42(8), 30–37 (2009). DOI 10.1109/MC.2009.263
Koren, Y., Sill, J.: OrdRec: An ordinal model for predicting personalized item rating distributions. In: Proceedings of the Fifth ACM Conference on Recommender Systems, RecSys ‘11, pp. 117–124. ACM, New York, NY, USA (2011). DOI 10.1145/2043932.2043956
Landsberger, H.A.: Hawthorne revisited: Management and the worker: its critics, and developments in human relations in industry. Cornell University (1958)
Google Scholar
Lathia, N., Hailes, S., Capra, L., Amatriain, X.: Temporal diversity in recommender systems. In: Proceedings of the 33rd International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR ‘10, pp. 210–217. ACM, Geneva, Switzerland (2010). DOI 10.1145/1835449.1835486
Lee, Y.E., Benbasat, I.: The influence of trade-off difficulty caused by preference elicitation methods on user acceptance of recommendation agents across loss and gain conditions. Information Systems Research 22(4), 867–884 (2011). DOI 10.1287/isre.1100.0334
Lopes, C.S., Rodrigues, L.C., Sichieri, R.: The lack of selection bias in a snowball sampled case-control study on drug abuse. International journal of epidemiology 25(6), 1267–1270 (1996). DOI 10.1093/ije/25.6.1267
MacCallum, R.C., Widaman, K.F., Zhang, S., Hong, S.: Sample size in factor analysis. Psychological Methods 4(1), 84–99 (1999). DOI 10.1037/1082-989X.4.1.84
MacKenzie, I.S.: Human-Computer Interaction: An Empirical Research Perspective, 1st edn. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA (2013)
Google Scholar
Martin, F.J.: Recsys’09 industrial keynote: Top 10 lessons learned developing deploying and operating real-world recommender systems. In: Proceedings of the Third ACM Conference on Recommender Systems, RecSys ‘09, pp. 1–2. ACM, New York, NY, USA (2009). DOI 10.1145/1639714.1639715
McNee, S.M., Albert, I., Cosley, D., Gopalkrishnan, P., Lam, S.K., Rashid, A.M., Konstan, J.A., Riedl, J.: On the recommending of citations for research papers. In: Proceedings of the 2002 ACM conference on Computer supported cooperative work, pp. 116–125. New Orleans, LA (2002). DOI 10.1145/587078.587096
McNee, S.M., Riedl, J., Konstan, J.A.: Being accurate is not enough: how accuracy metrics have hurt recommender systems. In: Extended abstracts on Human factors in computing systems, pp. 1097–1101. Montréal, Québec, Canada (2006). DOI 10.1145/1125451.1125659
McNee, S.M., Riedl, J., Konstan, J.A.: Making recommendations better: An analytic model for human-recommender interaction. In: Extended Abstracts on Human Factors in Computing Systems, CHI EA ‘06, pp. 1103–1108. ACM, Montréal, Québec, Canada (2006). DOI 10.1145/1125451.1125660
Mogilner, C., Rudnick, T., Iyengar, S.S.: The mere categorization effect: How the presence of categories increases choosers’ perceptions of assortment variety and outcome satisfaction. Journal of Consumer Research 35(2), 202–215 (2008). DOI 10.1086/586908
Neter, J., Kutner, M.H., Nachtsheim, C.J., Wasserman, W.: Applied linear statistical models, vol. 4. Irwin Chicago (1996)
Google Scholar
Nguyen, T.T., Kluver, D., Wang, T.Y., Hui, P.M., Ekstrand, M.D., Willemsen, M.C., Riedl, J.: Rating support interfaces to improve user experience and recommender accuracy. In: Proceedings of the 7th ACM Conference on Recommender Systems, RecSys ‘13, pp. 149–156. ACM, Hong Kong, China (2013). DOI 10.1145/2507157.2507188
Nuzzo, R.: Scientific method: Statistical errors. Nature 506(7487), 150–152 (2014). DOI 10.1038/506150a
Oestreicher-Singer, G., Sundararajan, A.: Recommendation networks and the long tail of electronic commerce. Management Information Systems Quarterly 36(1), 65–83 (2012). URL http://aisel.aisnet.org/misq/vol36/iss1/7
Oestreicher-Singer, G., Sundararajan, A.: The visible hand? demand effects of recommendation networks in electronic markets. Management Science 58(11), 1963–1981 (2012). DOI 10.1287/mnsc.1120.1536
Orne, M.T.: On the social psychology of the psychological experiment: With particular reference to demand characteristics and their implications. American Psychologist 17(11), 776–783 (1962). DOI 10.1037/h0043424
Paolacci, G., Chandler, J., Ipeirotis, P.: Running experiments on amazon mechanical turk. Judgment and Decision Making 5(5), 411–419 (2010). URL http://www.sjdm.org/journal/10/10630a/jdm10630a.pdf
Podsakoff, P.M., MacKenzie, S.B., Lee, J.Y., Podsakoff, N.P.: Common method biases in behavioral research: A critical review of the literature and recommended remedies. Journal of Applied Psychology 88(5), 879–903 (2003). DOI 10.1037/0021-9010.88.5.879
Pu, P., Chen, L.: Trust-inspiring explanation interfaces for recommender systems. Knowledge-Based Systems 20(6), 542–556 (2007). DOI 10.1016/j.knosys.2007.04.004
Pu, P., Chen, L., Hu, R.: A user-centric evaluation framework for recommender systems. In: Proceedings of the Fifth ACM Conference on Recommender Systems, RecSys ‘11, pp. 157–164. ACM, Chicago, IL, USA (2011). DOI 10.1145/2043932.2043962
Pu, P., Chen, L., Hu, R.: Evaluating recommender systems from the user’s perspective: survey of the state of the art. User Modeling and User-Adapted Interaction 22(4), 317–355 (2012). DOI 10.1007/s11257-011-9115-7
Purchase, H.C.: Experimental Human-Computer Interaction: A Practical Guide with Visual Examples, 1st edn. Cambridge University Press, New York, NY, USA (2012)
Google Scholar
Randall, T., Terwiesch, C., Ulrich, K.T.: User design of customized products. Marketing Science 26(2), 268–280 (2007). DOI 10.1287/mksc.1050.0116
Said, A., Fields, B., Jain, B.J., Albayrak, S.: User-centric evaluation of a k-furthest neighbor collaborative filtering recommender algorithm. In: Proceedings of the 2013 Conference on Computer Supported Cooperative Work, CSCW ‘13, pp. 1399–1408. ACM, New York, NY, USA (2013). DOI 10.1145/2441776.2441933
Said, A., Jain, B.J., Narr, S., Plumbaum, T., Albayrak, S., Scheel, C.: Estimating the magic barrier of recommender systems: A user study. In: Proceedings of the 35th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR ‘12, pp. 1061–1062. ACM, Portland, Oregon (2012). DOI 10.1145/2348283.2348469
Salganik, M.J., Heckathorn, D.D.: Sampling and estimation in hidden populations using respondent-driven sampling. Sociological Methodology 34(1), 193–240 (2004). DOI 10.1111/j.0081-1750.2004.00152.x
Schaeffer, N.C., Presser, S.: The science of asking questions. Annual Review of Sociology 29(1), 65–88 (2003). DOI 10.1146/annurev.soc.29.110702.110112
Scheibehenne, B., Greifeneder, R., Todd, P.M.: Can there ever be too many options? a Meta-Analytic review of choice overload. Journal of Consumer Research 37(3), 409–425 (2010). DOI 10.1086/651235
Sinha, R., Swearingen, K.: Comparing recommendations made by online systems and friends. In: In Proceedings of the DELOS-NSF Workshop on Personalization and Recommender Systems in Digital Libraries (2001)
Google Scholar
Smith, N.C., Goldstein, D.G., Johnson, E.J.: Choice without awareness: Ethical and policy implications of defaults. Journal of Public Policy & Marketing 32(2), 159–172 (2013). DOI 10.1509/jppm.10.114
Sparling, E.I., Sen, S.: Rating: How difficult is it? In: Proceedings of the Fifth ACM Conference on Recommender Systems, RecSys ‘11, pp. 149–156. ACM, Chicago, IL, USA (2011). DOI 10.1145/2043932.2043961
Steele-Johnson, D., Beauregard, R.S., Hoover, P.B., Schmidt, A.M.: Goal orientation and task demand effects on motivation, affect, and performance. Journal of Applied Psychology 85(5), 724–738 (2000). DOI 10.1037/0021-9010.85.5.724
Symeonidis, P., Nanopoulos, A., Manolopoulos, Y.: Providing justifications in recommender systems. IEEE Transactions on Systems, Man and Cybernetics, Part A: Systems and Humans 38(6), 1262–1272 (2008). DOI 10.1109/TSMCA.2008.2003969
Tam, K.Y., Ho, S.Y.: Web personalization: is it effective? IT Professional 5(5), 53–57 (2003). DOI 10.1109/MITP.2003.1235611
Tintarev, N., Masthoff, J.: A survey of explanations in recommender systems. In: Data Engineering Workshop, pp. 801–810. IEEE, Istanbul, Turkey (2007). DOI 10.1109/ICDEW.2007.4401070
Tintarev, N., Masthoff, J.: Evaluating the effectiveness of explanations for recommender systems. User Modeling and User-Adapted Interaction 22(4–5), 399–439 (2012). DOI 10.1007/s11257-011-9117-5
Torres, R., McNee, S.M., Abel, M., Konstan, J.A., Riedl, J.: Enhancing digital libraries with TechLens+. In: Proceedings of the 2004 joint ACM/IEEE conference on Digital libraries - JCDL ‘04, pp. 228–236. Tuscon, AZ, USA (2004). DOI 10.1145/996350.996402
Utts, J.: Seeing Through Statistics. Cengage Learning (2004)
Google Scholar
Van Velsen, L., Van Der Geest, T., Klaassen, R., Steehouder, M.: User-centered evaluation of adaptive and adaptable systems: a literature review. The Knowledge Engineering Review 23(03), 261–281 (2008). DOI 10.1017/S0269888908001379
Vargas, S., Castells, P.: Rank and relevance in novelty and diversity metrics for recommender systems. In: Proceedings of the Fifth ACM Conference on Recommender Systems, RecSys ‘11, pp. 109–116. ACM, Chicago, IL, USA (2011). DOI 10.1145/2043932.2043955
Venkatesh, V., Morris, M.G., Davis, G.B., Davis, F.D.: User acceptance of information technology: Toward a unified view. MIS Quarterly 27(3), 425–478 (2003). URL http://www.jstor.org/stable/30036540
Vig, J., Sen, S., Riedl, J.: Tagsplanations: Explaining recommendations using tags. In: Proceedings of the 14th International Conference on Intelligent User Interfaces, IUI ‘09, pp. 47–56. ACM, Sanibel Island, Florida, USA (2009). DOI 10.1145/1502650.1502661
Wang, H.C., Doong, H.S.: Argument form and spokesperson type: The recommendation strategy of virtual salespersons. International Journal of Information Management 30(6), 493–501 (2010). DOI 10.1016/j.ijinfomgt.2010.03.006
Wang, W., Benbasat, I.: Recommendation agents for electronic commerce: Effects of explanation facilities on trusting beliefs. Journal of Management Information Systems 23(4), 217–246 (2007). DOI 10.2753/MIS0742-1222230410
Willemsen, M.C., Graus, M.P., Knijnenburg, B.P.: Understanding the role of latent feature diversification on choice difficulty and satisfaction (manuscript, under review)
Google Scholar
Willemsen, M.C., Knijnenburg, B.P., Graus, M.P., Velter-Bremmers, L.C., Fu, K.: Using latent features diversification to reduce choice difficulty in recommendation lists. In: RecSys’11 Workshop on Human Decision Making in Recommender Systems, CEUR-WS, vol. 811, pp. 14–20. Chicago, IL (2011). URL http://ceur-ws.org/Vol-811/paper3.pdf
Xiao, B., Benbasat, I.: E-commerce product recommendation agents: Use, characteristics, and impact. Mis Quarterly 31(1), 137–209 (2007). URL http://www.jstor.org/stable/25148784
Xiao, B., Benbasat, I.: Research on the use, characteristics, and impact of e-commerce product recommendation agents: A review and update for 2007–2012. In: F.J. Martínez-López (ed.) Handbook of Strategic e-Business Management, Progress in IS, pp. 403–431. Springer Berlin Heidelberg (2014). DOI 10.1007/978-3-642-39747-9_18
Zhang, M., Hurley, N.: Avoiding monotony: Improving the diversity of recommendation lists. In: Proceedings of the 2008 ACM Conference on Recommender Systems, RecSys ‘08, pp. 123–130. ACM, Lausanne, Switzerland (2008). DOI 10.1145/1454008.1454030
Zhou, T., Kuscsik, Z., Liu, J.G., Medo, M., Wakeling, J.R., Zhang, Y.C.: Solving the apparent diversity-accuracy dilemma of recommender systems. Proceedings of the National Academy of Sciences 107(10), 4511–4515 (2010). DOI 10.1073/pnas.1000488107
Ziegler, C.N., McNee, S.M., Konstan, J.A., Lausen, G.: Improving recommendation lists through topic diversification. In: Proceedings of the 14th international conference on World Wide Web - WWW ‘05, pp. 22–32. Chiba, Japan (2005). DOI 10.1145/1060745.1060754

Download references

Author information

Authors and Affiliations

Clemson University, Clemson, SC, USA
Bart P. Knijnenburg
Eindhoven University of Technology, Eindhoven, The Netherlands
Martijn C. Willemsen

Authors

Bart P. Knijnenburg
View author publications
You can also search for this author in PubMed Google Scholar
Martijn C. Willemsen
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Bart P. Knijnenburg .

Editor information

Editors and Affiliations

Faculty of Computer Science, Free University of Bozen-Bolzano, Bolzano - Bozen, Italy
Francesco Ricci
Information Systems Engineering, Ben-Gurion University of the Negev, Beer-Sheva, Israel
Lior Rokach
Ben-Gurion University of the Negev, Beer-Sheva, Israel
Bracha Shapira

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Knijnenburg, B.P., Willemsen, M.C. (2015). Evaluating Recommender Systems with User Experiments. In: Ricci, F., Rokach, L., Shapira, B. (eds) Recommender Systems Handbook. Springer, Boston, MA. https://doi.org/10.1007/978-1-4899-7637-6_9

Download citation

DOI: https://doi.org/10.1007/978-1-4899-7637-6_9
Publisher Name: Springer, Boston, MA
Print ISBN: 978-1-4899-7636-9
Online ISBN: 978-1-4899-7637-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics