Abstract
The analysis of clickstream data facilitates the understanding and prediction of customer behavior in e-commerce. Companies can leverage such data to increase revenue. For customers and website users, on the other hand, the collection of behavioral data entails privacy invasion. The objective of the paper is to shed light on the trade-off between privacy and the business value of customer information. To that end, the authors review approaches to convert clickstream data into behavioral traits, which we call clickstream features, and propose a categorization of these features according to the potential threat they pose to user privacy. The authors then examine the extent to which different categories of clickstream features facilitate predictions of online user shopping patterns and approximate the marginal utility of using more privacy adverse information in behavioral prediction models. Thus, the paper links the literature on user privacy to that on e-commerce analytics and takes a step toward an economic analysis of privacy costs and benefits. In particular, the results of empirical experimentation with large real-world e-commerce data suggest that the inclusion of short-term customer behavior based on session-related information leads to large gains in predictive accuracy and business performance, while storing and aggregating usage behavior over longer horizons has comparably less value.
Similar content being viewed by others
Notes
For example, see the Health Insurance Portability and Accountability Act of 1996 or the California Online Privacy Protection Act of 2003 for the US or the General Data Protection Regulation for EU regulation.
The calculations are based on the actual number of correctly and incorrectly classified customers across the 50 (2 shops × 5 feature sets × 5 conversion rate) settings. Interested readers find results at this level of detail in the Appendix.
References
Agrawal R, Srikant R (2000) Privacy-preserving data mining. ACM SIGMOD Record 29:439–450. https://doi.org/10.1145/335191.335438
Akrivopoulou C, Stylianou A (2009) Navigating in Internet: privacy and the socioeconomic and legal implications of electronic intrusion. IGI Global, Hershey
Banerjee A, Ghosh J (2001) Clickstream clustering using weighted longest common subsequences. In: Proceedings of the web mining workshop at the 1st SIAM conference on data mining
Bansal G, Zahedi F, Gefen D (2015) The role of privacy assurance mechanisms in building trust and the moderating role of privacy concern. Eur J Inf Syst 24:624–644. https://doi.org/10.1057/ejis.2014.41
Baumer D, Earp J, Poindexter J (2004) Internet privacy law: a comparison between the United States and the European Union. Comput Secur 23:400–412. https://doi.org/10.1016/j.cose.2003.11.001
Bennett PN, White RW, Chu W, Dumais ST, Bailey P, Borisyuk F, Cui X (2012) Modeling the impact of short-and long-term behavior on search personalization. In: Proceedings of the 35th international ACM SIGIR conference on research and development in information retrieval. ACM, pp 185–194
Boda K, Földes Á, Gulyás G, Imre S (2012). User tracking on the web via cross-browser fingerprinting. In: Information security technology for applications, pp 31–46
Breiman L (2001) Random forests. Mach Learn 45:5–32
Buckinx W, Van den Poel D (2005) Predicting online-purchasing behaviour. Eur J Oper Res 166:557–575. https://doi.org/10.1016/j.ejor.2004.04.022
Chaffey D (2015) Digital business and e-commerce management, 6th edn. Pearson, London
Chan T, Joseph I, Macasaet C, Kang D, Hardy RM, Ruiz C, Porras R, Baron B, Qazi K, Hannon P, Honda T (2014) Predictive models for determining if and when to display online lead forms. In: Proceedings of the twenty-eighth AAAI conference on artificial intelligence (AAAI), pp 2882–2889
comScore (2007) Cookie-based counting overstates size of web site audiences. In: comScore, Inc. http://www.comscore.com/chi/Insights/Press-Releases/2007/04/comScore-Cookie-Deletion-Report. Accessed 22 Dec 2016
Cooley R, Mobasher B, Srivastava J (1997) Web mining: information and pattern discovery on the world wide web. In: Proceedings of the ninth IEEE international conference on tools with artificial intelligence. IEEE, pp 558–567
Dinev T, Xu H, Smith JH, Hart P (2013) Information privacy and correlates: an empirical attempt to bridge and distinguish privacy-related concepts. Eur J Inf Syst 22:295–316
Eckersley P (2010) How unique is your web browser? In: International symposium on privacy enhancing technologies symposium. Springer, Heidelberg, pp 1–18
Elkan C (2001) The foundations of cost-sensitive learning. Int Jt Conf Artif Intell 17:973–978
eMarketer (2016) Worldwide retail e-commerce sales will reach $1.915 trillion this year. In: Emarketer.com. https://www.emarketer.com/Article/Worldwide-Retail-Ecommerce-Sales-Will-Reach-1915-Trillion-This-Year/1014369. Accessed 22 Dec 2016
Gregorutti B, Michel B, Saint-Pierre P (2017) Correlation and variable importance in random forests. Stat Comput 27:659–678. https://doi.org/10.1007/s11222-016-9646-1
Greis F (2016) Browser-Addons: Browserverläufe von Millionen deutschen Nutzern verkauft. In: Golem.de. http://www.golem.de/news/browser-addons-browserverlaeufe-von-millionen-deutschen-nutzern-verkauft-1611-124171.html. Accessed 22 Dec 2016
Guo Q, Agichtein E (2010a) Towards predicting web searcher gaze position from mouse movements. In: Proceedings on extended abstracts on human factors in computing systems (CHI), pp 3601–3606
Guo Q, Agichtein E (2010b) Ready to buy or just browsing? Detecting web searcher goals from interaction data. In: Proceedings of the 33rd international ACM SIGIR conference on research and development in information retrieval. ACM, pp 130–137
Hoofnagle C, Urban J, Li S (2012) Privacy and modern advertising: most US internet users want ‘do not track’ to stop collection of data about their online activities. In: Amsterdam privacy conference
Iwanaga J, Nishimura N, Sukegawa N, Takano Y (2016) Estimating product-choice probabilities from recency and frequency of page views. Knowl Based Syst 99:157–167. https://doi.org/10.1016/j.knosys.2016.02.006
Jiang Q, Tan CH, Wei KK (2012) Cross-website navigation behavior and purchase commitment: a pluralistic field research. In: Proceedings of the Pacific Asia conference on information systems (PACIS)
KantarMedia (2016) CPG digital coupon circulation grows by 23.4% in 1H16, reaching 3.7 billion. In: Kantarmedia.com. http://www.kantarmedia.com/us/newsroom/press-releases/cpg-digital-coupon-circulation-grows-by-23-4-in-1h16. Accessed 1 March 2017
Khajehzadeh S, Oppewal H, Tojib D (2014) Consumer responses to mobile coupons: the roles of shopping motivation and regulatory fit. J Bus Res 67:2447–2455. https://doi.org/10.1016/j.jbusres.2014.02.012
Kim DJ, Ferrin DL, Rao HR (2008) A trust-based consumer decision-making model in electronic commerce: the role of trust, perceived risk, and their antecedents. Decis Support Syst 44:544–564. https://doi.org/10.1016/j.dss.2007.07.001
Kuhn M, Johnson K (2013) Applied predictive modeling. Springer, Heidelberg
Lee M, Ferguson ME, Garrow LA, Post D (2010) The impact of leisure travelers’ online search and purchase behaviors on promotion effectiveness. Working paper, Georgia Institute of Technology
Lessmann S, Voß S (2010) Customer-centric decision support: a benchmarking study of novel versus established classification models. Bus Inf Syst Eng 2:79–93. https://doi.org/10.1007/s12599-010-0094-8
Libert T (2015) Privacy implications of health information seeking on the web. Commun ACM 58:68–77
Lin E (2002) Prioritizing privacy: a constitutional response to the Internet. Berkeley Technol Law J 17:1085–1154
Liu C, Marchewka J, Lu J, Yu C (2005) Beyond concern: a privacy–trust–behavioral intention model of electronic commerce. Inf Manag 42:127–142. https://doi.org/10.1016/j.im.2004.01.002
Lu L, Dunham M, Meng Y (2005) Mining significant usage patterns from clickstream data. In: Advances in web mining and web usage analysis. Springer, Heidelberg, pp 1–17
Margineantu DD (2001) Methods for cost-sensitive learning. Doctoral dissertation, Department of Computer Science, Oregon State University
Masand B., Piatetsky-Shapiro G (1996) A comparison of approaches for maximizing business payoff of prediction models. In: Proceedings of the 2nd international conference on knowledge discovery and data mining, Portland, OR, USA. AAAI Press Menlo Park, pp 195–201
Metzger M (2004) Privacy, trust, and disclosure: exploring barriers to electronic commerce. J Comput Med Commun. https://doi.org/10.1111/j.1083-6101.2004.tb00292.x
Moe W (2003) Buying, searching, or browsing: differentiating between online shoppers using in-store navigational clickstream. J Consum Psychol 13:29–39. https://doi.org/10.1207/153276603768344762
Moe W, Fader P (2004) Capturing evolving visit behavior in clickstream data. J Interact Mark 18:5–19. https://doi.org/10.1002/dir.10074
Moe WW, Chipman H, George EI, McCulloch RE (2002) A Bayesian treed model of online purchasing behavior using in-store navigational clickstream. Revising for 2nd review at Journal of Marketing Research
Moertini VS, Ibrahim N (2015) Efficient techniques for predicting suppliers churn tendency in e-commerce based on website access data. J Theoret Appl Inf Technol 74(3):300–309
Montgomery A, Li S, Srinivasan K, Liechty J (2004) Modeling online browsing and path analysis using clickstream data. Mark Sci 23:579–595. https://doi.org/10.1287/mksc.1040.0073
Nikiforakis N, Kapravelos A, Joosen W, Kruegel C, Piessens F, Vigna G (2014) On the workings and current practices of web-based device fingerprinting. IEEE Secur Priv 12:28–36
Nofer M, Hinz O, Muntermann J, Roßnagel H (2014) The economic impact of privacy violations and security breaches: a laboratory experiment. Bus Inf Syst Eng 6:339–348. https://doi.org/10.1007/s12599-014-0351-3
O’Connell BM, Walker KR (2014) User-browser interaction-based fraud detection system. In: USPTO Patent Full-Text and Image Database. http://patft.uspto.gov/netacgi/nph-Parser?Sect1=PTO1&Sect2=HITOFF&d=PALL&p=1&u=%2Fnetahtml%2FPTO%2Fsrchnum.htm&r=1&f=G&l=50&s1=8,650,080.PN.&OS=PN/8,650,080&RS=PN/8,650,080. Accessed 22 Dec 2016
Padmanabhan B, Zheng Z, Kimbrough SO (2001) Personalization from incomplete data: what you don’t know can hurt. In: Proceedings of the seventh ACM SIGKDD international conference on knowledge discovery and data mining, San Francisco, California
Padmanabhan B, Zheng Z, Kimbrough SO (2006) An empirical analysis of the value of complete information for eCRM models. MIS Q 30(2):247–267
Pai D, Sharang A, Yadagiri MM, Agrawal S (2014) Modelling visit similarity using click-stream data: a supervised approach. In: Web information systems engineering (WISE). Springer, Heidelberg, pp 135–145
Park CH, Park YH (2015) Investigating purchase conversion by uncovering online visit patterns. SSRN Electron J. https://doi.org/10.2139/ssrn.1685469
Pitman A, Zanker M (2010). Insights from applying sequential pattern mining to e-commerce click stream data. In: IEEE international conference on data mining workshops (ICDMW). IEEE, pp 967–975
Pollach I (2011) Online privacy as a corporate social responsibility: an empirical study. Bus Ethics Europ Rev 20:88–102
Rodden K, Fu X, Aula A, Spiro I (2008) Eye-mouse coordination patterns on web search results pages. In: Proceedings of extended abstracts on human factors in computing systems (CHI’08)
Sarwar SM, Hasan M, Ignatov DI (2015) Two-stage cascaded classifier for purchase prediction. arXiv preprint arXiv:1508.03856
Sato S, Asahi Y (2012) A daily-level purchasing model at an e-commerce site. Int J Electric Comput Eng (IJECE). https://doi.org/10.11591/ijece.v2i6.1816
Senécal S, Kalczynski P, Nantel J (2005) Consumers’ decision-making process and their online shopping behavior: a clickstream analysis. J Bus Res 58:1599–1608. https://doi.org/10.1016/j.jbusres.2004.06.003
Senécal S, Kalczynski P, Fredette M (2014) Dynamic identification of anonymous consumers’ visit goals using clickstream. Int J Electron Bus 11:220. https://doi.org/10.1504/ijeb.2014.063036
Sheng VS, Ling CX (2006) Thresholding for making classifiers cost-sensitive. In: Proceedings of the 21st national conference on artificial intelligence. AAAI Press, Boston, MA, USA
Sipior JC, Ward BT, Mendoza RA (2011) Online privacy concerns associated with cookies, flash cookies, and web beacons. J Internet Commer 10:1–16
Sismeiro C, Bucklin R (2004) Modeling purchase behavior at an e-commerce web site: a task-completion approach. J Mark Res 41:306–323. https://doi.org/10.1509/jmkr.41.3.306.35985
Skok G (2000) Establishing a legitimate expectation of privacy in clickstream data. Michigan Telecommun Technol Law Rev 6:61–88
Solove DJ (2001) Privacy and power: computer databases and metaphors for information privacy. Stanf Law Rev 53:1393–1462
Stange M, Funk B (2014) Real-time-advertising. Bus Inf Syst Eng 6(5):305–308. https://doi.org/10.1007/s12599-014-0346-0
Stange M, Funk B (2015) How much tracking is necessary? The learning curve in Bayesian user journey analysis. In: Proceedings of the 23rd European conference on information systems
Statista (2016a) Executive survey: big data sets that add the most value 2012. In: Statista. https://www.statista.com/statistics/249054/executive-survey-on-big-data-sets-that-add-the-most-company-value/. Accessed 22 Dec 2016
Statista (2016b) Löschen oder Unterdrücken von Cookies bei deutschen Internetnutzern bis 2015| Umfrage. In: Statista. https://de.statista.com/statistik/daten/studie/168870/umfrage/nutzung-von-programmen-die-cookies-loeschen/. Accessed 22 Dec 2016
Statista (2016c) Global online shopping conversion rate 2016. Statistic. In: Statista. https://www.statista.com/statistics/439576/online-shopper-conversion-rate-worldwide/. Accessed 12 Jan 2017
Statista (2017) The ten coupon websites with the highest conversion rate in China in June 2011. Statistic. In: Statista. https://www.statista.com/statistics/278752/coupon-websites-by-conversion-rate-in-china/. Accessed 08 Nov 2017
Suh E, Lim S, Hwang H, Kim S (2004) A prediction model for the purchase probability of anonymous customers to support real time web marketing: a case study. Expert Syst Appl 27(2):245–255. https://doi.org/10.1016/j.eswa.2004.01.008
Turow J, King J, Hoofnagle C, Bleakley A, Hennessy M (2009) Americans reject tailored advertising and three activities that enable it. SSRN Electron J. https://doi.org/10.2139/ssrn.1478214
Van der Meer D, Dutta K, Datta A, Ramamritham K, Navanthe SB (2000) Enabling scalable online personalization on the web. In: Proceedings of the 2nd ACM conference on electronic commerce. ACM, pp 185–196
Vroomen B, Donkers B, Verhoef P, Franses P (2005) Selecting profitable customers for complex services on the Internet. J Serv Res 8(1):37–47. https://doi.org/10.1177/1094670505276681
Wu F, Chiu IH, Lin JR (2005) Prediction of the intention of purchase of the user surfing on the web using hidden Markov model. In: Proceedings of international conference on services systems and services management (ICSSSM’05). IEEE, pp 387–390
Yang Y (2010) Web user behavioral profiling for user identification. Decis Support Syst 49(3):261–271. https://doi.org/10.1016/j.dss.2010.03.001
Zhang Y, Bradlow E, Small D (2015) Predicting customer value using clumpiness: from RFM to RFMC. Mark Sci 34(2):195–208. https://doi.org/10.1287/mksc.2014.0873
Zhao Y, Yao L, Zhang Y (2016) Purchase prediction using Tmall-specific features. Concurr Comput Pract Exp 28(14):3879–3894. https://doi.org/10.1002/cpe.3720
Zheng Z, Padmanabhan B, Kimbrough S (2003) On the existence and significance of data preprocessing biases in web-usage mining. INFORMS J Comput 15:148–170. https://doi.org/10.1287/ijoc.15.2.148.14449
Author information
Authors and Affiliations
Corresponding author
Additional information
Accepted after two revisions by Prof. Dr. Suhl.
Electronic supplementary material
Below is the link to the electronic supplementary material.
Rights and permissions
About this article
Cite this article
Baumann, A., Haupt, J., Gebert, F. et al. The Price of Privacy. Bus Inf Syst Eng 61, 413–431 (2019). https://doi.org/10.1007/s12599-018-0528-2
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s12599-018-0528-2