Skip to main content
Log in

Invisible market for online personal data: An examination

  • Research Paper
  • Published:
Electronic Markets Aims and scope Submit manuscript

Abstract

Despite the widespread knowledge that corporations collect and exchange user online personal data (OPD) between themselves in a market for OPD, there have been few attempts to systematically understand the nature and structure of these markets or answer basic questions about the behavior of parties in these markets. This paper addresses these questions using records of data sharing behavior by 218 websites across eight economic sectors. Two datasets, collected 4 years apart, are analyzed using social network analysis (SNA). Findings indicate linear preferential attachment is the most likely coordinating mechanism in the OPD market. Further, this market has a much higher number of brokers (intermediary corporations that facilitate exchange between other corporations) than comparable markets. Building on these findings, implications for research and practice are presented along with future research directions.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

References

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to David Agogo.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Appendix 1

Dataset 1: Spring 2016

Lightbeam was created by Atul Varma, software developer at Mozilla and originally called Collusion. This application made it possible to create a visualization of the network of websites collecting data about one’s browsing behavior on each page they visit online. In February 2012, Mozilla CEO at the time, Gary Kovacs, spoke about Collusion in a TED talk leading to the plugin going viral. In September 2012, Mozilla along with faculty and student researchers at Emily Carr University of Art + Design extended this plugin and relaunched it as Lightbeam in 2013. This application was supported by the Ford Foundation and the Natural Sciences and Engineering Research Council (NSERC). The full reference for this application and its source code can be accessed from: https://github.com/mozilla/lightbeam/blob/master/doc/data_format.v1.1.md

The data retrieved from Lightbeam has the following layout:

[source, target, timestamp, contentType, cookie, sourceVisited, secure, sourcePathDepth, sourceQueryDepth, sourceSub, targetSub, method, status, cacheable]

For instance, [“nytimes.com“, “doubleclick.net“, 1,456,366,106,722, “text\/html”, true, false, true, 1, 0, “www.”, “cm.g.”, “GET”, 204, true, false]

WHOIS Lookup is a query and response protocol that is used to query internet registry databases that store the registered users or assignees of an internet resource, such as a domain name, an IP address block or an autonomous system

Steps in creating the dataset:

  1. 1.

    Selected 8 different economic sectors

  2. 2.

    Identified the top 20–25 ranked websites in each sector using Alexa

  3. 3.

    Visited the homepage only of the top ranked websites

  4. 4.

    Save the data on websites that ‘talked’ to the visited page using Lightbeam

  5. 5.

    Retrieve the name of the corporation that owns each website in the dataset using a WHOIS look up tool.

Dataset 2 Spring 2020

OpenWpm is an automated web privacy measurement framework that makes it easy to collect data from thousands to millions of websites. It is built on top of Firefox and runs in a windowed or windowless state, crawling the provided list of websites automatically and according to configurations supplied. This tool is still in active development. The full reference for this application and its source code can be accessed from: https://github.com/mozilla/OpenWPM

The data retrieved from OpenWpm is in form of an SQLite database with different tables. More information about the Http requests table can be found at this link: https://github.com/mozilla/OpenWPM/wiki/Instrumentation-Schema-Documentation#http-requests

Steps in creating the dataset:

  1. 1.

    Crawled websites from dataset 1 using OpenWpm script

  2. 2.

    Extracted the same information as used for dataset 1

  3. 3.

    Retrieve the name of the corporation that owns each website in the dataset using WHOIS records collected for dataset 1.

  4. 4.

    Updated WhoIS records for those websites that were new in this dataset.

Complete List of Websites Crawled.

Adult

adam4adam.com

adultfriendfinder.com

cam4.com

cams.com

clips4sale.com

digitalplayground.com

ebaumsworld.com

fetlife.com

flirt4free.com

freeones.com

imlive.com

literotica.com

livejasmin.com

mrskin.com

newgrounds.com

nudevista.com

playboy.com

xnxx.com

youporn.com

planetsuzy.org

squirt.org

furaffinity.net

e-hentai.org

manhunt.net

nhentai.net

eCommerce

amazon.com

bestbuy.com

bhphotovideo.com

costco.com

ebay.com

gap.com

hm.com

homedepot.com

etsy.com

groupon.com

ikea.com

kohls.com

lowes.com

macys.com

netflix.com

newegg.com

nike.com

nordstrom.com

overstock.com

sears.com

steampowered.com

target.com

walmart.com

wayfair.com

amazon.co.uk

Health

drugs.com

medscape.com

express-scripts.com

health.com

healthgrades.com

medicinenet.com

medscape.com

mensfitness.com

menshealth.com

mercola.com

myfitnesspal.com

prevention.com

psychologytoday.com

webmd.com

weightwatchers.com

cdc.gov

fda.gov

kaiserpermanente.org

mayoclinic.org

mayoclinic.org/diseases-conditions

ncbi.nlm.nih.gov/pmc/

nhs.uk

nih.gov

who.int

Hotels

conradhotels3.hilton.com/en/index.html

courtyard.marriott.com

doubletree3.hilton.com/en/index.html

embassysuites3.hilton.com/en/index.html

hamptoninn3.hilton.com/en/index.html

hiltongardeninn3.hilton.com/en/index.html

homewoodsuites3.hilton.com/en/index.html

www.ihg.com/crowneplaza/hotels/us/en/reservation

ihg.com/intercontinental/hotels/gb/en/reservation

marriott.com/towneplace-suites/travel.mi

starwoodhotels.com

starwoodhotels.com/alofthotels/index.html

starwoodhotels.com/design/index.html

starwoodhotels.com/element/index.html

starwoodhotels.com/fourpoints/index.html

starwoodhotels.com/lemeridien/index.html

starwoodhotels.com/luxury/index.html

starwoodhotels.com/sheraton/index.html

starwoodhotels.com/stregis/index.html

starwoodhotels.com/tributeportfolio/index.html

starwoodhotels.com/whotels/index.html

http://www3.hilton.com/en/index.html

http://jw.marriott.com/

http://renaissance-hotels.marriott.com

fairfieldinn.com/

hiltongrandvacations.com/

ihg.com/candlewood/hotels/us/en/reservation

ihg.com/holidayinn/hotels/us/en/reservation

ihg.com/holidayinnexpress/hotels/us/en/reservation

ihg.com/hotelindigo/hotels/us/en/reservation

ihg.com/staybridge/hotels/us/en/reservation

marriott.com

residenceinn.com

ritzcarlton.com

springhillsuites.com

News

go.com

accuweather.com

bloomberg.com

cbsnews.com

cnn.com

drudgereport.com

indiatimes.com

forbes.com

foxnews.com

google.com

reddit.com

huffingtonpost.com

cnn.com

nbcnews.com

yahoo.com

nytimes.com

reuters.com

shutterstock.com

theguardian.com

indiatimes.com

usatoday.com

weather.com

wsj.com

wunderground.com

bbc.co.uk

Social Media

badoo.com

classmates.com

facebook.com

fiverr.com

flickr.com

foursquare.com

hi5.com

hootsuite.com

myspace.com

twitter.com

twitter.com

couchsurfing.com

facebook.com

linkedin.com

pinterest.com

livejournal.com

meetup.com

ning.com

okcupid.com

google.com

skyrock.com

stumbleupon.com

tagged.com

xing.com

last.fm

Society

ancestry.com

yahoo.com

biblegateway.com

complex.com

correios.com.br

dailykos.com

digg.com

esquire.com

legacy.com

match.com

salon.com

siteadvisor.com

slate.com

sulekha.com

theguardian.com

aarp.org

europa.eu

europa.eu

change.org

irs.gov

jw.org

lds.org

nih.gov

japanpost.jp

state.gov

Insurance

aetna.com

aflac.com

allstate.com

anthem.com

aon.com

bcbsm.com

carefirst.com

cigna.com

esurance.com

farmers.com

geico.com

travelers.com

humana.com

libertymutual.com

massmutual.com

metlife.com

nationwide.com

progressive.com

prudential.com

statefarm.com

thehartford.com

usaa.com

vsp.com

fepblue.org

kaiserpermanente.org

Appendix 2

A t-test was performed to compare the incidence of different forms of brokerage between the observed network and the commensurate random networks. The full results of this test by brokerage type, and for each observed network is shown below in Tables 6 and 7. Table 8 contains details of the companies with the most brokerage positions at both times

Table 6 T-tests comparing observed networks to random networks (Time 1)
Table 7 T-tests comparing observed networks to random networks (Time 2)
Table 8 Companies and the number of Brokerage positions occupied (Time 1)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Agogo, D. Invisible market for online personal data: An examination. Electron Markets 31, 989–1010 (2021). https://doi.org/10.1007/s12525-020-00437-0

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s12525-020-00437-0

Keywords

JEL classification

Navigation