Understanding and Addressing Online Tracking

dc.contributor.advisorMazurek, Michelle L.en_US
dc.contributor.authorReitinger, Nathanen_US
dc.contributor.departmentComputer Scienceen_US
dc.contributor.publisherDigital Repository at the University of Marylanden_US
dc.contributor.publisherUniversity of Maryland (College Park, Md.)en_US
dc.date.accessioned2025-09-15T05:43:58Z
dc.date.issued2025en_US
dc.description.abstractDespite the growing international conversation around data and privacy protection, vis-à-vis regulations like the European Union’s (EU) General Data Protection Regulation and California’s Consumer Protection Act, online privacy, today, is at risk. How should users protect themselves from online tracking that they cannot see and have very little control over (i.e., surreptitious or stateless tracking practices)? What, according to the individuals themselves who are being tracked, are considered socially acceptable or unacceptable tracking practices? And where can policymakers look to find out how their new data and privacy protection laws are working on the ground—by the actual companies tasked with complying with those laws? To address these issues, I first set out to provide users with a tool to help them protect their online identities. My collaborators and I focused on a cookie-less form of tracking known as canvas fingerprinting. We measured the web’s state of canvas fingerprinting in a half-a-million website scrape, finding that the use of the canvas for fingerprinting was increasing: from five, to seven, to 11 percent on popular websites in the years 2014, 2016, and 2018, respectively. We then built a state-of-the-art supervised machine learning model to predict when websites engage in this type of tracking: ML-CB. Our new tool outperformed the prior state of the art by approximately ten percentage points per F1 score (i.e., the harmonic mean between precision and recall). In short, we offer an accurate and robust tracking blocker that users can leverage to keep themselves more private online—assuming, of course, that this type of tracker blocker is a type that users would want to use, a question I addressed in follow-up work. ML-CB was a state-of-the-art canvas tracking blocker, but the tool itself begs the question: Is the type of tracker blocking offered by ML-CB a type that users want? To address this question, I next turned to understanding the norms surrounding online tracking. My collaborators and I conducted a two-part user study canvassing participants’ opinions toward online tracking after being exposed to tracking artifacts, using a custom Chrome browser extension visualizing “the tracker’s perspective” of the participant. Participants used the extension for one week and provided their opinions on visualizations, like when a tracker might infer they go to bed at night or what potentially sensitive interests a tracker may think they have. The work provided strong empirical evidence that users still (after many years of online tracking proliferating on the web) disliked tracking, with over 80% of participants finding at least one of the visualizations of tracking “creepy.” At the same time, the work also found that users are largely heterogeneous when it comes to agreeing on which aspects of tracking (i.e., which visualizations) are creepy. Even for the visualization with the most participants agreeing it was creepy, no more than 66% of participants agreed that the visualization was creepy. What this means is that this problem is ripe for regulation—if users are generally perceiving tracking as creepy, but cannot themselves identify which aspects of tracking should be limited, then that is a great place for a regulator to step in. And this is exactly what is happening with the worldwide efforts to enact data and privacy protection laws. The problem, however, is that protecting privacy and data—i.e., tracking-adjacent properties—is a difficult task. As I found in my work on tracking transparency, some efforts to protect data are necessary, but the difficulty lies in how those efforts take shape. Guiding legislative efforts is a growing field of research, where researchers are measuring compliance with privacy and data protection laws. These research efforts, however, face many obstacles—law is nuanced, ambiguous, and carries consequences that can be monetary or reputational; even in the case of compliance, the specter of non-compliance can be damaging. In turn, research in this field could help address the complex problems of privacy and data protection regulations, but would benefit from procedural-focused, field-level systematization. To address this issue, my collaborators and I systematized prior work measuring the legal compliance of privacy and data protection laws. We find that most prior work focuses on web analysis (43%) and almost all researchers focus on the GDPR (77%). Some researchers note legal exceptions as possibly being applicable (26%), but few researchers investigate these exceptions. Less than half of papers detailed how they approached ethics (40%), and those that did most frequently mentioned institutional ethics review. I end this work with a number of recommendations, including how future researchers may want to clearly articulate the goals of studying compliance, address legal ambiguity with legal resources as opposed to homegrown resources, and responsibly disclose results of non-compliance.en_US
dc.identifierhttps://doi.org/10.13016/nssg-ixfv
dc.identifier.urihttp://hdl.handle.net/1903/34691
dc.language.isoenen_US
dc.subject.pqcontrolledComputer scienceen_US
dc.subject.pquncontrolledComplianceen_US
dc.subject.pquncontrolledEmpiricalen_US
dc.subject.pquncontrolledLawen_US
dc.subject.pquncontrolledPrivacyen_US
dc.subject.pquncontrolledRegulationen_US
dc.subject.pquncontrolledSurveillanceen_US
dc.titleUnderstanding and Addressing Online Trackingen_US
dc.typeDissertationen_US

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Reitinger_umd_0117E_25526.pdf
Size:
23.71 MB
Format:
Adobe Portable Document Format