Modeling, Quantifying, and Limiting Adversary Knowledge

Thumbnail Image
Publication or External Link
Mardziel, Piotr
Hicks, Michael
Users participating in online services are required to relinquish control over potentially sensitive personal information, exposing them to intentional or unintentional miss-use of said information by the service providers. Users wishing to avoid this must either abstain from often extremely useful services, or provide false information which is usually contrary to the terms of service they must abide by. An attractive middle-ground alternative is to maintain control in the hands of the users and provide a mechanism with which information that is necessary for useful services can be queried. Users need not trust any external party in the management of their information but are now faced with the problem of judging when queries by service providers should be answered or when they should be refused due to revealing too much sensitive information. Judging query safety is difficult. Two queries may be benign in isolation but might reveal more than a user is comfortable with in combination. Additionally malicious adversaries who wish to learn more than allowed might query in a manner that attempts to hide the flows of sensitive information. Finally, users cannot rely on human inspection of queries due to its volume and the general lack of expertise. This thesis tackles the automation of query judgment, giving the self-reliant user a means with which to discern benign queries from dangerous or exploitive ones. The approach is based on explicit modeling and tracking of the knowledge of adversaries as they learn about a user through the queries they are allowed to observe. The approach quantifies the absolute risk a user is exposed, taking into account all the information that has been revealed already when determining to answer a query. Proposed techniques for approximate but sound probabilistic inference are used to tackle the tractability of the approach, letting the user tradeoff utility (in terms of the queries judged safe) and efficiency (in terms of the expense of knowledge tracking), while maintaining the guarantee that risk to the user is never underestimated. We apply the approach to settings where user data changes over time and settings where multiple users wish to pool their data to perform useful collaborative computations without revealing too much information. By addressing one of the major obstacles preventing the viability of personal information control, this work brings the attractive proposition closer to reality.