New Notions and Mechanisms for Statistical Privacy
Groce, Adam Dowlin
Many large databases of personal information currently exist in the hands of corporations, nonprofits, and governments. The data in these databases could be used to answer any number of important questions, aiding in everything from basic research to day-to-day corporate decision-making. These questions must be answered while respecting the privacy of the individuals whose data are being used. However, even defining privacy in this setting can be difficult. The standard definition in the field is differential privacy. During the years since its introduction, a wide variety of query algorithms have been found that can achieve meaningful utility while at the same time protecting the privacy of individuals. However, differential privacy is a very strong definition, and in some settings it can seem too strong. Given the difficulties involved in getting differentially private output to all desirable queries, many have looked for ways to weaken differential privacy without losing its meaningful privacy guarantees. Here we discuss two such weakenings. The first is computational differential privacy, originally defined by Mironov et al. We find the promise of this weakening to be limited. We show two results that severely curtail the potential for computationally private mechanisms to add any utility over those that achieve standard differential privacy when working in the standard setting with all data held by a single entity. We then propose our own weakening, coupled-worlds privacy. This definition is meant to capture the cases where reasonable bounds can be placed on the adversary's certainty about the data (or, equivalently, the adversary's auxiliary information). We discuss the motivation for the definition, its relationship to other definitions in the literature, and its useful properties. Coupled-worlds privacy is actually a framework with which specific definitions can be instantiated, and we discuss a particular instantiation, distributional differential privacy, which we believe is of particular interest. Having introduced this definition, we then seek new distributionally differentially private query algorithms that can release useful information without the need to add noise, as is necessary when satisfying differential privacy. We show that one can release a variety of query output with distributional differential privacy, including histograms, sums, and least-squares regression lines.