Data and Distribution Privacy Under Function Recoverability
Files
Publication or External Link
Date
Authors
Advisor
Citation
Abstract
Many data-driven applications entail function computation based on user data, with accompanying requirements of data privacy. In such an application, a user holding data must publicly furnish accurate attributes of the data (function recoverability or utility) while simultaneously protecting the data or the data-generating mechanism (privacy). Two broad formulations of function computation with privacy are considered: "data privacy'" which requires that data or a sensitive attribute of it be protected; and "distribution privacy'" in which the workings of an underlying data-generating algorithm must remain under wraps.
The concept of differential privacy rules the data privacy landscape. Our new approach can be viewed as a complement. It enables exact characterizations of optimal utility versus privacy performance tradeoffs, and specifies explicit randomized privacy mechanisms for attaining them.
We consider the following set of problems in the data privacy category. A user’s data is represented by a finite-valued random variable. Given a function of the data, a querier is required to recover, with at least a prescribed probability, the value of the function on the basis of a query response provided by the user. The user devises the query response, subject to the recoverability condition, so as to maximize privacy of the data from the querier. Privacy is measured by the probability of error incurred by the querier in estimating the data from the query response. We analyze single and multiple independent query responses, with each response satisfying the recoverability requirement, which provide maximum privacy to the user. In the former setting, we also consider privacy for a predicate of the user’s data. Achievability schemes that constitute explicit randomization mechanisms for query responses are given, and their privacy compared with converse upper bounds.
Under the same recoverability constraint, we introduce a more demanding measure of privacy: list privacy, measured by the probability that the user data is not contained in a list formed by the querier based on the query response. We obtain a general converse upper bound for the maximum list privacy which is shown to be tight for the special case of a binary-valued function.
In another line of work, we consider analog user data represented by a Gaussian random variable. Given a linear function of the data, a querier is required to recover, with at least a specified accuracy level, the function value from a query response obtained from the user. The user devises the query response, subject to the recoverability requirement, so as to maximize privacy of the data from the querier. Both recoverability and privacy are measured by $\ell_2$-distance criteria. An exact characterization of maximum user data privacy under the recoverability condition is derived. An explicit optimal privacy mechanism for the user is given whose privacy is shown to match a converse upper bound.
Finally, turning to distribution privacy, a user generates $n$ independent and identically distributed data random variables with a probability mass function (a stand-in for a data-generating probabilistic algorithm) that must be guarded from a querier. The querier must recover, with a prescribed accuracy, a given function of the data from each of the $n$ query responses upon eliciting them from the user. The user chooses the data probability mass function and devises the random query responses to maximize distribution privacy as gauged by the (Kullback-Leibler) divergence between the former and the querier’s best estimate of it based on the $n$ query responses. A basic achievable lower bound for distribution privacy is provided that does not depend on $n$ and corresponds to worst-case privacy. Next, upper and lower bounds for distribution privacy, dependent on $n$, are developed. The former improves upon worst-case privacy and the latter does so under suitable assumptions; both converge to it as $n$ grows. The converse and achievability proofs bring out explicit strategies for the user and the querier.
A special feature of all our optimal privacy mechanisms throughout this dissertation is that they are of the "add-noise" variety and afford ready implementability. Specifically, the user designs a query response by adding to the function value an appropriate value-dependent noise.