Data and Distribution Privacy Under Function Recoverability
Data and Distribution Privacy Under Function Recoverability
Loading...
Files
Publication or External Link
Date
2023
Authors
Nageswaran, Ajaykrishnan
Advisor
Narayan, Prakash
Citation
Abstract
Many data-driven applications entail function computation based on user data, with accompanying requirements of data privacy. In such an application, a user holding data must publicly furnish accurate
attributes of the data (function recoverability or utility) while simultaneously protecting the data or the
data-generating mechanism (privacy). Two broad formulations of function computation with privacy are
considered: "data privacy'" which requires that data or a sensitive attribute of it be protected; and
"distribution privacy'" in which the workings of an underlying data-generating algorithm must remain
under wraps.
The concept of differential privacy rules the data privacy landscape. Our new approach can be viewed
as a complement. It enables exact characterizations of optimal utility versus privacy performance
tradeoffs, and specifies explicit randomized privacy mechanisms for attaining them.
We consider the following set of problems in the data privacy category. A user’s data is represented by
a finite-valued random variable. Given a function of the data, a querier is required to recover, with at
least a prescribed probability, the value of the function on the basis of a query response provided by
the user. The user devises the query response, subject to the recoverability condition, so as to maximize
privacy of the data from the querier. Privacy is measured by the probability of error incurred by the querier
in estimating the data from the query response. We analyze single and multiple independent query
responses, with each response satisfying the recoverability requirement, which provide maximum privacy
to the user. In the former setting, we also consider privacy for a predicate of the user’s data. Achievability
schemes that constitute explicit randomization mechanisms for query responses are given, and their
privacy compared with converse upper bounds.
Under the same recoverability constraint, we introduce a more demanding measure of privacy: list
privacy, measured by the probability that the user data is not contained in a list formed by the querier
based on the query response. We obtain a general converse upper bound for the maximum list privacy
which is shown to be tight for the special case of a binary-valued function.
In another line of work, we consider analog user data represented by a Gaussian random variable.
Given a linear function of the data, a querier is required to recover, with at least a specified accuracy level,
the function value from a query response obtained from the user. The user devises the query response,
subject to the recoverability requirement, so as to maximize privacy of the data from the querier. Both
recoverability and privacy are measured by $\ell_2$-distance criteria. An exact characterization of maximum
user data privacy under the recoverability condition is derived. An explicit optimal privacy mechanism for
the user is given whose privacy is shown to match a converse upper bound.
Finally, turning to distribution privacy, a user generates $n$ independent and identically distributed
data random variables with a probability mass function (a stand-in for a data-generating probabilistic
algorithm) that must be guarded from a querier. The querier must recover, with a prescribed accuracy,
a given function of the data from each of the $n$ query responses upon eliciting them from the user. The
user chooses the data probability mass function and devises the random query responses to maximize
distribution privacy as gauged by the (Kullback-Leibler) divergence between the former and the querier’s
best estimate of it based on the $n$ query responses. A basic achievable lower bound for distribution
privacy is provided that does not depend on $n$ and corresponds to worst-case privacy. Next, upper
and lower bounds for distribution privacy, dependent on $n$, are developed. The former improves
upon worst-case privacy and the latter does so under suitable assumptions; both converge to it as
$n$ grows. The converse and achievability proofs bring out explicit strategies for the user and the querier.
A special feature of all our optimal privacy mechanisms throughout this dissertation is that they are of the
"add-noise" variety and afford ready implementability. Specifically, the user designs a query response by
adding to the function value an appropriate value-dependent noise.