Towards Secure Computation and Learning from Symmetric Private Information Retrieval

Loading...
Thumbnail Image

Files

Publication or External Link

Date

2023

Citation

Abstract

This dissertation focuses on secure computation and learning from the perspective of information-theoretic symmetric private information retrieval (SPIR). Private information retrieval (PIR) is an elementary and classical problem in computer science, aiming to protect the privacy of users. In a typical PIR setting, a user wishes to correctly download the desired message out of a set of messages from multiple non-colluding and replicated databases. This constraint is known as the reliability constraint. Meanwhile, no individual database should learn anything about which message the user has downloaded. This constraint is known as the user privacy constraint. In SPIR, further, the user should learn nothing beyond the particular message it has downloaded. This constraint is known as the database privacy constraint.

As a fundamental problem in secure multi-party computation, two-party private set intersection (PSI) refers to the problem where two parties wish to collaboratively determine the common elements without leaking any further information about the remaining elements to each other. There are three requirements for two-party PSI: At least one of the two parties should be able to decode the intersection correctly (reliability), queried party should not learn the information being queried (user privacy) and the querying party should not learn anything further than what it has queried (database privacy). The last two constraints protect the privacy of the remaining elements in two parties. Thus, the constraints in SPIR and two-party PSI are equivalent and have a one-to-one correspondence. Further, SPIR is a distributed (multi-database) version of $1$-out-of-$K$ oblivious transfer (OT). It is well-known that reliability, user privacy and database privacy contradict each other, and thus, basic single-database SPIR, and basic two-party PSI are infeasible. In order to achieve a valid two-party PSI protocol, our primary solutions are to relax the privacy constraints via use of multiple databases and utilize auxiliary randomness data. Further, we study the secure computation problem via SPIR and apply it to the secure learning problem.

First, we investigate the problem of two-party PSI by using SPIR. It is well-known that single-database SPIR is not feasible. Hence, in our two-party PSI setting, we assume that each party stores its data set across multiple non-colluding and replicated databases instead of a single database. As a relaxation of the previous privacy requirements, the remaining elements of each party only needs to be kept private from each individual database in the other party. As a consequence, we propose a new valid two-party PSI achievable scheme under this new privacy requirement. More concretely, we consider the multi-message SPIR (MM-SPIR) problem, which is an extension of the conventional single-message SPIR (SM-SPIR). In MM-SPIR, the user retrieves multiple messages at a time from the databases. We obtain the capacity of MM-SPIR as a stand-alone result and show that the MM-SPIR capacity equals the SM-SPIR capacity. We also unify the schemes of multi-message PIR (MM-PIR) and MM-SPIR by proposing a new capacity-achieving MM-SPIR scheme. Finally, we establish the equivalence between two-party PSI and MM-SPIR with i.i.d.~messages of length $1$ through an incidence vector mapping, which implies the minimum download cost for two-party PSI.

Second, we investigate the problem of SPIR with user-side common randomness where the user is provided with a random subset of the shared database common randomness, which is unknown to the databases. We determine the exact capacity region of the triple $(d, \rho_S, \rho_U)$, where $d$ is the download cost, $\rho_S$ is the amount of shared database common randomness, and $\rho_U$ is the amount of available user-side common randomness. With an appropriate amount of $\rho_U$, this new SPIR can achieve the conventional PIR capacity. As a corollary, single-database SPIR becomes feasible. Consequently, the user-side common randomness in SPIR can be deemed as auxiliary randomness data and then applied to two-party PSI problem where each party now possesses only a single database. Likewise, the minimum download cost for two-party PSI in this case is derived.

Third, we consider the problem of multi-party PSI (MP-PSI). In MP-SPI, several parties wish to jointly determine the intersection of their respective elements while protecting the information about the remaining elements. MP-PSI is a non-trivial extension of the two-party PSI as it cannot be implemented via multiple use of two-party PSI. We propose a new information-theoretic MP-PSI scheme that builds on the connection between PSI and MM-SPIR. Our scheme is a non-trivial generalization of the two-party PSI scheme since it needs an intricate design of the shared common randomness among the parties. With the aid of this auxiliary randomness data, we show that our scheme does not incur any penalty, in terms of the download cost, due to the more stringent privacy constraints in MP-PSI compared to two-party PSI.

Fourth, as the communication cost in the PSI problem consists of upload cost and download cost, we consider the total (upload plus download) communication cost of SPIR with a focus on two-database setting through its relationships to conditional disclosure of secrets (CDS) and conditional disclosure of multiple secrets (CDMS). In CDS, two parties each holding an individual input and sharing a common secret wish to disclose this secret to an external party efficiently when their inputs satisfy some condition. As a natural extension of CDS, in CDMS, two parties share multiple i.i.d.~common secrets. Inspired by the equivalence between a special configuration of CDMS and two-database SPIR, we design download cost efficient SPIR schemes for given upload cost using bipartite graph representations of CDS and CDMS.

Fifth, as a novel and interesting extension of SPIR, we consider the problem of random SPIR (RSPIR) where the user does not pick a specific message to download, instead is content with any one of the messages stored in the databases. This is digital blind box, also known as gachapon, which implements this specified setting with physical objects for entertainment. This is also the blind version of $1$-out-of-$K$ OT, an important cryptographic primitive. We explore the capacity of two-database RSPIR and provide its corresponding achievable scheme.

Sixth, we consider the private set union (PSU) problem and then apply it to the federated submodel learning (FSL) problem. As a dual problem of two-party PSI, two-party PSU refers to the problem where two parties wish to collaboratively compute the union of their elements without revealing anything beyond this union to each other. Hence, we also establish the equivalence between two-party PSU and MM-SPIR. In FSL, the full learning model in the server is divided into multiple submodels such that a large number of clients collectively update the model by downloading only the needed submodels and uploading the corresponding increments while keeping clients' local training data private from the server. As a conventional approach in FSL, secure aggregation ensures that the server can only learn the aggregate model from the clients and nothing beyond that. By contrast, if one of the parties is selected as a leader party to derive the union, multi-party PSU (MP-PSU) requires that this leader party obtains only the union from the remaining parties and nothing beyond that. By unifying secure aggregation and MP-PSU in the same framework, we propose a new FSL scheme that achieves low communication cost, and is also robust against client drop-outs, client late-arrivals and database drop-outs.

Finally, seventh, we consider the FSL problem in a distributed storage system. The server now comprises multiple independent databases and the full model is stored through ramp secure regenerating coding (RSRC) across these databases. This novel RSRC mechanism is proposed to resolve passive eavesdropper and database failure issues together in an efficient manner by allowing the eavesdropper to learn a controllable amount of submodel information. Our new RSRC-based distributed FSL approach is constructed on the basis of our previous two-database FSL scheme which uses PSU. In addition to the previous robustness against client drop-outs, client late-arrivals and database drop-outs, and newly-added robustness against database failures and a passive eavesdropper, this advanced FSL approach also guarantees robustness against an active adversary.

Notes

Rights