Private Information Retrieval and Security in Networks
Files
Publication or External Link
Date
Authors
Advisor
Citation
DRUM DOI
Abstract
This dissertation focuses on privacy and security issues in networks from an information-theoretic perspective. Protecting privacy requires protecting the identity of the desired message from the data source. This is highly desirable in next-generation networks, where data-mining techniques are present everywhere. Ensuring security requires that the data content is not interpretable by non-authorized nodes. This is critical in wireless networks, which are inherently open.
We first focus on the privacy issue by investigating the private information retrieval (PIR) problem. PIR is a canonical problem to study the privacy of the downloaded content from public databases. In PIR, a user wishes to retrieve a file from distributed databases, in such a way that no database can know the identity of the user's desired file. PIR schemes need to be designed to be more efficient than the trivial scheme of downloading all the files stored in the databases. Fundamentally, PIR lies at the intersection of computer science, information theory, coding theory, and signal processing.
The classical PIR formulation makes the following assumptions: The content is exactly replicated across the databases; the user wishes to retrieve a single file privately; the databases do not collude; the databases answer the user queries truthfully; the database answers go through noiseless orthogonal channels; there are no external security threats; the answer strings have unconstrained symmetric lengths. These assumptions are too idealistic to be practical in modern systems. In this thesis, we introduce extended versions of the classical PIR problem to be relevant to modern applications, namely: PIR from coded databases, multi-message PIR, PIR from colluding and Byzantine databases, PIR under asymmetric traffic constraints, noisy PIR, and PIR from wiretap channel II. We characterize the fundamental limits of such problems from an information-theoretic perspective. This involves two parts: first, we devise a practical scheme that retrieves the desired file(s) correctly and privately; second, we mathematically prove that no other retrieval scheme can achieve any higher rate than the proposed scheme. The optimal retrieval rate is called the PIR capacity reminiscent of the capacity of communication channels.
First, we consider PIR from MDS-coded databases. Due to node failures and erasures that arise naturally in any storage system, redundancy should be introduced. However, replicating the content across the databases incurs high storage cost. This motivates the content of the databases to be coded instead of merely being replicated. We investigate the PIR problem from MDS-coded databases. We determine the optimal retrieval scheme for this problem, and characterize the exact PIR capacity. The result implies a fundamental tradeoff between the retrieval cost and the storage cost.
Second, we consider the multi-message PIR. In this problem, the user is interested in retrieving multiple files from the databases without revealing the identities of these messages. We show that multiple messages can be retrieved more efficiently than retrieving them one-by-one in a sequence. When the user wishes to retrieve at least half of the files stored in the databases, we characterize the exact capacity of the problem by proposing a novel scheme that downloads MDS-coded mixtures of all messages. For all other cases, we develop a near-optimal scheme which is optimal if the ratio between the total number of files and the number of desired files is an integer.
Third, we consider PIR from colluding and Byzantine databases. In this problem, a subset of the databases, called Byzantine databases, can return arbitrarily corrupted answers. In addition, a subset of the databases can collude by exchanging user queries. The errors introduced by the Byzantine databases can be unintentional (if databases store outdated message set), or even worse, can be intentional (as in the case of maliciously controlled databases). We propose a Byzantine and collusion resilient retrieval scheme, and determine the exact PIR capacity for this problem. The capacity expression reveals that the effect of the Byzantine databases is equivalent to removing twice the number of Byzantine databases from the system.
Fourth, we consider PIR under asymmetric traffic constraints. A common property of the schemes constructed for the existing PIR settings is that they exhibit a symmetric structure across the databases. In practice, this may be infeasible, for instance when the links from the databases have different capacities. To that end, we develop a novel upper bound for the PIR capacity that incorporates the traffic asymmetry. We propose explicit achievability schemes for specific traffic ratios. For any other traffic ratio, we employ time-sharing. Our results show that asymmetry fundamentally hurts the retrieval rate.
Fifth, we consider noisy PIR, where the returned answers reach the user via noisy channel(s). This is motivated by practical applications, such as, random packet dropping, random packet corruption, and PIR over wireless networks. We consider two variations of the problem, namely: noisy PIR with orthogonal links, and PIR from multiple access channel. For noisy PIR with orthogonal links, we show that channel coding and retrieval scheme are almost separable in the sense that the noisy channel affects only the traffic ratio. For the PIR problem from multiple access channel, the output of the channel is a mixture of all the answers returned by the databases. In this case, we show explicit examples, where the channel coding and the retrieval scheme are inseparable, and the privacy may be achieved for free.
Sixth, we consider PIR from wiretap channel II. In this problem, there is an external eavesdropper who wishes to learn the contents of the databases by observing portions of the traffic exchanged between the user and the databases during the PIR process. The databases must encrypt their responses such that the eavesdropper learns nothing from its observation. We design a retrieval code that satisfies the combined privacy and security constraints. We show the necessity of using asymmetric retrieval schemes which build on our work on PIR under asymmetric traffic constraints.
Next, we focus on the security problem in multi-user networks by physical layer techniques. Physical layer security enables secure transmission of information without a need for encryption keys. Hence, it mitigates the problems associated with exchanging encryption keys across open wireless networks. Existing work in physical layer security makes the following assumptions: All nodes are altruistic and follow a prescribed transmission policy to maximize the secure rate of the entire system; the channel inputs to Gaussian channels are constrained by a total transmitter-side power constraint; and in secure degrees of freedom studies for interference channels, users have a single antenna each. We address these issues by investigating the MIMO interference channel with confidential messages, security in networks with user misbehavior, and MIMO wiretap channel under receiver-side power constraints. We characterize the optimal secure transmission strategies in terms of the secrecy capacity and its high-SNR approximation, the secure degrees of freedom (s.d.o.f.).
First, we determine the exact s.d.o.f. region of the two-user MIMO interference channel with confidential messages (ICCM). To that end, we propose a novel achievable scheme for the 2x2 ICCM system, which is a building block for any other antenna configuration. We show that the s.d.o.f. region starts as a square region, then it takes the shape of an irregular polytope until it returns back to a square region when the number of transmit antennas is at least twice the number of receiving antennas.
Second, we investigate the security problem in the presence of user misbehavior. We consider the following multi-user scenarios: Multiple access wiretap channel with deviating users who do not follow agreed-upon optimum protocols, where we quantify the effect of user deviations and propose counter-strategies for the honest users; the broadcast channel with confidential messages in the presence of combating helpers, where we show that the malicious intentions of the helpers are neutralized and the full s.d.o.f. is retained; and interference channel with confidential messages when the users are selfish and have conflicting interests, where we show that selfishness precludes secure communication and no s.d.o.f. is achieved.
Third, we consider the MIMO wiretap channel with a receiver-side minimum power constraint in addition to the usual transmitter-side power constraint. This problem is motivated by energy harvesting communications with wireless energy transfer, where an added goal is to deliver a minimum amount of energy to a receiver in addition to delivering secure data to another receiver. We prove that the problem is equivalent to solving a secrecy capacity problem with a double-sided correlation matrix constraint on the channel input. We extend the channel enhancement technique to our setting. We propose two optimum schemes that achieve the optimum rate: Gaussian signaling with a fixed mean and Gaussian signaling with Gaussian artificial noise. We extend our techniques to other related multi-user settings.