Studies in Differential Privacy and Federated Learning
Files
Publication or External Link
Date
Authors
Advisor
Citation
DRUM DOI
Abstract
In the late 20th century, Machine Learning underwent a paradigm shift from model-driven to data-driven design. Rather than field specific models, advances in sensors, data storage, and computing power enabled the collection of increasing amounts of data. The abundance of new data allowed researchers to fit flexible models directly to observed data. The influx of information made possible numerous advances, including the development of novel medicines, increases in efficiency of markets, and the proliferation of vast sensor networks.
However, not all data should be freely accessible. Sensitive medical records, personal finances, and private IDs are all currently stored on digital devices across the world with the expectation that they remain private. However, at the same time, such data is frequently instrumental in the development of predictive models. Since the beginning of the 21st century, researchers have recognized that traditional methods of anonymizing data are inadequate for protecting client identities. This dissertation's primary focus is the advancement of two fields of data privacy: Differential Privacy and Federated Learning.
Differential Privacy is one of the most successful modern privacy methods. By injecting carefully structured noise into a dataset, Differential Privacy obscures individual contributions while allowing researchers to extract meaningful information from the aggregate. Within this methodology, the Gaussian mechanism is one of the most common privacy mechanisms due to its favorable properties such as the ability of each client to apply noise locally before transmission to a server. However, the use of this mechanism yields only an approximate form of Differential Privacy. This dissertation introduces the first in-depth analysis of the Symmetric alpha-Stable (SaS) privacy mechanism, demonstrating its ability to achieve pure-Differential Privacy while retaining local applicability. Based on these findings, the dissertation advocates for using the SaS privacy mechanism in protecting the privacy of client data.
Federated Learning is a sub-field of Machine Learning, which trains Machine Learning models across a collection (federation) of client devices. This approach aims to protect client privacy by limiting the type of information that clients transmit to the server. However, this distributed environment poses challenges such as non-uniform data distributions and inconsistent client update rates, which reduces the accuracy of trained models. To overcome these challenges, we introduce Federated Inference, a novel algorithm that we show is consistent in federated environments. That is, even when the data is unevenly distributed and the clients' responses to the server are staggered in time (asynchronous), the algorithm is able to converge to the global optimum.
We also present a novel result in system identification in which we extend a method known as Dynamic Mode Decomposition to accommodate input delayed systems. This advancement enhances the accuracy of identifying and controlling systems relevant to privacy-sensitive applications such as smart grids and autonomous vehicles.
Privacy is increasingly pertinent, especially as investments in computer infrastructure constantly grow in order to cater to larger client bases. Privacy failures impact an ever-growing number of individuals. This dissertation reports on our efforts to advance the toolkit of data privacy tools through novel methods and analysis while navigating the challenges of the field.