- ItemUnderstanding and Enriching the Algorithmic Reasoning Capabilities of Deep Learning Models(2023) Li, Jingling; Dickerson, John; Computer Science; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)Learning to reason is an essential step to achieving general intelligence. My research has been focusing on empowering deep learning models with the abilities to generalize efficiently, extrapolate to out-of-distribution data, learn under noisy labels, and make better sequential decisions --- all of these require the models to have varying levels of reasoning capabilities. As the reasoning process can be described as a step-by-step algorithmic procedure, understanding and enriching the algorithmic reasoning capabilities has drawn increasing attention in deep learning communities. To bridge algorithms and neural networks, we propose a framework, algorithmic alignment, which connects neural networks with algorithms in a novel manner and advances our understanding of how these two fields can work together to solve complex reasoning tasks. Intuitively, the algorithmic alignment framework evaluates how well a neural network's computation structure aligns with the algorithmic structure in a reasoning process. Utilizing algorithmic alignment, we are able to understand the limitations of machine learning models in the context of reasoning (e.g., generalization, extrapolation, and robustness) and know where to make improvements. Beyond this framework, we also investigate how to make better sequential decisions, during which we introduce a class of efficient approaches, hindsight learning, which allows us to leverage the knowledge inside existing human-designed algorithms to make better sequential decisions under uncertainty. First, why and when does a network structure generalize better than others, despite having equal expressive power? In studying this, we unify seemingly different reasoning tasks, such as intuitive physics, visual question answering, and shortest paths, via the lens of a powerful algorithmic paradigm, dynamic programming (DP). We then show that Graph Neural Network (GNNs)---a popular reasoning model---algorithmically align well with DP, which explains their empirical success on many reasoning tasks. At the same time, using this framework, we also suggest the limitations of GNNs and demonstrate how we can use algorithmic alignment to design a structured network given a particular task. Second, what does a neural network learn outside the support of the training distribution? To answer this, we identify conditions under which MLPs and GNNs extrapolate well.We first quantify the observation that ReLU MLPs quickly converge to linear functions along any direction from the origin, implying that ReLU MLPs do not extrapolate most nonlinear functions. Yet, we prove that MLPs can successfully extrapolate a linear target function given sufficient training support. These results suggest a hypothesis on GNNs for which we provide theoretical and empirical evidence: the success of GNNs in extrapolating algorithmic tasks to new data (e.g., larger graphs or edge weights) relies on encoding task-specific non-linearities in the architecture or features. Third, how does a neural network’s architecture impact its robustness to noisy labels? To figure this out, we use the proposed framework connecting a network's robustness to its architectural alignments with the target/noise functions. We hypothesize that a network is more robust to noisy labels if its architecture is more aligned with the target function than the noise, and we provide both theoretical and empirical evidence across various neural network architectures and different domains. Last but not least, how can we design sample-efficient approaches for sequential decision-making problems? Many decision-making problems especially the ones in resource management have a bunch of human-designed heuristics that work well in practice. These heuristics could provide more guidance to RL agents than just the environmental reward. At the same time, we observe that in many resource management problems requiring sequential decision-making under uncertainty, the only uncertainty affecting the decision outcomes are exogenous variables outside the control of the decision-maker in the system, and we can use machine learning to learn from historical patterns of compute demands to help deal with these uncertainties. Therefore, we design a class of data-efficient algorithms termed Hindsight Learning (HL), where we utilize existing human-designed algorithms with historical data to better guide the RL agents.
- ItemScalable Methods for Robust Machine Learning(2023) Levine, Alexander Jacob; Feizi, Soheil; Computer Science; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)In recent years, machine learning systems have been developed that demonstrate remarkable performance on many tasks. However, naive metrics of performance, such as the accuracy of a classifier on test samples drawn from the same distribution as the training set, can provide an overly optimistic view of the suitability of a model for real-world deployment. In this dissertation, we develop models that are robust, in addition to performing well on large-scale tasks. One notion of robustness is adversarial robustness, which characterizes the performance of models under adversarial attacks. Adversarial attacks are small, often imperceptible, distortions to the inputs of machine learning systems which are crafted to substantially change the output of the system. These attacks represent a real security threat, and are especially concerning when machine learning systems are used in safety-critical applications. To mitigate this threat, certifiably robust classification techniques have been developed. In a certifiably robust classifier, for each input sample, in addition to a classification, the classifier also produces a certificate, which is a guaranteed lower bound on the magnitude of any perturbation required to change the classification. Existing methods for certifiable robustness have significant limitations, which we address in Parts I and II of this dissertation: (i) Currently, randomized smoothing techniques are the only certification techniques that are viable for large-scale image classification (i.e. ImageNet). However, randomized smoothing techniques generally provide only high-probability, rather than exact, certificate results. To address this, we develop deterministic randomized smoothing-based algorithms, which produce exact certificates with finite computational costs. In particular, in Part I of this dissertation, we present to our knowledge the first deterministic, ImageNet-scale certification methods under the L_1, L_p (for p < 1), and "L_0" metrics. (ii) Certification results only apply to particular metrics of perturbation size. There is therefore a need to develop new techniques to provide provable robustness against different types of attacks. In Part II of this dissertation, we develop randomized smoothing-based algorithms for several new types of adversarial perturbation, including Wasserstein adversarial attacks, Patch adversarial attacks, and Data Poisoning attacks. The methods developed for Patch and Poisoning attacks are also deterministic, allowing for efficient exact certification. In Part III of this dissertation, we consider a different notion of robustness: test-time adaptability to new objectives in reinforcement learning. This is formalized as goal-conditioned reinforcement learning (GCRL), in which each episode is conditioned by a new "goal," which determines the episode's reward function. In this work, we explore a connection between off-policy GCRL and knowledge distillation, which leads us to apply Gradient-Based Attention Transfer, a knowledge distillation technique, to the Q-function update. We show, empirically and theoretically, that this can improve the performance of off-policy GCRL when the space of goals is high-dimensional.
- ItemThe Limitations of Deep Learning Methods in Realistic Adversarial Settings(2023) Kaya, Yigitcan; Dumitras, Tudor; Computer Science; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)The study of adversarial examples has evolved from a niche phenomenon to a well-established branch of machine learning (ML). In the conventional view of an adversarial attack, the adversary takes an input sample, e.g., an image of a dog, and applies a deliberate transformation to this input, e.g., a rotation. This then causes the victim model to abruptly change its prediction, e.g., the rotated image is classified as a cat. Most prior work has adapted this view across different applications and provided powerful attack algorithms as well as defensive strategies to improve robustness. The progress in this domain has been influential for both research and practice and it has produced a perception of better security. Yet, security literature tells us that adversaries often do not follow a specific threat model and adversarial pressure can exist in unprecedented ways. In this dissertation, I will start from the threats studied in security literature to highlight the limitations of the conventional view and extend it to capture realistic adversarial scenarios. First, I will discuss how adversaries can pursue goals other than hurting the predictive performance of the victim. In particular, an adversary can wield adversarial examples to perform denial-of-service against emerging ML systems that rely on input-adaptiveness for efficient predictions. Our attack algorithm, DeepSloth, can transform the inputs to offset the computational benefits of these systems. Moreover, an existing conventional defense is ineffective against DeepSloth and poses a trade-off between efficiency and security. Second, I will show how the conventional view leads to a false sense of security for anomalous input detection methods. These methods build modern statistical tools around deep neural networks and have shown to be successful in detecting conventional adversarial examples. As a general-purpose analogue of blending attacks in security literature, we introduce the Statistical Indistinguishability Attack (SIA). SIA bypasses a range of published detection methods by producing anomalous samples that are statistically similar to normal samples. Third, and finally, I will focus on malware detection with ML, a domain where adversaries gain leverage over ML naturally without deliberately perturbing inputs like in the conventional view. Security vendors often rely on ML for automating malware detection due to the large volume of new malware. A standard approach for detection is collecting runtime behaviors of programs in controlled environments (sandboxes) and feeding them to an ML model. I have first observed that a model trained using this approach performs poorly when it is deployed on program behaviors from realistic, uncontrolled environments, which gives malware authors an advantage in causing harm. We attribute this deterioration to distribution shift and investigate possible improvements by adapting modern ML techniques, such as distributionally robust optimization. Overall, my dissertation work has reinforced the importance of considering comprehensive threat models and applications with well-documented adversaries for properly assessing the security risks of ML.
- ItemRESILIENT AND EFFICIENT CONSENSUS UNDER UNKNOWN NETWORK CONDITIONS(2023) Blum, Erica 3645 SE Woodstock, Portland, OR, 97202; Katz, Jonathan; Computer Science; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)Large-scale distributed services need to provide high throughput and low latency without sacrificing resilience. In particular, even if some servers crash or malfunction, the system as a whole should remain consistent. This challenge has been studied extensively in distributed computing and cryptography in the form of consensus algorithms. A consensus algorithm is an interactive protocol that allows honest (non-faulty) nodes to agree on a shared output in the presence of Byzantine (faulty) nodes, who may behave arbitrarily. Consensus algorithms have a long history in distributed computing, and are now receiving even more attention in the context of blockchain systems.Consensus has frequently been studied in the context of two contrasting network models. In the synchronous network model, any message sent by an honest party will be delivered within a fixed bound; this bound is known to all parties and may be used as a protocol parameter. In the asynchronous network model, messages may be delayed for arbitrary lengths of time. For certain consensus problems and settings, the optimal fault tolerance is higher in the synchronous model than the asynchronous model (all else being equal). For example, assuming a public key infrastructure (PKI), the fundamental problem of Byzantine agreement (BA) for n parties is feasible for t < n/2 faults in the synchronous model, compared to only t < n/3 in the asynchronous model. On the other hand, synchronous consensus protocols can become stuck or even lose consistency if delays exceed the fixed bound. In this dissertation, we consider a novel network-agnostic notion of security. Our central contribution is a suite of consensus protocols that achieve precisely defined security guarantees when run in either a synchronous or asynchronous network model, even when the parties are unaware of the network’s true status. In addition, we provide matching impossibility results characterizing the best-possible security guarantees for this setting. We conclude by exploring a natural extension to network-agnostic security, in which protocols must remain secure in a setting where the underlying network status is not only unknown, but may switch between synchrony and asynchrony during a single protocol execution.
- ItemCOMPLEXITY CONTROLLED NATURAL LANGUAGE GENERATION(2023) Agrawal, Sweta; Carpuat, Marine; Computer Science; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)Generating text at the right level of complexity for its target audience so it can be easily understood by its target audience has the potential to make information more accessible to a wider range of people, including non-native speakers, language learners, and people who suffer from language or cognitive impairments. For example, a native Hindi speaker learning English might prefer reading a U.S. news article in English or Hindi tailored to their vocabulary and language proficiency level. Natural Language Generation (NLG), the use of computational models to generate human-like text, has been used to empower countless applications – from automatically summarizing financial and weather reports to enabling communication between multilingual communities through automatic translation. Although NLG has met some level of success, current models ignore that there are many valid ways of conveying the same information in a text and that selecting the appropriate variation requires knowing who the text is written for and its intended purpose. To address this, in this thesis, we present tasks, datasets, models, and algorithms that are designed to let users specify how simple or complex the generated text should be in a given language. We introduce the Complexity Controlled Machine Translation task, where the goal is to translate text from one language to another at a specific complexity level defined by the U.S. reading grade level. While standard machine translation (MT) tools generate a single output for each input, the models we design for this task produce translation at various complexity levels to suit the needs of different users. In order to build such models, we ideally require rich annotation and resources for supervised training, i.e., examples of the same input text paired with several translations in the output language, which is not available in most datasets used in MT. Hence, we have also contributed datasets that can enable the generation and evaluation of complexity-controlled translations. Furthermore, recognizing that when humans simplify a complex text in a given language, they often revise parts of the complex text according to the intended audience, we present strategies to adopt general-purpose Edit-based Non-Autoregressive models for controllable text simplification (TS). In this framing, the simplified output for a desired grade level is generated through a sequence of edit operations like deletions and insertions applied to the complex input sequence. As the model needs to learn to perform a wide range of edit operations for different target grade levels, we introduce algorithms to inject additional guidance during training and inference, which results in improved output quality while also providing users with the specific changes made to the input text. Finally, we present approaches to adapt general-purpose controllable TS models that leverage unsupervised pre-training and low-level control tokens describing the nature of TS edit operations as side constraints for grade-specific TS. Having developed models that can enable complexity-controlled text generation, in the final part of the thesis, we introduce a reading comprehension-based human evaluation framework that is designed to assess the correctness of texts generated by these systems using multiple-choice question-answering. Furthermore, we evaluate whether the measure of correctness (via the ability of native speakers to answer the questions correctly using the simplified texts) is captured by existing automatic metrics that measure text complexity or meaning preservation.