Computer Science Theses and Dissertations
Permanent URI for this collectionhttp://hdl.handle.net/1903/2756
Browse
9 results
Search Results
Item Developing and Measuring Latent Constructs in Text(2024) Hoyle, Alexander Miserlis; Resnik, Philip; Computer Science; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)Constructs---like inflation, populism, or paranoia---are of fundamental concern to social science. Constructs are the vocabulary over which theory operates, and so a central activity is the development and measurement of latent constructs from observable data. Although the social sciences comprise fields with different epistemological norms, they share a concern for valid operationalizations that transparently map between data and measure. Economists at the US Bureau of Labor Statistics, for example, follow a hundred-page handbook to sample the egg prices that constitute the Consumer Price Index; Clinical psychologists rely on suites of psychometric tests to diagnose schizophrenia. In many fields, this observable data takes the form of language: as a social phenomenon, language data can encode many of the latent social constructs that people care about. Commensurate with both increasing sophistication in language technologies and amounts of available data, there has thus emerged a "text-as-data" paradigm aimed at "amplifying and augmenting" the analyses that compose research. At the same time, Natural Language Processing (NLP), the field from which analysis tools originate, has often remained separate from real-world problems and guiding theories---as least when it comes to social science. Instead, it focuses on atomized tasks under the assumption that progress on low-level language aspects will generalize to higher-level problems that involve overlapping elements. This dissertation focuses on NLP methods and evaluations that facilitate the development and measurement of latent constructs from natural language, while remaining sensitive to social sciences' need for interpretability and validity.Item The Limitations of Deep Learning Methods in Realistic Adversarial Settings(2023) Kaya, Yigitcan; Dumitras, Tudor; Computer Science; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)The study of adversarial examples has evolved from a niche phenomenon to a well-established branch of machine learning (ML). In the conventional view of an adversarial attack, the adversary takes an input sample, e.g., an image of a dog, and applies a deliberate transformation to this input, e.g., a rotation. This then causes the victim model to abruptly change its prediction, e.g., the rotated image is classified as a cat. Most prior work has adapted this view across different applications and provided powerful attack algorithms as well as defensive strategies to improve robustness. The progress in this domain has been influential for both research and practice and it has produced a perception of better security. Yet, security literature tells us that adversaries often do not follow a specific threat model and adversarial pressure can exist in unprecedented ways. In this dissertation, I will start from the threats studied in security literature to highlight the limitations of the conventional view and extend it to capture realistic adversarial scenarios. First, I will discuss how adversaries can pursue goals other than hurting the predictive performance of the victim. In particular, an adversary can wield adversarial examples to perform denial-of-service against emerging ML systems that rely on input-adaptiveness for efficient predictions. Our attack algorithm, DeepSloth, can transform the inputs to offset the computational benefits of these systems. Moreover, an existing conventional defense is ineffective against DeepSloth and poses a trade-off between efficiency and security. Second, I will show how the conventional view leads to a false sense of security for anomalous input detection methods. These methods build modern statistical tools around deep neural networks and have shown to be successful in detecting conventional adversarial examples. As a general-purpose analogue of blending attacks in security literature, we introduce the Statistical Indistinguishability Attack (SIA). SIA bypasses a range of published detection methods by producing anomalous samples that are statistically similar to normal samples. Third, and finally, I will focus on malware detection with ML, a domain where adversaries gain leverage over ML naturally without deliberately perturbing inputs like in the conventional view. Security vendors often rely on ML for automating malware detection due to the large volume of new malware. A standard approach for detection is collecting runtime behaviors of programs in controlled environments (sandboxes) and feeding them to an ML model. I have first observed that a model trained using this approach performs poorly when it is deployed on program behaviors from realistic, uncontrolled environments, which gives malware authors an advantage in causing harm. We attribute this deterioration to distribution shift and investigate possible improvements by adapting modern ML techniques, such as distributionally robust optimization. Overall, my dissertation work has reinforced the importance of considering comprehensive threat models and applications with well-documented adversaries for properly assessing the security risks of ML.Item Predicting Cancer Prognosis and Drug Response from the Tumor Microbiome(2021) Hermida, Leandro Cruz; Ruppin, Eytan; Patro, Robert; Computer Science; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)Tumor gene expression is predictive of patient prognosis in some cancers. However, RNA-seq and whole genome sequencing data contain not only reads from host tumor and normal tissue, but also reads from the tumor microbiome, which can be used to infer the microbial abundances in each tumor. Here, we show that tumor microbial abundances, alone or in combination with tumor gene expression data, can predict cancer prognosis and drug response to some extent – microbial abundances are significantly less predictive of prognosis than gene expression, although remarkably, similarly as predictive of drug response, but in mostly different cancer-drug combinations. Thus, it appears possible to leverage existing sequencing technology, or develop new protocols, to obtain more non-redundant information about prognosis and drug response from RNA-seq and whole genome sequencing experiments than could be obtained from gene expression or mutation data alone.Item Unblock: Interactive Perception for Decluttering(2021) govindaraj, krithika; Aloimonos, Yiannis; Systems Engineering; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)Novel segmentation algorithms can easily identify objects that are occludedor partially occluded, however in highly cluttered scenes the degree of occlusion is so high that some objects may not be visible to a static camera. In these scenarios, humans use action to change the configuration of the environment, elicit more information through perception, process the information before taking the next action. Reinforcement learning models this behavior, however unlike humans, the phase where perception data is understood is not included, as images are directly used as observations. The aim of this thesis is to establish a novel method that indirectly uses perception data for reinforcement learning to address the task of decluttering a scene using a static camera.Item Closing the Gap Between Classification and Retrieval Models(2021) Taha, Ahmed; Davis, Larry; Shrivastava, Abhinav; Computer Science; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)Retrieval networks learn a feature embedding where similar samples are close together, and different samples are far apart. This feature embedding is essential for computer vision applications such as face/person recognition, zero-shot learn- ing, and image retrieval. Despite these important applications, retrieval networks are less popular compared to classification networks due to multiple reasons: (1) The cross-entropy loss – used with classification networks – is stabler and converges faster compared to metric learning losses – used with retrieval networks. (2) The cross-entropy loss has a huge toolbox of utilities and extensions. For instance, both AdaCos and self-knowledge distillation have been proposed to tackle low sample complexity in classification networks; also, both CAM and Grad-CAM have been proposed to visualize attention in classification networks. To promote retrieval networks, it is important to equip them with an equally powerful toolbox. Accordingly, we propose an evolution-inspired approach to tackle low sample complexity in feature embedding. Then, we propose SVMax to regularize the feature embedding and avoid model collapse. Furthermore, we propose L2-CAF to visualize attention in retrieval networks. To tackle low sample complexity, we propose an evolution-inspired training approach to boost performance on relatively small datasets. The knowledge evolution (KE) approach splits a deep network into two hypotheses: the fit-hypothesis and the reset-hypothesis. We iteratively evolve the knowledge inside the fit-hypothesis by perturbing the reset-hypothesis for multiple generations. This approach not only boosts performance but also learns a slim (pruned) network with a smaller inference cost. KE reduces both overfitting and the burden for data collection. To regularize the feature embedding and avoid model collapse, We propose singular value maximization (SVMax) to promote a uniform feature embedding. Our formulation mitigates model collapse and enables larger learning rates. SV- Max is oblivious to both the input-class (labels) and the sampling strategy. Thus it promotes a uniform feature embedding in both supervised and unsupervised learning. Furthermore, we present a mathematical analysis of the mean singular value’s lower and upper bounds. This analysis makes tuning the SVMax’s balancing- hyperparameter easier when the feature embedding is normalized to the unit circle. To support retrieval networks with a visualization tool, we formulate attention visualization as a constrained optimization problem. We leverage the unit L2-Norm constraint as an attention filter (L2-CAF) to localize attention in both classification and retrieval networks. This approach imposes no constraints on the network architecture besides having a convolution layer. The input can be a regular image or a pre-extracted convolutional feature. The network output can be logits trained with cross-entropy or a space embedding trained with a ranking loss. Furthermore, this approach neither changes the original network weights nor requires fine-tuning. Thus, network performance remains intact. The visualization filter is applied only when an attention map is required. Thus, it poses no computational overhead during inference. L2-CAF visualizes the attention of the last convolutional layer ofGoogLeNet within 0.3 seconds. Finally, we propose a compromise between retrieval and classification networks. We propose a simple, yet effective, two-head architecture — a network with both logits and feature-embedding heads. The embedding head — trained with a ranking loss — limits the overfitting capabilities of the cross-entropy loss by promoting a smooth embedding space. In our work, we leverage the semi-hard triplet loss to allow a dynamic number of modes per class, which is vital when working with imbalanced data. Also, we refute a common assumption that training with a ranking loss is computationally expensive. By moving both the triplet loss sampling and computation to the GPU, the training time increases by just 2%.Item Quantum algorithms for machine learning and optimization(2020) Li, Tongyang; Childs, Andrew M.; Computer Science; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)The theories of optimization and machine learning answer foundational questions in computer science and lead to new algorithms for practical applications. While these topics have been extensively studied in the context of classical computing, their quantum counterparts are far from well-understood. In this thesis, we explore algorithms that bridge the gap between the fields of quantum computing and machine learning. First, we consider general optimization problems with only function evaluations. For two core problems, namely general convex optimization and volume estimation of convex bodies, we give quantum algorithms as well as quantum lower bounds that constitute the quantum speedups of both problems to be polynomial compared to their classical counterparts. We then consider machine learning and optimization problems with input data stored explicitly as matrices. We first look at semidefinite programs and provide quantum algorithms with polynomial speedup compared to the classical state-of-the-art. We then move to machine learning and give the optimal quantum algorithms for linear and kernel-based classifications. To complement with our quantum algorithms, we also introduce a framework for quantum-inspired classical algorithms, showing that for low-rank matrix arithmetics there can only be polynomial quantum speedup. Finally, we study statistical problems on quantum computers, with the focus on testing properties of probability distributions. We show that for testing various properties including L1-distance, L2-distance, Shannon and Renyi entropies, etc., there are polynomial quantum speedups compared to their classical counterparts. We also extend these results to testing properties of quantum states.Item Alternating Optimization: Constrained Problems, Adversarial Networks, and Robust Models(2019) Xu, Zheng; Goldstein, Tom; Computer Science; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)Data-driven machine learning methods have achieved impressive performance for many industrial applications and academic tasks. Machine learning methods usually have two stages: training a model from large-scale samples, and inference on new samples after the model is deployed. The training of modern models relies on solving difficult optimization problems that involve nonconvex, nondifferentiable objective functions and constraints, which is sometimes slow and often requires expertise to tune hyperparameters. While inference is much faster than training, it is often not fast enough for real-time applications.We focus on machine learning problems that can be formulated as a minimax problem in training, and study alternating optimization methods served as fast, scalable, stable and automated solvers. First, we focus on the alternating direction method of multipliers (ADMM) for constrained problem in classical convex and nonconvex optimization. Some popular machine learning applications including sparse and low-rank models, regularized linear models, total variation image processing, semidefinite programming, and consensus distributed computing. We propose adaptive ADMM (AADMM), which is a fully automated solver achieving fast practical convergence by adapting the only free parameter in ADMM. We further automate several variants of ADMM (relaxed ADMM, multi-block ADMM and consensus ADMM), and prove convergence rate guarantees that are widely applicable to variants of ADMM with changing parameters. We release the fast implementation for more than ten applications and validate the efficiency with several benchmark datasets for each application. Second, we focus on the minimax problem of generative adversarial networks (GAN). We apply prediction steps to stabilize stochastic alternating methods for the training of GANs, and demonstrate advantages of GAN-based losses for image processing tasks. We also propose GAN-based knowledge distillation methods to train small neural networks for inference acceleration, and empirically study the trade-off between acceleration and accuracy.Third, we present preliminary results on adversarial training for robust models. We study fast algorithms for the attack and defense for universal perturbations, and then explore network architectures to boost robustness.Item Hinge-Loss Markov Random Fields and Probabilistic Soft Logic: A Scalable Approach to Structured Prediction(2015) Bach, Stephen Hilliard; Getoor, Lise; Computer Science; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)A fundamental challenge in developing impactful artificial intelligence technologies is balancing the ability to model rich, structured domains with the ability to scale to big data. Many important problem areas are both richly structured and large scale, from social and biological networks, to knowledge graphs and the Web, to images, video, and natural language. In this thesis I introduce two new formalisms for modeling structured data, distinguished from previous approaches by their ability to both capture rich structure and scale to big data. The first, hinge-loss Markov random fields (HL-MRFs), is a new kind of probabilistic graphical model that generalizes different approaches to convex inference. I unite three views of inference from the randomized algorithms, probabilistic graphical models, and fuzzy logic communities, showing that all three views lead to the same inference objective. I then derive HL-MRFs by generalizing this unified objective. The second new formalism, probabilistic soft logic (PSL), is a probabilistic programming language that makes HL-MRFs easy to define, refine, and reuse for relational data. PSL uses a syntax based on first-order logic to compactly specify complex models. I next introduce an algorithm for inferring most-probable variable assignments (MAP inference) for HL-MRFs that is extremely scalable, much more so than commercially available software, because it uses message passing to leverage the sparse dependency structures common in inference tasks. I then show how to learn the parameters of HL-MRFs using a number of learning objectives. The learned HL-MRFs are as accurate as traditional, discrete models, but much more scalable. To enable HL-MRFs and PSL to capture even richer dependencies, I then extend learning to support latent variables, i.e., variables without training labels. To overcome the bottleneck of repeated inferences required during learning, I introduce paired-dual learning, which interleaves inference and parameter updates. Paired-dual learning learns accurate models and is also scalable, often completing before traditional methods make even one parameter update. Together, these algorithms enable HL-MRFs and PSL to model rich, structured data at scales not previously possible.Item ANALYSIS OF A SEMI-SUPERVISED LEARNING APPROACH TO INTRUSION DETECTION(2014) Klimkowski, Benjamin Harold; Cukier, Michel; Computer Science; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)This thesis addresses the use of a semi-supervised learning (SSL) method in an intrusion detection setting. Specifically, this thesis illustrates the potential benefits and difficulties of using a cluster-then-label (CTL) SSL approach to classify stealth scanning in network flow metadata. A series of controlled tests were performed to show that, in certain situations, a CTL SSL approach could perform comparable to a supervised learner with a fraction of the development effort. This study also balances these findings with pragmatic issues like labeling, noise and feature encoding. While CTL demonstrated accuracy, research is still needed before practical implementations are a reality. The contributions of this work are 1) one of the first studies in the application of SSL in intrusion detection, illustrating the challenges of applying a CTL approach to domain with imbalanced class distributions; 2) the creation of a new intrusion detection dataset; 3) validation of previously established techniques