Computer Science Theses and Dissertations

Permanent URI for this collectionhttp://hdl.handle.net/1903/2756

Browse

Search Results

Now showing 1 - 10 of 21

Efficient Models and Learning Strategies for Resource-Constrained Systems
(2024) Rabbani, Tahseen Wahed Karim; Huang, Furong; Computer Science; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)
The last decade has seen sharp improvements in the performance of machine learning (ML) models but at the cost of vastly increased complexity and size of their underlying architectures. Advances in high-performance computing have enabled researchers to train and deploy models composed of hundreds of billions of parameters. However, harvesting the full utility of large models on smaller clients, such as Internet of Things (IoT) devices, without resorting to external hosting will require a significant reduction of parameters and faster, cheaper inference. In addition to augmenting IoT, efficient models and learning paradigms can reduce energy consumption, encourage technological equity, and are well-suited for deployment in real-world applications that require fast response in low-resource settings. To address these challenges, we introduce multiple, novel strategies for (1) reducing the scale of deep neural networks and (2) faster learning. For the size problem (1), we leverage tools such as tensorization, randomized projections, and locality-sensitive hashing to train on reduced representations of large models without sacrificing performance. For learning efficiency (2), we develop algorithms for cheaper forward passes, accelerated PCA, and asynchronous gradient descent. Several of these methods are tailored for federated learning (FL), a private, distributed learning paradigm where data is decentralized among resource-constrained edge clients. We are exclusively concerned with improving efficiency during training -- our techniques do not process pre-trained models or require a device to train over an architecture in its full entirety.
ROBUSTNESS AND UNDERSTANDABILITY OF DEEP MODELS
(2022) Ghiasi, Mohammad Amin; Goldstein, Thomas; Computer Science; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)
Deep learning has made a considerable leap in the past few decades, from promising models for solving various problems to becoming state-of-the-art. However, unlike classical machine learning models, it is sometimes difficult to explain why and how deep learning models make decisions. It is also interesting that their performance can drop with small amounts of noise. In short, deep learning models are well-performing, easily corrupted, hard-to-understand models that beat human beings in many tasks. Consequently, improving these deep models requires a deep understanding. While deep learning models usually generalize well on unseen data, adding negligible amounts of noise to their input can flip their decision. This interesting phenomenon is known as "adversarial attacks." In this thesis, we study several defense methods against such adversarial attacks. More specifically, we focus on defense methods that, unlike traditional methods, use less computation or fewer training examples. We also show that despite the improvements in adversarial defenses, even provable certified defenses can be broken. Moreover, we revisit regularization to improve adversarial robustness. Over the past years, many techniques have been developed for understanding and explaining how deep neural networks make a decision. This thesis introduces a new method for studying the building blocks of neural networks' decisions. First, we introduce the Plug-In Inversion, a new method for inverting and visualizing deep neural network architectures, including Vision Transformers. Then we study the features a ViT learns to make a decision. We compare these features when the network trains on labeled data versus when it uses a language model's supervision for training, such as in CLIP. Last, we introduce feature sonification, which borrows feature visualization techniques to study models trained for speech recognition (non-vision) tasks.
Transfer Learning in Natural Language Processing through Interactive Feedback
(2022) Yuan, Michelle; Boyd-Graber, Jordan; Computer Science; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)
Machine learning models cannot easily adapt to new domains and applications. This drawback becomes detrimental for natural language processing (NLP) because language is perpetually changing. Across disciplines and languages, there are noticeable differences in content, grammar, and vocabulary. To overcome these shifts, recent NLP breakthroughs focus on transfer learning. Through clever optimization and engineering, a model can successfully adapt to a new domain or task. However, these modifications are still computationally inefficient or resource-intensive. Compared to machines, humans are more capable at generalizing knowledge across different situations, especially in low-resource ones. Therefore, the research on transfer learning should carefully consider how the user interacts with the model. The goal of this dissertation is to investigate “human-in-the-loop” approaches for transfer learning in NLP. First, we design annotation frameworks for inductive transfer learning, which is the transfer of models across tasks. We create an interactive topic modeling system for users to find topics useful for classifying documents in multiple languages. The user-constructed topic model bridges improves classification accuracy and bridges cross-lingual gaps in knowledge. Next, we look at popular language models, like BERT, that can be applied to various tasks. While these models are useful, they still require a large amount of labeled data to learn a new task. To reduce labeling, we develop an active learning strategy which samples documents that surprise the language model. Users only need to annotate a small subset of these unexpected documents to adapt the language model for text classification. Then, we transition to user interaction in transductive transfer learning, which is the transfer of models across domains. We focus our efforts on low-resource languages to develop an interactive system for word embeddings. In this approach, the feedback from bilingual speakers refines the cross-lingual embedding space for classification tasks. Subsequently, we look at domain shift for tasks beyond text classification. Coreference resolution is fundamental for NLP applications, like question-answering and dialogue, but the models are typically trained and evaluated on one dataset. We use active learning to find spans of text in the new domain for users to label. Furthermore, we provide important insights on annotating spans for domain adaptation. Finally, we summarize the contributions of each chapter. We focus on aspects like the scope of applications and model complexity. We conclude with a discussion of future directions. Researchers may extend the ideas in our thesis to topics like user-centric active learning and proactive learning.
Object Detection and Instance Segmentation for Real-world Applications
(2022) Lan, Shiyi; Davis, Larry; Computer Science; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)
The modern visual recognition system has achieved great success in the past decade. Aided by the great progress, instance localization and recognition has been significantly improved, which benefit many applications e.g. face recognition, autonomous driving, smart city etc.\ The three key factors play very important roles in the success of visual recognition, big computation, big data, and big models. Recent advances in hardware have increased the computation exponentially, which makes it feasible for training deep and large learning models on large-scale datasets. On the other hand, large-scale visual datasets e.g. ImageNet~\cite{deng2009imagenet}, COCO dataset~\cite{lin2014microsoft}, Youtube-VIS~\cite{yang2019video}, provide accurate and rich information for deep learning models. Moreover, aided by advanced design of deep neural networks~\cite{he2016deep,xie2017aggregated,liu2021swin,liu2022convnet}, the capacity of the deep models has been greatly increased. On the other hand, instance localization and recognition as the core of modern visual system has many downstream applications, e.g. autonomous driving, augmented reality, virtual reality, and smart city. Thanks to the successful advances of deep learning in the last decade, those applications have achieved such great progresses recently. In this thesis, we introduce a series of published work that improves the performance of instance localization and addresses the issues in modeling instance localization and recognition by using deep learning models. Moreover, we will introduce the future direction and some potential research projects.
Data-Driven Techniques For Vulnerability Assessments
(2021) Suciu, Octavian; Dumitras, Tudor A; Computer Science; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)
Security vulnerabilities have been puzzling researchers and practitioners for decades.As highlighted by the recent WannaCry and NotPetya ransomware campaigns, which resulted in billions of dollars of losses, weaponized exploits against vulnerabilities remain one of the main tools for cybercrime. The upward trend in the number of vulnerabilities reported annually and technical challenges in the way of remediation lead to large exposure windows for the vulnerable populations. On the other hand, due to sustained efforts in application and operating system security, few vulnerabilities are exploited in real-world attacks. Existing metrics for severity assessments err on the side of caution and overestimate the risk posed by vulnerabilities, further affecting remediation efforts that rely on prioritization. In this dissertation we show that severity assessments can be improved by taking into account public information about vulnerabilities and exploits.The disclosure of vulnerabilities is followed by artifacts such as social media discussions, write-ups and proof-of-concepts, containing technical information related to the vulnerabilities and their exploitation. These artifacts can be mined to detect active exploits or predict their development. However, we first need to understand: What features are required for different tasks? What biases are present in public data and how are data-driven systems affected? What security threats do these systems face when deployed operationally? We explore the questions by first collecting vulnerability-related posts on social media and analyzing the community and the content of their discussions.This analysis reveals that victims of attacks often share their experience online, and we leverage this finding to build an early detector of exploits active in the wild. Our detector significantly improves on the precision of existing severity metrics and can detect active exploits a median of 5 days earlier than a commercial intrusion prevention product. Next, we investigate the utility of various artifacts in predicting the development of functional exploits. We engineer features causally linked to the ease of exploitation, highlight trade-offs between timeliness and predictive utility of various artifacts, and characterize the biases that affect the ground truth for exploit prediction tasks. Using these insights, we propose a machine learning-based system that continuously collects artifacts and predicts the likelihood of exploits being developed against these vulnerabilities. We demonstrate our system's practical utility through its ability to highlight critical vulnerabilities and predict imminent exploits. Lastly, we explore the adversarial threats faced by data-driven security systems that rely on inputs of unknown provenance.We propose a framework for defining algorithmic threat models and for exploring adversaries with various degrees of knowledge and capabilities. Using this framework, we model realistic adversaries that could target our systems, design data poisoning attacks to measure their robustness, and highlight promising directions for future defenses against such attacks.
Evaluating Machine Intelligence with Question Answering
(2021) rodriguez, pedro; Boyd-Graber, Jordan; Computer Science; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)
Humans ask questions to learn about the world and to test knowledge understanding. The ability to ask questions combines aspects of intelligence unique to humans: language understanding, knowledge representation, and reasoning. Thus, building systems capable of intelligent question answering (QA) is a grand goal of natural language processing (NLP). To measure progress in NLP, we create "exams" for computer systems and compare their effectiveness against a reference point---often based on humans. How precisely we measure progress depends on whether we are building computer systems that optimize human satisfaction in information-seeking tasks or that measure progress towards intelligent QA. In the first part of this dissertation, we explore each goal in turn, how they differ, and describe their relationship to QA formats. As an example of an information-seeking evaluation, we introduce a new dialog QA task paired with a new evaluation method. Afterward, we turn our attention to using QA to evaluate machine intelligence. A good evaluation should be able to discriminate between lesser and more capable QA models. This dissertation explores three ways to improve the discriminative power of QA evaluations: (1) dynamic weighting of test questions, (2) a format that by construction tests multiple levels of knowledge, and (3) evaluation data that is created through human-computer collaboration. By dynamically weighting test questions, we challenge a foundational assumption of the de facto standard in QA evaluation---the leaderboard. Namely, we contend that contrary to nearly all QA and NLP evaluations which implicitly assign equal weights to examples by averaging scores, that examples are not equally useful for estimating machine (or human) QA ability. As any student may tell you, not all questions on an exam are equally difficult and in the worst-case questions are unsolvable. Drawing on decades of research in educational testing, we propose adopting an alternative evaluation methodology---Item Response Theory---that is widely used to score human exams (e.g., the SAT). By dynamically weighting questions, we show that this improves the reliability of leaderboards in discriminating between models of differing QA ability while also being helpful in the construction of new evaluation datasets. Having improved the scoring of models, we next turn to improving the format and data in QA evaluations. Our idea is simple. In most QA tasks (e.g., Jeopardy!), each question tests a single level of knowledge; in our task (the trivia game Quizbowl), we test multiple levels of knowledge with each question. Since each question tests multiple levels of knowledge, this decreases the likelihood that we learn nothing about the difference between two models (i.e., they are both correct or both wrong), which substantially increases discriminative power. Despite the improved format, we next show that while our QA models defeat accomplished trivia players, that they are overly reliant on brittle pattern matching, which indicates a failure to intelligently answer questions. To mitigate this problem, we introduce a new framework for building evaluation data where humans and machines cooperatively craft trivia questions that are difficult to answer through clever pattern matching tricks alone---while being no harder for humans. We conclude by sketching a broader vision for QA evaluation that combines the three components of evaluation we improve---scoring, format, and data---to create living evaluations and re-imagine the role of leaderboards.
Identifying Semantic Divergences Across Languages
(2019) Vyas, Yogarshi; Carpuat, Marine; Computer Science; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)
Cross-lingual resources such as parallel corpora and bilingual dictionaries are cornerstones of multilingual natural language processing (NLP). They have been used to study the nature of translation, train automatic machine translation systems, as well as to transfer models across languages for an array of NLP tasks. However, the majority of work in cross-lingual and multilingual NLP assumes that translations recorded in these resources are semantically equivalent. This is often not the case---words and sentences that are considered to be translations of each other frequently divergein meaning, often in systematic ways. In this thesis, we focus on such mismatches in meaning in text that we expect to be aligned across languages. We term such mismatches as cross-lingual semantic divergences. The core claim of this thesis is that translation is not always meaning preserving which leads to cross-lingual semantic divergences that affect multilingual NLP tasks. Detecting such divergences requires ways of directly characterizing differences in meaning across languages through novel cross-lingual tasks, as well as models that account for translation ambiguity and do not rely on expensive, task-specific supervision. We support this claim through three main contributions. First, we show that a large fraction of data in multilingual resources (such as parallel corpora and bilingual dictionaries) is identified as semantically divergent by human annotators. Second, we introduce cross-lingual tasks that characterize differences in word meaning across languages by identifying the semantic relation between two words. We also develop methods to predict such semantic relations, as well as a model to predict whether sentences in different languages have the same meaning. Finally, we demonstrate the impact of divergences by applying the methods developed in the previous sections to two downstream tasks. We first show that our model for identifying semantic relations between words helps in separating equivalent word translations from divergent translations in the context of bilingual dictionary induction, even when the two words are close in meaning. We also show that identifying and filtering semantic divergences in parallel data helps in training a neural machine translation system twice as fast without sacrificing quality.
Rich and Scalable Models for Text
(2019) nguyen, thang dai; Boyd-Graber, Jordan; Resnik, Philip; Computer Science; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)
Topic models have become essential tools for uncovering hidden structures in big data. However, the most popular topic model algorithm—Latent Dirichlet Allocation (LDA)— and its extensions suffer from sluggish performance on big datasets. Recently, the machine learning community has attacked this problem using spectral learning approaches such as the moment method with tensor decomposition or matrix factorization. The anchor word algorithm by Arora et al. [2013] has emerged as a more efficient approach to solve a large class of topic modeling problems. The anchor word algorithm is high-speed, and it has a provable theoretical guarantee: it will converge to a global solution given enough number of documents. In this thesis, we present a series of spectral models based on the anchor word algorithm to serve a broader class of datasets and to provide more abundant and more flexible modeling capacity. First, we improve the anchor word algorithm by incorporating various rich priors in the form of appropriate regularization terms. Our new regularized anchor word algorithms produce higher topic quality and provide flexibility to incorporate informed priors, creating the ability to discover topics more suited for external knowledge. Second, we enrich the anchor word algorithm with metadata-based word representation for labeled datasets. Our new supervised anchor word algorithm runs very fast and predicts better than supervised topic models such as Supervised LDA on three sentiment datasets. Also, sentiment anchor words, which play a vital role in generating sentiment topics, provide cues to understand sentiment datasets better than unsupervised topic models. Lastly, we examine ALTO, an active learning framework with a static topic overview, and investigate the usability of supervised topic models for active learning. We develop a new, dynamic, active learning framework that combines the concept of informativeness and representativeness of documents using dynamically updating topics from our fast supervised anchor word algorithm. Experiments using three multi-class datasets show that our new framework consistently improves classification accuracy over ALTO.
Harmonic Analysis and Machine Learning
(2018) Pekala, Michael; Czaja, Wojciech; Levy, Doron; Applied Mathematics and Scientific Computation; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)
This dissertation considers data representations that lie at the interesection of harmonic analysis and neural networks. The unifying theme of this work is the goal for robust and reliable machine learning. Our specific contributions include a new variant of scattering transforms based on a Haar-type directional wavelet, a new study of deep neural network instability in the context of remote sensing problems, and new empirical studies of biomedical applications of neural networks.
Feature extraction in image processing and deep learning
(2018) Li, Yiran; Czaja, Wojciech; Applied Mathematics and Scientific Computation; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)
This thesis develops theoretical analysis of the approximation properties of neural networks, and algorithms to extract useful features of images in fields of deep learning, quantum energy regression and cancer image analysis. The separate applications are connected by using representation systems in harmonic analysis; we focus on deriving proper representations of data using Gabor transform in this thesis. A novel neural network with proven approximation properties dependent on its size is developed using Gabor system. In quantum energy regression, invariant representation of chemical molecules using electron densities is obtained based on the Gabor transform. Additionally, we dig into pooling functions, the feature extractor in deep neural networks, and develop a novel pooling strategy originated from the maximal function with stability property and stable performance. Anisotropic representation of data using the Shearlet transform is also explored in its ability to detect regions of interests of nuclei in cancer images.

Computer Science Theses and Dissertations

Browse

Filters

Settings

Sort By

Results per page

Search Results