Computer Science

Permanent URI for this communityhttp://hdl.handle.net/1903/2224

Browse

Search Results

Now showing 1 - 10 of 20

Object-Attribute Compositionality for Visual Understanding
(2024) Saini, Nirat; Shrivastava, Abhinav Dr; Computer Science; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)
Object appearances evolve overtime, which results in visually discernible changes in their colors, shapes, sizes and materials. Humans are innately good at recognizing and understanding the evolution of object states, which is also crucial for visual understanding across images and videos. However, current vision models still struggle to capture and account for these subtle changes to recognize the objects and underlying action causing the changes. This thesis focuses on using compositional learning for recognition and generation of attribute-object pairs. In the first part, we propose to disentangle visual features for object and attributes, to generalize recognition for novel object-attribute pairs. Next, we extend this approach to learn entirely unseen attribute-object pairs, by using semantic language priors, label smoothing and propagation techniques. Further, we use object states for action recognition in videos where subtle changes in object attributes and affordances help in identifying state-modifying and context-transforming actions. All of these methods for decomposing and composing objects and states generalize to unseen pairs and out-of-domain datasets for various compositional zero-shot learning and action recognition tasks. In the second part, we propose a new benchmark suite Chop \& Learn for a novel task of Compositional Image Generation as well as discuss the implications of these approaches for other compositional tasks in images, videos, and beyond. We further extend insertion and editing of attributes of objects consistently across frames of videos, using off-the-shelf training free architecture and discuss the future challenges and opportunities of compositionality for visual understanding.
Machine Learning with Differentiable Physics Priors
(2024) Qiao, Yiling; Lin, Ming ML; Computer Science; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)
Differentiable physics priors enable gradient-based learning systems to adhere to physical dynamics. By making physics simulations differentiable, we can backpropagate through the physical consequences of actions. This pipeline allows agents to quickly learn to achieve desired effects in the physical world and is an effective technique for solving inverse problems in physical or dynamical systems. This new programming paradigm bridges model-based and data-driven methods, mitigating data scarcity and model bias simultaneously. My research focuses on developing scalable, powerful, and efficient differentiable physics simulators. We have created state-of-the-art differentiable physics for rigid bodies, cloth, fluids, articulated bodies, and deformable solids, achieving performance orders of magnitude better than existing alternatives. These differentiable simulators are applied to solve inverse problems, train control policies, and enhance reinforcement learning algorithms.
Efficient Optimization Algorithms for Nonconvex Machine Learning Problems
(2024) Xian, Wenhan; Huang, Heng HH; Computer Science; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)
In recent years, the success of the AI revolution has led to the training of larger neural networks on vast amounts of data to achieve superior performance. These powerful machine learning models have enabled the creation of remarkable AI products. Optimization, as the core of machine learning, becomes especially crucial because most machine learning problems can ultimately be formulated as optimization problems, which require minimizing a loss function with respect to model parameters based on training samples. To enhance the efficiency of optimization algorithms, distributed learning has emerged as a popular solution for addressing large-scale machine learning tasks. In distributed learning, multiple worker nodes collaborate to train a global model. However, a key challenge in distributed learning is the communication cost. This thesis introduces a novel adaptive gradient algorithm with gradient sparsification to address this issue. Another significant challenge in distributed learning is the communication overhead on the central parameter server. To mitigate this bottleneck, decentralized distributed (serverless) learning has been proposed, where each worker node only needs to communicate with its neighbors. This thesis investigates core nonconvex optimization problems in decentralized settings, including constrained optimization, minimax optimization, and second-order optimality. Efficient optimization algorithms are proposed to solve these problems. Additionally, the convergence analysis of minimax optimization under the generalized smooth condition is explored. A generalized algorithm is proposed, which can be applied to a broader range of applications.
Understanding and Enhancing Machine Learning Models with Theoretical Foundations
(2024) Hu, Zhengmian; Huang, Heng HH; Computer Science; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)
Machine learning has become a key driver of many contemporary technological advancements. With its empirical success, there is an urgent need for theoretical research to explain and complement these practical achievements. This includes understanding the empirical success of machine learning, especially deep learning, and aiding the design of better algorithms in terms of performance, efficiency, and security. This dissertation aims to advance the understanding and practical development of machine learning through three interrelated research directions, while emphasizing reliable theoretical guarantees throughout the process. In the first part, we study the deep learning theory under overparameterization conditions. The core objects of study are the Conjugate Kernel and Neural Tangent Kernel, which have deep connections to the training dynamics of deep learning. Based on the analysis of these kernels, we prove several new concentration results characterizing the trainability and generalization of infinitely wide neural networks. In the second part, we focus on training algorithms. On one hand, we propose new algorithms to improve learning efficiency. This includes a new underdamped Langevin MCMC method called ALUM, for which we prove its complexity reaches the theoretical lower bound. On the other hand, we propose new theoretical tools to analyze existing algorithms and obtain tighter convergence results. For Proxskip, our analysis shows it can still achieve an improvement in communication complexity from sublinear to linear convergence under stochastic oracle. We also generalize the concept of Lipschitz smoothness for tighter non-convex optimization analysis. In the third part, we develop new Monte Carlo methods to large language models (LLMs) to improve their efficiency and security. We develop unbiased watermarking techniques to protect model outputs and propose an Accelerated Speculative Sampling method for faster inference. We also investigate the trade-off between watermark strength and inference sampling efficiency, pointing out the conflict between the two.
Optimal Point-Spread-Function Engineering with Dynamic Optics and Event Cameras
(2024) Shah, Sachin; Metzler, Christopher A; Computer Science; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)
Computational imaging systems co-design optics and algorithms to observe phenomena beyond the reach of traditional cameras. Point-spread-function (PSF) engineering is a powerful technique wherein a custom phase mask is integrated into an optical system to encode additional information into captured images. Used in combination with deep learning, such systems now offer state-of-the-art performance at three-dimensional molecule localization, extended depth-of-field imaging, lensless imaging, and other tasks. Recent hardware breakthroughs are unlocking unprecedented ultrafast capabilities such as micro-electromechanical system based spatial light modulators will allow us to module light at kilohertz rates and neuromorphic event cameras will enable kilohertz lower-power and high-dynamic-range capture. Unfortunately, existing theories and algorithms are unable to fully harness these new capabilities. This work answers a natural question: Can one encode additional information and achieve superior performance by leveraging the ultrafast capabilities of spatial light modulators and event cameras. We first prove that the set of PSFs described by static phase masks is non-convex and that, as a result, time-averaged PSFs generated by dynamic phase masks displayed on a spatial light modulator are fundamentally more expressive. We then derive the theoretical limits on three-dimensional tracking with PSF-engineered event cameras. Using these bounds, we design new optimal phase masks and binary amplitude masks. We demonstrate the efficacy of our designs through extensive simulations and validate our method with a simple lab prototype.
Learning-based Motion Planning for High-DoF Robot Systems
(2023) Jia, Biao; Manocha, Dinesh; Computer Science; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)
A high-degree-of-freedom (DoF) robot system refers to a type of robotic system that possesses many independently controllable mechanical degrees of freedom. This includes high-DoF robots or objects being manipulated, such as flexible robotic arms and flexible objects. Degrees of freedom in robotics represent the different ways a robot can move or manipulate its parts. High-DoF robot systems have a significant number of these independent motions, allowing them to exhibit complex and versatile movements and behaviors. These systems are employed in various applications, including manufacturing and healthcare, where precise and flexible control is essential. The main difficulty associated with high-DoF robot systems is the complexity arising from their numerous degrees of freedom. Calculating the optimal trajectories or control inputs for high-DoF systems can be computationally intensive. The sheer number of variables and the need for real-time responsiveness pose significant challenges in terms of computation and control. In some cases, high-DoF robot systems interact with deformable objects such as fabrics and foam. Modeling and controlling these objects add additional layers of complexity due to their dynamic and unpredictable behavior. To address these challenges, we delve into several key areas: Object Deformation Modeling, Controller Parameterization, System Identification, Control Policy Learning, and Sim-to- Real Transfer. We begin by using cloth manipulation as an example to illustrate how to model high-DoF objects and design mapping relationships. By leveraging computer vision and visual feedback-based controllers, we enhance the ability to model and control objects with substantial shape variations, which is particularly valuable in applications involving deformable materials. Next, we shift our focus to Controller Parameterization, aiming to define control parameters for high-DoF objects. We employ a random forest-based controller along with imitation learning, resulting in more robust and efficient controllers, which are essential for high-DoF robot systems. This method can be used for human-robot collaboration involving flexible objects and enables imitation learning to converge in as few as 4-5 iterations. Furthermore, we explore how to reduce the dimensionality of both high-degree-of-freedom (high-DoF) robot systems and objects simultaneously. Our system allows for the more effective use of computationally intensive methods like reinforcement learning (RL) or trajectory optimization. Therefore, we design a system identification method to reduce the need for repeated rendering or experiments, significantly improving the efficiency of RL. This enables some algorithms with exponential computational complexity to be solved in linear time. In this part of the work, we adopt a real setup where humans and robots collaborate in real-time to manipulate flexible objects. In the second part of our research, we focus on the task of natural media painting. We utilize reinforcement learning techniques. Painting itself can be considered a high-DoF robot system, as it entails a multitude of context-dependent actions to complete the task. Our objective is to replicate a reference image using brush strokes, with the goal encoded through observations. We will focus on how to address the sparse reward distribution with a large continuous action space. Additionally, we investigate the practicality of transferring learned policies from simulated environments to real-world scenarios, with a specific focus on tasks like painting. This research bridges the gap between simulation and practical application, ensuring that the knowledge gained from our work can be effectively utilized in real-world settings. Ultimately, we will demonstrate the use of RL-learned painting strategies in both virtual and real robot environments.
Scalable Methods for Robust Machine Learning
(2023) Levine, Alexander Jacob; Feizi, Soheil; Computer Science; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)
In recent years, machine learning systems have been developed that demonstrate remarkable performance on many tasks. However, naive metrics of performance, such as the accuracy of a classifier on test samples drawn from the same distribution as the training set, can provide an overly optimistic view of the suitability of a model for real-world deployment. In this dissertation, we develop models that are robust, in addition to performing well on large-scale tasks. One notion of robustness is adversarial robustness, which characterizes the performance of models under adversarial attacks. Adversarial attacks are small, often imperceptible, distortions to the inputs of machine learning systems which are crafted to substantially change the output of the system. These attacks represent a real security threat, and are especially concerning when machine learning systems are used in safety-critical applications. To mitigate this threat, certifiably robust classification techniques have been developed. In a certifiably robust classifier, for each input sample, in addition to a classification, the classifier also produces a certificate, which is a guaranteed lower bound on the magnitude of any perturbation required to change the classification. Existing methods for certifiable robustness have significant limitations, which we address in Parts I and II of this dissertation: (i) Currently, randomized smoothing techniques are the only certification techniques that are viable for large-scale image classification (i.e. ImageNet). However, randomized smoothing techniques generally provide only high-probability, rather than exact, certificate results. To address this, we develop deterministic randomized smoothing-based algorithms, which produce exact certificates with finite computational costs. In particular, in Part I of this dissertation, we present to our knowledge the first deterministic, ImageNet-scale certification methods under the L_1, L_p (for p < 1), and "L_0" metrics. (ii) Certification results only apply to particular metrics of perturbation size. There is therefore a need to develop new techniques to provide provable robustness against different types of attacks. In Part II of this dissertation, we develop randomized smoothing-based algorithms for several new types of adversarial perturbation, including Wasserstein adversarial attacks, Patch adversarial attacks, and Data Poisoning attacks. The methods developed for Patch and Poisoning attacks are also deterministic, allowing for efficient exact certification. In Part III of this dissertation, we consider a different notion of robustness: test-time adaptability to new objectives in reinforcement learning. This is formalized as goal-conditioned reinforcement learning (GCRL), in which each episode is conditioned by a new "goal," which determines the episode's reward function. In this work, we explore a connection between off-policy GCRL and knowledge distillation, which leads us to apply Gradient-Based Attention Transfer, a knowledge distillation technique, to the Q-function update. We show, empirically and theoretically, that this can improve the performance of off-policy GCRL when the space of goals is high-dimensional.
Learning and Composing Primitives for the Visual World
(2023) Gupta, Kamal; Shrivastava, Abhinav; Davis, Larry; Computer Science; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)
Compositionality is at the core of how humans understand and create visual data. In order for the computational approaches to assist humans in creative tasks, it is crucial for them to understand and perform composition. The recent advances in deep generative models have enabled us to convert noise to highly realistic scenes. However, in order to harness these models for building real-world applications, I argue that we need to be able to represent and control the generation process with the composition of interpretable primitives. In the first half of this talk, I’ll discuss how deep models can discover such primitives from visual data. By playing a cooperative referential game between two neural network agents, we can represent images with discrete meaningful concepts without supervision. I further extend this work for applications in image and video editing by learning a dense correspondence of primitives across images. In the second half, I’ll focus on learning how to compose primitives for both 2D and 3D visual data. By expressing the scenes as an assembly of smaller parts, we can easily perform generation from scratch or from partial scenes as input. I’ll conclude the talk with a discussion of possible future directions and applications of generative models, and how we can better enable users to guide the creative process.
Optimization Problems in Quantum Machine Learning
(2023) You, Xuchen; Wu, Xiaodi; Computer Science; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)
The variational algorithm is a paradigm for designing quantum procedures implementable on noisy intermediate-scale quantum (NISQ) machines. It is viewed as a promising candidate for demonstrating practical quantum advantage. In this dissertation, we look into the optimization aspect of the variational quantum algorithms as an attempt to answer when and why a variational quantum algorithm works. We mainly focus on two instantiations of the family of variational algorithms, the Variational Quantum Eigensolvers (VQEs) and the Quantum Neural Networks (QNNs). We first established that, for almost all QNN architecture designs, there exist hard problem instances leading to an optimization landscape swarmed by spurious local minima provided that the QNN is under-parameterized. This observation rules out the possibility of a universal good QNN design achieving exponential advantage against the classical neural networks on any dataset and calls for instance-dependent designs for variational circuits. We then show that VQE training converges linearly when the number of parameters exceeds an over-parameterization threshold. By tying the threshold to instance-dependent quantities, we developed variants of VQE algorithms that allow the training and testing of shallower variational circuits, as depths are usually the implementation bottlenecks on NISQ machines. For QNNs, by looking into its convergence, we show that the dynamics of QNN training are different from the dynamics of any kernel regression, therefore ruling out the popular conjecture that over-parameterized QNNs are equivalent to certain versions of neural tangent kernels like their classical counterparts. As a practical implication, our analysis showcases the measurement design as a way to accelerate the convergence of QNNs. At the end of this dissertation, we consider the classical problem of optimization with partial information, the Multi-arm Bandits (MABs). We show that, when enhanced with quantum access to the arms, there is a quadratic speed-up against the classical algorithms, which can serve as the building block for quantum reinforcement learning algorithms.
Stronger Inductive Biases for Sample-Efficient and Controllable Neural Machine Translation
(2023) Xu, Weijia; Carpuat, Marine; Computer Science; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)
As one of the oldest applications of natural language processing, machine translation (MT) has a growing impact on human lives both as an end application and as a key component of cross-lingual information processing such as cross-lingual information retrieval and dialogue generation. Although neural machine translation (NMT) models achieve impressive performance on some language pairs, they are trained on large amounts of human translations. In addition, they are notorious for generating fluent outputs that do not faithfully reflect the meaning of the source sentence, and they make it difficult for users to control the outputs. To address these issues, this thesis contributes techniques to build more sample-efficient and controllable NMT models by incorporating stronger inductive biases that help correct undesirable biases, integrate prior knowledge, and introduce flexible ways to control the outputs in NMT. In our first line of research, we show that current NMT models are susceptible to undesirable biases that hinder sample-efficient training and lead to unfaithful translations. We further provide evidence that we can mitigate these undesirable biases by integrating stronger inductive biases through training algorithms. We start by introducing a new training objective to address the exposure bias problem — a common problem in sequence generation models that typically causes accumulated errors along the generated sequence at inference time, especially when the training data is limited. Next, we turn to a well-known but less studied problem in MT — the hallucination problem — translation outputs that are unrelated to the source text. To find spurious biases that cause hallucination errors, we first identify model symptoms that are indicative of hallucinations at inference time. And then, we show how these symptoms connect to the spurious biases at training time, where the model learns to predict the ground-truth translation while ignoring a large part of the source sentence. These findings provide a future path toward mitigating hallucinations by addressing these spurious biases. In our second line of research, we study how to integrate stronger inductive biases in NMT for effective integration of the language priors estimated from unsupervised data. We introduce a novel semi-supervised learning objective with a theoretical guarantee on its global optimum and show that it can be effectively approximated and leads to improved performance in practice. Finally, we study inductive biases in the form of NMT model architectures to allow end users to control the model outputs more easily. Controlling the outputs of standard NMT models is difficult with high computational cost at training or inference time. We develop an edit-based NMT model with novel edit operations that can incorporate users' lexical constraints with low computational cost at both training and inference time. To allow users to provide lexical constraints in more flexible morphological forms, we further introduce a modular framework for inflecting and integrating lexical constraints in NMT.

Computer Science

Browse

Filters

Settings

Sort By

Results per page

Search Results