Theses and Dissertations from UMD
Permanent URI for this communityhttp://hdl.handle.net/1903/2
New submissions to the thesis/dissertation collections are added automatically as they are received from the Graduate School. Currently, the Graduate School deposits all theses and dissertations from a given semester after the official graduation date. This means that there may be up to a 4 month delay in the appearance of a give thesis/dissertation in DRUM
More information is available at Theses and Dissertations at University of Maryland Libraries.
Browse
2 results
Search Results
Item Feedback for Vision(2024) Maynord, Michael; Aloimonos, Yiannis; Fermüller, Cornelia; Computer Science; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)Feedback plays a prominent role in biological vision, where perception is modulated based on agents' evolving expectations and world model. This is the case both in visually understanding the static structure of the world, as well as in modeling the dynamic structure of action. In this thesis we present first an approach to incorporating controlled feedback into image understanding, second an adaptation of this approach to action understanding, and lastly a notion of feedback in video monitoring. First, we introduce a novel mechanism which modulates perception based on high level categorical expectations: Mid-Vision Feedback (MVF). MVF associates high level contexts with linear transformations. When a context is "expected" its associated linear transformation is applied over feature vectors in a mid level of a network. The result is that mid-level network representations are biased towards conformance with high level expectations, improving overall accuracy and contextual consistency. Additionally, during training, mid-level feature vectors are biased through introduction of a loss term which increases the distance between feature vectors associated with different contexts. MVF is agnostic as to the source of contextual expectations, and can serve as a mechanism for top down integration of symbolic systems with deep vision architectures. We demonstrate the utility of MVF for object classification across three popular datasets and multiple architectures, including both Convolutional Neural Network architectures and a Transformer architecture. We adapt MVF for action understanding with Sub-Action Modulation (SAM) for Video Networks. When humans interpret action they bring high level expectations of the context in which those actions are being performed. Along this thinking, we develop an approach to incorporating context into action understanding. Video segments are classified uniquely into a small set of action primitives (called Therbligs), which are grouped hierarchically into "Meta-Therbligs" as a context representation. SAM is an approach to first modeling Meta-Therbligs, and then incorporating expectation of Meta-Therbligs into mid-level processes through feedback. This allows the modulation of mid-level features in accordance with a temporally compositional representation of context. We show the superior performance of MVF to post-hoc filtering for incorporation of contextual knowledge, and show superior performance of configurations using predicted context (when no context is known a priori) over configurations with no context awareness. We demonstrate the utility of SAM over four popular video understanding architectures - I3D, MoViNet, TimeSFormer, and ViViT. Experiments over EPIC Kitchens and 50 Salads on the tasks of action recognition \& anticipation demonstrate SAM produces superior accuracies across all models, tasks, and datasets with minimal architectural alterations. Lastly, we consider a notion of “feedback” where high level expectations, or specifications, are provided by human operators, allowing integration of humans into the perceptual loop . This is important for interfacing with humans, as perceptual tasks which are conventionally left entirely to human labor are increasingly (yet, thus, imperfectly) automated. We consider the task of surveillance. Security watchstanders who monitor multiple videos over long periods of time can be susceptible to information overload and fatigue. To address this, we present a configurable perception pipeline architecture, called the {\it Image Surveillance Assistant} (ISA), for assisting watchstanders with video surveillance tasks. We also present ISA$_1$, an initial implementation that can be configured with a set of {\em context specifications} which watchstanders can select or provide to indicate what imagery should generate notifications. ISA$_1$'s inputs include (1) an image and (2) context specifications, which contain English sentences and a decision boundary defined over object detection vectors. ISA$_1$ assesses the match of the image with the contexts by comparing (1) detected versus specified objects and (2) automatically-generated versus specified captions. Finally, we present a study to assess the utility of using captions in ISA$_1$, and found that they substantially improve the performance of image context detection. Finally, notions of context and the contrast used to separate context for better manipulation in the above feedback work can be of benefit not only to feedback architectures, but within feed-forward architectures as well. We apply this intuition to the task of action understanding in video, where input is separated into motion and ``context''. Motivated by Goldman's Theory of Human Action - a framework in which action decomposes into 1) base physical movements, and 2) the context in which they occur - we propose a novel learning formulation for motion and context, where context is derived as the complement to motion. More specifically, we model physical movement through the adoption of Therbligs, a set of elemental physical motions centered around object manipulation. Context is modeled through the use of a contrastive mutual information loss that formulates context information as the action information not contained within movement information. We empirically prove the utility brought by this separation of representation, showing sizable improvements in action recognition and action anticipation accuracies for a variety of models. We present results over two object manipulation datasets: EPIC Kitchens 100, and 50 Salads.Item Feedback-Directed Model-Based GUI Test Case Generation(2008-08-15) Yuan, Xun; Memon, Atif M; Computer Science; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)Most of today's software users interact with the software through a graphical user interfac (GUI), which is a representative of the broader class of event-driven software (EDS). As the correctness of the GUI is necessary to ensure the correctness of the overall software, its quality assurance (QA) is becoming increasingly important. During software testing, an important QA technique, test cases are created and executed on the software. For GUIs, test cases are modeled as sequences of user input events. Because each possible sequence of user events may potentially be a test case and because today's GUIs offer enormous flexibility to end users, in principle, GUI testing requires a prohibitively large number of test cases. Any practical test case generation technique must sample the vast GUI input space. Existing techniques are either extremely resource intensive or do not adequately model complex GUI behaviors, thereby limiting fault detection. This research develops new models, algorithms, and metrics for automated GUI test case generation. A novel aspect of this work is its use of software runtime information collected as feedback during GUI test case execution, and used to generate additional test cases that model complex GUI behaviors. One set of empirical studies show that the feedback directed technique significantly improves upon existing techniques and helps to identify serious problems in fielded GUIs. Another set of studies conducted on in-house software applications show that the test suites generated by the new technique outperform their coverage equivalent counterparts in terms of fault detection. Although the focus of this work is on the GUI domain, the techniques developed are general and are applicable to the broader class of EDS. In fact, this work has already had an impact on research and practice of testing other EDS. In particular, the work has been extended by other researchers to test web applications.