Computer Science Theses and Dissertations

Permanent URI for this collectionhttp://hdl.handle.net/1903/2756

Browse

Search Results

Now showing 1 - 6 of 6

Egocentric Vision in Assistive Technologies For and By the Blind
(2022) Lee, Kyungjun; Kacorri, Hernisa; Computer Science; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)
Visual information in our surroundings, such as everyday objects and passersby, is often inaccessible to people who are blind. Cameras that leverage egocentric vision, in an attempt to approximate the visual field of the camera wearer, hold great promise for making the visual world more accessible for this population. Typically, such applications rely on pre-trained computer vision models and thus are limited. Moreover, as with any AI system that augments sensory abilities, conversations around ethical implications and privacy concerns lie at the core of their design and regulation. However, early efforts tend to decouple perspectives, considering only either those of the blind users or potential bystanders. In this dissertation, we revisit egocentric vision for the blind. Through a holistic approach, we examine the following dimensions: type of application (objects and passersby), camera form factor (handheld and wearable), user’s role (a passive consumer and an active director of technology), and privacy concerns (from both end-users and bystanders). Specifically, we propose to design egocentric vision models that capture blind users’ intent and are fine-tuned by the user in the context of object recognition. We seek to explore societal issues that AI-powered cameras may lead to, considering perspectives from both blind users and nearby people whose faces or objects might be captured by the cameras. Last, we investigate interactions and perceptions across different camera form factors to reveal design implications for future work.
Transfer Learning in Natural Language Processing through Interactive Feedback
(2022) Yuan, Michelle; Boyd-Graber, Jordan; Computer Science; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)
Machine learning models cannot easily adapt to new domains and applications. This drawback becomes detrimental for natural language processing (NLP) because language is perpetually changing. Across disciplines and languages, there are noticeable differences in content, grammar, and vocabulary. To overcome these shifts, recent NLP breakthroughs focus on transfer learning. Through clever optimization and engineering, a model can successfully adapt to a new domain or task. However, these modifications are still computationally inefficient or resource-intensive. Compared to machines, humans are more capable at generalizing knowledge across different situations, especially in low-resource ones. Therefore, the research on transfer learning should carefully consider how the user interacts with the model. The goal of this dissertation is to investigate “human-in-the-loop” approaches for transfer learning in NLP. First, we design annotation frameworks for inductive transfer learning, which is the transfer of models across tasks. We create an interactive topic modeling system for users to find topics useful for classifying documents in multiple languages. The user-constructed topic model bridges improves classification accuracy and bridges cross-lingual gaps in knowledge. Next, we look at popular language models, like BERT, that can be applied to various tasks. While these models are useful, they still require a large amount of labeled data to learn a new task. To reduce labeling, we develop an active learning strategy which samples documents that surprise the language model. Users only need to annotate a small subset of these unexpected documents to adapt the language model for text classification. Then, we transition to user interaction in transductive transfer learning, which is the transfer of models across domains. We focus our efforts on low-resource languages to develop an interactive system for word embeddings. In this approach, the feedback from bilingual speakers refines the cross-lingual embedding space for classification tasks. Subsequently, we look at domain shift for tasks beyond text classification. Coreference resolution is fundamental for NLP applications, like question-answering and dialogue, but the models are typically trained and evaluated on one dataset. We use active learning to find spans of text in the new domain for users to label. Furthermore, we provide important insights on annotating spans for domain adaptation. Finally, we summarize the contributions of each chapter. We focus on aspects like the scope of applications and model complexity. We conclude with a discussion of future directions. Researchers may extend the ideas in our thesis to topics like user-centric active learning and proactive learning.
Time-Situated Metacognitive Agency and Other Aspects of Commonsense Reasoning
(2022) Goldberg, Matthew David; Perlis, Donald; Computer Science; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)
Much research in commonsense reasoning (CSR) involves use of external representations of an agent's reasoning, based on compelling features of classical logic. However, these advantages come with severe costs, including: omniscience, consistency, static semantics, frozen deadlines, lack of self-knowledge, and lack of expressive power to represent the reasoning of others. Active logic was developed to address many of these, but work to date still leaves serious gaps. The present work focuses on major extensions of active logic to deal with self-knowledge, and their implementation into a newly-developed automated reasoner for commonsense active logic. Dealing with self-knowledge has been designed and implemented in the reasoner via a new treatment of quotation as a form of nesting. More sophisticated varieties of nesting, particularly quasi-quotation mechanisms, have also been developed to extend the basic form of quotation. Active logic and the reasoner are applied to classical issues in CSR, including a treatment of one agent having the knowledge and inferential mechanisms to reason about another's time-situated reasoning.
Evaluating Machine Intelligence with Question Answering
(2021) rodriguez, pedro; Boyd-Graber, Jordan; Computer Science; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)
Humans ask questions to learn about the world and to test knowledge understanding. The ability to ask questions combines aspects of intelligence unique to humans: language understanding, knowledge representation, and reasoning. Thus, building systems capable of intelligent question answering (QA) is a grand goal of natural language processing (NLP). To measure progress in NLP, we create "exams" for computer systems and compare their effectiveness against a reference point---often based on humans. How precisely we measure progress depends on whether we are building computer systems that optimize human satisfaction in information-seeking tasks or that measure progress towards intelligent QA. In the first part of this dissertation, we explore each goal in turn, how they differ, and describe their relationship to QA formats. As an example of an information-seeking evaluation, we introduce a new dialog QA task paired with a new evaluation method. Afterward, we turn our attention to using QA to evaluate machine intelligence. A good evaluation should be able to discriminate between lesser and more capable QA models. This dissertation explores three ways to improve the discriminative power of QA evaluations: (1) dynamic weighting of test questions, (2) a format that by construction tests multiple levels of knowledge, and (3) evaluation data that is created through human-computer collaboration. By dynamically weighting test questions, we challenge a foundational assumption of the de facto standard in QA evaluation---the leaderboard. Namely, we contend that contrary to nearly all QA and NLP evaluations which implicitly assign equal weights to examples by averaging scores, that examples are not equally useful for estimating machine (or human) QA ability. As any student may tell you, not all questions on an exam are equally difficult and in the worst-case questions are unsolvable. Drawing on decades of research in educational testing, we propose adopting an alternative evaluation methodology---Item Response Theory---that is widely used to score human exams (e.g., the SAT). By dynamically weighting questions, we show that this improves the reliability of leaderboards in discriminating between models of differing QA ability while also being helpful in the construction of new evaluation datasets. Having improved the scoring of models, we next turn to improving the format and data in QA evaluations. Our idea is simple. In most QA tasks (e.g., Jeopardy!), each question tests a single level of knowledge; in our task (the trivia game Quizbowl), we test multiple levels of knowledge with each question. Since each question tests multiple levels of knowledge, this decreases the likelihood that we learn nothing about the difference between two models (i.e., they are both correct or both wrong), which substantially increases discriminative power. Despite the improved format, we next show that while our QA models defeat accomplished trivia players, that they are overly reliant on brittle pattern matching, which indicates a failure to intelligently answer questions. To mitigate this problem, we introduce a new framework for building evaluation data where humans and machines cooperatively craft trivia questions that are difficult to answer through clever pattern matching tricks alone---while being no harder for humans. We conclude by sketching a broader vision for QA evaluation that combines the three components of evaluation we improve---scoring, format, and data---to create living evaluations and re-imagine the role of leaderboards.
Discourse-Level Language Understanding with Deep Learning
(2017) Iyyer, Mohit Nagaraja; Boyd-Graber, Jordan; Daumé, Hal; Computer Science; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)
Designing computational models that can understand language at a human level is a foundational goal in the field of natural language processing (NLP). Given a sentence, machines are capable of translating it into many different languages, generating a corresponding syntactic parse tree, marking words that refer to people or places, and much more. These tasks are solved by statistical machine learning algorithms, which leverage patterns in large datasets to build predictive models. Many recent advances in NLP are due to deep learning models (parameterized as neural networks), which bypass user-specified features in favor of building representations of language directly from the text. Despite many deep learning-fueled advances at the word and sentence level, however, computers still struggle to understand high-level discourse structure in language, or the way in which authors combine and order different units of text (e.g., sentences, paragraphs, chapters) to express a coherent message or narrative. Part of the reason is data-related, as there are no existing datasets for many contextual language-based problems, and some tasks are too complex to be framed as supervised learning problems; for the latter type, we must either resort to unsupervised learning or devise training objectives that simulate the supervised setting. Another reason is architectural: neural networks designed for sentence-level tasks require additional functionality, interpretability, and efficiency to operate at the discourse level. In this thesis, I design deep learning architectures for three NLP tasks that require integrating information across high-level linguistic context: question answering, fictional relationship understanding, and comic book narrative modeling. While these tasks are very different from each other on the surface, I show that similar neural network modules can be used in each case to form contextual representations.
Model-Predictive Strategy Generation for Multi-Agent Pursuit-Evasion Games
(2015) Raboin, Eric James; Nau, Dana S; Computer Science; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)
Multi-agent pursuit-evasion games can be used to model a variety of different real world problems including surveillance, search-and-rescue, and defense-related scenarios. However, many pursuit-evasion problems are computationally difficult, which can be problematic for domains with complex geometry or large numbers of agents. To compound matters further, practical applications often require planning methods to operate under high levels of uncertainty or meet strict running-time requirements. These challenges strongly suggest that heuristic methods are needed to address pursuit-evasion problems in the real world. In this dissertation I present heuristic planning techniques for three related problem domains: visibility-based pursuit-evasion, target following with differential motion constraints, and distributed asset guarding with unmanned sea-surface vehicles. For these domains, I demonstrate that heuristic techniques based on problem relaxation and model-predictive simulation can be used to efficiently perform low-level control action selection, motion goal selection, and high-level task allocation. In particular, I introduce a polynomial-time algorithm for control action selection in visibility-based pursuit-evasion games, where a team of pursuers must minimize uncertainty about the location of an evader. The algorithm uses problem relaxation to estimate future states of the game. I also show how to incorporate a probabilistic opponent model learned from interaction traces of prior games into the algorithm. I verify experimentally that by performing Monte Carlo sampling over the learned model to estimate the location of the evader, the algorithm performs better than existing planning approaches based on worst-case analysis. Next, I introduce an algorithm for motion goal selection in pursuit-evasion scenarios with unmanned boats. I show how a probabilistic model accounting for differential motion constraints can be used to project the future positions of the target boat. Motion goals for the pursuer boat can then be selected based on those projections. I verify experimentally that motion goals selected with this technique are better optimized for travel time and proximity to the target boat when compared to motion goals selected based on the current position of the target boat. Finally, I introduce a task-allocation technique for a team of unmanned sea-surface vehicles (USVs) responsible for guarding a high-valued asset. The team of USVs must intercept and block a set of hostile intruder boats before they reach the asset. The algorithm uses model-predictive simulation to estimate the value of high-level task assignments, which are then realized by a set of learned low-level behaviors. I show experimentally that using model-predictive simulations based on Monte-Carlo sampling is more effective than hand-coded evaluation heuristics.

Computer Science Theses and Dissertations

Browse

Filters

Settings

Sort By

Results per page

Search Results