Theses and Dissertations from UMD
Permanent URI for this communityhttp://hdl.handle.net/1903/2
New submissions to the thesis/dissertation collections are added automatically as they are received from the Graduate School. Currently, the Graduate School deposits all theses and dissertations from a given semester after the official graduation date. This means that there may be up to a 4 month delay in the appearance of a give thesis/dissertation in DRUM
More information is available at Theses and Dissertations at University of Maryland Libraries.
Browse
9 results
Search Results
Item Egocentric Vision in Assistive Technologies For and By the Blind(2022) Lee, Kyungjun; Kacorri, Hernisa; Computer Science; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)Visual information in our surroundings, such as everyday objects and passersby, is often inaccessible to people who are blind. Cameras that leverage egocentric vision, in an attempt to approximate the visual field of the camera wearer, hold great promise for making the visual world more accessible for this population. Typically, such applications rely on pre-trained computer vision models and thus are limited. Moreover, as with any AI system that augments sensory abilities, conversations around ethical implications and privacy concerns lie at the core of their design and regulation. However, early efforts tend to decouple perspectives, considering only either those of the blind users or potential bystanders. In this dissertation, we revisit egocentric vision for the blind. Through a holistic approach, we examine the following dimensions: type of application (objects and passersby), camera form factor (handheld and wearable), user’s role (a passive consumer and an active director of technology), and privacy concerns (from both end-users and bystanders). Specifically, we propose to design egocentric vision models that capture blind users’ intent and are fine-tuned by the user in the context of object recognition. We seek to explore societal issues that AI-powered cameras may lead to, considering perspectives from both blind users and nearby people whose faces or objects might be captured by the cameras. Last, we investigate interactions and perceptions across different camera form factors to reveal design implications for future work.Item Transfer Learning in Natural Language Processing through Interactive Feedback(2022) Yuan, Michelle; Boyd-Graber, Jordan; Computer Science; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)Machine learning models cannot easily adapt to new domains and applications. This drawback becomes detrimental for natural language processing (NLP) because language is perpetually changing. Across disciplines and languages, there are noticeable differences in content, grammar, and vocabulary. To overcome these shifts, recent NLP breakthroughs focus on transfer learning. Through clever optimization and engineering, a model can successfully adapt to a new domain or task. However, these modifications are still computationally inefficient or resource-intensive. Compared to machines, humans are more capable at generalizing knowledge across different situations, especially in low-resource ones. Therefore, the research on transfer learning should carefully consider how the user interacts with the model. The goal of this dissertation is to investigate “human-in-the-loop” approaches for transfer learning in NLP. First, we design annotation frameworks for inductive transfer learning, which is the transfer of models across tasks. We create an interactive topic modeling system for users to find topics useful for classifying documents in multiple languages. The user-constructed topic model bridges improves classification accuracy and bridges cross-lingual gaps in knowledge. Next, we look at popular language models, like BERT, that can be applied to various tasks. While these models are useful, they still require a large amount of labeled data to learn a new task. To reduce labeling, we develop an active learning strategy which samples documents that surprise the language model. Users only need to annotate a small subset of these unexpected documents to adapt the language model for text classification. Then, we transition to user interaction in transductive transfer learning, which is the transfer of models across domains. We focus our efforts on low-resource languages to develop an interactive system for word embeddings. In this approach, the feedback from bilingual speakers refines the cross-lingual embedding space for classification tasks. Subsequently, we look at domain shift for tasks beyond text classification. Coreference resolution is fundamental for NLP applications, like question-answering and dialogue, but the models are typically trained and evaluated on one dataset. We use active learning to find spans of text in the new domain for users to label. Furthermore, we provide important insights on annotating spans for domain adaptation. Finally, we summarize the contributions of each chapter. We focus on aspects like the scope of applications and model complexity. We conclude with a discussion of future directions. Researchers may extend the ideas in our thesis to topics like user-centric active learning and proactive learning.Item Time-Situated Metacognitive Agency and Other Aspects of Commonsense Reasoning(2022) Goldberg, Matthew David; Perlis, Donald; Computer Science; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)Much research in commonsense reasoning (CSR) involves use of external representations of an agent's reasoning, based on compelling features of classical logic. However, these advantages come with severe costs, including: omniscience, consistency, static semantics, frozen deadlines, lack of self-knowledge, and lack of expressive power to represent the reasoning of others. Active logic was developed to address many of these, but work to date still leaves serious gaps. The present work focuses on major extensions of active logic to deal with self-knowledge, and their implementation into a newly-developed automated reasoner for commonsense active logic. Dealing with self-knowledge has been designed and implemented in the reasoner via a new treatment of quotation as a form of nesting. More sophisticated varieties of nesting, particularly quasi-quotation mechanisms, have also been developed to extend the basic form of quotation. Active logic and the reasoner are applied to classical issues in CSR, including a treatment of one agent having the knowledge and inferential mechanisms to reason about another's time-situated reasoning.Item CODE ME A GOOD REASON: JOSEPH WEIZENBAUM AND A RHETORIC OF ETHICAL AI(2021) Yang, Misti Hewatt; Pfister, Damien S; Communication; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)Joseph Weizenbaum was a Massachusetts Institute of Technology professor often credited with creating the first chatbot, or automated computer conversationalist, in 1966. He named it ELIZA. Ten years later, however, he wrote Computer Power and Human Reason: From Judgment to Calculation, a book questioning the ethics of natural language processing, AI, and instrumental reason. This dissertation presents Weizenbaum as an early 20th century rhetorical theorist of computation. With an understanding of rhetoric as the material means for generating good reasons for living together, I articulate how Weizenbaum’s rhetorical interventions around the early development of computational culture can inform the ethics of engineering broadly and the development of AI specifically. The first chapter provides an overview of my historical and theoretical framework. The second chapter starts with Weizenbaum’s childhood and ends with the release of ELIZA. The third chapter chronicles his growing disillusionment with computers in society in the context of the Vietnam War. The final two chapters are dedicated to the book and reactions from a prominent figure in the history of AI, John McCarthy. Informed by Weizenbaum, I recuperate rhetoric as a practice of reason composed of technē that requires phronêsis in order to be realized in its full ethical potential. I argue that recognizing the practice of rhetoric inherent in engineering and ethics can better equip engineers and the public to manage scientific and technological uncertainty with the care necessary for a humane future.Item Evaluating Machine Intelligence with Question Answering(2021) rodriguez, pedro; Boyd-Graber, Jordan; Computer Science; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)Humans ask questions to learn about the world and to test knowledge understanding. The ability to ask questions combines aspects of intelligence unique to humans: language understanding, knowledge representation, and reasoning. Thus, building systems capable of intelligent question answering (QA) is a grand goal of natural language processing (NLP). To measure progress in NLP, we create "exams" for computer systems and compare their effectiveness against a reference point---often based on humans. How precisely we measure progress depends on whether we are building computer systems that optimize human satisfaction in information-seeking tasks or that measure progress towards intelligent QA. In the first part of this dissertation, we explore each goal in turn, how they differ, and describe their relationship to QA formats. As an example of an information-seeking evaluation, we introduce a new dialog QA task paired with a new evaluation method. Afterward, we turn our attention to using QA to evaluate machine intelligence. A good evaluation should be able to discriminate between lesser and more capable QA models. This dissertation explores three ways to improve the discriminative power of QA evaluations: (1) dynamic weighting of test questions, (2) a format that by construction tests multiple levels of knowledge, and (3) evaluation data that is created through human-computer collaboration. By dynamically weighting test questions, we challenge a foundational assumption of the de facto standard in QA evaluation---the leaderboard. Namely, we contend that contrary to nearly all QA and NLP evaluations which implicitly assign equal weights to examples by averaging scores, that examples are not equally useful for estimating machine (or human) QA ability. As any student may tell you, not all questions on an exam are equally difficult and in the worst-case questions are unsolvable. Drawing on decades of research in educational testing, we propose adopting an alternative evaluation methodology---Item Response Theory---that is widely used to score human exams (e.g., the SAT). By dynamically weighting questions, we show that this improves the reliability of leaderboards in discriminating between models of differing QA ability while also being helpful in the construction of new evaluation datasets. Having improved the scoring of models, we next turn to improving the format and data in QA evaluations. Our idea is simple. In most QA tasks (e.g., Jeopardy!), each question tests a single level of knowledge; in our task (the trivia game Quizbowl), we test multiple levels of knowledge with each question. Since each question tests multiple levels of knowledge, this decreases the likelihood that we learn nothing about the difference between two models (i.e., they are both correct or both wrong), which substantially increases discriminative power. Despite the improved format, we next show that while our QA models defeat accomplished trivia players, that they are overly reliant on brittle pattern matching, which indicates a failure to intelligently answer questions. To mitigate this problem, we introduce a new framework for building evaluation data where humans and machines cooperatively craft trivia questions that are difficult to answer through clever pattern matching tricks alone---while being no harder for humans. We conclude by sketching a broader vision for QA evaluation that combines the three components of evaluation we improve---scoring, format, and data---to create living evaluations and re-imagine the role of leaderboards.Item Security Enhancement and Bias Mitigation for Emerging Sensing and Learning Systems(2021) Chen, Mingliang; Wu, Min; Electrical Engineering; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)Artificial intelligence (AI) is being used across various practical tasks in recent years, facilitating many aspects of our daily life. With AI-based sensing and learning systems, we can enjoy the services of automated decision making, computer-assisted medical diagnosis, and health monitoring. Since these algorithms have entered human society and are influencing our daily life, such important issues as intellectual property protection, access control, privacy protection, and fairness/equity, should be considered when we are developing the algorithms, in addition to their successful performance. In this dissertation, we improve the design of emerging AI-based sensing and learning systems from security and fairness perspectives. The first part is the security protection of deep neural networks (DNN). DNNs are becoming an emerging form of intellectual property for model owners and should be protected from unauthorized access and piracy to encourage healthy business investment and competition. Taking advantage of DNN's intrinsic mechanism, we propose a novel framework to provide access control to the trained DNNs so that only authorized users can utilize them properly to prevent piracy and illicit usage. The second part is privacy protection in facial videos. Remote Photoplethysmography (rPPG) can be used to collect a person's physiological signal when his/her face is captured by a video camera, which may raise privacy issues from two aspects. First, individual health conditions may be revealed from a facial recording unintentionally by a person without his/her explicit consent from a facial recording. To avoid the physiological privacy issue, we develop \textit{PulseEdit}, a novel and efficient algorithm that can edit the physiological signals in facial videos without affecting visual appearance to protect the person's physiological signal from disclosure.On the other hand, R\&D of rPPG technology also has a potential leakage of identity privacy. We usually require public benchmark facial datasets to develop rPPG algorithms, but facial videos are often very sensitive and have a high leakage risk in identity privacy. We develop an anonymization transform that removes sensitive visual information identifying an individual, but in the meantime, preserves the physiological information for rPPG analysis. In the last part, we investigate fairness in machine learning inference. Various fairness definitions in prior art were proposed to ensure that decisions guided by the machine learning models are equitable. Unfortunately, the ``fair'' model trained with these fairness definitions is sensitive to threshold, i.e., the condition of fairness will no longer hold when tuning the decision threshold. To this end, we introduce the notion of threshold-invariant fairness, which enforces equitable performances across different groups independent of the decision threshold.Item Discourse-Level Language Understanding with Deep Learning(2017) Iyyer, Mohit Nagaraja; Boyd-Graber, Jordan; Daumé, Hal; Computer Science; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)Designing computational models that can understand language at a human level is a foundational goal in the field of natural language processing (NLP). Given a sentence, machines are capable of translating it into many different languages, generating a corresponding syntactic parse tree, marking words that refer to people or places, and much more. These tasks are solved by statistical machine learning algorithms, which leverage patterns in large datasets to build predictive models. Many recent advances in NLP are due to deep learning models (parameterized as neural networks), which bypass user-specified features in favor of building representations of language directly from the text. Despite many deep learning-fueled advances at the word and sentence level, however, computers still struggle to understand high-level discourse structure in language, or the way in which authors combine and order different units of text (e.g., sentences, paragraphs, chapters) to express a coherent message or narrative. Part of the reason is data-related, as there are no existing datasets for many contextual language-based problems, and some tasks are too complex to be framed as supervised learning problems; for the latter type, we must either resort to unsupervised learning or devise training objectives that simulate the supervised setting. Another reason is architectural: neural networks designed for sentence-level tasks require additional functionality, interpretability, and efficiency to operate at the discourse level. In this thesis, I design deep learning architectures for three NLP tasks that require integrating information across high-level linguistic context: question answering, fictional relationship understanding, and comic book narrative modeling. While these tasks are very different from each other on the surface, I show that similar neural network modules can be used in each case to form contextual representations.Item On agent-based modeling: Multidimensional travel behavioral theory, procedural models and simulation-based applications(2015) Xiong, Chenfeng; Zhang, Lei; Civil Engineering; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)This dissertation proposes a theoretical framework to modeling multidimensional travel behavior based on artificially intelligent agents, search theory, procedural (dynamic) models, and bounded rationality. For decades, despite the number of heuristic explanations for different results, the fact that "almost no mathematical theory exists which explains the results of the simulations" remains as one of the large drawbacks of agent-based computational process approach. This is partly the side effect of its special feature that "no analytical functions are required". Among the rapidly growing literature devoted to the departure from rational behavior assumptions, this dissertation makes effort to embed a sound theoretical foundation for computational process approach and agent-based microsimulations for transportation system modeling and analyses. The theoretical contribution is three-fold: (1) It theorizes multidimensional knowledge updating, search start/stopping criteria, and search/decision heuristics. These components are formulated or empirically modeled and integrated in a unified and coherent approach. (2) Procedural and dynamic agent-based decision-making is modeled. Within the model, agents make decisions. They also make decisions on how and when to make those decisions. (3) Replace conventional user equilibrium with a dynamic behavioral user equilibrium (BUE). Search start/stop criteria is defined in the way that the modeling process should eventually lead to a steady state that is structurally different to user equilibrium (UE) or dynamic user equilibrium (DUE). The theory is supported by empirical observations and the derived quantitative models are tested by agent-based simulation on a demonstration network. The model in its current form incorporates short-term behavioral dimensions: travel mode, departure time, pre-trip routing, and en-route diversion. Based on research needs and data availability, other dimensions can be added to the framework. The proposed model is successfully integrated with a dynamic traffic simulator (i.e. DTALite, a light-weight dynamic traffic assignment and simulation engine) and then applied to a mid-size study area in White Flint, Maryland. Results obtained from the integration corroborate the behavioral richness, computational efficiency, and convergence property of the proposed theoretical framework. The model is then applied to a number of applications in transportation planning, operations, and optimization, which highlights the capabilities of the proposed theory in estimating rich behavioral dynamics and the potential of large-scale implementation. Future research should experiment the integration with activity-based models, land-use development, energy consumption estimators, etc. to fully develop the potential of the agent-based model.Item Model-Predictive Strategy Generation for Multi-Agent Pursuit-Evasion Games(2015) Raboin, Eric James; Nau, Dana S; Computer Science; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)Multi-agent pursuit-evasion games can be used to model a variety of different real world problems including surveillance, search-and-rescue, and defense-related scenarios. However, many pursuit-evasion problems are computationally difficult, which can be problematic for domains with complex geometry or large numbers of agents. To compound matters further, practical applications often require planning methods to operate under high levels of uncertainty or meet strict running-time requirements. These challenges strongly suggest that heuristic methods are needed to address pursuit-evasion problems in the real world. In this dissertation I present heuristic planning techniques for three related problem domains: visibility-based pursuit-evasion, target following with differential motion constraints, and distributed asset guarding with unmanned sea-surface vehicles. For these domains, I demonstrate that heuristic techniques based on problem relaxation and model-predictive simulation can be used to efficiently perform low-level control action selection, motion goal selection, and high-level task allocation. In particular, I introduce a polynomial-time algorithm for control action selection in visibility-based pursuit-evasion games, where a team of pursuers must minimize uncertainty about the location of an evader. The algorithm uses problem relaxation to estimate future states of the game. I also show how to incorporate a probabilistic opponent model learned from interaction traces of prior games into the algorithm. I verify experimentally that by performing Monte Carlo sampling over the learned model to estimate the location of the evader, the algorithm performs better than existing planning approaches based on worst-case analysis. Next, I introduce an algorithm for motion goal selection in pursuit-evasion scenarios with unmanned boats. I show how a probabilistic model accounting for differential motion constraints can be used to project the future positions of the target boat. Motion goals for the pursuer boat can then be selected based on those projections. I verify experimentally that motion goals selected with this technique are better optimized for travel time and proximity to the target boat when compared to motion goals selected based on the current position of the target boat. Finally, I introduce a task-allocation technique for a team of unmanned sea-surface vehicles (USVs) responsible for guarding a high-valued asset. The team of USVs must intercept and block a set of hostile intruder boats before they reach the asset. The algorithm uses model-predictive simulation to estimate the value of high-level task assignments, which are then realized by a set of learned low-level behaviors. I show experimentally that using model-predictive simulations based on Monte-Carlo sampling is more effective than hand-coded evaluation heuristics.