UMD Theses and Dissertations

Permanent URI for this collectionhttp://hdl.handle.net/1903/3

New submissions to the thesis/dissertation collections are added automatically as they are received from the Graduate School. Currently, the Graduate School deposits all theses and dissertations from a given semester after the official graduation date. This means that there may be up to a 4 month delay in the appearance of a given thesis/dissertation in DRUM.

More information is available at Theses and Dissertations at University of Maryland Libraries.

Browse

Search Results

Now showing 1 - 6 of 6
  • Thumbnail Image
    Item
    Learning Temporal Properties from Data Streams
    (2020) Huang, Samuel; Cleaveland, Rance; Computer Science; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)
    Verifying that systems behave as expected is a cornerstone of computing. In formal verification approaches, engineers capture their intentions, or specifications, mathematically, often using logic. The verification task is then to confirm that the system satisfies its specifications. In a formal setting this endeavor typically involves the construction of mathematical proofs, which are either constructed automatically, as in the case of so-called model-checking techniques, or by humans with machine assistance, as in the case of theorem-proving-based methodologies. In practice, formal verification faces a number of obstacles. One involves the construction of formal specifications in the first place. Another is the lack of availability of analyzable system artifacts (source code, executables) about which proofs can be constructed. In this dissertation we propose techniques for inferring formal specifications, in the form of Linear Temporal Logic formulas, from system executions, and models when these are available, as a means of addressing these concerns. We first illustrate how guided generation of test cases (i.e. guided exploration of the system's input/output space) can be leveraged to develop and then refine the hypothesis set of invariants that a system satisfies. This approach was successful in revealing a system property required in code specification of an automotive application provided to us but missing in the implementation. An unexpected (undocumented) specification was also discovered through our analysis. The approach has since been applied by other researchers to several other automotive applications. Second, we develop techniques to mine properties from models of systems and/or their executions. In some cases compact, finite state representations of a system is available. In this scenario we employ a novel automaton-based approach to mine properties matching a user-specified template. In other cases such white-box knowledge is not available and we must work over executions of the system rather than the system itself. In this scenario we apply a novel variant of Linear Temporal Logic (LTL) using finite-sequence semantics to again mine properties based on a specified template. Lastly, we consider situations where standard formal properties are insufficient due to uncertainty or being overly simplified. Oftentimes properties that are not satisfied 100\% of the time can be very interesting during a system inspection or system redesign. For example, natural noise manifesting in data streams from system executions such as missing evidence and changing system behavior could lead to significant properties being overlooked if a strict ``exactitude'' were enforced. We explore the notion of ``incomplete'' properties which we term \emph{partial invariants} by formulating a new ``Noisy Linear Temporal Logic'' which is an extension of standard LTL. We consider several representations of such noise and show decidability of the language emptiness problem for some of the variants.
  • Thumbnail Image
    Item
    SEA-SURFACE TEMPERATURE BASED STATISTICAL PREDICTION OF THE SOUTH ASIAN SUMMER MONSOON RAINFALL DISTRIBUTION
    (2019) Sengupta, Agniv; Nigam, Sumant; Atmospheric and Oceanic Sciences; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)
    The South Asian summer monsoon brings copious amounts of rainfall accounting for over 70% of the annual rainfall over India. Summer monsoon predictions have drawn considerable public/policy attention lately as South Asia has become a resource-stressed and densely populated region. This environmental backdrop and the livelihood concerns of a billion-plus people generate the demand for more accurate monsoon predictions. The prediction skill, however, has remained marginal and stagnant for several decades despite advances in the representation of physical processes, numerical model resolution, and data assimilation techniques, leading to the following key question: what is the potential predictability of summer monsoon rainfall at lead times of one month to a season? This dissertation examines the role of influential climate system components with large thermal inertia and reliable long-term observational records, like sea-surface temperature (SST) in forecasting the seasonal distribution of South Asian monsoon rainfall. First, an evolution-centric SST analysis is conducted in the global oceans using the extended-Empirical Orthogonal Function technique to uncover the recurrent modes of spatiotemporal variability and their potential inter-basin linkages. A statistical forecast model is next developed using these extracted modes of SST variability as predictors. Assessment of the forecasting system’s long-term performance from reconstruction and hindcasting over an independent verification period demonstrates high forecast skill over core monsoon regions – the Indo-Gangetic Plain and southern peninsular India, indicating prospects for improved seasonal predictions. The influence of SSTs on the northeast winter monsoon is subsequently investigated, especially, its evolution, interannual variability and the El Niño–Southern Oscillation (ENSO) influence. Key findings from this study include evidence of increased rainfall over southeastern peninsular India and Sri Lanka (generated by an off-equatorial anticyclonic circulation centered over the Bay of Bengal) during El Niño winters. This dissertation provides the first quantitative assessment of the potential predictability of summer monsoon rainfall anomalies – the maximum predictable summer rainfall signal (amount, distribution) over South Asia from prior SST information – at various seasonal leads, and notably, at SST-mode resolution. The improved skill of the SST-based statistical forecast establishes the bar – an evaluative benchmark – for the dynamical prediction of summer monsoon rainfall.
  • Thumbnail Image
    Item
    Model-Based Testing of Off-Nominal Behaviors
    (2017) Schulze, Christoph; Cleaveland, Rance; Lindvall, Mikael; Computer Science; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)
    Off-nominal behaviors (ONBs) are unexpected or unintended behaviors that may be exhibited by a system. They can be caused by implementation and documentation errors and are often triggered by unanticipated external stimuli, such as unforeseen sequences of events, out of range data values, or environmental issues. System specifications typically focus on nominal behaviors (NBs), and do not refer to ONBs or their causes or explain how the system should respond to them. In addition, untested occurrences of ONBs can compromise the safety and reliability of a system. This can be very dangerous in mission- and safety-critical systems, like spacecraft, where software issues can lead to expensive mission failures, injuries, or even loss of life. In order to ensure the safety of the system, potential causes for ONBs need to be identified and their handling in the implementation has to be verified and documented. This thesis describes the development and evaluation of model-based techniques for the identification and documentation of ONBs. Model-Based Testing (MBT) techniques have been used to provide automated support for thorough evaluation of software behavior. In MBT, models are used to describe the system under test (SUT) and to derive test cases for that SUT. The thesis is divided into two parts. The first part develops and evaluates an approach for the automated generation of MBT models and their associated test infrastructure. The test infrastructure is responsible for executing the generated test cases of the models. The models and the test infrastructure are generated from manual test cases for web-based systems, using a set of heuristic transformation rules and leveraging the structured nature of the SUT. This improvement to the MBT process was motivated by three case studies of MBT that we conducted that evaluate MBT in terms of its effectiveness and efficiency for identifying ONBs. Our experience led us to develop automated approaches to model and test-infrastructure creation, since these were some of the most time-consuming tasks associated with MBT. The second part of the thesis presents a framework and associated tooling for the extraction and analysis of specifications for identifying and documenting ONBs. The framework infers behavioral specifications in the form of system invariants from automatically generated test data using data-mining techniques (e.g. association-rule mining). The framework follows an iterative test -> infer -> instrument -> retest paradigm, where the initial invariants are refined with additional test data. This work shows how the scalability and accuracy of the resulting invariants can be improved with the help of static data- and control-flow analysis. Other improvements include an algorithm that leverages the iterative process to accurately infer invariants from variables with continuous values. Our evaluations of the framework have shown the utility of such automatically generated invariants as a means for updating and completing system specifications; they also are useful as a means of understanding system behavior including ONBs.
  • Thumbnail Image
    Item
    The Dynamics of Multi-Modal Networks
    (2012) Sharara, Hossam; Getoor, Lise; Computer Science; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)
    The widespread study of networks in diverse domains, including social, technological, and scientific settings, has increased the interest in statistical and machine learning techniques for network analysis. Many of these networks are complex, involving more than one kind of entity, and multiple relationship types, both changing over time. While there have been many network analysis methods proposed for problems such as network evolution, community detection, information diffusion and opinion leader identification, the majority of these methods assume a single entity type, a single edge type and often no temporal dynamics. One of the main shortcomings of these traditional techniques is their inadequacy for capturing higher-order dependencies often present in real, complex networks. To address these shortcomings, I focus on analysis and inference in dynamic, multi-modal, multi-relational networks, containing multiple entity types (such as people, social groups, organizations, locations, etc.), and different relationship types (such as friendship, membership, affiliation, etc.). An example from social network theory is a network describing users, organizations and interest groups, where users have different types of ties among each other, such as friendship, family ties, etc., as well as affiliation and membership links with organizations and interest groups. By considering the complex structure of these networks rather than limiting the analysis to a single entity or relationship type, I show how we can build richer predictive models that provide better understanding of the network dynamics, and thus result in better quality predictions. In the first part of my dissertation, I address the problems of network evolution and clustering. For network evolution, I describe methods for modeling the interactions between different modalities, and propose a co-evolution model for social and affiliation networks. I then move to the problem of network clustering, where I propose a novel algorithm for clustering multi-modal, multi-relational data. The second part of my dissertation focuses on the temporal dynamics of interactions in complex networks, from both user-level and network-level perspectives. For the user-centric approach, I analyze the dynamics of user relationships with other entity types, proposing a measure of the "loyalty" a user shows for a given group or topic, based on her temporal interaction pattern. I then move to macroscopic-level approaches for analyzing the dynamic processes that occur on a network scale. I propose a new differential adaptive diffusion model for incorporating diversity and trust in the process of information diffusion on multi-modal, multi-relational networks. I also discuss the implications of the proposed diffusion model on designing new strategies for viral marketing and influential detection. I validate all the proposed methods on several real-world networks from multiple domains.
  • Thumbnail Image
    Item
    KNOWLEDGE DISCOVERY FROM GENE EXPRESSION DATA: NOVEL METHODS FOR SIMILARITY SEARCH, SIGNATURE DETECTION, AND CONFOUNDER CORRECTION
    (2012) Licamele, Louis; Getoor, Lise; Computer Science; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)
    Gene expression microarray data is used to answer a variety of scientific questions. For example, it can be used for gaining a better understanding of a drug, segmenting a disease, and predicting an optimal therapeutic response. The amount of gene expression data publicly available is extremely large and continues to grow at an increasing rate. However, this rapid growth of gene expression data from laboratories across the world has not fully achieved its potential impact on the scientific community. This shortcoming is due to the fact that the majority of the data has been gathered under varying conditions, and there is no principled way for combining and fully utilizing related data. Even within a closely controlled gene expression experiment, there are confounding factors that may mask the true signatures when analyzed with current methods. Therefore, we are interested in three core tasks that we believe are important for improving the utilization of gene array data: similarity search, signature detection, and confounder correction. We have developed novel methods that address each of these tasks. In this work, we first address the similarity search problem. More specifically, we propose methods which overcome experimental barriers in pariwise gene expression similarity calculations. We introduce a method, which we refer to as indirect similarity, which, unlike previous approaches, uses all of the information in a database to better inform the similarity calculation of a pair of gene expression profiles. We demonstrate that our method is more robust and better able to cope with experimental barriers such as vehicle and batch effects. We evaluate the ability of our method to retrieve compounds with similar therapeutic effects in two independent datasets. We evaluate the recall ability of our approach and show that our method results in an improvement of 97.03% and 49.44% respectively over existing state of the art approaches. The second problem we focus on is signature detection. Gene expression experiments are performed to test a specific hypothesis. Generally, this hypothesis is that there is some genetic signature common in a group of samples. Current methods try to find the differentially expressed genes within a group of samples using a variety of methods, however, they all are parametric. We introduce a nonparametric approach to group profile creation which we refer to as the Weighted Influence Model - Rank of Ranks method. For every probe on the microarray, the average rank is calculated across all members of a group. These average ranks are then re-ranked to form the group profile. We demonstrate the ability of our group profile method to better understand a disease and the underlying mechanism common to its treatments. Additionally, we demonstrate the predictive power of this group profile to detect novel drugs that could treat a particular disease. This method leads the detection of robust group signatures even with unknown confounding effects. The final problem that we address is the challenge of removing known (annotated) confounding effects from gene expression profiles. We propose an extension to our non-parametric gene expression profile method to correct for observed confounding effects. This correction is performed on ranked lists directly, and it provides a robust alternative to parametric batch profile correction methods. We evaluate our novel profile subtraction method on two real world datasets, comparing against several state-of-the-art parametric methods. We demonstrate an improvement in group signature detection using our method to remove confounding effects. Additionally, we show that in a dataset with the true group assignments removed and only the confounding effects labelled, our profile subtraction method allows for the discovery of the true groups. We evaluate the robustness of our methods using a gene expression profile generator that we developed.
  • Thumbnail Image
    Item
    AUTOMATED KEYWORD EXTRACTION FROM BIO-MEDICAL LITERATURE WITH CONCENTRATION ON ANTIBIOTIC RESISTANCE
    (2009) Zuhl, Maya; Pop, Mihai; Computer Science; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)
    The explosive growth of bio-medical literature makes it increasingly difficult and time consuming to keep up with newly discovered and published information. The extraction of knowledge from papers is critical in enabling computational analysis of biological data. In the last decade, tremendous effort has been put into development of automated and semi-automated tools for knowledge discovery and extraction from text, as an alternative to monotonous and time-consuming manual processing. This thesis research was focused on determining whether minor human supervision can improve the process of automated bio-medical text annotation. One of the main outcomes of this study is a tool that requires minimal effort and time from scientists to reach high precision in semi-automated annotation. The task we targeted is the extraction of keywords related to antibiotic resistance in bacteria. The tool is based on a machine learning algorithm that is retrained several times to achieve the best accuracy.