Computer Science Theses and Dissertations

Permanent URI for this collectionhttp://hdl.handle.net/1903/2756

Browse

Search Results

Now showing 1 - 3 of 3
  • Thumbnail Image
    Item
    A Comparative Study Of Outlier Detection Methods And Their Downstream Effects
    (2024) Adipudi, Vikram; Herrmann, Jeffrey W.; Systems Engineering; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)
    When fitting machine learning models on datasets there is a possibility of mistakes occurring with overfitting due to outliers in the dataset. Mistakes can lead to incorrect predictions from the model and could diminish the usefulness of the model. Outlier detection is conducted as a precursor step to avoid errors caused by this and to improve performance of the model. This study compares how different outlier detection methods impact regression, classification, and clustering methods. To identify which outlier detection performs best in conjunction with different tasks. To conduct this study multiple outlier detection algorithms were used to clean datasets and the cleaned data was fed into the models. The performance of the model with and without cleaning was compared to identify trends. This study found that using outlier detection of any kind will have little impact on supervised tasks such as regression and classification. For the unsupervised task different clustering models had outlier detection and removal algorithms that made the most positive impact in the clustering. Most commonly IForest and PCA had the greatest impact on clustering methods.
  • Thumbnail Image
    Item
    ANALYZING SEMI-LOCAL LINK COHESION TO DETECT COMMUNITIES AND ANOMALIES IN COMPLEX NETWORKS
    (2021) Schwartz, Catherine; Czaja, Wojciech; Applied Mathematics and Scientific Computation; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)
    Link cohesion is a new type of metric used to assess how supported an edge is relativeto other edges, accounting for nearby alternate paths and associated vertex degrees. A deterministic, scalable, and parallelizable link cohesion metric was shown to be useful in supporting edge scoring and simplifying highly connected networks, making key cohesive subgraphs easier to detect. In this dissertation, the link cohesion metric and a modified version of the metric are analyzed to determine their ability to improve the communities detected in different types of networks when used as a pre-weighting step to traditional algorithms like the Louvain method. Additional observations are made on the utility of analyzing the modified metric to gain insights on whether a network has community structure. The two different link cohesion metrics are also used to create vertex-level features that have the potential for being useful in detecting fake accounts in online social networks. These features are used in conjunction with a new interpretable anomaly detection method which performs well with a small amount of training data, yielding the potential for humanin- the-loop interactions that can allow users to tailor the type of anomalies to prioritize.
  • Thumbnail Image
    Item
    Anti-Profiles for Anomaly Classification and Regression
    (2015) Dinalankara, Wikum; Bravo, Héctor C; Computer Science; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)
    Anomaly detection is a classical problem in Statistical Learning with wide-reaching applications in security, networks, genomics and others. In this work, we formulate the anomaly classification problem as an extension to the detection problem: how to distinguish between samples from multiple heterogenous classes that are anomalies relative to a well-defined, homogenous, normal class. Our formulation of this learning setting arises from studies in cancer genomics, where this problem follows from prognosis and diagnosis applications. Standard binary and multi-class classification schemes are not well suited to the anomaly classification task since they attempt to directly model these highly unstable, heterogeneous classes. In this work, we show that robust classifiers can be obtained by modeling the degree of deviation from the normal class as a stable characteristic of each anomaly class. To do so, we formalize the anomaly classification problem, characterize it statistically and computationally via kernel methods and propose a class of robust learning methods, anti-profiles, specifically designed for this task. We focus on an open area of research in cancer genomics which motivates this project: the classification of tumors for prognosis and diagnosis. We provide experimental results obtained by applying the anti-profile method to gene expression data. In addition we extend the anti-profile approach to use kernel functions, and develop a support-vector machine (SVM) based method for classification of anomalies based on their deviation from a stable normal class. We provide experimental results obtained by applying this method to genetic data to classify different stages of tumor progression, and show that this method provides much more stable classifiers than the application of regular classifiers. In addition we show that this approach can be applied to anomaly classification problems in other application domains. We conclude by developing an SVM for censored survival information and demonstrate that the anti-profile method can produce stable classifiers for modeling the clinical outcome of clinical studies of cancer.