UMD Theses and Dissertations
Permanent URI for this collectionhttp://hdl.handle.net/1903/3
New submissions to the thesis/dissertation collections are added automatically as they are received from the Graduate School. Currently, the Graduate School deposits all theses and dissertations from a given semester after the official graduation date. This means that there may be up to a 4 month delay in the appearance of a given thesis/dissertation in DRUM.
More information is available at Theses and Dissertations at University of Maryland Libraries.
Browse
4 results
Search Results
Item A Comparative Study Of Outlier Detection Methods And Their Downstream Effects(2024) Adipudi, Vikram; Herrmann, Jeffrey W.; Systems Engineering; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)When fitting machine learning models on datasets there is a possibility of mistakes occurring with overfitting due to outliers in the dataset. Mistakes can lead to incorrect predictions from the model and could diminish the usefulness of the model. Outlier detection is conducted as a precursor step to avoid errors caused by this and to improve performance of the model. This study compares how different outlier detection methods impact regression, classification, and clustering methods. To identify which outlier detection performs best in conjunction with different tasks. To conduct this study multiple outlier detection algorithms were used to clean datasets and the cleaned data was fed into the models. The performance of the model with and without cleaning was compared to identify trends. This study found that using outlier detection of any kind will have little impact on supervised tasks such as regression and classification. For the unsupervised task different clustering models had outlier detection and removal algorithms that made the most positive impact in the clustering. Most commonly IForest and PCA had the greatest impact on clustering methods.Item PROBLEMS ORIGINATING FROM THE PLANNING OF AIR TRAFFIC MANAGEMENT INITIATIVES(2018) Estes, Alexander; Ball, Michael O; Applied Mathematics and Scientific Computation; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)When weather affects the ability of an airport to accommodate flights, a ground delay program is used to control the rate at which flights arrive at the airport. This prevents excessive congestion at the airport. In this thesis, we discuss several problems arising from the planning of these programs. Each of these problems provides insight that can be applied in a broader setting, and in each case we develop generalizations of these results in a wider context. We show that a certain type of greedy policy is optimal for planning a ground delay program when no air delays are allowed. More generally, we characterize the conditions under which policies are optimal for a dynamic stochastic transportation problem. We also provide results that ensure that certain assignments are optimal, and we apply these results to the problem of matching drivers to riders in an on-demand ride service. When flights are allowed to take air delays, then a greedy policy is no longer optimal, but flight assignments can be produced by solving an integer program. We establish the strength of an existing formulation of this problem, and we provide a new, more scalable formulation that has the same strength properties. We show that both of these methods satisfy a type of equity property. These formulations are a special case of a dynamic stochastic network flow problem, which can be modeled as a deterministic flow problem on a hypergraph. We provide strong formulations for this general class of hypergraph flow problems. Finally, we provide a method for summarizing a dataset of ground delay programs. This summarization consists of a small subset of the original data set, whose elements are referred to as "representative" ground delay programs. More generally, we define a new class of data exploration methods, called "representative region selection" methods. We provide a framework for evaluating the quality of these methods, and we demonstrate statistical properties of these methods.Item A Systematic and Minimalist Approach to Lower Barriers in Visual Data Exploration(2016) Yalcin, Mehmet Adil; Bederson, Benjamin B; Elmqvist, Niklas E; Computer Science; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)With the increasing availability and impact of data in our lives, we need to make quicker, more accurate, and intricate data-driven decisions. We can see and interact with data, and identify relevant features, trends, and outliers through visual data representations. In addition, the outcomes of data analysis reflect our cognitive processes, which are strongly influenced by the design of tools. To support visual and interactive data exploration, this thesis presents a systematic and minimalist approach. First, I present the Cognitive Exploration Framework, which identifies six distinct cognitive stages and provides a high-level structure to design guidelines, and evaluation of analysis tools. Next, in order to reduce decision-making complexities in creating effective interactive data visualizations, I present a minimal, yet expressive, model for tabular data using aggregated data summaries and linked selections. I demonstrate its application to common categorical, numerical, temporal, spatial, and set data types. Based on this model, I developed Keshif as an out-of-the-box, web-based tool to bootstrap the data exploration process. Then, I applied it to 160+ datasets across many domains, aiming to serve journalists, researchers, policy makers, businesses, and those tracking personal data. Using tools with novel designs and capabilities requires learning and help-seeking for both novices and experts. To provide self-service help for visual data interfaces, I present a data-driven contextual in-situ help system, HelpIn, which contrasts with separated and static videos and manuals. Lastly, I present an evaluation on design and graphical perception for dense visualization of sorted numeric data. I contrast the non-hierarchical treemaps against two multi-column chart designs, wrapped bars and piled bars. The results support that multi-column charts are perceptually more accurate than treemaps, and the unconventional piled bars may require more training to read effectively. This thesis contributes to our understanding on how to create effective data interfaces by systematically focusing on human-facing challenges through minimalist solutions. Future work to extend the power of data analysis to a broader public should continue to evaluate and improve design approaches to address many remaining cognitive, social, educational, and technical challenges.Item Search for Pair Production of Top Squarks in Proton-Proton Collisions at $\sqrt{s} = 8$ TeV(2015) Calvert, Brian Michael; Hadley, Nicholas J; Physics; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)Supersymmetric extensions to the standard model can solve a number of current, unresolved issues in particle physics. In most of these models, the top-squark, the supersymmetric partner to the top quark, plays an integral role in fixing some of these issues. Although the existence of many supersymmetric particles have been strongly constrained by experiments, currently the existence of the top-squark remains largely unconstrained. This dissertation presents several searches for top-squark pair-production in $R$-parity conserving supersymmetry where the lightest neutralino is assumed to be stable. The data utilized in this search corresponds to 19.66 fb$^{-1}$ of proton-proton collision data collected by the CMS experiment at $\sqrt{s} = 8$ TeV during the 2012 LHC run. The main focus of the dissertation is a search in the dileptonic final state, where the experimental final state is two leptons, two bottom quarks, and missing transverse momentum. Using a cut-based approach, no excess of events above the nominal background expectations is observed. This result is combined with a top-squark search in the semi-leptonic final state to exclude top-squark pair-production at the 95\% confidence level for top-squark masses up to 700 GeV. when the lightest neutralino's mass is below 260 GeV. This dissertation also presents a powerful new approach to the dileptonic top-squark search. Shape-based comparisons, using three complementary discriminating variables, between the observed data and the nominal background expectations achieve much better statistical sensitivity to top-squark pair-production in comparison with the cut-based search. Notably, the shape analysis excludes the existence of top-squarks that are nearly mass-degenerate with the top quark. Currently, no other direct top-squark search can achieve this exclusion. As well, there are a number of observed excesses in the shape analysis. The statistical significances of these excesses are tested against top-squark pair-production models. The subset of models where top-squark decays to a top quark and the lightest neutralino and the mass-splitting between the top-squark and the lightest neutralino is $(150\pm12.5)$ GeV are found to fit with a statistical significance of $\sim 3.5$--$4\sigma$. The global significance of these excesses is quantified by correcting for the look-elsewhere effect; the highest post-correction significances are found to be $\sim 2.5$--$3\sigma$.