Theses and Dissertations from UMD
Permanent URI for this communityhttp://hdl.handle.net/1903/2
New submissions to the thesis/dissertation collections are added automatically as they are received from the Graduate School. Currently, the Graduate School deposits all theses and dissertations from a given semester after the official graduation date. This means that there may be up to a 4 month delay in the appearance of a give thesis/dissertation in DRUM
More information is available at Theses and Dissertations at University of Maryland Libraries.
Browse
7 results
Search Results
Item A Comparative Study Of Outlier Detection Methods And Their Downstream Effects(2024) Adipudi, Vikram; Herrmann, Jeffrey W.; Systems Engineering; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)When fitting machine learning models on datasets there is a possibility of mistakes occurring with overfitting due to outliers in the dataset. Mistakes can lead to incorrect predictions from the model and could diminish the usefulness of the model. Outlier detection is conducted as a precursor step to avoid errors caused by this and to improve performance of the model. This study compares how different outlier detection methods impact regression, classification, and clustering methods. To identify which outlier detection performs best in conjunction with different tasks. To conduct this study multiple outlier detection algorithms were used to clean datasets and the cleaned data was fed into the models. The performance of the model with and without cleaning was compared to identify trends. This study found that using outlier detection of any kind will have little impact on supervised tasks such as regression and classification. For the unsupervised task different clustering models had outlier detection and removal algorithms that made the most positive impact in the clustering. Most commonly IForest and PCA had the greatest impact on clustering methods.Item COMPUTATIONAL METHODS IN MACHINE LEARNING: TRANSPORT MODEL, HAAR WAVELET, DNA CLASSIFICATION, AND MRI(2018) Njeunje, Franck Olivier Ndjakou; Czaja, Wojciech K; Benedetto, John J; Applied Mathematics and Scientific Computation; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)With the increasing amount of raw data generation produced every day, it has become pertinent to develop new techniques for data representation, analyses, and interpretation. Motivated by real-world applications, there is a trending interest in techniques such as dimensionality reduction, wavelet decomposition, and classication methods that allow for better understanding of data. This thesis details the development of a new non-linear dimension reduction technique based on transport model by advection. We provide a series of computational experiments, and practical applications in hyperspectral images to illustrate the strength of our algorithm. In wavelet decomposition, we construct a novel Haar approximation technique for functions f in the Lp-space, 0 < p < 1, such that the approximants have support contained in the support of f. Furthermore, a classification algorithm to study tissue-specific deoxyribonucleic acids (DNA) is constructed using the support vector machine. In magnetic resonance imaging, we provide an extension of the T2-store-T2 magnetic resonance relaxometry experiment used in the analysis of magnetization signal from 2 to N exchanging sites, where N >= 2.Item SPARSE REPRESENTATION, DISCRIMINATIVE DICTIONARIES AND PROJECTIONS FOR VISUAL CLASSIFICATION(2015) Shrivastava, Ashish; Chellappa, Rama; Electrical Engineering; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)Developments in sensing and communication technologies have led to an explosion in the availability of visual data from multiple sources and modalities. Millions of cameras have been installed in buildings, streets, and airports around the world that are capable of capturing multimodal information such as light, depth, heat etc. These data are potentially a tremendous resource for building robust visual detectors and classifiers. However, the data are often large, mostly unlabeled and increasingly of mixed modality. To extract useful information from these heterogeneous data, one needs to exploit the underlying physical, geometrical or statistical structure across data modalities. For instance, in computer vision, the number of pixels in an image can be rather large, but most inference or representation models use only a few parameters to describe the appearance, geometry, and dynamics of a scene. This has motivated researchers to develop a number of techniques for finding a low-dimensional representation of a high-dimensional dataset. The dominant methodology for modeling and exploiting the low-dimensional structure in high dimensional data is sparse dictionary-based modeling. While discriminative dictionary learning have demonstrated tremendous success in computer vision applications, their performance is often limited by the amount and type of labeled data available for training. In this dissertation, we extend the sparse dictionary learning framework for weakly supervised learning problems such as semi-supervised learning, ambiguously labeled learning and Multiple Instance Learning (MIL). Furthermore, we present nonlinear extensions of these methods using the kernel trick. We also address the problem of choosing the optimal kernel for sparse representation-based classification using Multiple Kernel Learning (MKL) methods. Finally, in order to deal with heterogeneous multimodal data, we present a feature level fusion method based on quadratic programing. The dissertation has been divided into following four parts: 1) In the first part, we develop a discriminative non-linear dictionary learning technique which utilizes both labeled and unlabeled data for learning dictionaries. We compute a probability distribution over class labels for all the unlabeled samples which is updated together with dictionary and sparse coefficients. The algorithm is also extended for ambiguously labeled data when part of the data contains multiple labels for a training sample. 2) Using non-linear dictionaries, we present a multi-class Multiple Instance Learning (MIL) algorithm where the data is given in the form of bags. Each bag contains multiple samples, called instances, out of which at least one belongs to the class of the bag. We propose a noisy-OR model and a generalized mean-based optimization framework for learning the dictionaries in the feature space. The proposed method can be viewed as a generalized dictionary learning algorithm since it reduces to a novel discriminative dictionary learning framework when there is only one instance in each bag. 3) We propose a Multiple Kernel Learning (MKL) algorithm that is based on the Sparse Representation-based Classification (SRC) method. Taking advantage of the non-linear kernel SRC in efficiently representing the non-linearities in the high-dimensional feature space, we propose an MKL method based on the kernel alignment criteria. Our method uses a two step training method to learn the kernel weights and the sparse codes. At each iteration, the sparse codes are updated first while fixing the kernel mixing coefficients, and then the kernel mixing coefficients are updated while fixing the sparse codes. These two steps are repeated until a stopping criteria is met. 4) Finally, using a linear classification model, we study the problem of fusing information from multiple modalities. Many current recognition algorithms combine different modalities based on training accuracy but do not consider the possibility of noise at test time. We describe an algorithm that perturbs test features so that all modalities predict the same class. We enforce this perturbation to be as small as possible via a quadratic program (QP) for continuous features, and a mixed integer program (MIP) for binary features. To efficiently solve the MIP, we provide a greedy algorithm and empirically show that its solution is very close to that of a state-of-the-art MIP solver.Item FAULT DETECTION AND PROGNOSTICS OF INSULATED GATE BIPOLAR TRANSISTOR (IGBT) USING A K-NEAREST NEIGHBOR CLASSIFICATION ALGORITHM(2013) Sutrisno, Edwin; Pecht, Michael; Mechanical Engineering; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)Insulated Gate Bipolar Transistor (IGBT) is a power semiconductor device commonly used in medium to high power applications from household appliances, automotive, and renewable energy. Health assessment of IGBT under field use is of interest due to costly system downtime that may be associated with IGBT failures. Conventional reliability approaches were shown by experimental data to suffer from large uncertainties when predicting IGBT lifetimes, partly due to their inability to adapt to varying loading conditions and part-to-part differences. This study developed a data-driven prognostic method to individually assess IGBT health based on operating data obtained from run-to-failure experiments. IGBT health was classified into healthy and faulty using a K-Nearest Neighbor Centroid Distance classification algorithm. A feature weight optimization method was developed to determine the influence of each feature toward classifying IGBT's health states.Item The Future of Freedom of Information: An Analysis of the Impact of Executive Orders on the Freedom of Information Act National Security Exemptions(2010) Kaminer, Joan Gibson; Jaeger, Paul; Library & Information Services; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)The Freedom of Information Act ("FOIA") was enacted in 1976 to provide access to government information while balancing the interests of privacy and national security. A constant theme in court interpretations has been the extent of FOIA's national security exemptions in preventing disclosure. These interpretations are based on both FOIA and current Presidential Executive Orders addressing the classification of national security information. This paper analyzes the changes between President Bush's and President Obama's Executive Orders. Furthermore, this paper examines the relevant case law regarding FOIA national security exemptions and possible impacts from the changes in Executive Orders. This paper also makes recommendations on how to better implement the policy presented in the Executive Order. This paper concludes that President Obama's Executive Order, while clearly stating the intended policy of open access and addressing prior problems in internal agency procedures, fails to provide adequate changes that will impact FOIA litigation.Item Tackling Uncertainties and Errors in the Satellite Monitoring of Forest Cover Change(2010) Song, Kuan; Townshend, John R. G.; Geography; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)This study aims at improving the reliability of automatic forest change detection. Forest change detection is of vital importance for understanding global land cover as well as the carbon cycle. Remote sensing and machine learning have been widely adopted for such studies with increasing degrees of success. However, contemporary global studies still suffer from lower-than-satisfactory accuracies and robustness problems whose causes were largely unknown. Global geographical observations are complex, as a result of the hidden interweaving geographical processes. Is it possible that some geographical complexities were not expected in contemporary machine learning? Could they cause uncertainties and errors when contemporary machine learning theories are applied for remote sensing? This dissertation adopts the philosophy of error elimination. We start by explaining the mathematical origins of possible geographic uncertainties and errors in chapter two. Uncertainties are unavoidable but might be mitigated. Errors are hidden but might be found and corrected. Then in chapter three, experiments are specifically designed to assess whether or not the contemporary machine learning theories can handle these geographic uncertainties and errors. In chapter four, we identify an unreported systemic error source: the proportion distribution of classes in the training set. A subsequent Bayesian Optimal solution is designed to combine Support Vector Machine and Maximum Likelihood. Finally, in chapter five, we demonstrate how this type of error is widespread not just in classification algorithms, but also embedded in the conceptual definition of geographic classes before the classification. In chapter six, the sources of errors and uncertainties and their solutions are summarized, with theoretical implications for future studies. The most important finding is that, how we design a classification largely pre-determines what we eventually get out of it. This applies for many contemporary popular classifiers including various types of neural nets, decision tree, and support vector machine. This is a cause of the so-called overfitting problem in contemporary machine learning. Therefore, we propose that the emphasis of classification work be shifted to the planning stage before the actual classification. Geography should not just be the analysis of collected observations, but also about the planning of observation collection. This is where geography, machine learning, and survey statistics meet.Item Human Activity Classification Based on Gait and Support Vector Machines(2008) Ducao II, Amon Brigoli; Chellappa, Rama; Electrical Engineering; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)Presented is a method to characterize human gait and to classify human activities using gait. Slices along the x-t dimension of a patio-temporal sequence are extracted to construct a gait double helical signature (gait DHS). A DHS pattern is a compact description that encodes the parameters of human gait and shows inherent symmetry in natural walking (without encumbered limb movement). The symmetry takes the form of Frieze groups, and differences in DHS symmetry can classify different activities. This thesis presents a method for extracting gait DHS, and how the DHS can be separable by activity. Then, a Support Vector Machine (SVM) n-class classifier is constructed using the Radial Basis Function (RBF) kernel, and the performance is measured on a set of data. The SVM is a classification tool based on learning from a training set, and fitting decision boundaries based on an output function. This thesis examines the effect of slicing at different heights of the body and shows the robustness of DHS to view angle, size, and direction of motion. Experiments using real video sequences are presented.