A Comparative Study Of Outlier Detection Methods And Their Downstream Effects

dc.contributor.advisorHerrmann, Jeffrey W.en_US
dc.contributor.authorAdipudi, Vikramen_US
dc.contributor.departmentSystems Engineeringen_US
dc.contributor.publisherDigital Repository at the University of Marylanden_US
dc.contributor.publisherUniversity of Maryland (College Park, Md.)en_US
dc.date.accessioned2024-07-02T05:37:53Z
dc.date.available2024-07-02T05:37:53Z
dc.date.issued2024en_US
dc.description.abstractWhen fitting machine learning models on datasets there is a possibility of mistakes occurring with overfitting due to outliers in the dataset. Mistakes can lead to incorrect predictions from the model and could diminish the usefulness of the model. Outlier detection is conducted as a precursor step to avoid errors caused by this and to improve performance of the model. This study compares how different outlier detection methods impact regression, classification, and clustering methods. To identify which outlier detection performs best in conjunction with different tasks. To conduct this study multiple outlier detection algorithms were used to clean datasets and the cleaned data was fed into the models. The performance of the model with and without cleaning was compared to identify trends. This study found that using outlier detection of any kind will have little impact on supervised tasks such as regression and classification. For the unsupervised task different clustering models had outlier detection and removal algorithms that made the most positive impact in the clustering. Most commonly IForest and PCA had the greatest impact on clustering methods.en_US
dc.identifierhttps://doi.org/10.13016/gkhj-8xej
dc.identifier.urihttp://hdl.handle.net/1903/33034
dc.language.isoenen_US
dc.subject.pqcontrolledComputer scienceen_US
dc.subject.pquncontrolledAnomaly Detectionen_US
dc.subject.pquncontrolledClassificationen_US
dc.subject.pquncontrolledClusteringen_US
dc.subject.pquncontrolledData Analysisen_US
dc.subject.pquncontrolledOutlier Detectionen_US
dc.subject.pquncontrolledRegressionen_US
dc.titleA Comparative Study Of Outlier Detection Methods And Their Downstream Effectsen_US
dc.typeThesisen_US

Files

Original bundle

Now showing 1 - 1 of 1
Thumbnail Image
Name:
Adipudi_umd_0117N_24315.pdf
Size:
2.34 MB
Format:
Adobe Portable Document Format