IDENTIFYING AND MITIGATING BIAS IN MACHINE LEARNING FOR HEALTHCARE

dc.contributor.advisorBjarnadóttir, Margrét Ven_US
dc.contributor.advisorFrías-Martínez, Vanessaen_US
dc.contributor.authorSmolyak, Danielen_US
dc.contributor.departmentComputer Scienceen_US
dc.contributor.publisherDigital Repository at the University of Marylanden_US
dc.contributor.publisherUniversity of Maryland (College Park, Md.)en_US
dc.date.accessioned2025-09-15T05:46:58Z
dc.date.issued2025en_US
dc.description.abstractThe use of machine learning in healthcare settings has become increasingly common, from prediction of individual patient outcomes to supporting policy decision-making for public health officials. However, these machine learning models often replicate or exacerbate human biases and discrimination. In this dissertation, we seek to address this problem both through identification of bias in existing healthcare modeling settings and through the development of approaches to mitigate bias, focusing on several complementary problems. We audit predictive models of county-level COVID-19 cases, identifying whether models perform equally well across counties with different demographic compositions when a) human mobility data is included as a model feature and b) when various approaches are used to correct case underreporting. We also investigate approaches to improve model performance specifically for small subgroups. We develop a regression model for joint estimation of multiple groups that uses sample weighting and separate sparsity penalties to boost model performance for smaller groups. Then we outline an easy-to-implement LLM-based synthetic data generation method to augment smaller, underrepresented groups in health datasets, conducting a comprehensive evaluation of two prompt templates and three LLMs across two health datasets. Lastly, we present a novel use of causal machine learning methods to investigate sociodemographic subgroups with heterogeneous racial health disparities. Given structural inequities in allocation of health resources to marginalized communities and current disparities in a wide range of health outcomes, it is important that we both prevent machine learning systems from causing further harm through perpetuation of allocation inequities and leverage machine learning approaches to actively correct these harms.en_US
dc.identifierhttps://doi.org/10.13016/dmaw-n1vr
dc.identifier.urihttp://hdl.handle.net/1903/34709
dc.language.isoenen_US
dc.subject.pqcontrolledComputer scienceen_US
dc.subject.pquncontrolledBiasen_US
dc.subject.pquncontrolledEquityen_US
dc.subject.pquncontrolledFairnessen_US
dc.subject.pquncontrolledHealthcareen_US
dc.subject.pquncontrolledLLMen_US
dc.subject.pquncontrolledMachine Learningen_US
dc.titleIDENTIFYING AND MITIGATING BIAS IN MACHINE LEARNING FOR HEALTHCAREen_US
dc.typeDissertationen_US

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Smolyak_umd_0117E_25545.pdf
Size:
9.83 MB
Format:
Adobe Portable Document Format
Download
(RESTRICTED ACCESS)