IDENTIFYING AND MITIGATING BIAS IN MACHINE LEARNING FOR HEALTHCARE

Smolyak, Daniel

IDENTIFYING AND MITIGATING BIAS IN MACHINE LEARNING FOR HEALTHCARE

dc.contributor.advisor	Bjarnadóttir, Margrét V	en_US
dc.contributor.advisor	Frías-Martínez, Vanessa	en_US
dc.contributor.author	Smolyak, Daniel	en_US
dc.contributor.department	Computer Science	en_US
dc.contributor.publisher	Digital Repository at the University of Maryland	en_US
dc.contributor.publisher	University of Maryland (College Park, Md.)	en_US
dc.date.accessioned	2025-09-15T05:46:58Z
dc.date.issued	2025	en_US
dc.description.abstract	The use of machine learning in healthcare settings has become increasingly common, from prediction of individual patient outcomes to supporting policy decision-making for public health officials. However, these machine learning models often replicate or exacerbate human biases and discrimination. In this dissertation, we seek to address this problem both through identification of bias in existing healthcare modeling settings and through the development of approaches to mitigate bias, focusing on several complementary problems. We audit predictive models of county-level COVID-19 cases, identifying whether models perform equally well across counties with different demographic compositions when a) human mobility data is included as a model feature and b) when various approaches are used to correct case underreporting. We also investigate approaches to improve model performance specifically for small subgroups. We develop a regression model for joint estimation of multiple groups that uses sample weighting and separate sparsity penalties to boost model performance for smaller groups. Then we outline an easy-to-implement LLM-based synthetic data generation method to augment smaller, underrepresented groups in health datasets, conducting a comprehensive evaluation of two prompt templates and three LLMs across two health datasets. Lastly, we present a novel use of causal machine learning methods to investigate sociodemographic subgroups with heterogeneous racial health disparities. Given structural inequities in allocation of health resources to marginalized communities and current disparities in a wide range of health outcomes, it is important that we both prevent machine learning systems from causing further harm through perpetuation of allocation inequities and leverage machine learning approaches to actively correct these harms.	en_US
dc.identifier	https://doi.org/10.13016/dmaw-n1vr
dc.identifier.uri	http://hdl.handle.net/1903/34709
dc.language.iso	en	en_US
dc.subject.pqcontrolled	Computer science	en_US
dc.subject.pquncontrolled	Bias	en_US
dc.subject.pquncontrolled	Equity	en_US
dc.subject.pquncontrolled	Fairness	en_US
dc.subject.pquncontrolled	Healthcare	en_US
dc.subject.pquncontrolled	LLM	en_US
dc.subject.pquncontrolled	Machine Learning	en_US
dc.title	IDENTIFYING AND MITIGATING BIAS IN MACHINE LEARNING FOR HEALTHCARE	en_US
dc.type	Dissertation	en_US

Files

Original bundle

Now showing 1 - 1 of 1

Name:: Smolyak_umd_0117E_25545.pdf
Size:: 9.83 MB
Format:: Adobe Portable Document Format

Download
(RESTRICTED ACCESS)

Collections

UMD Theses and Dissertations
Computer Science Theses and Dissertations