Comparing the Effectiveness of Standard vs. Multilevel Machine Learning Algorithms on Hierarchical Data

dc.contributor.advisorSweet, Tracyen_US
dc.contributor.authorRegister, Brennanen_US
dc.contributor.departmentMeasurement, Statistics and Evaluationen_US
dc.contributor.publisherDigital Repository at the University of Marylanden_US
dc.contributor.publisherUniversity of Maryland (College Park, Md.)en_US
dc.date.accessioned2025-08-08T11:41:25Z
dc.date.issued2025en_US
dc.description.abstractThis dissertation explored the performance of standard and multilevel machine learning classification algorithms on hierarchical datasets, which are prevalent in educational research due to multistage sampling techniques (e.g., students nested within schools). Hierarchical data pose unique analytical challenges (e.g., nonindependence of data), often requiring specialized approaches. While multilevel modeling is well-established in inferential contexts, its potential in predictive scenarios had been largely underexplored. Through a Monte Carlo simulation and empirical analyses using data from the Maryland Longitudinal Data System (MLDS), this study evaluated the suitability of standard and multilevel algorithms. Results revealed that multilevel models offered slight advantages in high residual level-2 variance settings, effectively capturing cluster-level dependencies and providing stable predictions. Standard models performed well in low residual level-2 variance contexts, while standard models incorporating cluster IDs as fixed effects performed comparably to multilevel models under many conditions, including high residual level-2 variance scenarios. However, this approach was only feasible when training and testing clusters overlapped, highlighting limitations in generalizing predictions to unseen clusters. In an empirical analysis addressing class imbalance, Logistic Regression and GLMM exhibited the highest sensitivity for identifying STEM completers when training and testing clusters overlapped while Neural Nets and XGBoost demonstrated better performance in identifying the minority class when training and testing clusters were distinct. These findings highlighted the complexity of predictive modeling for hierarchical data and provided insights for prediction tasks in educational research.en_US
dc.identifierhttps://doi.org/10.13016/mv4w-liso
dc.identifier.urihttp://hdl.handle.net/1903/34077
dc.language.isoenen_US
dc.subject.pqcontrolledStatisticsen_US
dc.subject.pqcontrolledEducational psychologyen_US
dc.titleComparing the Effectiveness of Standard vs. Multilevel Machine Learning Algorithms on Hierarchical Dataen_US
dc.typeDissertationen_US

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Register_umd_0117E_24921.pdf
Size:
1.99 MB
Format:
Adobe Portable Document Format