THE USE OF RANDOM FORESTS IN PROPENSITY SCORE WEIGHTING
Files
Publication or External Link
Date
Authors
Advisor
Citation
DRUM DOI
Abstract
An important problem of social science research is the estimate of causal effects in observationalstudies. Propensity score methods, as effective ways to remove selection bias, have been widely used in estimating causal effects in observational studies. An important step of propensity score methods is to estimate the propensity score. Recently, a machine learning method, random forests, has been proposed as an alternative to the conventional method of logistic regression to estimate the propensity score as it requires less stringent assumptions and provides less biased and more reliable estimate of the treatment effect. However, previous studies only covered limited conditions with a small number of covariates and medium sample sizes, leaving the generalizability of the results in doubt. In addition, previous studies have seldom explored how to choose the hyper-parameters in random forests in the context of propensity score methods. This dissertation, via a simulation study, aims to 1) make a more comprehensive comparison between the use of random forests and logistic regression to determine which model performs better under what conditions, 2) explore the effects of the hyperparameters on the estimate of the treatment effect in propensity score weighting. An empirical study is also used as an illustration about how to choose the hyperparameters in random forests using propensity score weighting in practical settings.