THE USE OF RANDOM FORESTS IN PROPENSITY SCORE WEIGHTING

Loading...
Thumbnail Image

Files

Publication or External Link

Date

2023

Citation

Abstract

An important problem of social science research is the estimate of causal effects in observationalstudies. Propensity score methods, as effective ways to remove selection bias, have been widely used in estimating causal effects in observational studies. An important step of propensity score methods is to estimate the propensity score. Recently, a machine learning method, random forests, has been proposed as an alternative to the conventional method of logistic regression to estimate the propensity score as it requires less stringent assumptions and provides less biased and more reliable estimate of the treatment effect. However, previous studies only covered limited conditions with a small number of covariates and medium sample sizes, leaving the generalizability of the results in doubt. In addition, previous studies have seldom explored how to choose the hyper-parameters in random forests in the context of propensity score methods. This dissertation, via a simulation study, aims to 1) make a more comprehensive comparison between the use of random forests and logistic regression to determine which model performs better under what conditions, 2) explore the effects of the hyperparameters on the estimate of the treatment effect in propensity score weighting. An empirical study is also used as an illustration about how to choose the hyperparameters in random forests using propensity score weighting in practical settings.

Notes

Rights