STATISTICAL LEARNING WITH APPLICATIONS IN HIGH DIMENSIONAL DATA AND HEALTH CARE ANALYTICS
MetadataShow full item record
Statistical learning has been applied in business and health care analytics. Predictive models are fit using hierarchically structured data: common characteristics of products and customers are represented as categorical variables, and each category can be split up into multiple subcategories at a lower level of the hierarchy. Hundreds of thousands of binary variables may be required to model the hierarchy, necessitating the use of variable selection to screen out large numbers of irrelevant or insignificant features. We propose a new dynamic screening method, based on the distance correlation criterion, designed for hierarchical binary data. Our method can screen out large parts of the hierarchy at the higher levels, avoiding the need to explore many lower-level features and greatly reducing the computational cost of screening. The practical potential of the method is demonstrated in a case application involving a large volume of B2B transaction data. While statistical inference has been widely used for decision and policy making in health care, we particularly focused on how providers get paid for some common procedures. We explored a few rich datasets and discovered large variations among providers for how much payers/insurers have paid, aka allowed payment. Then we proposed to incorporate available providers' attributes with regression model to explain the possible reasons for those payment variations.