UMD Theses and Dissertations

Permanent URI for this collectionhttp://hdl.handle.net/1903/3

New submissions to the thesis/dissertation collections are added automatically as they are received from the Graduate School. Currently, the Graduate School deposits all theses and dissertations from a given semester after the official graduation date. This means that there may be up to a 4 month delay in the appearance of a given thesis/dissertation in DRUM.

More information is available at Theses and Dissertations at University of Maryland Libraries.

Browse

Search Results

Now showing 1 - 1 of 1
  • Thumbnail Image
    Item
    STATISTICAL LEARNING WITH APPLICATIONS IN HIGH DIMENSIONAL DATA AND HEALTH CARE ANALYTICS
    (2017) Fan, Yimei; Ryzhov, Ilya; Mathematics; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)
    Statistical learning has been applied in business and health care analytics. Predictive models are fit using hierarchically structured data: common characteristics of products and customers are represented as categorical variables, and each category can be split up into multiple subcategories at a lower level of the hierarchy. Hundreds of thousands of binary variables may be required to model the hierarchy, necessitating the use of variable selection to screen out large numbers of irrelevant or insignificant features. We propose a new dynamic screening method, based on the distance correlation criterion, designed for hierarchical binary data. Our method can screen out large parts of the hierarchy at the higher levels, avoiding the need to explore many lower-level features and greatly reducing the computational cost of screening. The practical potential of the method is demonstrated in a case application involving a large volume of B2B transaction data. While statistical inference has been widely used for decision and policy making in health care, we particularly focused on how providers get paid for some common procedures. We explored a few rich datasets and discovered large variations among providers for how much payers/insurers have paid, aka allowed payment. Then we proposed to incorporate available providers' attributes with regression model to explain the possible reasons for those payment variations.