Semiparametric Regression and Mortality Rate Prediction
MetadataShow full item record
This dissertation is divided into two parts. In the first part we consider the general multivariate multiple sample semiparametric density ratio model. In this model one distribution serves as a reference or baseline, and all other distributions are weighted tilts of the reference. The weights are considered known up to a parameter. All the parameters in the model, as well as the reference distribution, are estimated from the combined data from all samples. A kernel-based density estimator can be constructed based on the semiparametric model. In this dissertation we discuss the asymptotic theory and convergence properties for the semiparametric kernel density estimator. The estimator is shown to be not only consistent, but also more efficient than the general kernel density estimator. Several ways for selecting the bandwidth are also discussed. This opens the door to regression analysis with random covariates from a semiparametric perspective where information is combined from multiple multivariate sources. Accordingly, each multivariate distribution and a corresponding conditional expectation (or regression) of interest is then estimated from the combined data from all sources. Graphical and quantitative diagnostic tools are suggested to assess model validity. The method is applied to real and simulated data. Comparisons are made with multiple regression, generalized additive models (GAM) and nonparametric kernel regression. In the second part we study mortality rate prediction. The National Center for Health Statistics (NCHS) uses observed mortality data to publish race-gender specific life tables for individual states decennially. At ages over 85 years, the reliability of death rates based on these data is compromised to some extent by age misreporting. The eight-parameter Heligman-Pollard parametric model is then used to smooth the data and obtain estimates/extrapolation of mortality rates for advanced ages. In States with small sub-populations the observed mortality rates are often zero, particularly among young ages. The presence of zero death rates makes the fitting of the Heligman-Pollard model difficult and at times outright impossible. In addition, since death rates are reported on a log scale, zero mortality rates are problematic. To overcome observed zero death rates, appropriate probability models are used. Using these models, observed zero mortality rates are replaced by the corresponding expected values. This enables using logarithmic transformations, and the fitting of the Heligman-Pollard model to produce mortality estimates for ages 0 - 130 years.