Semiparametric Regression and Mortality Rate Prediction

dc.contributor.advisorKedem, Benjaminen_US
dc.contributor.authorVoulgaraki, Anastasiaen_US
dc.contributor.departmentMathematicsen_US
dc.contributor.publisherDigital Repository at the University of Marylanden_US
dc.contributor.publisherUniversity of Maryland (College Park, Md.)en_US
dc.date.accessioned2011-10-08T05:38:00Z
dc.date.available2011-10-08T05:38:00Z
dc.date.issued2011en_US
dc.description.abstractThis dissertation is divided into two parts. In the first part we consider the general multivariate multiple sample semiparametric density ratio model. In this model one distribution serves as a reference or baseline, and all other distributions are weighted tilts of the reference. The weights are considered known up to a parameter. All the parameters in the model, as well as the reference distribution, are estimated from the combined data from all samples. A kernel-based density estimator can be constructed based on the semiparametric model. In this dissertation we discuss the asymptotic theory and convergence properties for the semiparametric kernel density estimator. The estimator is shown to be not only consistent, but also more efficient than the general kernel density estimator. Several ways for selecting the bandwidth are also discussed. This opens the door to regression analysis with random covariates from a semiparametric perspective where information is combined from multiple multivariate sources. Accordingly, each multivariate distribution and a corresponding conditional expectation (or regression) of interest is then estimated from the combined data from all sources. Graphical and quantitative diagnostic tools are suggested to assess model validity. The method is applied to real and simulated data. Comparisons are made with multiple regression, generalized additive models (GAM) and nonparametric kernel regression. In the second part we study mortality rate prediction. The National Center for Health Statistics (NCHS) uses observed mortality data to publish race-gender specific life tables for individual states decennially. At ages over 85 years, the reliability of death rates based on these data is compromised to some extent by age misreporting. The eight-parameter Heligman-Pollard parametric model is then used to smooth the data and obtain estimates/extrapolation of mortality rates for advanced ages. In States with small sub-populations the observed mortality rates are often zero, particularly among young ages. The presence of zero death rates makes the fitting of the Heligman-Pollard model difficult and at times outright impossible. In addition, since death rates are reported on a log scale, zero mortality rates are problematic. To overcome observed zero death rates, appropriate probability models are used. Using these models, observed zero mortality rates are replaced by the corresponding expected values. This enables using logarithmic transformations, and the fitting of the Heligman-Pollard model to produce mortality estimates for ages 0 - 130 years.en_US
dc.identifier.urihttp://hdl.handle.net/1903/11890
dc.subject.pqcontrolledStatisticsen_US
dc.subject.pquncontrolledbandwidth selectionen_US
dc.subject.pquncontrolledheligman-pollarden_US
dc.subject.pquncontrolledkernelen_US
dc.subject.pquncontrolledmortality rate predictionen_US
dc.subject.pquncontrolledsemiparametric density ratio modelen_US
dc.titleSemiparametric Regression and Mortality Rate Predictionen_US
dc.typeDissertationen_US

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Voulgaraki_umd_0117E_12436.pdf
Size:
1.03 MB
Format:
Adobe Portable Document Format