Latent Class Logistic Regression with Complex Sample Survey Data
Blahut, Steven Albert
Dayton, C. Mitchell
MetadataShow full item record
Latent class regression has been reported previously in the literature. Often, however, data are collected from a survey that utilizes unequal selection probabilities that result in complex sample survey data. Techniques for latent class logistic regression utilizing complex survey data have not previously been reported. Additionally, no software is available to perform these analyses. A model was chosen for investigation based on an existing survey called the Indiana Youth Tobacco Survey. A variety of scenarios were investigated using systematically manipulated conditions to simulate complex sample survey data. Specifically, the effect of ignoring sample weights was investigated by comparing bias in parameter estimates from simulations both incorporating and ignoring weights. Additionally, several competing approaches for estimating standard errors were compared in terms of bias and confidence interval coverage. The techniques that were investigated were the unadjusted approach assuming simple random sampling, the jackknife, the bootstrap, and the design effect adjustment. Two design effects were compared, one based on jackknife estimates and one based on bootstrap estimates. The results indicated that weights must be incorporated in the estimation via pseudo-maximum likelihood to ensure that parameter estimates are not biased. These estimates were less biased than jackknife, bootstrap, and unweighted parameter estimates. In terms of variance estimation, the bootstrap estimates were preferred. Estimates arising from the assumption of simple random sampling were consistently small and therefore undesirable. Jackknife and design effect adjusted standard errors were better, but bootstrap standard errors were consistently best. Finally, the best technique was applied to the Indiana Youth Tobacco Survey data to identify latent classes that differed in their susceptibility to initiate tobacco use and abuse. The results indicated that a two class model was a better fit to the data than a one class model. These classes differed in their susceptibility to peer pressure. Latent class one comprised 82% of the population and was more susceptible to peer pressure than was latent class two. Both classes were more at risk of initiating tobacco use as they aged.