## Selected Problems in Multi-Sample Statistical Inference

##### Abstract

In Chapter 1, a natural semiparametric model for case control study data is discussed, and the asymptotic properties of two simple methods of estimation are explored. The probability element of the model can be factored into a known positive function h involving the finite dimensional structural parameter, an infinite dimensional nuisance parameter in the form of the probability element dP of a distribution, and a normalizing constant. In the setup of interest, a sample of size n is available from a population with a distribution from the aforementioned model . A second, independent sample provides information only about the infinite-dimensional nuisance parameter P. The methods of estimation involve replacing the infinite-dimensional parameter with the empirical distribution function based on the second sample, and constructing semiparametric analogs of the maximum likelihood estimator and the method of moment estimator. The simplicity of these semiparametric estimators permits analysis of their asymptotic distribution even when n and m grow at different rates, yielding very natural and interpretable asymptotic results. In the case where n=o(m), the analog of the maximum likelihood estimator is asymptotically efficient.
Chapter 2 explores a related parametric asymptotic statistics problem. Suppose a sample of size m is available from a population with density f<sub>Y(y; lambda), and an independent sample of size n is available from a population with density f<sub>X(x;lambda,alpha). Here &lambda is regarded as a nuisance parameter and &alpha is the structural parameter, where &lambda and &alpha are scalars. One approach to estimation of &alpha would be to compute the maximum likelihood estimator based on both samples. A second approach would be to first find the maximum likelihood estimator of &lambda from the first sample, and to then treat it as the true parameter when using maximum likelihood estimation based on the second sample. Chapter 2 compares the asymptotic behavior of these two estimators under different assumptions about the rate of growth of m relative to n.
In chapter 3 we consider interval estimation for small area proportions based on data collected under stratified random sampling. We focus on the case where the stratum sample sizes and the true proportions are small for all strata, and for simplicity we assume equal stratum sample sizes. The objective is to construct a confidence interval for each of the true stratum proportions, P<sub>i. A commonly used small area empirical Bayes model for a single stratum's true proportion P<sub>i assumes that the distributions of the sampled stratum proportions and the prior distribution of the true stratum proportions are normal. The well-documented problems of the normal approximation to the binomial, particularly when the sample size is small and the probability of success is close to 0 or 1, raise questions about the adequacy of such a model when the P<sub>i and the stratum sample sizes are small. We argue that a more reasonable model in this setting is to assume that the sampled stratum counts have binomial distributions and that the prior distribution of the true stratum proportions follows a beta distribution. We propose a new empirical Bayes confidence interval based on this model, and examine related simulation results.

University of Maryland, College Park, MD 20742-7011 (301)314-1328.

Please send us your comments.