Data Fusion based on the Density Ratio Model

Thumbnail Image


Publication or External Link





A vast amount of the statistical literature deals with a single sample coming from a distribution where the problem is to make inferences about the distribution by estimation and testing procedures. Data fusion is a process of integrating multiple data sources in the hope of getting more accurate inference than that provided by a single data sources, the expectation being that fused data are more informative than the individual original inputs. This requires appropriate statistical methods which can provide inference by using multiple data sources as input. The Density Ratio Model is a model which allows semiparametric inference about probability distributions from fused data. In this dissertation, we will discuss three different types of problems based on the Density Ratio Model. We will discuss the situation where there is a system of sensors, each producing data according to some probability distribution. The parametric connection between the distributions allows various hypothesis tests including that of equidistribution, which are very helpful in detecting abnormalities in mechanical systems. Another example of a data fusion problem is the small area estimation where borrowing strength occurs by using all data from all areas where information is available. Real data can be fused with other real data, or even with artificial data. Thus, a given sample can be fused with computer-generated data giving rise to the concept of out of sample fusion(OSF). We will see that this approach is very helpful when estimating a small threshold exceedance probability when the sample size is not large enough and consisting of values below the threshold.