Efficient Implementation of an Optimal Interpolator for Large Spatial Data Sets
Abstract
Interpolating scattered data points is a problem of wide ranging
interest. A number of approaches for interpolation have been proposed
both from theoretical domains such as computational geometry and in
applications' fields such as geostatistics. Our motivation arises from
geological and mining applications. In many instances data can be
costly to compute and are available only at nonuniformly scattered
positions. Because of the high cost of collecting measurements, high
accuracy is required in the interpolants. One of the most popular
interpolation methods in this field is called ordinary kriging.
It is popular because it is a best linear unbiased estimator. The price
for its statistical optimality is that the estimator is computationally
very expensive. This is because the value of each interpolant is given
by the solution of a large dense linear system. In practice, kriging
problems have been solved approximately by restricting the domain to a
small local neighborhood of points that lie near the query point.
Determining the proper size for this neighborhood is a solved by
ad hoc methods, and it has been shown that this approach leads
to undesirable discontinuities in the interpolant.
Recently a more principled approach to approximating kriging has
been proposed based on a technique called covariance
tapering. This process achieves its efficiency by replacing the
large dense kriging system with a much sparser linear system. This
technique has been applied to a restriction of our problem, called
simple kriging, which is not unbiased for general data sets.
In this paper we generalize these results by showing how to apply
covariance tapering to the more general problem of ordinary kriging.
Through experimentation we demonstrate the space and time efficiency
and accuracy of approximating ordinary kriging through the use of
covariance tapering combined with iterative methods for solving
large sparse systems. We demonstrate our approach on large data
sizes arising both from synthetic sources and from real applications.