K R I G I N G FOR E S T I M A T I N G CONTAMINANTS:

S P A T I A L P A T T E R N OF

POTENTIAL

AND PROBLEMS*

R I C H A R D O. G I L B E R T and JEANNE C. SIMPSON Statistics Section. Pacific Northwest Laboratory, Richland, Washington 99352, Operatedfor the U.S. Department of Energy by Battelle Memorial Institute

(Received 23 November, 1983)

Abstract.This paper discusses and illustrates the use of kriging techniques for estimating the spatial pattern of contaminants in environmental media, particularly soil. The assumptions underlyingkriging are reviewed as are some advantages and disadvantages of the method. Lognormal kriging(kriging applied to logarithmictransformed data) is illustrated using a set of radionuclide soil concentrations at a nuclear testing area on the Nevada Test Site. This example shows how lognormal kriging can be used to estimate average concentrations at points or for blocks of land, concentration contours over space, confidence bands about these contours, and radionuclide inventory in soil. The validity ofkriging estimates depends on the accurate estimation and modeling of the spatial correlation structure of the phenomenon. Accuracy is especially important when lognormal krigingis used and estimates of means and their standard deviations are required in the original, untransformed scale. This paper illustrates the bias that can result when a changing correlation structure over space is ignored.

1. Introduction Evaluating potential health hazards to man from pollutants in the environment involves estimating the spatial and/or temporal distribution of the pollutant, i.e., in trying to determine where it is, how much is present, and how it changes over time. The optimal estimation of spatial and temporal distributions is frequently a formidable task due to problems such as the occurrence o f local and global trends, pockets of high or low pollution levels, correlated data, non-homogeneity of variance over space and time, highly skewed data sets and insufficient data. In this paper we estimate the spatial distribution o f radionuclides in soil at a nuclear weapons test area on the N e v a d a Test Site using an interpolation procedure called simple kriging. All of the above problems are present in this example except that the data set was relatively large (712 data points). In this section we give a brief overview of kriglng and several other spatial estimation procedures. Some of the basic concepts and assumptions underlying kriging are reviewed in Section 2. Section 3 is a very brief summary o f the kriglng equations that are used to obtain the kriging weights and kriging variance. An example of kriglng at a nuclear testing area on the N e v a d a Test Site is given in Section 4. Discussion of some design and analysis problems is offered in Section 5. The basic theory and methods of kriging were developed primarily by Matheron and his coworkers at the Centre de Morphologic Mathematique in Fontainebleau, France during the 1960's and 1970's. Another early worker in this area was Mat6rn (1960). * Work supported by Nevada Applied Ecology Group, U.S. Department of Energy, Nevada Operations Office under Contract DE-AC06-76RLO 1830. Environmental Monitoring and Assessment 5 (1985) 113-135. 0167-6369/85.15. @ 1985 by D. Reidel Publishing Company.

114

R. O. GILBERT AND J. C. SIMPSON

Kriging is named after Dr D. G. Krige, a geologist specializing in mining reserve estimating. Kriging has been applied in many areas, including mining (David, 1977; Journel and Huijbregts, 1978; Rendu, 1978; Mousset-Jones, 1980; Henley, 1981; and Clark, 1979), oil exploration (Haas and Jousselin, 1976; Harbaugh et al., 1977), ground water studies (Delhomme, 1978, 1979; Sophocleous et al., 1982; Doctor, 1979), bathymetry (Chiles and Chauvet, 1974), gene frequency mapping (Piazza et al., 1981), acid rain deposition (Eynon and Switzer, 1982), air quality data (Grivet, 1980; Faith and Sheshinski, 1979) and radionuclide distribution over space (Barnes, 1980; Delfiner and Gilbert, 1978; White et al., 1980). AUdredge and Alldredge (1978), Pauncz (1978), and Bell and Reeves (1979) give bibliographic reviews. Kriging can be applied globally or locally. A global application would be to use all the data over the study site to estimate the average for the site. In this paper we use kriging as a local estimator, i.e., as a weighted moving average estimator of the average value of a spatial (regionalized) variable at points in space or over blocks (areas) or volumes. For example, the kriging estimator of the pollution level at a point Xo in geographical space is ~(Xo) = ~ 2;z(x,), ;=l

(1)

where z(x;) is the observed datum at the point x; within a local neighborhood about the point Xo, and 2; is the weight attached to that datum as obtained using simple kriging. If the assumptions underlying simple kriging (see Section 2) are fulfilled then the kriging estimator is a best linear unbiased estimator (B.L.U.E.). By using local estimation we avoid the need to assume the same mean and covariance function over the entire site. An important feature of kriging is that it provides a measure of the estimation error (kriging variance) associated with kriging estimates. Furthermore, if the kriging errors (i.e., the differences between the true values and the kriging estimates) are normally distributed, this kriging variance can be used to put confidence bands about concentration isopleths (lines of constant concentration) and confidence limits on average concentrations for individual blocks of land. The kriging variance can also be used to help select optimal locations in the field to take future measurements (Delfiner and Delhomme, 1975) or to decide on the optimal spacing between grid lines of a sampling grid network (McBratney and Webster, 1981). Kriging is also an exact interpolator, which means that if the point to be estimated coincides with the location where a sample was collected, the kriging estimate will equal the observed datum. Kriging requires that the underlying correlation structure of the data be estimated and modeled. This structure, embodied in the semi-variogram function, is used to obtain the weights 2; for use in Equation 1 (see Section 3). Hence, estimating and modeling the semi-variogram is a very important aspect of kriging and can be a difficult task, especially if data are sparse or data outliers are present. Krige and Magri (1982) study the effects of outliers and data transformations on semi-variogram estimates. Other methods for spatial interpolation (two-dimensional smoothing) include trend

K R I G I N G : POTENTIAL A N D PROBLEMS

1 15

surface analysis, weighted moving average methods that use subjectively determined weighting functions, and bicubic splines. Trend surface analysis can work well when the spatial surface is simple in form and the goal is to show broad features of the data. Ripley (1981, p. 35) suggests that its main use may be in removing or modeling these broad features so that another technique can be used. Delfiner and Delhomme (1975), Whitten (1975), and Davis (1973, p. 349) discuss precautions and problems with trend surface analysis. Weighted moving average methods are based on the intuitive notion that data values closest to the point or block where the average is desired should receive more weight than more distant points. A problem with these methods is determining the appropriate weighting function. Inverse distance or square inverse distance weighting might be used, but it is difficult in practice to know which function is best. Although kriging is a weighted moving average, its weights 2; are determined using the semi-variogram, which assures (assuming the correct semi-variogram) that the kriging estimator is B.L.U.E. Also, clustered data does not introduce a bias if kriging is used since kriging takes into account the spatial arrangement of the data. Another method of estimating the spatial distribution of contaminants is to use splines (Brodlie, 1980). Foley (1981a) describes an iterative two-dimensional interpolation procedure and computer program (DELTASH) that uses bicubic splines. The method is constructed to be efficient on large data sets, stable when large variations in concentrations are present, and to give smooth interpolations in areas where data are lacking. Foley (198 l b) compares DELTAS H with kriging for radionuclide soil data at a test area on the Nevada Test Site. Results were found to be similar for the two methods. On other data sets, DELTASH may perform better or worse than kriging, depending in part on whether the data are on a regular grid. DELTASH is easier to use than kriging but does not give an estimate of the error in the estimated point or area (block) averages.

2. Kriging Concepts and Assumptions Kriging methods are based on the theory of regionalized variables which is the conceptual foundation of geostatistics as developed by Matheron and his coworkers. A regionalized variable (ReV) is a variable distributed in space, i.e., a function that takes a value at every point x in space. A ReV has two characteristics (Journel and Huijbregts, 1978, p. 29): (i) a local, random component, and (ii) a structured aspect involving the notion of correlated random variables. These two aspects are incorporated into the concept of a random function (RF). At each point x~ in two-dimensional space (a geographical location) the observed datum is considered to be a realization of a certain random variable Z(xi) defined at point x i. This is the local random component mentioned above. The set of all random variables {Z(x); x ~ space domain} is a random function (RF). For a given pair of points x t and x i + h, the corresponding RVs Z(x~) and Z ( x t + h) are not, in general, independent. This

116

R. O. GILBERT AND J. C. SIMPSON

is the structured aspect of the ReV. The quantity h is the distance between the two points. The set of observations z(xi) at locations xi over the domain (study site) is considered a realization of the RF. Since only a single realization is usually available in practice, it is necessary to make some stationarity assumptions so that data at different locations can be viewed as multiple realizations of the same RV. This permits inferences to be made. The assumptions needed for simple kriging are given by the intrinsic hypothesis. A RF {Z(x); x ~ domain} is said to be intrinsic, i.e., to follow the intrinsic hypothesis if (1)

E [ Z ( x + h) - Z(x)] --- 0 for all points x ,

(2)

i.e., all random variables Z ( x ) in the set making up the RF have the same expected value, and (2) the variance of the difference between the levels at two points xi and x I + h in the study area depends only on the distance h, not on their geographical location or orientation. This may be written as var[Z(x

+ h) - Z ( x ) ] = 2~(h),

(3)

where 27(h) is termed the variogram and 7(h) the semi-variogram. If Equations (2) and (3) apply globally over the entire study site, then 7(h) is estimated using data from all portions of the site, and indeed, kriging could be done on a global basis. The intrinsic hypothesis makes no demand that an a priori variance or covariance exists. Hence, the intrinsic hypothesis is more generally applicable than second order stationarity, which requires that the covariance C(h) = E [ Z ( x + h)" Z(x)] - { E [ Z ( x ) I } z

exist and depend only on the distance h. If the RF is second order stationary then the covariance C(h) and the semi-variogram 7(h) are equivalent tools for characterizing the correlation between two variables Z ( x ) and Z ( x + h). In this case the correlation p(h) can be written as

p(h)= 1

70) c(o)

C(h) c(o)

where C(0) is Var [Z(x)], i.e., the covariance when h = 0 (Journel and Huijbregts, 1978, p. 33). The conventional estimator of 7(h) is

G(h)

=

1 ~hh

Nh ~ [z(x i + h) - z(x,)] 2

i=1

(4)

where Nh is the number of data pairs a distance h apart. A plot of G(h) versus h is called the experimental or raw semi-variogram. G(h) is very sensitive to unusually large or 'wild' data (outliers) since it is the mean of squared differences (divided by two). Cressie

KRIGING: POTENTIAL AND PROBLEMS

1 17

and Hawkins (I 980) and Hawkins and Cressie (1984) studied several estimators of 7(h) that are less affected by outliers than G(h). However, research that is specifically applicable to skewed environmental data sets is needed in this area. If Equations (2) and (3) do not apply globally, i.e., if the mean and variance of the differences z(xi + h) - z(xi) are stationary (homogeneous) only within subregions, then the semi-variogram estimated using the data from all portions of the site will be biased for the different subregions. In this case, simple kriging on a global basis is inappropriate and even local kriging will give biased estimates of mean concentrations. This latter problem is present in the example data set discussed here (Section 4). Our approach to the problem is to divide the study site into subregions and to estimate and model the raw semi-variogram separately for each subregion using only data from within each subregion. These separate semi-variograms we term regional semi-variograms. The semi-variogram for a given subregion is assumed to apply to all local neighborhoods within the subregion. Frequently, the raw semi-variogram computed for pairs orientated in one direction, say N-S, will be quite different from that computed between pairs in another direction. In that case the variable is said to be anisotropic, and a correction for this anisotropy must be made. This is illustrated in our example below. Once the raw semi-variogram has been obtained, the next step is to fit it to one of the standard semi-variogram models. Some of the commonly used models are shown in Figure 1. Figures la and lb are simple linear semi-variograms. Figures lb and lc show a discontinuity at the origin called the nugget effect. This is caused by variability in repeat measurements at the same location and/or variability between sampling points at distances less than that actually used or available. Figure ld shows a bounded semi-variogram. The upper bound or maximum value is called the sill. The distance at which the sill is reached is the range, or spatial zone of influence of the phenomenon. When there is no correlation for any distance h, the semi-variogram is a horizontal line, i.e., all nugget. White et al. (1980) fit a semi-variogram of the type in Figure lb to the logarithms of soil uranium concentrations. This type is also used for the example in Section 4 below. Delfiner and Gilbert (1978) used a spherical variogram as in Figure ld for in-situ gamma readings. The shape of the semi-variogranl and the size of the nugget effect is affected by the (geometrical) support of the measurement. The support is the volume, shape, and orientation of individual samples or measurements. If the support changes the semivariogram may also change. This concept is illustrated by Doctor and Gilbert (1978, p. 14) who plot raw semi-variograms of adjacent in-situ 241Am gamma measurements taken at both ground level and one-foot height. The raw semi-variogram for the one-foot height data is smoother and has a smaller nugget than that for the surface data. The raw semi-variogram may be fit to a semi-variogram model using least squares as was done for the data in Section 4 below. However, Journel and Huijbregts (1978, p. 232) warn against the blind use of least squares or other fitting routines. Considerable information about the spatial variable is frequently available and should not be ignored. Such information might include abrupt changes in concentration levels due to geologic

118

R. O. G I L B E R T A N D J. C. S I M P S O N

(a)

'(b) A

A

Ihl

Ih[

'(c)

'(d) "~ "-RANGE-'i

-] SILL

,_]

P

lhl

Ihl

Fig. 1. Commonsemi-variogramfunctions. (a) Linear: 7(h)= a. Ihl; (b) Linear with nugget: ?(h) = a' [hi + b, for ]hi > 0, (b is the 'nugget effect') =0 for /h[ =0; (c) Power functionwith nugget:7(h)=a-ihlk+b, for Ih l > 0 , (1 ill

0.046

9

9

[] []

0.038

0.030 400

800

1200

1600

2000

D I S T A N C E IN FEET Fig. 6.

Raw and modeled semi-variograms for Zone 3.

18

15 •

-- 1 . 4 0 + 0 . 0 3 5 5 h EAST-WEST

12 no O m a> o

uJ o0

I

/

o



1.40 + 0.0138h NORTH-SOUTH

i,

0 I

0 Fig. 7.

I

i 200

I

I 400

I

i 600

I 800

DISTANCE IN FEET, h Raw and modeled semi-variograms computed using data in all zones combined.

124

R. O. GILBERT AND J. C. SIMPSON

N

110 °

W

80o

o0'o ....... 2600 T

3000

S Fig. 8.

Direction zones within which data pairs were used to compute the E - W and N - S raw semivariograms for Zone 1, Zone 2 and Zones 1, 2, 3 combined. TABLE I Raw semi-variogram for the N - S direction class for Zone 1 data. Values of G(h) are plotted in Figure 4. Distance class (feet) < 25 2 5 - 49.9 50- 74.9 7 5 - 99.9 100-124.9 1 2 5 - 149.9 150-174.9 175-199.9 200-224.9 225-249.9 250-274.9 275-299.9 300-324.9 325-349.9 350-374.9 375-399.9 400-424.9 425 -449.9 450-474.9 475-499.9 500-524.9 525-549.9

Number of data pairs in the distance class Nh 3 92 15 193 71 154 178 232 133 241 149 238 273 206 224 249 198 231 237 278 206 238

Raw semi-variogram G(h) 1.46 2.82 3.07 3.55 4.27 4.68 5.12 4.45 5.21 5.34 4.57 7.03 6.73 7.54 6.18 10.25 8.06 10.10 9.10 12.23 11.01 12.74

KRIGING: POTENTIAL AND PROBLEMS

125

distance class. The results for the N - S direction in Zone 1 are tabulated in Table I and plotted in Figure 4. Straight lines were fit to the N - S , E - W directional semi-variograms using linear least squares under the constraint that the two lines have the same intercept with the y-axis (Draper and Smith, 1981, problem 2, p. 58, give the slope and intercept formula). This model (2 linear semi-variograms with a common intercept) allows for a simple adjustment for anisotropy (described by David, 1977, p. 138). The fitted linear semi-variogram for Zones 1, 2, and 3 combined (Figure 7) have slopes intermediate between semi-variograms for Zone 1 and Zone 2. Hence, if the semi-variograms in Figure 7 are used for local estimation throughout the site, the kriging variances will be biased in Zones 1 and 2. The magnitude of these effects on kriging means and variances is discussed below. 4. 2 .

E S T I M A T I N G SPATIAL D I S T R I B U T I O N A N D IN V EN TO R Y

Kriging was used to estimate average concentrations at points (grid nodes) on a square 50 • 50 ft grid as well as for block areas formed by the grid lines. The grid was laid over the site so that the grid nodes did not coincide with the grid sampling points. This was done because kriging is an exact interpolator and hence would return the observed data at those grid nodes that coincide with sampling locations. The computer code BLUEPACK-3D (Delfiner et al., 1980) was used to perform the lodging calculations. The model semi-variograms shown in Figures 4, 5, 6, and 7 on pCi/g data were used to estimate the values at grid nodes taking into account the anisotropy present in the underlying phenomenon. Block averages in units of gCi/block were estimated using semi-variograms computed on the t~Ci/core data. These semi-variograms are almost identical to those in Figures 4-7 and hence are not shown. These block totals were summed to estimate the inventory of Am for the entire site. Each kriging average was obtained using the eight nearest data points within a 100 • 250 ft (30.5 • 76.2 m) priority neighborhood centered on the point or on the 50 x 50 ft (15.2 x 15.2 m) block center (Figure 9). If eight data points were not present within this neighborhood, the additional data were found using an octant search. If more than eight were present, only the eight nearest were used. The weight assigned to each data point was a function of that point's distance and direction from x o as determined using the modeled variogram and solving the kriging equations (Equation (5)). In this example the shape of the priority neighborhood was chosen to be rectangular with the short axis in the east-west direction zone. This was done because the data in that direction zone are more variable than in the north-south direction zone. However, since anisotropy was accounted for by using directional semi-variograms, a square priority region may have worked equally well in this example. In Zone 3 the kriging average at each point and block was computed using the 8 nearest points (octant search) with each datum getting the same weight 2 = 1/8. The weights are equal since the semi-variogram for Zone 3 is pure nugget. Accordingly, the kriging variance for each such estimated mean in Zone 3 is s~/8 where s z was computed as the average of G(h) values, some of which are plotted in Figure 6. Averaging G(h)

126

R.O. GILBERTANDJ. C. SIMPSON

\

I I I I I \

/

I I

\

/

", I//I / /

/

/ 7

PRIORITY NEIGHBORHOOD'

t

/

I I I I I I

\\

/

I ESTIMATEBD E

\ \ "

~ 50FEET

I

Fig. 9. Priority neighborhood(100 • 250 if) within which the eight data points nearest to xo are used to estimate the average at xo using Equation(1).

values is appropriate since when the semi-variogram is pure nugget (no correlation) and no drift is present, G(h)is an estimate of the variance of the phenomenon, 241Am in this case. The isopleths of median Am concentrations obtained using kriging are given in Figure 10. These were obtained by drawing a line through the grid of estimated values 2~ = exp(2~r) for several selected concentration values (10, 100, and 1000 pCi/g in Figure 10) where 2~r is the kriging average in logarithmic scale at a grid point. Results within Zone 1 were obtained by kriging the entire site (using all 712 data points) using the semi-variogram for Zone 1, then discarding all estimates for points outside Zone 1. Results for Zone 2 were obtained in the same manner using the semi-variogram for Zone 2. Estimates along the boundary between Zone 1 and Zone 2 were obtained by visually interpolating the estimates for Zones 1 and 2 in that region. However, almost no interpolation was required because the results from the two zones matched very well at the boundary. Figure 11 gives isopleths of median concentrations obtained using the semi-variogram for Zones 1, 2, 3 combined (Figure 7). In general the isopleths in Figures 10 and 11 agree

:ig. I0. Estimated concentration contours of average (median) 241Am pCi/g oncentrations at NS-201 obtained using simple kriging and semi-variograms for Zone 1 and Zone 2 (Figures 4 and 5).

:ig. 11. Estimated concentration contours of average (median) 241Am pCi/g oncentrations at NS-201 obtained using simple kriging and the semi-variogram (Figure 7) obtained using data in Zones 1, 2, and 3 combined.

128

R. O. GILBERT AND J. C. SIMPSON

quite well. Hence, for these data, the use of regional semi-variograms did not have an important effect on the estimated median concentration contours, even though the combined semi-variogram is not a good model for the correlation structure in Zone 2. However, in Zone 1 the block means obtained using regional semi-variograms (Figure 12) are larger than those obtained using the combined semi-variogram (Figure 13). In Zones 2 and 3 the opposite occurs, i.e., smaller block means are obtained using regional semi-variograms. In Figure 13 we also note the large number of blocks in Zone 3 that have average concentrations estimated to be between 50 and 500 ~tCi/block. These block estimates are spurious, being caused by using the wrong (combined) semi-variogram. The change in block means that occurs when the wrong semi-variogram model is used can be better understood by examining the equation used to estimate 2, the block mean in original scale. This equation is 2" = exp(Zr) exp

2;~(x;, V) ki= 1

a2 + ~(V, V) 2

,

(8)

where 2 r and a 2 are the kriging block mean and the kriging variance (of that mean), both of which are computed on the logarithms of the data. The second exp function in Equation (8) is the lognormal correction factor (LCF) that was derived by Rendu (1979, Equation (59)). The estimated median concentration (exp(Zr)) is multiplied by this factor in order to estimate the mean for the block. For Equation (8) we see that if the wrong 7(h) model is used, this will change the value of the multiplicative LCF and hence the estimated mean value. The LCF is not used when estimating the median exp(Zr) in the original scale. Hence, the medians are less affected by misspecification of 7(h). We note that the LCF given above reduces to exp [s2(n - 1)/2n] for the case of uncorrelated data, where s 2 is the Variance of the n data. This simple LCF, or the simpler exp (sa/2), are approximations to the optimum LCF (Equation 7.86, p.235 in Agterberg, 1974) that is appropriate for uncorrelated data. We surmise that the LCF given in Equation (8) is also an approximation to the optimal LCF for correlated data. See Journel (1980) and Dowd (1982) for what is presently known concerning the use of lognormal correction factors. Figure 14 gives the estimated 10 pCi/g isopleth and associated 68~ confidence band when regional semi-variograms for Zones 1 and 2 are used. The band was obtained by first computing lower and upper 68~o confidence limits on the median concentration at each grid node assuming the logarithmic-transformed data are normally distributed, i.e., by computing exp (Z r + ~T). The quantity ~T is the estimated kriging standard deviation of the kriging average 27- at a grid node (both computed on logarithms). Then the 10 pCi/g line was drawn through the grid of upper limits over the site, then through the grid of lower limits. Both lines on a single plot define the band. The isopleth and band within Zone 1 were obtained using the semi-variograms for Zone 1, and those within Zone 2 using the semi-variogram for Zone 2. Figure 15 gives the 10 pCi/g isopleth and band obtained using the semi-variograms for Zones 1, 2, and 3 combined (Figure 7).

Fig. 12, Kriging estimates of 24~Am mean laCi/block concentrations for 50' by 50' blocks at NS-20I using regional semi-variograms for Zone 1 and Zone 2 (Figures 4 and 5).

~

.

N

o

0

0

0

O

0

0

Fig, 13, Kriging estimates of24~Am mean laCi/block concentrations for 50' by 50' blocks using the semi-variogram (Figure 7) for Zones I, 2, and 3 combined,

e

B ~

Q

3 3

O

;Z ~7

Z

rig. I4. Kriging estimates of the median 241Am 10 pCi/g contour and assoiated 68~o confidence band obtained using semi-variograms for Zone 1 and Zone 2 (Figures 4 and 5).

e/~

'

I

. .

.

'

L-i

:.,

9

,

,

':-

~

~"~ " "

~?:v

'..-'Y

i

N

._": ""

;.

""

.~.

~,%

< lOpCi/g

I

I

I

I

,GROUND

BAND

~, .FEET '~' "200

1

i

i

ZERO

i[] > I OpCi/g [__ 68% VT/JCONFIDENCI

[]

7ig. 15. Kriging estimates of the me, lian 241Am 10 pCi/g contour and asso:iated 68 ~o confidence band obtainer using the semi-variogram for Zones 1, 2, and 3 combined (Figure 7).

I

1

~

,~:::~::

I

,:.':.'.'.~

..... -: ::-::-:,__._. ~I'" ;:

I

:::

I

I

,

zONE

,

I

;

..;::._;

":-::;

2 ~

:

,,!

9 ,;:

-'::"

' . -. - . . . . . .

9

';'

ZONE

I

1

:: ::

9"

. . . .

::-: :

"L'."

.-:.:

....... ""

,i

-

I/I~

.~_;.::

~

.......

=

//

ut,,,~.. ,//# "T..'

--.

.,,

!

i

/

-.-:.

....

9

i

i

I ~ ~ UND ZERO

I

Fig. 16. Kriging estimates of the median 100 pCi/g contour and associated 68~o confidence band obtained using semi-variograms for Zone 1 and Zone 2 (Figures 4 and 5).

t

I

)

ZONE 1

'

Fig. 17. Kriging estimates of the median 100 pCi/g contour and associated 68}; confidence band obtained using the semi-variogram for Zones 1, 2, and 3 combined (Figure 7).

U~

m

@

?,

>

132

R. O. GILBERT AND J. C, SIMPSON

In Zone 2 the width of the band is slightly narrower in Figure 14 than in Figure 15, whereas in Zone 1 the bands are almost identical. More obvious differences in Zone 2 are observed by comparing Figures 16 and 17 which show the estimated 100 pCi/g isopleth and its confidence band obtained using regional and combined semi-variograms, respectively. Since the combined semi-variogram (Figure 7) is known to be biased for Zone 2, we believe that Figure 16 is the more accurate representation. For this example the combined semi-variogram reflects primarily the large variability in the data near GZ. As a result, the confidence bands in outlying areas of lower variability are too wide. The inventory of 241Am in the top 5 cm of soil within each zone and over all three zones were obtained by simply summing the block means (~tCi/block). In Table II we TABLE II Estimates of inventory(mCi) of 241Amobtained using regional semi-variograms for Zones 1 and 2, and the semi-variogramfor Zones 1, 2, and 3 combined Zone

No. of blocks

Estimate

Regional semi-variograms

1 2 3 Total

426 1261 3017 4704

3088.4 1051.8 12.6 4152.8

( 74.4~o) ( 25.3~o) .( 0.3 ~o) (100.0~o)

2379.4 2579.8 124.6 5083.8

( 46.8~o) (50.7~/o) _(2.5~) ( 100.0~o)

One semi-variogram

1 2 3 Total

426 ~ 1261 3017 4704

compare inventory estimates obtained using regional and combined semi-variograms. Column 2 shows that the total inventory obtained using the combined variogram (Figure 7) is about 22)'0 larger than that obtained using the regional variograms. Also, the proportion of the total estimated inventory in Zone 1 (i.e., near GZ) increases from 47 % for the combined semi-variogram to 74 % when regional semi-variograms are used. If the inventory estimates based on the combined semi-vadogram are used to estimate cleanup costs, those costs might be overstated since a greater than necessary effort might be made to clean up the outer zones. 5. Discussions The 241Am soil data examined here illustrates the importance of not ignoring changes in the semi-variogram model over space. The effects of using a combined semi-variogram

KRIGING: POTENTIAL AND PROBLEMS

133

as opposed to the less biased regional semi-variograms have been studied for this data set. However, there are perhaps more fundamental concerns associated with the use of kriging on environmental radionuclides and other pollutants. The kriging variances are obtained from the semi-variogram and the distances and orientations of the data values. The data values do not enter directly into computing the kriging variances except through the estimation and modeling of the semi-variogram. Hence, methods for estimating and modeling the semi-variogram become extremely important. We have already mentioned (Section 2) that the raw semi-variogram computed using Equation 4 is very sensitive to outliers and that methods for obtaining robust estimates of ?(h) are needed. The papers by Krige and Magri (1982) and Hawkins and Cressie (1984) are of interest in this regard. Another problem is trying to decide whether the variable under study has a stationary mean within the neighborhood from which data are used to estimate the local mean. This is in part a subjective evaluation based on the magnitude of changes within the local neighborhood relative to those over the entire region being studied. The computer code BLUEPACK-3D has an automatic structure identification procedure that evaluates whether a linear, quadratic or no drift in the mean is present. This is helpful, but again, blind obedience to the indicated result is not always the best approach. Even when a drift is present, it may not greatly affect local estimation, i.e., the semi-variogram may not be seriously affected by it over the distances for which data will be drawn to estimate local means. Others matters of interest are the selection of the sampling pattern (random, grid, etc.), the density of sample points, and the size of blocks. Design considerations can be approached using the fact that the kriging variance depends only on the semivariogram and the locations where data are obtained, i.e., it does not depend on the data values themselves. Hence, if one specifies a semi-variogram model, the kriging variance can be computed for various field sample designs and block sizes. McBratney and Webster (1981) provide a Fortran computer code for this purpose. Mat6rn (1960) and McBratney et at. (1981) show that the equilateral triangular grid is the best design for estimating means within a homogeneous area, i.e., in an area with no anisotropies. The square grid is only slightly less efficient and may be preferable due to simpler field procedures. Simple types of anisotropies can be handled by using rectangular grids (see McBratney and Webster, 1981 for details). There is a need to develop kriging methods for space-time phenomenon such as air pollutants. Faith and Sheshinski (1979) found it very difficult to find a reasonably smooth function for the space-time correlation function for daily mean oxidant data from 1971 to 1973 at 16 stations in the San Francisco Bay area. They note that the state of the art at that time (1979) was very crude. More experience and study is needed for these complex situations. It should perhaps be said, that while there is still much to learn about kriging environmental data, the method has considerable potential for wide use. If the following three conditions are met, the method is worthy of serious consideration: (1)It is important to obtain optimal (B.L.U.E.) estimates of areal averages, (2)estimation

134

R. O. GILBERTAND J. C. SIMPSON

variances of those averages are required (perhaps to conduct statistical tests or construct confidence bands), and (3) enough data at appropriate spacings can be collected to allow for estimating the spatial correlation structure (the semi-variogram).

Acknowledgement

This work was funded by the Nevada Applied Ecology Group, U.S. Department of Energy, Nevada Operations Office. References Agterberg, F. P.: 1974, Geomathematics, Elsevier Scientific Publishing Company, New York. Alldredge, J. R. and Alldredge, N. G.: 1978, 'Geostatistics: A Bibliography', International Statistical Review 46, 77-88. Armstrong, M.: 1984, 'Problems with Universal Kriging', Mathematical Geology 16, 101-108. Barnes, M. G.: 1978, Statistical Design and Analysis in the Cleanup of Environmental Radionuclide Contamination, Desert Research Institute, University of Nevada System. NVO-1253-12. Barnes, M. G.: 1980, 'The Use of Kriging for Estimating the Spatial Distribution of Radionuclides and other Spatial Phenomena', TRAN-STAT: Statistics for Environmental Studies, No. 13. Pacific Northwest Laboratory, Richland, WA. PNL-SA-9051. Bell, G. D. and Reeves, M.: 1979, 'Kriging and Geostatistics: A Review of the Literature Available in English', Proc. Austratas. Inst. Min. Metall., No. 269, pp. 17-27. Brodlie, K. W.: 1980, Mathematical Methods in Computer Graphics and Design, Academic Press, New York. Chiles, J. P. and Chauvet, P.: 1974, 'Kriging: A Method for Cartography of the Sea Floor', International Hydrographic Review 52, 25-41. *Clark, I.: 1979, Practical Geostatistics, Applied Science Publishers, London. Cressie, N. and Hawkins, D. M.: 1980, 'Robust Estimation of the Variogram: I', Mathematical Geology 12, 115-125. Dalenius, T., Hajek, J., and Zubrzycki, S.: 1961, 'On Plane Sampling and Related Geometrical Problems', Fourth Berkeley Symposium on Mathematical Statistics and Probability, Vol. 1, pp. 125-150. **David, M.: 1977, Geostatistical Ore Reserve Estimation, Elsevier Scientific Publishing Company, New York. Davis, J. C.: 1973, Statistics and Data Analysis in Geology, John Wiley and Sons, New York. Delfiner, P. and Delhomme, J. P.: 1975, 'Optimal Interpolation by Kriging', in J.C. Davis and M.J. McCullagh (eds.), Display and Analysis of Spatial Data, John Wiley and Sons, New York, pp. 96-114. Delfiner, P. and Gilbert, R. O.: 1978, 'Combining Two Types of Survey Data for Estimating Geographical Distribution of Plutonium in Area 13', in M. G. White and P. B. Dunaway (eds.), Selected Environmental Plutonium Research Reports of the NAEG, U.S. Department of Energy, Nevada Operations Office, Las Vegas, NV. NVO-192. pp. 361-404. Delfiner, P., Delhomme, J. P., Chiles, J. P., Renard, D., and Irigoin, F.: 1980, BLUEPACK-3D, Ecole Nationale Superieure des mines de Paris, Centre de geostatistique et de morphologle mathematique, Fontainebleau, France. Delhomme, J. P.: 1978, 'Kriging in the Hydrosciences', Advances in Water Resources 1,251-266. Delhomme, J. P.: 1979, 'Spatial Variability and Uncertainty in Groundwater Flow Parameters: A Geostatistical Approach', Water Resources Research 15, 269-280. Doctor, P. G.: 1979, An Evaluation of Kriging Techniques for High Level Radioactive Waste Repository Site Characterization, Pacific Northwest Laboratory, Richland, WA. PNL-2903. Doctor, P. G. and Gilbert, R. O.: 1978, 'Two Studies in Variability for Soil Concentrations: With Aliquot Size and With Distance', in M. G. White and P.B. Dunaway (eds.), Selected Environmental Plutonium Research Reports of the NAEG, U.S. Department of Energy, Nevada Operations Office, Las Vegas. NVO-192. pp. 405-449. ***Dowd, P. A.: 1982, 'Lognormal Kriging - The General Case', Mathematical Geology 14, 475-499. Draper, N. and Smith, H. S.: 1981, Applied Regression Analysis, 2nd edition. John Wiley and Sons, New York.

KRIGING: POTENTIAL AND PROBLEMS

135

Eynon, B. and Switzer, P.: 1982, 'The Variability of Rainfall Acidity', SIMS Technical Report No. 58. Department of Statistics, Stanford University, Stanford, CA. Faith, R. and Sheshinski, R.: 1979, 'Misspecification of Trend in Spatial Random-Function Interpolation with Application to Oxidant Mapping', SIMS TechnicalReport No. 28, Department of Statistics, Stanford University, Stanford, CA. Foley, T. A., Jr.: 1981a, Off-Site RadiologicalExposure Review Project:A Computer-Aided Surface Interpolation and Graphical Display, Desert Research Institute, University of Nevada System, Las Vegas. DOE/DP/01253-45021. Foley, T. A., Jr.: 1981b, NTS RadiologicalAssessment Project: Comparison of Delta Surface Interpolation with Krigingfor the Frenchman Lake Region of Area 5, Desert Research Institute, University of Nevada System, Las Vegas. DOE/DP/01253-45022. Grivet, C. D.: 1980, 'Modeling and Analysis of Air Quality Data', SIMS TechnicalReportNo. 43, Department of Statistics, Stanford University, Stanford, CA. Haas, A. and Jousselin, C.: 1976, 'Geostatistics in Petroleum Industry', Advanced Geostatistics in the Mining Industry, D. Reidel Publ. Co., Holland, Dordrecht, pp. 49-68. Harbaugh, J. W., Doveton, J. H., and Davis, J. C.: 1977, ProbabilityMethods in OilExploration, John Wiley and Sons, New York. Hawkins, D. M. and Cressie, N.: 1984, 'Robust Kriging - A Proposal', Mathematical Geology 16, 3-18. Henley, S.: 1981, Nonparametric Geostatistics, Halsted Press, John Wiley and Sons, New York. § § + Journel, A. G.: 1980, 'The Lognormal Approach to Predicting Local Distributions of Selective Mining Unit Grades', Mathematical Geology 12, 285-303. ***Journel, A. G. and Huijbregts, CH. J,: 1978, Mining Geostatistics, Academic Press, New York. Krige, D. G. and Magri, E. J.: 1982, 'Studies of the Effects of Outliers and Data Transformation on Variogram Estimates for a Base Metal and a Gold Ore Body', Mathematical Geology 14, 557-564. Mat6rn, B.: 1960, Spatial Variation, Meddelander fran Statens Skogforskningsinstitut, Vol. 49, No. 5, 144 pp. + McBratney, A. B. and Webster, R.: 1981, 'The Design of Optimal Sampling Schemes for Local Estimation and Mapping of Regionalized Variables. Part II. Program and Examples', Computers and Geosciences 7, 335-365. + McBratney, A. B., Webster, R., and Burgess, T. M.: 1981, 'The Design of Optimal Sampling Schemes for Local Estimation and Mapping of Regionalized Variables. Part I. Theory and Method', Computers and Geosciences 7, 331-334. § Mousset-Jones, P. F.: 1980, Geostatistics, McGraw-Hill, New York. Pauncz, I.: 1978, 'English Language Publications on the French School of Geostatistics', Mathematical Geology 10, 253-260. Piazza, A., Menozzi, P., and Cavalli-Sforza, L.: 1981, 'The Making and Testing of Geographic GeneFrequency Maps', Biometrics 37, 635-659. § + Rendu, J. M.: 1978,Anlntroduction to GeostatisticalMethodsofMineralEvaluation, South African Institute of Mining and Metallurgy, Johannesburg. § § § Rendu, J. M.: 1979, 'Normal and Lognormal Estimation', Mathematical Geology, Vol. II, pp. 407-422. Ripley, B. D.: 1981, Spatial Statistics, John Wiley and Sons, New York. Sophocteous, M., Paschetto, J. E., and Olea, R. A.: 1982, 'Ground-water Network Design for Northwest Kansas, Using the Theory of Regionalized Variables', Ground Water 20, 48-58. White, G. C., Simpson, J. C., and Bostick, K. V.: 1980, Studies of Long-Term EcologicalEffects of Exposure to Uranium V, Los Alarnos National Laboratory, Los Alamos, NM. LA-8221. Whitten, E. H. T.: 1975, 'The Practical Use of Trend-Surface Analyses in the Geological Sciences', in J. C. Davis and M. J. McCullagh (eds.), Display and Analysis of Spatial Data, John Wiley and Sons, New York, pp. 282-297. * ** *** + §+ +§ +

Well written, introductory discussion. Well written, intermediate discussion. Most complete discussion available on kriging, intermediate to difficult level. Computer program listed. Intermediate to difficult level discussion. Gives correction factors when transforming logarithmic results back to original scale if data are lognormal.

Kriging for estimating spatial pattern of contaminants: Potential and problems.

This paper discusses and illustrates the use of kriging techniques for estimating the spatial pattern of contaminants in environmental media, particul...
1MB Sizes 0 Downloads 0 Views