SIMULATING GEOLOGICAL UNCERTAINTY WITH IMPRECISE DATA FOR GROUNDWATER FLOW AND ADVECTIVE TRANSPORT MODELING

Sean A McKenna
Eileen P. Poeter
Dept. of Geology and Geological Engineering
Colorado School of Mines
Golden, Colorado, U.S.A.

ABSTRACT

A known cross section composed of a heterogeneous mixture of three hydrofacies is sampled along vertical lines representing ten wells. These wells provide conditioning data for stochastic geostatistical simulations. Steady-state flow and advective transport are modeled across each resulting realization to evaluate effective hydraulic conductivity, first arrival time, and duration of particle arrivals. Further ensembles of realizations are produced with both the original ten wells of conditioning data and additional soft data in the form of imprecise estimates of the hydrofacies at every location within the simulated domain. Use of soft data decreases the ensemble variances with respect to the first arrival times, bulk hydraulic conductivity of the domain, and duration of the particle arrivals relative to the ensemble variance resulting from only ten wells of hard conditioning data. The accuracy of the resulting distributions is influenced by biases in the conditioning data.

INTRODUCTION

For water supply problems, the use of relatively simple models of subsurface geology and bulk values of hydrologic parameters is sufficient. As the focus of the groundwater industry has shifted to contaminant transport problems, more complex models of the subsurface are necessary. Determination of the continuity of high hydraulic conductivity facies across a site is critical to predicting contaminant transport and designing effective remediation systems. A small but continuous geologic feature having only one order of magnitude higher hydraulic conductivity than the surrounding medium can have a dramatic effect on the behavior of a contaminant plume (Figure 1). If these continuous high hydraulic conductivity pathways go undetected, it is possible that contaminants will migrate offsite at unexpected locations and remediation efforts will be thwarted.

Figure 1. Computer simulated examples of the influence of aquifer heterogeneity with respect to hydraulic conductivity on contaminant transport. The contaminant is introduced for 5 yr at 1360 L/min. The contaminant is conservative, with a concentration of 1000 ppm. The hydraulic gradient is 4 x 10^-3, effective porosity is 0.20, longitudinal dispersivity is 30 m, transverse dispersivity is 3 m, and the plumes are pictured 35 yr after the injection began (after Poeter and Gaylord, 1990).

The need to define these features presents a new challenge in site characterization. It is not possible to drill enough holes to confidently state that such features do not exist at a site. Consequently, soft data and stochastic simulation can be used to improve the subsurface characterization.

Stochastic modeling of heterogeneities within aquifers has emerged as a means of producing distributions on contaminant traveltimes or other parameters of interest (Rautman and Treadway, 1991; Nichols and Freshley, 1993). Generally, when more subsurface data are available, the response of the transfer function exhibits a narrower distribution (Figure 2). Supplementary data are usually obtained by further drilling into the subsurface at the site.

Figure 2. Conceptualization of stochastic subsurface imaging coupled with hydrologic modeling. The dashed curve demonstrates the reduction in uncertainty of traveltimes across a site due to a reduction in subsurface uncertainty through the use of soft information.

Another source of subsurface information at many sites is soft data in the form of geophysical information, uncertain drillers' logs, and knowledge concerning the depositional environment of the subsurface. Imprecise data are defined as a type of soft data that give an uncertain estimate of the parameter under study at a location. The amount of imprecision must be quantified through a calibration procedure. This paper examines the utility of one type of soft data, imprecise information, in reducing the uncertainty associated with generating stochastic realizations of a subsurface domain. This reduction in subsurface uncertainty should result in reduced uncertainty associated with the information used to address flow and transport questions (Figure 2).

The deterministic image of the section used in this paper is a discrete interpretation of the geology of a Yorkshire cliff face described by Matheron et al. (1987). The cross sectional domain is 600 m long x 30 m high. Each cell is 2 m long and I m high, yielding a total of 9000 cells. Each cell is assigned a single lithology, shale, shaly-sand, or sandstone (Figure 3A), corresponding to the lithologies described in this fluvio-deltaic deposit (Ravenne et al., 1987). Each lithology represents an indicator class in the geostatistical simulations. Lithologies are converted to hydrofacies by assigning each lithology a hydraulic conductivity and porosity value based on appropriate values reported in the literature (Freeze and Cherry, 1979) for those lithologies (Table 1).

Figure 3. (A) The deterministic (known) cross section. (B) An imprecise reproduction of the deterministic image used as imprecise conditioning data. In both images, shale is black, shaly sand is gray, and sandstone is white.

Table 1. Hydraulic properties assigned to facies

Facies	Hydraulic Conductivity (m/yr)	Effective Porosity (fraction)
Shale	0.32	0.05
Shaly-Sandstone	3.2	0.10
Sandstone	32.0	0.15

Steady-state flow and adjective transport are modeled across the two-dimensional cross sectional domain using the U.S. Geological Survey flow model MODFLOW (MacDonald and Harbaugh, 1988) and the particle tracking postprocessor MODPATH (Pollock, 1989). The top and bottom of the domain are modeled as impermeable boundaries and the ends of the domain are fixed head boundaries imposing a gradient of 0.01 across the section. The particle tracking is accomplished by examining the paths and velocities of 30 particles across the domain. The flow modeling and particle tracking provide information to analyze the bulk hydraulic conductivity of the domain, the fastest flowpath across the domain (first arrival time), and the length of time during which the majority of the particles arrive. The goal of this study is to examine the utility of imprecise data in reducing the uncertainty associated with these measures of flow and transport parameters relative to the uncertainty associated with those same measures when realizations are produced with limited hard data. Use of a synthetic, known section allows us to assess accuracy as well as precision.

THEORY OF IMPRECISE DATA

The type of soft data employed herein is imprecise data as discussed by Alabert (1987). Because the geology of the actual cross section is known, it can be reproduced with a given amount of uncertainty. In this study, imprecise reproduction of the geologic facies is accomplished by adding noise to the actual section. The uncertain reproduction of the cross section (Figure 3B) is used as imprecise conditioning data in indicator geostatistical simulations. Indicator geostatistical simulation is well suited to simulating facies types because it does not require the variable being simulated to have a numerical value. In addition, indicator simulation preserves the occurrence of thin but continuous units that can be critical to groundwater transport of contaminants.

The generated imprecise data in this study are considered an analog for the field situation in which a geophysical technique yields an uncertain image of facies in the subsurface. The amount of uncertainty in the imprecise data is quantified through a calibration process that compares the actual facies type to the estimated facies type at locations where both types of data exist. Results of the calibration process are a pair of misclassification probabilities (p₁ and p₂) for each indicator threshold. For nonnumeric data, such as facies type, the indicator thresholds are abstract boundaries that divide the data into indicator classes (e.g., when mapping this section, the geologists used a mental threshold to divide sand from shaly-sand). By definition, p₁ is the probability that the imprecise estimate of the facies is less than or equal to the indicator threshold given that the actual facies is less than or equal to the indicator threshold at that location (equation 1). Similarly, p₂ is the probability that the imprecise estimate of facies type is less than or equal to the indicator threshold given that the actual facies is greater than the threshold (equation 1).

The variable (x) is the soft data estimate of the facies. (x) is the actual facies (hard data) at that location. The indicator threshold is denoted by z_c. Note that the probabilities do not necessarily add up to 1.0. A simple measure of the quality of the imprecise data for a given threshold is the quantity (p₁ - p₂). If this quantity equals 1.0, the data are considered to be hard data (i.e., the imprecision in the soft data is negligible). If (p₁ - p₂) equals 0, the imprecise data provide no information on the actual facies. This quantity is equivalent to the "B" parameter in the Markov-Bayes formulation of soft data (Deutsch and Journel, 1992).

Two types of information are essential in creating geostatistical simulations: (1) an estimate of the covariance function for each threshold in the indicator study and (2) conditioning data. Imprecise data can be used to improve estimates (over use of hard data alone) of the covariance functions modeled as variograms and to provide conditioning data for the simulations. In both uses, the uncertainty of the imprecise data, as quantified by p₁ and p₂ must be taken into account. Software that uses imprecise data in both covariance modeling and in simulation of the cross sectional domain has been developed for this project.

VARIOGRAPHY AND SIMULATION

Two data sets are used in this study. The first data set consists of ten wells of hard data with a well spacing of 60 m. The other data set includes the original ten wells of hard data and soft data in the form of imprecise measurements of the facies at all other points in the domain as shown in Figure 3B. The probabilities, p₁ and p₂, have been determined for the imprecise data and are shown in Table 2.

Table 2. Imprecision of soft data

Threshold	p₁	p₂	(p₁-p₂)
1	0.8500	0.1513	0.6987
2	0.9206	0.1702	0.7504

Horizontal variograms built with the ten wells of hard data show a strong nugget effect (Figure 4A). Nugget effect variograms are common in site investigations where the well spacing is larger than the length of correlation for the parameter of interest. For each threshold, the nugget value determined for the vertical variogram was used in fitting a model to the horizontal variogram. All variograms were fit with two nested spherical models.

Figure 4. Variograms calculated with (A) ten wells of hard data, (B) ten wells of hard data and exhaustive imprecise data, and (C) the exhaustive hard data set. All variograms were fit with two nested spherical models.

To build variograms with a combination of hard and imprecise data, three estimates of the true variogram gamma value are combined: an estimate calculated from the hard data at lags where hard data are present, an estimate from a hard-imprecise data crosscovariance, and an estimate calculated from the imprecise data covariance at intermediate lags (equation 2). The final estimate of spatial covariance derived from equation 2 is then converted to a variogram using the nonergodic relationship developed by Isaaks and Srivastava (1988).

The variables _h and _s, refer to the number of hard and soft conditioning data in the search neighborhood. The three components of the covariance equation are weighted by the 's, which sum to 1. The 's are determined to account for the relative numbers of hard and soft data. One simple method to calculate is

The hard-soft cross-covariances and the soft data estimates are scaled by the quantity (p₁ - p₂) to provide an estimate of the true hard data gamma value at intermediate lag spacings (Alabert, 1987).

Improvement of the variograms with spatially dense imprecise data can be substantial as demonstrated in Figure 4B. A field example of spatially dense imprecise data is an estimate of facies from a surface or cross-borehole geophysical survey.

The geostatistical simulation software developed for this project is a soft data extension of the program ISIM3-D (Gomez-Hernandez and Srivastava, 1990). Both use the sequential indicator simulation (SIS) algorithm. The heart of the SIS algorithm is the estimation of a posterior cumulative distribution function (pcdf). An estimate of the pcdf is calculated for every indicator threshold at each location being simulated. The pcdf estimate is based on the proximity and values of conditioning data located within a search neighborhood. In the modified algorithm, the pcdf value at each threshold level is a linear combination of the hard data points within the search neighborhood weighted by their respective kriging weights and the imprecise data points weighted by a spatial declustering weight and scaled by the p₁ and p₂ values (Alabert, 1987):

where ₁ and ₂, are weights that account for the relative quantities of the hard and imprecise data within the search neighborhood and are calculated as in equation 2. The i's are the indicator value (0 or 1) of each conditioning point at the current threshold. The kriging and declustering weights are represented by and , respectively. The kriging weights are derived from solving the kriging matrix, and the declustering weights are determined by the cell declustering technique Journel, 1983). Once the pcdf is constructed, a random number between 0 and 1 is drawn, and by examining the values of the pcdf at the indicator thresholds, the indicator class to which the random number belongs is determined. The point being simulated is assigned to that indicator class and is now considered a hard conditioning datum for the simulation of subsequent locations within the domain.

The simulator built for this project has several options that can be used to control the amount of hard and imprecise data found in the search neighborhoods and the type of kriging process (simple or ordinary) to be used. Imprecise data of the level illustrated in Table 1 are used to build 100 realizations with ordinary kriging (OK) and a maximum of 12 (4 hard and 8 imprecise) points in the search neighborhood. An example realization is shown in Figure 5. Realizations produced using simple kriging (SK) are not useful for this situation as discussed in the following section. One hundred realizations were also generated using only the hard data.

Figure 5. An example realization of the cross section produced with both hard and imprecise conditioning data. Compare this figure with Figure 3.

GROUNDWATER FLOW MODELING AND PARTICLE TRACKING

Steady-state flow and advective transport are simulated across each of the two ensembles of 100 realizations. Thirty particles are tracked across the domain and the results of this particle tracking give a distribution of first arrival time and duration of arrivals, which indicates the amount of spreading developed within the particle cloud. Duration time is defined as the traveltime difference between the 10th percentile arrival and the 90th percentile arrival (i.e., the 3rd and 27th particles). An effective bulk hydraulic conductivity of each realization is calculated from the volumetric discharge estimated by the flow model and knowledge of the boundary conditions. Arrival times and duration times are measured in years and bulk hydraulic conductivity is measured in meters/year (Figure 6).

Figure 6. Histograms showing results of flow and transport modeling. For comparison, the actual first arrival time is 805 yr, the actual duration time is 5440 yr, and the actual bulk hydraulic conductivity is 2.5 m/yr. Black dots denote actual values.

The resulting distributions of bulk hydraulic conductivity, first arrival time, and duration time are shown as histograms in Figure 6. The values determined by conducting flow and transport modeling in the deterministic section are provided for comparison. The reduction in variance of each resulting distribution due to the incorporation of imprecise data is provided in Table 3.

Table 3. Reduction in variance due to incorporation of imprecise data*

Parameters	Hard	Soft
	Coefficient of Variation


First Arrivals	0.23	0.10
Duration Time	0.19	0.10
Bulk Hydraulic Conductivity	0.11	0.05

^* Coefficient of variation = standard deviation/mean.

DISCUSSION

The overall effect of supplementing the hard data with imprecise data is to tighten the distributions of the selected measures of flow and advective transport (Figure 6). This reduction in variance is an increase in precision (Deutsch and Journel, 1993). All of the distributions are accurate; that is, all distributions include the true values of flow and transport measures in the actual (deterministic) section, except for the distribution of bulk hydraulic conductivity as calculated from the hard data ensemble (Figure 6C). Ideally, the resulting distributions from a field site will be both precise and accurate.

The actual bulk hydraulic conductivity does not occur within the distribution of bulk hydraulic conductivity produced using only the hard data. The hard data include a slightly higher percentage of sandstone than the actual section (17 vs. 14%) and a lower percentage of shale (58 vs. 61%). The resulting increased frequency of occurrence of high hydraulic conductivity facies in the realizations causes the overestimation of bulk hydraulic conductivity. It is possible that if more realizations are generated, the tail of the distribution may extend beyond the actual value.

Several attempts were made to employ simple kriging (SK) in the simulation process. SK forces a stationary mean value into the simulation process. Ordinary kriging (OK) does not force an a priori stationary value into the simulation, but rather corrects for local departures from the overall mean. For a number of experiments, SK was used for all points in the search neighborhood, both throughout the simulations and up to specified levels of simulation completion at which time the process was switched to OK. The simulation program is written to allow the original hard data to be searched separately from the imprecise data and previously simulated data. This option is available to facilitate experimenting with different combinations of obtaining an SK estimate of the pcdf from the original hard data and an OK estimate from the soft and previously simulated data. All attempts at incorporating SK failed and the resulting realizations were geologically unrealistic such that they were not used for flow and transport modeling.

The data set used in this study is somewhat artificial in that every point in the domain is a conditioning point. It may be realistic to have a two-dimensional section entirely composed of conditioning data from a geophysical survey; however, this would most likely be used in simulating a three-dimensional subsurface. SK could be more effectively employed in the early stages of a simulation with sparse conditioning data to impose an ergodic mean on the realizations. As the simulation process progresses, a switch to OK allows the mean to vary smoothly through the domain as the remaining locations are simulated. The abundance of conditioning data in this study negates the effective use of SK.

CONCLUSIONS

In this study, imprecise data are used to reduce uncertainty associated with groundwater flow and advective transport. Imprecise data of moderate uncertainty but extensive coverage improved the calculation of experimental variograms leading to more accurate covariance models. The improved covariance models coupled with the increase in conditioning data provided by the imprecise information reduces subsurface uncertainty in geostatistical simulations. This reduction in subsurface uncertainty leads to a reduction in uncertainty associated with flow and transport results. In a simulation study, the resulting flow and transport distributions should be both accurate and precise. The incorporation of soft data into the simulation process helps achieve this goal. The least desirable result would be an inaccurate but precise distribution. The incorporation of soft conditioning data safeguards against this result.

A thorough testing of the effects of the simulation process on the flow/transport results using a known data set is necessary before applying a simulator to a field problem. Even simulators that function properly on a test data set may not capture all the desired features of a site if the conditioning data are significantly biased. As this study has shown, it is possible to include the true first arrival time in the distribution of arrival times from an ensemble of realizations, but completely miss the true value of bulk effective hydraulic conductivity in that same ensemble. This is an apparent conflict because the traveltime across the fastest path is not directly correlated to the bulk hydraulic conductivity of the section; however, it is disappointing that the simulation ensemble did not include the true bulk hydraulic conductivity.

When limited or biased data are available, the user of the simulator must consider a broader range of possible outcomes through the conceptual development of the simulation model. This range of outcomes can be constrained by including soft information in the form of geophysical data, hydrologic measurements, conceptual models of depositional environment, and other information correlated to the parameters being simulated.

ACKNOWLEDGMENTS

This work is supported by U.S. Bureau of Reclamation cooperative agreement I-FC-8117500. This manuscript has benefitted from thoughtful reviews by Kadri Dagdelen and Mary C. Hill.

REFERENCES

Alabert, F. G., 1987, Stochastic imaging of spatial distributions using hard and soft data: M.S. Thesis, Stanford University, Stanford, California, 197 p.

Deutsch, C. V., and A. G. Journel, 1992, GSLIB: geostatistical software library and user's guide: New York, Oxford University Press, 340 p.

Deutsch, C. V., and A. G. Journel, 1993, Entropy and spatial disorder: Mathematical Geology, v. 25, n. 3, p. 329-355.

Freeze, R. A., and J. A. Cherry, 1979, Groundwater: Englewood Cliffs, New Jersey, Prentice-Hall, 604 p. Gomez-Hernandez, J. J., and R. M. Srivastava, 1990, ISIM3-D: an ANSI-C three-dimensional multiple indicator conditional simulation program: Computers in Geosciences, v. 16, n. 4, p. 395-414.

Isaaks, E., and R. M. Srivastava, 1988, Spatial continuity measures for probabilistic and deterministic geostatistics: Mathematical Geology, v. 20, n. 4, p. 313-341.

Journel, A. G., 1983, Nonparametric estimation of spatial distributions: Mathematical Geology, v. 14, n. 3, p. 445-468.

MacDonald, M. G., and A. W. Harbaugh, 1988, A modular three-dimensional finite difference ground-water flow model: U.S. Geological Survey Techniques of Water Resources Investigations, Book 6.

Matheron, G., H. Beucher, C. de Fouquet, A. Galli, D. Guerillot, and C. Ravenne, 1987, Conditional simulation of the geometry of fluvio-deltaic reservoirs: SPE 16753, SPE Annual Technical Conference and Exhibition, Proceedings, p. 591-599.

Nichols, W. E., and M. D. Freshley, 1993, Uncertainty analysis of unsaturated zone travel time at Yucca Mountain: Ground Water, v. 31, n. 2, p. 293-301.

Poeter, E. P., and D. R. Gaylord, 1990, Influence of aquifer heterogeneity on contaminant transport at the Hanford Site: Ground Water, v. 28, n. 6, p. 900-909.

Pollock, D. W., 1989, Documentation of computer programs to compute and display pathlines using results from the U.S. Geological Survey modular three-dimensional finite-difference ground-water flow model: U.S. Geological Survey OFR 89-381,188 p.

Rautman, C. A., and A. H. Treadway, 1991, Geologic uncertainty in a regulatory environment: an example from the potential Yucca Mountain nuclear waste repository site: Environmental Geology and Water Sciences, p. 171-184.

Ravenne, C., R. Eschard, A. Galli, Y. Mathieu, L. Montadert, and J-L. Rudkiewicz, 1987, Heterogeneities and geometry of sedimentary bodies in a fluvio-deltaic reservoir: SPE 16752, SPE Annual Technical Conference and Exhibition, Proceedings, p. 115-124.