Sean A McKenna
Eileen P. Poeter
Dept. of Geology and Geological Engineering
Colorado School of Mines
Golden, Colorado, U.S.A.
The need to define these features presents a new challenge in site characterization. It is not possible to drill enough holes to confidently state that such features do not exist at a site. Consequently, soft data and stochastic simulation can be used to improve the subsurface characterization.
Stochastic modeling of heterogeneities within aquifers has emerged as a means of producing distributions on contaminant traveltimes or other parameters of interest (Rautman and Treadway, 1991; Nichols and Freshley, 1993). Generally, when more subsurface data are available, the response of the transfer function exhibits a narrower distribution (Figure 2). Supplementary data are usually obtained by further drilling into the subsurface at the site.
Another source of subsurface information at many sites is soft data in the form of geophysical information, uncertain drillers' logs, and knowledge concerning the depositional environment of the subsurface. Imprecise data are defined as a type of soft data that give an uncertain estimate of the parameter under study at a location. The amount of imprecision must be quantified through a calibration procedure. This paper examines the utility of one type of soft data, imprecise information, in reducing the uncertainty associated with generating stochastic realizations of a subsurface domain. This reduction in subsurface uncertainty should result in reduced uncertainty associated with the information used to address flow and transport questions (Figure 2).
The deterministic image of the section used in this paper is a discrete interpretation of the geology of a Yorkshire cliff face described by Matheron et al. (1987). The cross sectional domain is 600 m long x 30 m high. Each cell is 2 m long and I m high, yielding a total of 9000 cells. Each cell is assigned a single lithology, shale, shaly-sand, or sandstone (Figure 3A), corresponding to the lithologies described in this fluvio-deltaic deposit (Ravenne et al., 1987). Each lithology represents an indicator class in the geostatistical simulations. Lithologies are converted to hydrofacies by assigning each lithology a hydraulic conductivity and porosity value based on appropriate values reported in the literature (Freeze and Cherry, 1979) for those lithologies (Table 1).
Facies | Hydraulic Conductivity (m/yr) | Effective Porosity (fraction) |
---|---|---|
Shale | ||
Shaly-Sandstone | ||
Sandstone |
Steady-state flow and adjective transport are modeled across the two-dimensional cross sectional domain using the U.S. Geological Survey flow model MODFLOW (MacDonald and Harbaugh, 1988) and the particle tracking postprocessor MODPATH (Pollock, 1989). The top and bottom of the domain are modeled as impermeable boundaries and the ends of the domain are fixed head boundaries imposing a gradient of 0.01 across the section. The particle tracking is accomplished by examining the paths and velocities of 30 particles across the domain. The flow modeling and particle tracking provide information to analyze the bulk hydraulic conductivity of the domain, the fastest flowpath across the domain (first arrival time), and the length of time during which the majority of the particles arrive. The goal of this study is to examine the utility of imprecise data in reducing the uncertainty associated with these measures of flow and transport parameters relative to the uncertainty associated with those same measures when realizations are produced with limited hard data. Use of a synthetic, known section allows us to assess accuracy as well as precision.
The generated imprecise data in this study are considered an analog for the field situation in which a geophysical technique yields an uncertain image of facies in the subsurface. The amount of uncertainty in the imprecise data is quantified through a calibration process that compares the actual facies type to the estimated facies type at locations where both types of data exist. Results of the calibration process are a pair of misclassification probabilities (p1 and p2) for each indicator threshold. For nonnumeric data, such as facies type, the indicator thresholds are abstract boundaries that divide the data into indicator classes (e.g., when mapping this section, the geologists used a mental threshold to divide sand from shaly-sand). By definition, p1 is the probability that the imprecise estimate of the facies is less than or equal to the indicator threshold given that the actual facies is less than or equal to the indicator threshold at that location (equation 1). Similarly, p2 is the probability that the imprecise estimate of facies type is less than or equal to the indicator threshold given that the actual facies is greater than the threshold (equation 1).
The variable (x) is the soft data estimate of the facies. (x) is the actual facies (hard data) at that location. The indicator threshold is denoted by zc. Note that the probabilities do not necessarily add up to 1.0. A simple measure of the quality of the imprecise data for a given threshold is the quantity (p1 - p2). If this quantity equals 1.0, the data are considered to be hard data (i.e., the imprecision in the soft data is negligible). If (p1 - p2) equals 0, the imprecise data provide no information on the actual facies. This quantity is equivalent to the "B" parameter in the Markov-Bayes formulation of soft data (Deutsch and Journel, 1992).
Two types of information are essential in creating geostatistical simulations: (1) an estimate of the covariance function for each threshold in the indicator study and (2) conditioning data. Imprecise data can be used to improve estimates (over use of hard data alone) of the covariance functions modeled as variograms and to provide conditioning data for the simulations. In both uses, the uncertainty of the imprecise data, as quantified by p1 and p2 must be taken into account. Software that uses imprecise data in both covariance modeling and in simulation of the cross sectional domain has been developed for this project.
Threshold | |||
---|---|---|---|
Horizontal variograms built with the ten wells of hard data show a strong nugget effect (Figure 4A). Nugget effect variograms are common in site investigations where the well spacing is larger than the length of correlation for the parameter of interest. For each threshold, the nugget value determined for the vertical variogram was used in fitting a model to the horizontal variogram. All variograms were fit with two nested spherical models.
To build variograms with a combination of hard and imprecise data, three estimates of the true variogram gamma value are combined: an estimate calculated from the hard data at lags where hard data are present, an estimate from a hard-imprecise data crosscovariance, and an estimate calculated from the imprecise data covariance at intermediate lags (equation 2). The final estimate of spatial covariance derived from equation 2 is then converted to a variogram using the nonergodic relationship developed by Isaaks and Srivastava (1988).
The variables h and s, refer to the number of hard and soft conditioning data in the search neighborhood. The three components of the covariance equation are weighted by the 's, which sum to 1. The 's are determined to account for the relative numbers of hard and soft data. One simple method to calculate is
Improvement of the variograms with spatially dense imprecise data can be substantial as demonstrated in Figure 4B. A field example of spatially dense imprecise data is an estimate of facies from a surface or cross-borehole geophysical survey.
The geostatistical simulation software developed for this project is a soft data extension of the program ISIM3-D (Gomez-Hernandez and Srivastava, 1990). Both use the sequential indicator simulation (SIS) algorithm. The heart of the SIS algorithm is the estimation of a posterior cumulative distribution function (pcdf). An estimate of the pcdf is calculated for every indicator threshold at each location being simulated. The pcdf estimate is based on the proximity and values of conditioning data located within a search neighborhood. In the modified algorithm, the pcdf value at each threshold level is a linear combination of the hard data points within the search neighborhood weighted by their respective kriging weights and the imprecise data points weighted by a spatial declustering weight and scaled by the p1 and p2 values (Alabert, 1987):
where 1 and 2, are weights that account for the relative quantities of the hard and imprecise data within the search neighborhood and are calculated as in equation 2. The i's are the indicator value (0 or 1) of each conditioning point at the current threshold. The kriging and declustering weights are represented by and , respectively. The kriging weights are derived from solving the kriging matrix, and the declustering weights are determined by the cell declustering technique Journel, 1983). Once the pcdf is constructed, a random number between 0 and 1 is drawn, and by examining the values of the pcdf at the indicator thresholds, the indicator class to which the random number belongs is determined. The point being simulated is assigned to that indicator class and is now considered a hard conditioning datum for the simulation of subsequent locations within the domain.
The simulator built for this project has several options that can be used to control the amount of hard and imprecise data found in the search neighborhoods and the type of kriging process (simple or ordinary) to be used. Imprecise data of the level illustrated in Table 1 are used to build 100 realizations with ordinary kriging (OK) and a maximum of 12 (4 hard and 8 imprecise) points in the search neighborhood. An example realization is shown in Figure 5. Realizations produced using simple kriging (SK) are not useful for this situation as discussed in the following section. One hundred realizations were also generated using only the hard data.
The resulting distributions of bulk hydraulic conductivity, first arrival time, and duration time are shown as histograms in Figure 6. The values determined by conducting flow and transport modeling in the deterministic section are provided for comparison. The reduction in variance of each resulting distribution due to the incorporation of imprecise data is provided in Table 3.
Coefficient of Variation | ||
---|---|---|
Parameters | Hard | Soft |
First Arrivals | 0.23 | 0.10 |
Duration Time | 0.19 | 0.10 |
Bulk Hydraulic Conductivity | 0.11 | 0.05 |
The actual bulk hydraulic conductivity does not occur within the distribution of bulk hydraulic conductivity produced using only the hard data. The hard data include a slightly higher percentage of sandstone than the actual section (17 vs. 14%) and a lower percentage of shale (58 vs. 61%). The resulting increased frequency of occurrence of high hydraulic conductivity facies in the realizations causes the overestimation of bulk hydraulic conductivity. It is possible that if more realizations are generated, the tail of the distribution may extend beyond the actual value.
Several attempts were made to employ simple kriging (SK) in the simulation process. SK forces a stationary mean value into the simulation process. Ordinary kriging (OK) does not force an a priori stationary value into the simulation, but rather corrects for local departures from the overall mean. For a number of experiments, SK was used for all points in the search neighborhood, both throughout the simulations and up to specified levels of simulation completion at which time the process was switched to OK. The simulation program is written to allow the original hard data to be searched separately from the imprecise data and previously simulated data. This option is available to facilitate experimenting with different combinations of obtaining an SK estimate of the pcdf from the original hard data and an OK estimate from the soft and previously simulated data. All attempts at incorporating SK failed and the resulting realizations were geologically unrealistic such that they were not used for flow and transport modeling.
The data set used in this study is somewhat artificial in that every point in the domain is a conditioning point. It may be realistic to have a two-dimensional section entirely composed of conditioning data from a geophysical survey; however, this would most likely be used in simulating a three-dimensional subsurface. SK could be more effectively employed in the early stages of a simulation with sparse conditioning data to impose an ergodic mean on the realizations. As the simulation process progresses, a switch to OK allows the mean to vary smoothly through the domain as the remaining locations are simulated. The abundance of conditioning data in this study negates the effective use of SK.
A thorough testing of the effects of the simulation process on the flow/transport results using a known data set is necessary before applying a simulator to a field problem. Even simulators that function properly on a test data set may not capture all the desired features of a site if the conditioning data are significantly biased. As this study has shown, it is possible to include the true first arrival time in the distribution of arrival times from an ensemble of realizations, but completely miss the true value of bulk effective hydraulic conductivity in that same ensemble. This is an apparent conflict because the traveltime across the fastest path is not directly correlated to the bulk hydraulic conductivity of the section; however, it is disappointing that the simulation ensemble did not include the true bulk hydraulic conductivity.
When limited or biased data are available, the user of the simulator must consider a broader range of possible outcomes through the conceptual development of the simulation model. This range of outcomes can be constrained by including soft information in the form of geophysical data, hydrologic measurements, conceptual models of depositional environment, and other information correlated to the parameters being simulated.
Deutsch, C. V., and A. G. Journel, 1992, GSLIB: geostatistical software library and user's guide: New York, Oxford University Press, 340 p.
Deutsch, C. V., and A. G. Journel, 1993, Entropy and spatial disorder: Mathematical Geology, v. 25, n. 3, p. 329-355.
Freeze, R. A., and J. A. Cherry, 1979, Groundwater: Englewood Cliffs, New Jersey, Prentice-Hall, 604 p. Gomez-Hernandez, J. J., and R. M. Srivastava, 1990, ISIM3-D: an ANSI-C three-dimensional multiple indicator conditional simulation program: Computers in Geosciences, v. 16, n. 4, p. 395-414.
Isaaks, E., and R. M. Srivastava, 1988, Spatial continuity measures for probabilistic and deterministic geostatistics: Mathematical Geology, v. 20, n. 4, p. 313-341.
Journel, A. G., 1983, Nonparametric estimation of spatial distributions: Mathematical Geology, v. 14, n. 3, p. 445-468.
MacDonald, M. G., and A. W. Harbaugh, 1988, A modular three-dimensional finite difference ground-water flow model: U.S. Geological Survey Techniques of Water Resources Investigations, Book 6.
Matheron, G., H. Beucher, C. de Fouquet, A. Galli, D. Guerillot, and C. Ravenne, 1987, Conditional simulation of the geometry of fluvio-deltaic reservoirs: SPE 16753, SPE Annual Technical Conference and Exhibition, Proceedings, p. 591-599.
Nichols, W. E., and M. D. Freshley, 1993, Uncertainty analysis of unsaturated zone travel time at Yucca Mountain: Ground Water, v. 31, n. 2, p. 293-301.
Poeter, E. P., and D. R. Gaylord, 1990, Influence of aquifer heterogeneity on contaminant transport at the Hanford Site: Ground Water, v. 28, n. 6, p. 900-909.
Pollock, D. W., 1989, Documentation of computer programs to compute and display pathlines using results from the U.S. Geological Survey modular three-dimensional finite-difference ground-water flow model: U.S. Geological Survey OFR 89-381,188 p.
Rautman, C. A., and A. H. Treadway, 1991, Geologic uncertainty in a regulatory environment: an example from the potential Yucca Mountain nuclear waste repository site: Environmental Geology and Water Sciences, p. 171-184.
Ravenne, C., R. Eschard, A. Galli, Y. Mathieu, L. Montadert, and J-L. Rudkiewicz, 1987, Heterogeneities and geometry of sedimentary bodies in a fluvio-deltaic reservoir: SPE 16752, SPE Annual Technical Conference and Exhibition, Proceedings, p. 115-124.