The main form of nitrogen found in natural waters is nitrate, as it is soluble in water and can leach through soil into the groundwater table. The extensive application of fertilizer to agricultural lands is a significant source of nitrate in water, which can pollute groundwater and threaten drinking water supplies (Tutmez & Hatipoglu, 2010). Nitrate consumption at elevated levels has been linked to adverse effects such as methemoglobinemia and non-Hodgkin's lymphoma (Hudak, 2000). Although nitrate concentrations in groundwater vary over time based on the rate of natural and anthropogenic inputs, the hydrologic cycle, and the surrounding geologic properties, determining spatial patterns in groundwater quality can still aid in predicting potential consequences to environmental and human health (Chaudhuri & Ale, 2014). In the absence of groundwater wells being located uniformly in space or sampled at regular intervals, this study aims to apply different spatial interpolation techniques to predict nitrate concentrations across the Ogallala aquifer within Texas.
The Ogallala aquifer is found below the states of Wyoming, South Dakota, Nebraska, Colorado, Kansas, New Mexico, Oklahoma, and Texas and has the highest total groundwater withdrawals in the nation, with Texas being the second largest consumer. The Ogallala is an unconfined aquifer, which results in recharge being highly dependent on infiltration from precipitation and irrigation return flow (Chaudhuri & Ale, 2014). Statewide in Texas, a large majority of withdrawn groundwater is used for irrigation, followed by municipal demands and lastly livestock watering (Hudak, 2000). The Ogallala aquifer spreads across much of the northern portion of the state in the High Plains region, where the economy is highly dependent on irrigated agriculture such as corn, cotton, and sorghum. Groundwater flow generally trends to the southeast, as do the major rivers of Texas towards the coast (Chaudhuri & Ale, 2014).
Data for this project include the location of public and private groundwater wells within the estimated limits of the Ogallala aquifer in Texas that were sampled for nitrate concentrations in 2021 (Figure 1). Water levels and water quality information were collected by either the U.S. Geological Survey or the Texas Water Development Board from April through August of 2021. The primary water uses of the wells include irrigation, stock watering, and public water supply. Descriptions of all data sources used can be found in Table 1.
Figure 1: All groundwater wells within the Ogallala aquifer in Texas that were sampled for nitrate concentrations in 2021. Measured values are displayed for training wells while the testing wells will be used to validate the interpolation results.
Table 1: Metadata of all sources of data for this analysis.
The projected coordinate system used for this analysis is the NAD 1983 Texas Centric Lambert Conformal Conic (meters). As Texas is the second largest state in the U.S. in terms of area and encompasses five State Plane zones or three UTM zones, a statewide system was needed to more accurately portray the extent of the Ogallala aquifer. For the interpolation distances in this analysis, a conformal projection is advantageous in that it results in approximately equal distortions in all directions from each well point.
Within the Ogallala aquifer, 107 groundwater well points were selected for this analysis based on their latest recorded sampling date of 2021 for nitrate concentrations. To validate each interpolation method, a testing subset of 21 randomly selected well points (~20%) were initially removed from the dataset to later measure the predictive quality of each interpolation method (Figure 1). The remaining 86 well points were thus used to train each interpolation prediction. Spatial interpolation is the process of using discrete, measured values of an attribute to create a continuous surface of predictions at all locations in a study area. The two primary categories of interpolation techniques are deterministic, which utilize the actual measured values, and geostatistical, which fit statistical models based on the overall spatial structure of the values.
IDW is a deterministic interpolator in which measured values further away from a predicted value have less proportional weight applied based on the inverse of the distance away they are. Closer measured points are thus given larger weights when predicting values. IDW is an exact interpolator in that it utilizes the value of every measured point, which also results in predictions that cannot be lower than the minimum or higher than the maximum value of the data. The distance between points is calculated as a power function, so that increasing the exponent introduces a faster rate of distance decay to increase the influence of closer points (Smith et al. 2018). The inputs to the two IDW interpolation models created are given in Figure 2, in which the power exponent was increased from a value of two to three for model comparison.
The Spline method is another exact, deterministic interpolator that is part of a family known as radial basis interpolation. Spline applies mathematical functions to fit a predictive surface that minimizes the overall curvature and passes exactly through the measured points, but it does not necessarily preserve the minimum or maximum values (Smith et al. 2018). In this analysis two Spline types, regularized and tension, are explored (Figure 3). The regularized type ensures a smoother surface but with values that may be outside of the measured range. The tension type controls the overall stiffness of the surface, resulting in less smoothness but with values more constrained to the measured range.
Kriging is a geostatistical method that operates under the assumption that the measured values represent one sample from a true population. It incorporates spatial autocorrelation present by plotting the squared differences between measured values (semivariances) with distance to create a semivariogram cloud. The average semivariance can be found over a distance range, or lag, to create an empirical semivariogram. Numerous semivariogram models can then be selected from to be fit to the empirical semivariogram based on the spatial structure. The predicted values are calculated by a linear weighted average of the surrounding measured points, with optimal weights applied from the fitted semivariogram model (Smith et al. 2018). One relatively simple but widely used method is called Ordinary Kriging, in which it is assumed that the surface has a constant, unknown mean without an underlying trend (O’Sullivan & Unwin, 2014). The two Ordinary Kriging semivariogram models used in this analysis are the spherical and exponential models (Figure 4).
Assumptions
Percentage values found in references of how many points to initially remove as a testing subset seem to vary to up to 50% of the original points. For this analysis, 20% of the points were used for validation to ideally leave enough training points that were within a reasonable distance from each other for the interpolation calculations. For all three interpolation methods, a cell size of 100 meters was chosen. This resulted in each prediction cell having an area equal to one hectare, which is a commonly used metric unit of land measurement that seemed appropriate for the spatial scale of this region. Each interpolation method was calculated several times using a different number of neighboring cells as the search radius, from 8 to 12 to 16. Each method performed better (based on lower values of the root mean square error described further in Results) as the number of neighbors increased to 16, so this value was used for the remainder of the analysis.
For IDW, the power function was first set to the default value of squared distance. For comparison, it was then increased to cubed distance in order to ascertain any effects of making closer points more influential. For the Spline method, both the regularized and tension types were performed. The weights for both methods were kept at the default value of 0.1 to make more direct comparisons. For both Ordinary Kriging predictions, the lag size used to create the semiovariogram was kept at the output cell size of 100 meters. The spherical model was first used as it is the default model applied in ArcGIS Pro. It was then compared with predictions from the exponential model, which is a similarly shaped function but is useful when there are likely more measurement errors present.
Summary statistics from each of the six interpolation models compared with the original training and testing subsets are given in Table 2. While the minimum and maximum values are not the same for the training and testing subsets, this just means the wells with the minimum and maximum nitrate concentrations were not randomly selected for the testing subset. The means and standard deviations are still similar. Both IDW types align with the minimum and maximum values of the training subset as expected, and the standard deviations are somewhat lower potentially due to prediction values not exceeding the measured values. The means of the Spline types are comparable to the training points, but the ranges and the standard deviations are well outside of the measured values. The range is somewhat more constrained for the tension type, as expected. The means of both Kriging models are very similar to the mean of the training values, but the ranges are much more constrained. This can make sense with how the prediction rasters are calculated. The training point with the highest (or lowest) value is unlikely to be at the exact center of the output cell, so the predicted value of that cell will be lower (or higher) based on the influence of the surrounding cells. This results in the Kriging models also having lower standard deviations as the predicted values are clustered more around the means.
To assess model accuracy, validation was performed using the testing subset of groundwater wells. The predicted value was found at each of the 21 testing wells for each of the six interpolation methods and compared with the measured nitrate concentration at that well (Table 3). As it is readily apparent that the predicted values can widely vary from the measured values, some measurement of the degree of errors is needed in order to more directly compare model performance. The root mean square error (RMSE) is the square root of the average squared distance between the observed value and the fitted interpolation line (Gong et al. 2014). While there are numerous other performance indicators that could be calculated, in general models with smaller values of RSME are considered to be better. The calculation of RMSE is shown by Equation 1, and the value for each interpolation model is given in the bottom of Table 3.
For the IDW method, the squared distance produced a somewhat better model than the cubed distance parameter as evidenced by the lower RMSE value. This indicates that increasing the weight of closer points did not improve the IDW model. Neither of the Spline models performed particularly well, as they had the two highest RMSE values. The tension type did perform somewhat better due to being more constrained to the range of measured values as described earlier. While the Spline models do pass through the measured points, they both result in negative prediction values that are clearly not possible for actual nitrate concentrations. Based purely on the RMSE calculation, the spherical semivariogram model led to better Ordinary Kriging results. Finally, the best performing models from each interpolation method are visualized together in Figure 5.
Table 2: Summary statistics of training points, testing points, interpolation methods used in this analysis.
Table 3: Performance of each interpolation method through validation of the testing subset (RMSE = root mean square error).
Equation 1: Root mean square error (n = number of test points, p = predicted value, o = observed value).
Figure 5: Visual comparison of the top performing interpolation methods of nitrate concentrations in the Ogallala aquifer.
In terms of visual comparison, the IDW method produced a surface with sharp peaks that are often attributed to the exponent of the distance weighting. Instead of increasing the power function to three, decreasing it might have resulted in smoother results. Along with the moderate RMSE values, the IDW model is thus not the best of those tested here. The Spline tension type model could appear to be realistic in that the nitrate concentrations gently vary across the surface. However, the prediction of negative nitrate concentrations as well as the highest RMSE value leads this model to also not be selected as the best. The Ordinary Kriging spherical model, with the lowest RMSE value, can thus be chosen as the best predictor of nitrate concentrations. While the edges of each prediction band are not as visually smooth as the other models, it matches the measured values well that were higher in the southern and eastern portions of the aquifer.
It is important to note that the Ordinary Kriging spherical model does have a narrower range, as described in Results, and only has values in five of the eight prediction bands in Figure 5. Over half of the measured nitrate concentrations from the original 107 wells were higher than the EPA drinking water standard of 10 mg/L. But whether this could be a result of groundwater sampling and laboratory errors or that the measurements were taken in less than one year and might not be indicative of long-term concentrations cannot be determined from this analysis. It is also difficult to conclude that the spherical model was the best choice of semivariogram. Both the spherical and exponential models could be good predictors for this dataset, but as they were the only ones tested it does not mean that there could not be a more suitable model. A smaller cell size could also have been used to potentially improve the Kriging results. The overall distribution of nitrate concentrations from all 107 wells were right-skewed. Applying a logarithmic transformation to the data could have resulted in a more normal distribution that could have led to a Kriging model with better predictive power. However, logarithmic values of nitrate concentrations would not be as interpretable of results. Finally, other model performance indicators, such as average standard error or standardized mean prediction error, could have been determined to make more effective model comparisons.
This study is potentially affected by other limitations as well, the first of which is the two-dimensional prediction surfaces. Underground aquifers are not in a fixed plane but instead vary in space due to geologic and topographic properties. Nitrate concentrations are also not bound by state lines, and edge effects are certainly present from not including any measurements from the surrounding states. The well depths and elevations were also not considered along with the nitrate measurements. As being relatively static measurements in time, the models therefore cannot take into account variations from land use changes, dynamic hydrological cycles, or nutrient inputs.
Despite their constraints and assumptions, spatial interpolation methods can still be used to help identify potential causes of variability in nitrate concentrations in the Ogallala aquifer in order to inform water quality planning and management.
Chaudhuri, S., & Ale, S. (2014). Long term (1960–2010) trends in groundwater contamination and salinization in the Ogallala aquifer in Texas. Journal of Hydrology, 513, 376–390. https://doi.org/10.1016/j.jhydrol.2014.03.033
Gong, G., Mattevada, S., & O’Bryant, S. E. (2014). Comparison of the accuracy of kriging and IDW interpolations in estimating groundwater arsenic concentrations in Texas. Environmental Research, 130, 59–69. https://doi.org/10.1016/j.envres.2013.12.005
Hudak, P. F. (2000). Regional trends in nitrate content of Texas groundwater. Journal of Hydrology, 228(1), 37–47. https://doi.org/10.1016/S0022-1694(99)00206-1
O’Sullivan, D., & Unwin, D. J. (2014). Geographic information analysis (2nd ed.). John Wiley & Sons. https://doi.org/10.1002/9780470549094
Smith, M. J. D., Goodchild, M. F., & Longley, P. A. (2018). Geospatial analysis: A comprehensive guide to principles, techniques and software tools. Winchelsea Press.
Tutmez, B., & Hatipoglu, Z. (2010). Comparing two data driven interpolation methods for modeling nitrate distribution in aquifer. Ecological Informatics, 5(4), 311–315. https://doi.org/10.1016/j.ecoinf.2009.08.001