Abstract
An automatic cluster number selection algorithm is proposed for multi-point geostatistical simulation. The multi-point simulation is performed by extracting patterns from training image. The computational time of the pattern-based simulation is significantly reduced by dimension reduction of patterns by principal component analysis (PCA). The traditional PCA is used for its simplicity and computational ease. The patterns are classified using their principal components (PCs) by the k-means clustering algorithm. The number of clusters is selected automatically by calculating the gap statistics. The conditional cumulative density function (ccdf) for each class was generated based on the frequency of the central node value of the template. For sequential simulation, the similarity of the conditioning data with the class prototypes is measured using the L2-norm. The ccdf of best-matched class is used to draw a pattern from a class. The algorithm is validated with examples of conditional and unconditional simulation. The results show that the spatial continuity in terms of reproduction of curvilinear structure is well reproduced in all examples. The reproductions of first- and second-order statistics are also very good for all examples. A comparative study with the wavesim and filtersim techniques show that the proposed algorithm performed better than the filtersim and performed more or less very similar to the wavesim algorithm; however, the computational time of the proposed method is similar to filtersim and relatively less than that of the wavesim algorithm. The sensitivity of the algorithm on a number of PCs and the number of clusters have also been tested. Results revealed that automatic cluster selection helps to improve the performance of the proposed method.
Similar content being viewed by others
References
Arpat G, Caers J (2007) Conditional simulation with patterns. Math Geol 39(2):177–203
Chatterjee S, Dimitrakopoulos R (2011) Multi-scale stochastic simulation with a wavelet-based approach. Comput Geosci 45:177–189
Chatterjee S, Dimitrakopulos R, Mustafa H (2012) Dimensional reduction of pattern-based simulation using wavelet analysis. Math Geosci 44:343–374
Ding C, He X (2004) K-means clustering via principal component analysis. Proc. of Int’l Conf. Machine Learning (ICML 2004): 225–232.
Goovaert P (1997) Geostatistics for natural resources evaluation (applied geostatistics series). Oxford University Press, Oxford
Guardiano FB, Srivastava RM (1993) Multivariate gostatistics; beyond bivariate moments. Quant Geol Geostat 5:133–144
Hartigan JA, Wong MA (1979) Algorithm AS 136: a K-means clustering algorithm. J R Stat Soc: Ser C: Appl Stat 28(1):100–108
Honarkhah M, Caers J (2010) Stochastic simulation of patterns using distance-based pattern modelling. Math Geosci 42:487–517
Jolliffe I (1986) Principal component analysis. Springer, New York
Journel AG (1997) Deterministic geostatistics: a new visit. In: Baafy E, Shofield N (eds) Geostatistics Woolongong ‘96. Kluwer, Dordrecht, pp 213–224
Mao S, Journel AG (1999) Generation of a reference petrophysical and seismic 3D data set: the Stanford V reservoir. In: Stanford Center for Reservoir Forecasting Annual Meeting. Available at: http://ekofisk.stanford.edu/SCRF.html
Mariethoz G, Renard P (2010) Reconstruction of incomplete data sets or images using direct sampling. Math Geosci 42(3):245–268
Mariethoz G, Renard P, Straubhaar J (2010) The direct sampling method to perform multiple‐point geostatistical simulations. Water Resour Res 46(11): W11536
Mustafa H, Chatterjee S, Dimitrakopulos R (2014) CDFSIM: efficient stochastic simulation through decomposition of cumulative distribution functions of transformed spatial patterns. Math Geosci 46:95–123
Mustapha H, Dimitrakopoulos R (2010) High-order stochastic simulations for complex non-Gaussian and non-linear geological patterns. Math Geosci 42(5):457–485
Sarma P, Durlofsky LJ, Aziz K (2008) Kernel principal component analysis for efficient, differentiable parameterization of multipoint geostatistics. Math Geosci 40(1):3–32
Strebelle S (2002) Conditional simulation of complex geological structures using multiplepoint statistics. Math Geol 34(1):1–21
Sugar CA, James GM (2003) Finding the number of clusters in a dataset: an information-theoretic approach. J Am Stat Assoc 98:750–763
Tahmasebi P, Hezarkhani A, Sahimi M (2012) Multiple-point geostatistical modeling based on the cross-correlation functions. Computat Geosci 16:779–797
Tibshirani R, Walther G, Hastie T (2001) Estimating the number of clusters in a data set via the gap statistic. J R Statist Soc B 63(2):411–423
Wu J, Zhang T, Journel A (2008) Fast FILTERSIM simulation with score-based distance. Math Geosci 40(7):773–788
Yin H (2008) On multidimensional scaling and embedding of self-organising maps. Neural Netw 21:160–169
Zhang T, Switzer P, Journel A (2006) Filter-based classification of training image patterns for spatial simulation. Math Geol 38(1):63–80
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Chatterjee, S., Mohanty, M.M. Automatic cluster selection using gap statistics for pattern-based multi-point geostatistical simulation. Arab J Geosci 8, 7691–7704 (2015). https://doi.org/10.1007/s12517-014-1724-0
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s12517-014-1724-0