Data-driven prospectivity modelling of greenfields terrains is challenging because very few deposits are available and the training data are overwhelmingly dominated by non-deposit samples. This could lead to biased estimates of model parameters. In the present study involving Random Forest (RF)-based gold prospectivity modelling of the Tanami region, a greenfields terrain in Western Australia, we apply the Synthetic Minority Over-sampling Technique to modify the initial dataset and bring the deposit-to-non-deposit ratio closer to 50:50. An optimal threshold range is determined objectively using statistical measures such as the data sensitivity, specificity, kappa and per cent correctly classified. The RF regression modelling with the modified dataset of close to 50:50 sample ratio of deposit to non-deposit delineates 4.67% of the study area as high prospectivity areas as compared to only 1.06% by the original dataset, implying that the original “sparse” dataset underestimates prospectivity.
Financed by the National Centre for Research and Development under grant No. SP/I/1/77065/10 by the strategic scientific research and experimental development program:
SYNAT - “Interdisciplinary System for Interactive Scientific and Scientific-Technical Information”.