Recent technological advances in atom probe tomography (APT) have led to unprecedented data acquisition capabilities that routinely generate data sets containing hundreds of millions of atoms. Detecting nanoscale clusters of different atom types present in these enormous amounts of data and analyzing their spatial correlations with one another are fundamental to understanding the structural properties of the material from which the data is derived. Extant algorithms for nanoscale cluster detection do not scale to large data sets. Here, a scalable, CUDA-based implementation of an autocorrelation algorithm is presented. It isolates spatial correlations amongst atomic clusters present in massive APT data sets in linear time using a linear amount of storage. Correctness of the algorithm is demonstrated using large synthetically generated data with known spatial distributions. Benefits and limitations of using GPU-acceleration for autocorrelation-based APT data analyses are presented with supporting performance results on data sets with up to billions of atoms. To our knowledge, this is the first nanoscale cluster detection algorithm that scales to massive APT data sets and executes on commodity hardware.