Detecting strongly connected components (SCCs) in a directed graph is a fundamental graph analysis algorithm that is used in many science and engineering domains. Traditional approaches in parallel SCC detection, however, show limited performance and poor scaling behavior when applied to large real-world graph instances. In this paper, we investigate the shortcomings of the conventional approach and propose a series of extensions that consider the fundamental properties of real-world graphs, e.g. the small-world property. Our scalable implementation offers excellent performance on diverse, small-world graphs resulting in a 5.01× to 29.41× parallel speedup over the optimal sequential algorithm with 16 cores and 32 hardware threads.