After analyzing the performance bottlenecks of the Hough transform on multi-core processors, this paper proposes a new Hough transform implementation. The performance of microprocessors improves significantly because of the introduction of multiple cores. To harness the computation power of such multi-core processors, we must effectively execute many threads at the same time. This paper first studies a coarse-grain and a fine-grain parallelization of a straightforward Hough transform implementation on an 8-core machine. Due to parallelization overheads and memory requirements, these schemes do not fully utilize computation power. After that, we propose a new Hough transform implementation for parallelization. Experimental data shows that the new Hough transform exposes a significant amount of concurrency and pretty good data locality. On the 8-core machine, the new implementation has 25% better performance than the old ones.