As Reconfigurable Computing (RC) closes its sixth decade, significant improvements have been made to make this technology a competitor for application-specific integrated circuits (ASICs). With the field programmable gate array (FPGA) computing power operating significantly lower in speed than that of a general purpose processor (GPP), the developer must exploit every avenue possible to attain a speedup on a heterogeneous computer. Achieveing a significant speedup is what makes the RC application development process worthwhile. The developer may reap the benefits of having better computational power at a lower cost than using a traditional ASIC. This occurs primarily through efforts to pipeline and parallelize processes on an FPGA. In addition to the traditional “three P's,” this paper highlights another speedup avenue via true multilevel parallelism. In particular, it further demonstrates this concept by using a threaded programming model that allows for the GPP and the FPGA to run simultaneously. This method is realized through a threaded dot product on a heterogeneous computer.