Efficient algorithms, based on prime factor decomposition, for the calculation of the discrete cosine transform (DCT) and the discrete sine transform (DST) are presented. The proposed algorithms are an extension to the previously published algorithms and include corrections to inaccuracies found therein. Two hardware architectures, based on the proposed algorithms, are presented and shown to be superior, in terms of complexity, throughput and latency, to the existing hardware implementations.