Matrix-Matrix multiplication (MMM) is widely used algorithm in today's computations and researches. Many techniques exist to speed up its execution. In this paper, we analyze the performance of MMM varying matrix size in order to determine its behavior and the region where it provides the best performance. We also determine the best speedup and efficiency in parallel implementation for different CPU architectures since cache architecture and organization is very important for MMM performance. Intel i7 and AMD Opteron CPUs are used as an environment. Several achieved results are expected, but there are also many unexpected. Superlinear speedup (speedup greater than the number of used threads) and the efficiency greater than 100% are achieved for each parallel implementation only on AMD Opteron. We observe regions with performance discrepancy for speed, speedup, and efficiency for both CPUs.