The CC-NUMA is a promising architecture for multiprocessor computer systems because of its scalability and ease of programming. However, little is known about the collective performance of the cache coherency protocol, the interconnection network, and the I/O subsystem of the CC-NUMA multiprocessor. In this paper, we analyzed the collective performance of various configurations of the CC-NUMA multiprocessor using commercial workload targeted for popular OLTP applications. The simulation results showed that the bottleneck on the ring, the switch-based tree, and the I/O subsystem could be identified and effectively removed by changing the configurations.