In this paper, an efficient architecture to optimize system performance, power consumption, and reliability of stacked mesh 3D NoC is proposed. Stacked mesh is a feasible architecture which takes advantage of the short inter-layer wiring delays, while suffering from inefficient intermediate buffers. To cope with this, an inter-layer communication mechanism is developed to enhance the buffer utilization, load balancing, and system fault-tolerance. The mechanism benefits from a congestion-aware and bus failure tolerant routing algorithm for vertical communication. To estimate the efficiency of the proposed architecture, the system has been simulated using uniform, hotspot 10%, and Negative Exponential Distribution (NED) traffic patterns. In addition, a video conference encoder has been used as a real application for system analysis. Our extensive experiments show significant power and performance improvements compared to a typical stacked mesh 3D NoC.