The performance of 3D rendering of GraphicsProcessing Unit that converts 3D vector stream into 2D framewith 3D image effects significantly impacts users gamingexperience on modern computer systems. Due to its hightexture throughput requirement, main memory bandwidthbecomes a critical obstacle for improving the overall renderingperformance. 3D-stacked memory systems such as HybridMemory Cube provide opportunities to significantly overcomethe memory wall by directly connecting logic controllers toDRAM dies. Although recent works have shown promisingimprovement in performance by utilizing HMC to acceleratespecial-purpose applications, a critical challenge of how toeffectively leverage its high internal bandwidth and computingcapability in GPU for 3D rendering remains unresolved. Basedon the observation that texel fetches greatly impact off-chipmemory traffic, we propose two architectural designs to enableProcessing-In-Memory based GPU for efficient 3D rendering. Additionally, we employ camera angles of pixels to controlthe performance-quality tradeoff of 3D rendering. Extensiveevaluation across several real-world games demonstrates thatour design can significantly improve the performance of texturefiltering and 3D rendering by an average of 3.97X (up to 6.4X) and 43% (up to 65%) respectively, over the baseline GPU. Meanwhile, our design provides considerable memory trafficand energy reduction without sacrificing rendering quality.