In this paper, we segment RGB-D sensor (e.g. Microsoft Kinect camera) images into 3D planar surfaces. We initialize a set of plane equations based solely from the depth (point cloud) information. We then iteratively refine the pixel-to-plane assignment and plane equations. During this process, the number of planes are also reduced by merging adjacent local planes with similar orientations. For the pixel-to-plane assignment, we treat the image as a Markov Random Field (MRF), and solve the association problem using graph-based global energy minimization. We design the energy terms to encapsulate both appearance cues from the RGB (color) channels and shape cues from the D (depth) channel. Experiments show that the use of both appearance and geometry information significantly improves the segmentation quality, especially so at genuine plane edges and plane intersections. As a byproduct, the framework also automatically fills in missing depth information.