We present a new method for estimating 3D scene layers from light field data obtained by plenoptic cameras or camera arrays. The proposed method is based on a novel sparse generative model for light fields, which uses a dictionary of ray-like functions and combines them in a non-linear way via a set of occlusion masks. By estimating the set of sparse coefficients and masks, our method divides the light field into layers corresponding to different depths in a 3D scene, while taking occlusions into account. The proposed method can thus be used for 3D scene segmentation and in a myriad of inverse problems in light field imaging, such as view interpolation, denoising, inpainting and super-resolution.