In order to efficiently encode depth map images in a multi-view video coding scenario, two basic properties of these images can be leveraged: first, errors in pixels located near the edges of objects have a greater perceptual impact on the synthesized view; second, depth maps can be approximated as piecewise planar signals. We make use of these facts to define a discrete wavelet transform using lifting that avoids filtering across edges. The filters are designed to fit the planar shape of the signal. This leads to an efficient representation of the image while preserving the edge information. By preserving the edge information, we are able to improve the quality of the synthesized views, as compared to existing methods.