One of the important factors for in-loop filter of video codec is low-delay capability for encoding and decoding. In this paper, we employ non-local means filter between sample adaptive offset and adaptive loop filter to the reference software of High Efficiency Video Coding HM7.0, and propose largest coding unit (LCU) based framework for non-local means filter that can reconstruct a decoded picture in LCU order at encoder and decoder. As the result, compared to HM7.0 anchor, in the case of picture-based RD-optimization, the average improvements of BD-rate for luma and chroma are 0.36 to 1.52% and 0.04 to 1.37%, respectively. Similarly, LCU-based one improves 0.20 to 1.27% and 0.67 to 1.91%, respectively. We confirm the maximum gain in the sequence of “Kimono” on low-delay P; the gains are 3.50% (Y), 2.89% (U) and 1.84% (V), respectively. Subjective quality improvements are also observed.