This study, following on from previous review papers, determined test reliability and validity within a clinical context. The chosen topic was the assessment of intra and inter-tester reliability as well as criterion validity (X-ray) for ‘Tape’ (TP) and ‘Block’ (BK) methods of leg length discrepancy (LLD) estimation. Four different testers using both TP and BK methods on two occasions within the same working day assessed 25 subjects. Two testers were designated as experienced (EX) and two as non-experienced (NEX). Intra-tester reliability was perfect for the BK method but demonstrated a range of variability, as assessed by typical error, for the TP method (range 0·17–0·32cm). One EX tester was more reliable than the other EX and both NEX testers. Inter-tester variability was assessed on the ability of different pairs of testers to categorize any left or right LLD as > or <0·5cm. Kappa coefficients were only moderate throughout but were generally larger for the BK method. Criterion validity for the most reliable tester (EX1) was assessed for both TP and BK methods on a sub-sample of subjects using regression analysis and suggested a closer match between TP and X-ray than between BK and X-ray measures. Whilst the intra-tester reliability data for the BK method is better than the TP method both approaches may be sensitive enough to differentiate ‘large’ clinically significant LLD with some confidence. Inter-tester reliability data suggests that the same tester should perform serial LLD estimations. Data for criterion validity must be viewed cautiously because of the sample size but suggests that the BK estimation produced a greater degree of error. The experience of the tester may impinge on the reliability of the estimation of LLD thus careful attention must be paid to training staff appropriately.