This paper uses two “samples-of-opportunity” datasets to examine whether principal evaluations of teachers differ systematically across genders after controlling for arguably gender unbiased measures of teacher productivity---namely value-added student test scores calculated relative to other teachers in the same grade/school (where teachers are randomly allocated to classrooms within the same grade/school). While the two datasets appear to be quite similar in nature, both were samples-of-opportunity in that they were not representative of any particular population. Our findings differ substantially across datasets. This exercise reveals how results in the education and discrimination literature may be sensitive to the sample used.