Classroom observations are the largest component of evaluation ratings given to teachers in the multiple‐measure evaluation systems states have implemented in the last decade. Using data from the first eight years of Tennessee's teacher evaluation system, we document race and gender gaps in observation ratings and ask whether these gaps reflect true differences in instructional effectiveness. White and female teachers receive, on average, 0.15 standard deviations (SD) and 0.30 SD higher observation ratings than their Black and male colleagues. Gaps persist even conditional on other measures of teachers’ effectiveness, such as value‐added to student test scores or student attendance, consistent with potential bias. The Black–White gap is largest in schools where Black teachers are racially isolated and is partly explained by Black teachers’ propensity to be assigned less advantaged students within their schools. Teachers receive somewhat higher ratings from raters of the same race. We find no same‐gender rater effects and, beyond score differences associated with grade and subject taught, uncover few explanations for the large advantage women see in observation ratings. Our results suggest the need for steps to address bias in evaluation processes to ensure the accuracy of evaluation feedback and fair, equitable treatment of teachers in evaluation and staffing actions that rely on it.