The problem of testing the reliability of ensemble forecasting systems is revisited. A popular tool to assess the reliability of ensemble forecasting systems (for scalar verifications) is the rank histogram; this histogram is expected to be more or less flat, since, for a reliable ensemble, the ranks are uniformly distributed among their possible outcomes. Quantitative tests for flatness (e.g. Pearson's goodness‐of‐fit test) have been suggested; without exception, however, these tests assume the ranks to be a sequence of independent random variables, which is not the case in general, as can be demonstrated with simple toy examples. In this article, tests are developed that take the temporal correlations between the ranks into account. A refined analysis exploiting the reliability property shows that the ranks still exhibit strong decay of correlations. This property is key to the analysis, and the proposed tests are valid for general ensemble forecasting systems with minimal extraneous assumptions.