Evaluating competing multifactor asset pricing models involves comparing the statistical significance of their mean pricing errors (alphas). Unfortunately, this comparison favors imprecisely estimated models because p-values tend to be higher in more noisy models. To avoid false impressions of relative success at tests for zero mean pricing errors, we develop a notion of comparative p-values and suggest comparing these instead of the raw p-values. This comparison gives more precisely estimated models a fairer chance or, equivalently, quantifies how much easier it is for imprecisely estimated models, by comparison, to pass the test.