The use of multi-category scales is increasing for the monitoring of IEP goals, classroom and school rules, and Behavior Improvement Plans (BIPs). Although they require greater inference than traditional data counting, little is known about the inter-rater reliability of these scales. This simulation study examined the performance of nine reliability indices applied to six multi-category scales of different gradations (2, 3, 5, 7, 10, and 15 points) all derived from the same quasi-continuous (1–30) data. The researchers find that each index behaves differently and requires its own interpretation; there is no one-best reliability indices as most indices are scale-dependent. Finally, index values do not remain constant when more categories are collapsed to fewer. New guidelines are needed for optimal methods of obtaining reliability with ordinal scales.