Unsupervised learning may be modeled by an extreme version of the universal channel coding problem. Suppose a channel decoder knows only that a randomly generated block code is being employed at the other end of a discrete memoryless channel. The channel statistics, the codebook, the code distribution, the rate, the blocklength, and even the input alphabet are unknown. Using a novel decoding measure, it is shown that the channel outputs may be correctly clustered with vanishing error probability at all rates under capacity. Results provide theoretical motivation for certain heuristic clustering techniques and, more widely, suggest that there are gains to be had via non-pairwise similarity measures.