Many applications such as telecommunication and commercial video broadcasting streams, computer systems logs, and web clicks are categorical or mixed-value data streams that exhibit context-dependency. Models that try to capture this context-dependency tend not to be scalable. This paper offers a solution to the scalability problem of these models by providing a method for generating them around relevant aggregates of these data streams rather than the individual samples. The approach expands existing clustering techniques for static categorical data sets to predictive models of data streams based on Variable Length Markov models of clusters. The paper includes theoretical and experimental evaluations of the technique as well as comparison with other prominent clustering techniques for categorical data streams.