Musical signals are highly structured. Untrained listeners can capture some particular musical events from audio signals. Uncovering this structure and detecting musical events will benefit musical content analysis. This is known to be an unsolved problem. In this paper, an unsupervised learning approach is proposed to automatically infer some structure of the music from segments generated by beat and onset analysis. A top-down clustering procedure is applied to group these segments into musical events with similar characteristics. A Bayesian information criterion is then used to regularize the complexity of the model structure. Experimental results show that this unsupervised learning approach can effectively group similar segments together and automatically determine the number of such musical events in a given music piece