This paper studies the disjointness of the time-frequency representations of simultaneously playing musical instruments. As a measure of disjointness, we use the approximate W-disjoint orthogonality as proposed by Yilmaz and Rickard [1], which (loosely speaking) measures the degree of overlap of different sources in the time-frequency domain. The motivation for this study is to find a maximally disjoint representation in order to facilitate the separation and recognition of musical instruments in mixture signals. The transforms investigated in this paper include the short-time Fourier transform (STFT), constant-Q transform, modified discrete cosine transform (MDCT), and pitch-synchronous lapped orthogonal transforms. Simulation results are reported for a database of polyphonic music where the multitrack data (instrument signals before mixing) were available. Absolute performance varies depending on the instrument source in question, but on the average MDCT with 93 ms frame size performed best.