Summary
Speech/music discrimination of audio recordings refers to the problem of segmenting an audio stream and labeling each segment as either speech or music. This chapter provides an overview of methods that have been proposed in the field during the past decade and also presents in more detail a methodology that treats the problem as a posterior probability maximization task. Given that feature extraction is of primary importance to all methods, a study of feature extraction schemes is first provided. The existing methods are then broadly classified to categories depending on the underlying design philosophy. Finally, a performance study is given by presenting the datasets and accompanying assumptions that each method has adopted.