Abstract
Modified group delay based algorithms for estimation of melodic pitch sequences from heterphonic/polyphonic music are discussed in this paper. Two different variants of the modified group delay function are proposed, namely, (a) system based—MODGD (Direct) and (b) source based—MODGD (Source). In (a) the standard modified group delay function (MODGDF) is used to estimate prominent melodic pitch (\(f_0\)), which appears like a low frequency formant in the MODGDF spectrum. In (b), the power spectrum of the signal is first flattened to emphasise the source. The flattened power spectrum behaves like a sinusoid in noise, the frequency of the sinusoid being related to the pitch frequency. The modified group delay function of this signal produces peaks at \(T_0\), \(2T_0, \ldots ,\) where \(T_0=\frac{1}{f_0}\). Continuity constraints in a dynamic programming framework are imposed across frames to reduce octave errors. Sudden changes in pitch are accommodated by changing the frame size dynamically using a multi-resolution framework. The performance of the proposed systems was evaluated on four datasets: ADC-2004, LabROSA, MIREX-2008 and Carnatic music dataset. The performance of the proposed approaches demonstrate the potential of the group delay based methods for melody extraction.
Similar content being viewed by others
Notes
Visual representation of modified group delay functions with time in vertical axis and frame index in horizontal axis. A third dimension, indicating the amplitude of group delay function at a particular time is represented by the intensity or color of each point in the image.
An ālāpana is a melodic improvisation within the constraints of a melody.
A tone corresponds to 50 cents in quarter tone scale
In the quarter tone scale, an octave is divided into 24 equal steps (equal temperament). In this scale, the quarter tone is the smallest step.
References
Arora, V., & Behera, L. (2013). On-line melody extraction from polyphonic audio using harmonic cluster tracking. IEEE Transactions on Audio Speech and Language Processing, 21(3), 520–530.
Bello, J. P. (2003). Towards the automated analysis of simple polyphonic music: A knowledge based approach. Ph.D. Diss., University of London, Queen Mary.
Bittner, R. M., Salamon, J., Tierney, M., Mauch, M., Cannam, C., & Bello, J. P. (2014). Medleydb: A multitrack dataset for annotation-intensive mir research. In Proceedings of the international society for music information retrieval (ISMIR), Taipei, Taiwan.
Brossier, P. M. (2005, September). Fast melody extraction using aubio(brossier), mirex-2005. In 4th Music information retrieval evaluation eXchange (MIREX), extended abstract (pp. 325–333).
Cancela, P. (2008). Tracking melody in polyphonic audio. In 4th music information retrieval evaluation eXchange (MIREX), extended abstract.
Cao, C., Li, M., Liu, J., & Yan, Y. (2007). Singing melody extraction in polyphonic music by harmonic tracking. In Proceedings of international society for music information retrieval (International Society for Music Information Retrieval conference) (pp. 373–374).
Dressler, K. (2011, October). An auditory streaming approach for melody extraction from polyphonic music. In Proceedings of international society for music information retrieval conference (pp. 19–24).
Durrieu, J. L., Richard, G., & Fvotte, C. (2010). Source/filter model for unsupervised main melody extraction from polyphonic audio signals. IEEE transactions on audio, speech, and language processing (pp. 564–575).
Goto, M., & Hayamizu, S. (1999, May) A real-time music scene description system: Detecting melody and bass lines in audio signals. In Working notes of the IJCAI-99 workshop on computational auditory scene analysis (pp. 31–40).
Hsu, C.-L., Chen, L.-Y., Jang, J.-S. R., & Li, H.-J. (2009). Singing pitch extraction fom monaural polyphonic songs by contextuual audio modeling and singing harmonic enhancement. In Proceedings of the 10th international society for music information retrieval conference (pp. 201–206).
Hsu, C. L., & Jang, J. S. (2010, May) Singing pitch extraction by voice vibrato/tremolo estimation and instrument partial deletion. In Proceedings of international society for music information retrieval (International Society for Music Information Retrieval Conference) (pp. 525–530).
Hsu, C.-L., Wang, D., Jang, J.-S. R., & Hu, K. (2012). A tandem algorithm for singing pitch extraction and voice separation from music accompaniment. IEEE Transactions on Audio, Speech and Langauge Processing, 20(5), 1482–1491.
Hu, G., & Wang, D. L. (2010). A tandem algorithm for pitch estimation and voiced speech segregation. IEEE Transactions on Audio Speech Language Processing, 18(8), 2067–2079.
Jones, D., & Parks, T. (1990). A high-resolution data-adaptive time-frequency representation. IEEE Transactions on Acoustics, Speech, and Signal Processing, 38(12), 2127–2135, 1990.
Joo, S., Jo, S., & Yoo, C. D. (2010). Melody extraction from polyphonic audio signal mirex-2010. In 6th Music information retrieval evaluation exchange (MIREX), 2010.
Joo, S., Park, S., Jo, S., & Yo, C. D. (2011). Melody extraction based on harmonic coded structures. In 12th international society for music information retrieval conference (ISMIR 2011) (pp. 227 –232).
Kitahara, T. (2006). Computational musical instrument recognition and its application to content-based music information retrieval. Ph.D. Diss., Kyoto University, Japan.
Kum, S., Oh, C., & Nam, J. (2016). Melody extraction on vocal segments using multi-column deep neural networks. In Proceedings of 17th international society for music information retrieval (ISMIR).
Mauch, M., & Dixon, S. (2014, April). Pyin: A fundamental frequency estimator using probabilistic threshold distributions. In Proceedings of IEEE international conference on acoustics, speech, and signal processing (pp. 659–663).
Murthy, H. A. (1991, December). Algorithms for processing fourier transform phase of signals. PhD Dissertation, Department of Computer Science and Engg, Indian Institute of Technology, Madras, India.
Murthy, H. A., & Yegnanarayana, B. (1991a). Formant extraction from minimum phase group delay function. Speech Communications, 10, 209–221.
Murthy, H. A., & Yegnanarayana, B. (1991b). Speech processing using group delay functions. Signal Processing, 22, 259–267.
Murthy, H. A., & Yegnanarayana, B. (2011). Group delay functions and its application to speech processing. Sadhana, 36(5), 745–782.
Nagarajan, T., Prasad, V. K., & Murthy, H. A. (2003). Minimum phase signal derived from the root cepstrum. IEEE Electronics Letters, 39, 941–942.
Oppenheim, A. V., & Schafer, R. W. (1990). Discrete time signal processing. New Jersey: Prentice Hall Inc.
Painter, T., & Spanias, A. (2000, April). Perceptual coding of digital audio. In Proceedings of IEEE (Vol. 88, No. 4, pp. 451–513).
Poliner, G., Ellis, D., Ehmann, A., Gomez, E., Streich, S., & Ong, B. (2007, May). Melody transcription from music audio:approaches and evaluation. In Proceedings of the IEEE international conference on audio, speech and language processing (Vol. 15, No. 4, pp. 1247–1256).
Prasad, V. K., Nagarajan, T., & Murthy, H. A. (2004). Automatic segmentation of continuous speech using minimum phase group delay functions. Speech Communications, 42, 429–446.
Rabiner, L., Cheng, M. J., Rosenberg, A. E., & McGonegal, C. A. (1976, October). A comparative performance study of several pitch detection algorithms. IEEE transactions on acoustics, speech and signal processing (Vol. ASSP-24, No. 5, pp. 399–418).
Rajan, R., & Murthy, H. A. (2013a, May). Group delay based melody monopitch extraction from music. In Proceedings of the IEEE international conference on audio, speech and signal processing (pp. 186–190).
Rajan, R., & Murthy, H. A. (2013b, February). Melodic pitch extraction from music signals using modified group delay functions. In 2013 National conference on proceedings of the communications (NCC) (pp. 1–5).
Rajan, R., & Murthy, H. A. (2016). Modified group delay based multipitch estimation in co-channel speech. arXiv:1603.05435.
Ramakrishnan, S., Rao, V., & Rao, P. (2008, February). Singing voice detection in north indian classical music. In Proceedings of the national conference on communications (NCC).
Rao, P., & Shandilya, S. (2004). On the detection of melodic pitch in a percussive background. The Journal of the Audio Engineering Society, 52(4), 378–391.
Rao, V., Gaddipati, P., & Rao, P. (2012). Signal-driven window length adaptation for sinusoid detection in polyphonic music. IEEE Transactions on Audio Speech and Language Processing, 20(1), 342–348.
Rao, V., & Rao, P. (2010). Vocal melody extraction in the presence of pitched accompaniment in polyphonic music. IEEE Transactions on Audio, Speech, and Language Processing, 18(8), 2145–2154.
Ryynanen, M., & Klapuri, A. (2008). Automatic transcription of melody, base line, and chords in polyphonic music. Computer Music Journal, 32(3), 72–86.
Salamon, J., & Gomez, E. (2012). Melody extraction from polyphonic music signals using pitch contours characteristics. IEEE Transactions on Audio Speech and Language Processing, 20(6), 1759–1770.
Salamon, J., Gomez, E., Ellis, D. P. W., & Richard, G. (2014). Melody extraction from polyphonic music signals: Approaches, applications and challenges. IEEE Signal Processing Magazine, 31(2), 114–118.
Salamon, J., Gomez, E., Ellis, D., & Richard, G. (2015, April). Melody extraction from music recordings. In IEEE signal processing society.
Sebastian, J., Kumar, P. A. M., & Murthy, H. A. (2016). An analysis of the high resolution property of group delay function with applications to audio signal processing. Speech Communication.
Shanmugam, S. A., & Murthy, H. (2014, September). A hybrid approach to segmentation of speech using group delay processing and HMM based embedded reestimation. In Proceedings of fifteenth annual conference of the international speech communication association (INTERSPEECH 2014).
Tachibana, H., Ono, T., Ono, N., & Sagayama, S. (2010, April). Melody line estimation in homophonic music audio signals based on temporal-variability of melodic source. In Proceedings of IEEE international conference acoustics, speech, signal processing (pp. 425–428).
Thornburg, H. (2003, September). Detection and modeling of transient audio signals with prior information. Ph.D. Thesis, Standford University.
Veldhuis, R. (2000, October). Consistent pitch marking. In Proceedings of sixth international conference on spoken language processing (Vol. 3, pp. 207–210).
Vijayan, K. Kumar, V., & Murty, K. S. R. (2014, September). Feature extraction from analytic phase of speech signals for speaker verification. In Proceedings of fifteenth annual conference of the international speech communication association (INTERSPEECH 2014) (pp. 1658–1662).
Wavesurfer-an open source speech tool. (2000) [Online]. http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.38.1118
Yegnanarayana, B., & Murthy, H. A. (1992). Significance of group delay functions in spectrum estimation. IEEE Transactions on Signal Processing, 40(9), 2281–2289.
Yegnanarayana, B., Murthy, H. A., & Ramachandran, V. R. (1991, May). Processing of noisy speech using modified group delay functions. In Proceedings of the IEEE international conference on audio, speech and signal processing (pp. 945–948).
Yeh, T. C., Wu, M. J., Jang, J. S. R., Chang, W. L., & Liao, I. B. (2012, March). A hybrid approach to singing pitch extraction based on trend estimation and hidden markov models. In Proceedings of IEEE international conference on acoustics speech and signal processing (ICASSP) Kyoto, Japan (pp. 457–460).
Yoon, J. -Y., Song, C.-J., Lee, S.-P., & Park, H. (2011). Extracting predominant melody of polyphonic music based on harmonic structure. In 7th Music information retrieval evaluation eXchange (MIREX), extended abstract.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Rajan, R., Misra, M. & Murthy, H.A. Melody extraction from music using modified group delay functions. Int J Speech Technol 20, 185–204 (2017). https://doi.org/10.1007/s10772-017-9397-1
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10772-017-9397-1