Skip to main content
Log in

Melody extraction from music using modified group delay functions

  • Published:
International Journal of Speech Technology Aims and scope Submit manuscript

Abstract

Modified group delay based algorithms for estimation of melodic pitch sequences from heterphonic/polyphonic music are discussed in this paper. Two different variants of the modified group delay function are proposed, namely, (a) system based—MODGD (Direct) and (b) source based—MODGD (Source). In (a) the standard modified group delay function (MODGDF) is used to estimate prominent melodic pitch (\(f_0\)), which appears like a low frequency formant in the MODGDF spectrum. In (b), the power spectrum of the signal is first flattened to emphasise the source. The flattened power spectrum behaves like a sinusoid in noise, the frequency of the sinusoid being related to the pitch frequency. The modified group delay function of this signal produces peaks at \(T_0\), \(2T_0, \ldots ,\) where \(T_0=\frac{1}{f_0}\). Continuity constraints in a dynamic programming framework are imposed across frames to reduce octave errors. Sudden changes in pitch are accommodated by changing the frame size dynamically using a multi-resolution framework. The performance of the proposed systems was evaluated on four datasets: ADC-2004, LabROSA, MIREX-2008 and Carnatic music dataset. The performance of the proposed approaches demonstrate the potential of the group delay based methods for melody extraction.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17

Similar content being viewed by others

Notes

  1. https://en.wikipedia.org/wiki/Heterophony.

  2. https://en.wikipedia.org/wiki/Polyphony.

  3. Visual representation of modified group delay functions with time in vertical axis and frame index in horizontal axis. A third dimension, indicating the amplitude of group delay function at a particular time is represented by the intensity or color of each point in the image.

  4. An ālāpana is a melodic improvisation within the constraints of a melody.

  5. A tone corresponds to 50 cents in quarter tone scale

  6. In the quarter tone scale, an octave is divided into 24 equal steps (equal temperament). In this scale, the quarter tone is the smallest step.

References

  • Arora, V., & Behera, L. (2013). On-line melody extraction from polyphonic audio using harmonic cluster tracking. IEEE Transactions on Audio Speech and Language Processing, 21(3), 520–530.

    Article  Google Scholar 

  • Bello, J. P. (2003). Towards the automated analysis of simple polyphonic music: A knowledge based approach. Ph.D. Diss., University of London, Queen Mary.

  • Bittner, R. M., Salamon, J., Tierney, M., Mauch, M., Cannam, C., & Bello, J. P. (2014). Medleydb: A multitrack dataset for annotation-intensive mir research. In Proceedings of the international society for music information retrieval (ISMIR), Taipei, Taiwan.

  • Brossier, P. M. (2005, September). Fast melody extraction using aubio(brossier), mirex-2005. In 4th Music information retrieval evaluation eXchange (MIREX), extended abstract (pp. 325–333).

  • Cancela, P. (2008). Tracking melody in polyphonic audio. In 4th music information retrieval evaluation eXchange (MIREX), extended abstract.

  • Cao, C., Li, M., Liu, J., & Yan, Y. (2007). Singing melody extraction in polyphonic music by harmonic tracking. In Proceedings of international society for music information retrieval (International Society for Music Information Retrieval conference) (pp. 373–374).

  • Dressler, K. (2011, October). An auditory streaming approach for melody extraction from polyphonic music. In Proceedings of international society for music information retrieval conference (pp. 19–24).

  • Durrieu, J. L., Richard, G., & Fvotte, C. (2010). Source/filter model for unsupervised main melody extraction from polyphonic audio signals. IEEE transactions on audio, speech, and language processing (pp. 564–575).

  • Goto, M., & Hayamizu, S. (1999, May) A real-time music scene description system: Detecting melody and bass lines in audio signals. In Working notes of the IJCAI-99 workshop on computational auditory scene analysis (pp. 31–40).

  • Hsu, C.-L., Chen, L.-Y., Jang, J.-S. R., & Li, H.-J. (2009). Singing pitch extraction fom monaural polyphonic songs by contextuual audio modeling and singing harmonic enhancement. In Proceedings of the 10th international society for music information retrieval conference (pp. 201–206).

  • Hsu, C. L., & Jang, J. S. (2010, May) Singing pitch extraction by voice vibrato/tremolo estimation and instrument partial deletion. In Proceedings of international society for music information retrieval (International Society for Music Information Retrieval Conference) (pp. 525–530).

  • Hsu, C.-L., Wang, D., Jang, J.-S. R., & Hu, K. (2012). A tandem algorithm for singing pitch extraction and voice separation from music accompaniment. IEEE Transactions on Audio, Speech and Langauge Processing, 20(5), 1482–1491.

    Article  Google Scholar 

  • Hu, G., & Wang, D. L. (2010). A tandem algorithm for pitch estimation and voiced speech segregation. IEEE Transactions on Audio Speech Language Processing, 18(8), 2067–2079.

    Article  Google Scholar 

  • Jones, D., & Parks, T. (1990). A high-resolution data-adaptive time-frequency representation. IEEE Transactions on Acoustics, Speech, and Signal Processing, 38(12), 2127–2135, 1990.

    Article  Google Scholar 

  • Joo, S., Jo, S., & Yoo, C. D. (2010). Melody extraction from polyphonic audio signal mirex-2010. In 6th Music information retrieval evaluation exchange (MIREX), 2010.

  • Joo, S., Park, S., Jo, S., & Yo, C. D. (2011). Melody extraction based on harmonic coded structures. In 12th international society for music information retrieval conference (ISMIR 2011) (pp. 227 –232).

  • Kitahara, T. (2006). Computational musical instrument recognition and its application to content-based music information retrieval. Ph.D. Diss., Kyoto University, Japan.

  • Kum, S., Oh, C., & Nam, J. (2016). Melody extraction on vocal segments using multi-column deep neural networks. In Proceedings of 17th international society for music information retrieval (ISMIR).

  • Mauch, M., & Dixon, S. (2014, April). Pyin: A fundamental frequency estimator using probabilistic threshold distributions. In Proceedings of IEEE international conference on acoustics, speech, and signal processing (pp. 659–663).

  • Murthy, H. A. (1991, December). Algorithms for processing fourier transform phase of signals. PhD Dissertation, Department of Computer Science and Engg, Indian Institute of Technology, Madras, India.

  • Murthy, H. A., & Yegnanarayana, B. (1991a). Formant extraction from minimum phase group delay function. Speech Communications, 10, 209–221.

    Article  Google Scholar 

  • Murthy, H. A., & Yegnanarayana, B. (1991b). Speech processing using group delay functions. Signal Processing, 22, 259–267.

    Article  Google Scholar 

  • Murthy, H. A., & Yegnanarayana, B. (2011). Group delay functions and its application to speech processing. Sadhana, 36(5), 745–782.

    Article  Google Scholar 

  • Nagarajan, T., Prasad, V. K., & Murthy, H. A. (2003). Minimum phase signal derived from the root cepstrum. IEEE Electronics Letters, 39, 941–942.

    Article  Google Scholar 

  • Oppenheim, A. V., & Schafer, R. W. (1990). Discrete time signal processing. New Jersey: Prentice Hall Inc.

    MATH  Google Scholar 

  • Painter, T., & Spanias, A. (2000, April). Perceptual coding of digital audio. In Proceedings of IEEE (Vol. 88, No. 4, pp. 451–513).

  • Poliner, G., Ellis, D., Ehmann, A., Gomez, E., Streich, S., & Ong, B. (2007, May). Melody transcription from music audio:approaches and evaluation. In Proceedings of the IEEE international conference on audio, speech and language processing (Vol. 15, No. 4, pp. 1247–1256).

  • Prasad, V. K., Nagarajan, T., & Murthy, H. A. (2004). Automatic segmentation of continuous speech using minimum phase group delay functions. Speech Communications, 42, 429–446.

    Article  Google Scholar 

  • Rabiner, L., Cheng, M. J., Rosenberg, A. E., & McGonegal, C. A. (1976, October). A comparative performance study of several pitch detection algorithms. IEEE transactions on acoustics, speech and signal processing (Vol. ASSP-24, No. 5, pp. 399–418).

  • Rajan, R., & Murthy, H. A. (2013a, May). Group delay based melody monopitch extraction from music. In Proceedings of the IEEE international conference on audio, speech and signal processing (pp. 186–190).

  • Rajan, R., & Murthy, H. A. (2013b, February). Melodic pitch extraction from music signals using modified group delay functions. In 2013 National conference on proceedings of the communications (NCC) (pp. 1–5).

  • Rajan, R., & Murthy, H. A. (2016). Modified group delay based multipitch estimation in co-channel speech. arXiv:1603.05435.

  • Ramakrishnan, S., Rao, V., & Rao, P. (2008, February). Singing voice detection in north indian classical music. In Proceedings of the national conference on communications (NCC).

  • Rao, P., & Shandilya, S. (2004). On the detection of melodic pitch in a percussive background. The Journal of the Audio Engineering Society, 52(4), 378–391.

    Google Scholar 

  • Rao, V., Gaddipati, P., & Rao, P. (2012). Signal-driven window length adaptation for sinusoid detection in polyphonic music. IEEE Transactions on Audio Speech and Language Processing, 20(1), 342–348.

    Article  Google Scholar 

  • Rao, V., & Rao, P. (2010). Vocal melody extraction in the presence of pitched accompaniment in polyphonic music. IEEE Transactions on Audio, Speech, and Language Processing, 18(8), 2145–2154.

    Article  Google Scholar 

  • Ryynanen, M., & Klapuri, A. (2008). Automatic transcription of melody, base line, and chords in polyphonic music. Computer Music Journal, 32(3), 72–86.

    Article  Google Scholar 

  • Salamon, J., & Gomez, E. (2012). Melody extraction from polyphonic music signals using pitch contours characteristics. IEEE Transactions on Audio Speech and Language Processing, 20(6), 1759–1770.

    Article  Google Scholar 

  • Salamon, J., Gomez, E., Ellis, D. P. W., & Richard, G. (2014). Melody extraction from polyphonic music signals: Approaches, applications and challenges. IEEE Signal Processing Magazine, 31(2), 114–118.

    Article  Google Scholar 

  • Salamon, J., Gomez, E., Ellis, D., & Richard, G. (2015, April). Melody extraction from music recordings. In IEEE signal processing society.

  • Sebastian, J., Kumar, P. A. M., & Murthy, H. A. (2016). An analysis of the high resolution property of group delay function with applications to audio signal processing. Speech Communication.

  • Shanmugam, S. A., & Murthy, H. (2014, September). A hybrid approach to segmentation of speech using group delay processing and HMM based embedded reestimation. In Proceedings of fifteenth annual conference of the international speech communication association (INTERSPEECH 2014).

  • Tachibana, H., Ono, T., Ono, N., & Sagayama, S. (2010, April). Melody line estimation in homophonic music audio signals based on temporal-variability of melodic source. In Proceedings of IEEE international conference acoustics, speech, signal processing (pp. 425–428).

  • Thornburg, H. (2003, September). Detection and modeling of transient audio signals with prior information. Ph.D. Thesis, Standford University.

  • Veldhuis, R. (2000, October). Consistent pitch marking. In Proceedings of sixth international conference on spoken language processing (Vol. 3, pp. 207–210).

  • Vijayan, K. Kumar, V., & Murty, K. S. R. (2014, September). Feature extraction from analytic phase of speech signals for speaker verification. In Proceedings of fifteenth annual conference of the international speech communication association (INTERSPEECH 2014) (pp. 1658–1662).

  • Wavesurfer-an open source speech tool. (2000) [Online]. http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.38.1118

  • http://www.music-ir.org/mirex/wiki/2012:mirex2012-results.

  • Yegnanarayana, B., & Murthy, H. A. (1992). Significance of group delay functions in spectrum estimation. IEEE Transactions on Signal Processing, 40(9), 2281–2289.

    Article  MATH  Google Scholar 

  • Yegnanarayana, B., Murthy, H. A., & Ramachandran, V. R. (1991, May). Processing of noisy speech using modified group delay functions. In Proceedings of the IEEE international conference on audio, speech and signal processing (pp. 945–948).

  • Yeh, T. C., Wu, M. J., Jang, J. S. R., Chang, W. L., & Liao, I. B. (2012, March). A hybrid approach to singing pitch extraction based on trend estimation and hidden markov models. In Proceedings of IEEE international conference on acoustics speech and signal processing (ICASSP) Kyoto, Japan (pp. 457–460).

  • Yoon, J. -Y., Song, C.-J., Lee, S.-P., & Park, H. (2011). Extracting predominant melody of polyphonic music based on harmonic structure. In 7th Music information retrieval evaluation eXchange (MIREX), extended abstract.

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Rajeev Rajan.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Rajan, R., Misra, M. & Murthy, H.A. Melody extraction from music using modified group delay functions. Int J Speech Technol 20, 185–204 (2017). https://doi.org/10.1007/s10772-017-9397-1

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10772-017-9397-1

Keywords

Navigation