Melody extraction from music using modified group delay functions

Rajan, Rajeev; Misra, Manaswi; Murthy, Hema A.

doi:10.1007/s10772-017-9397-1

Melody extraction from music using modified group delay functions

Published: 03 February 2017

Volume 20, pages 185–204, (2017)
Cite this article

International Journal of Speech Technology Aims and scope Submit manuscript

Rajeev Rajan¹,
Manaswi Misra² &
Hema A. Murthy¹

401 Accesses
3 Citations
Explore all metrics

Abstract

Modified group delay based algorithms for estimation of melodic pitch sequences from heterphonic/polyphonic music are discussed in this paper. Two different variants of the modified group delay function are proposed, namely, (a) system based—MODGD (Direct) and (b) source based—MODGD (Source). In (a) the standard modified group delay function (MODGDF) is used to estimate prominent melodic pitch (\(f_0\)), which appears like a low frequency formant in the MODGDF spectrum. In (b), the power spectrum of the signal is first flattened to emphasise the source. The flattened power spectrum behaves like a sinusoid in noise, the frequency of the sinusoid being related to the pitch frequency. The modified group delay function of this signal produces peaks at \(T_0\), \(2T_0, \ldots ,\) where \(T_0=\frac{1}{f_0}\). Continuity constraints in a dynamic programming framework are imposed across frames to reduce octave errors. Sudden changes in pitch are accommodated by changing the frame size dynamically using a multi-resolution framework. The performance of the proposed systems was evaluated on four datasets: ADC-2004, LabROSA, MIREX-2008 and Carnatic music dataset. The performance of the proposed approaches demonstrate the potential of the group delay based methods for melody extraction.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Notes

https://en.wikipedia.org/wiki/Heterophony.
https://en.wikipedia.org/wiki/Polyphony.
Visual representation of modified group delay functions with time in vertical axis and frame index in horizontal axis. A third dimension, indicating the amplitude of group delay function at a particular time is represented by the intensity or color of each point in the image.
An ālāpana is a melodic improvisation within the constraints of a melody.
A tone corresponds to 50 cents in quarter tone scale
In the quarter tone scale, an octave is divided into 24 equal steps (equal temperament). In this scale, the quarter tone is the smallest step.

References

Arora, V., & Behera, L. (2013). On-line melody extraction from polyphonic audio using harmonic cluster tracking. IEEE Transactions on Audio Speech and Language Processing, 21(3), 520–530.
Article Google Scholar
Bello, J. P. (2003). Towards the automated analysis of simple polyphonic music: A knowledge based approach. Ph.D. Diss., University of London, Queen Mary.
Bittner, R. M., Salamon, J., Tierney, M., Mauch, M., Cannam, C., & Bello, J. P. (2014). Medleydb: A multitrack dataset for annotation-intensive mir research. In Proceedings of the international society for music information retrieval (ISMIR), Taipei, Taiwan.
Brossier, P. M. (2005, September). Fast melody extraction using aubio(brossier), mirex-2005. In 4th Music information retrieval evaluation eXchange (MIREX), extended abstract (pp. 325–333).
Cancela, P. (2008). Tracking melody in polyphonic audio. In 4th music information retrieval evaluation eXchange (MIREX), extended abstract.
Cao, C., Li, M., Liu, J., & Yan, Y. (2007). Singing melody extraction in polyphonic music by harmonic tracking. In Proceedings of international society for music information retrieval (International Society for Music Information Retrieval conference) (pp. 373–374).
Dressler, K. (2011, October). An auditory streaming approach for melody extraction from polyphonic music. In Proceedings of international society for music information retrieval conference (pp. 19–24).
Durrieu, J. L., Richard, G., & Fvotte, C. (2010). Source/filter model for unsupervised main melody extraction from polyphonic audio signals. IEEE transactions on audio, speech, and language processing (pp. 564–575).
Goto, M., & Hayamizu, S. (1999, May) A real-time music scene description system: Detecting melody and bass lines in audio signals. In Working notes of the IJCAI-99 workshop on computational auditory scene analysis (pp. 31–40).
Hsu, C.-L., Chen, L.-Y., Jang, J.-S. R., & Li, H.-J. (2009). Singing pitch extraction fom monaural polyphonic songs by contextuual audio modeling and singing harmonic enhancement. In Proceedings of the 10th international society for music information retrieval conference (pp. 201–206).
Hsu, C. L., & Jang, J. S. (2010, May) Singing pitch extraction by voice vibrato/tremolo estimation and instrument partial deletion. In Proceedings of international society for music information retrieval (International Society for Music Information Retrieval Conference) (pp. 525–530).
Hsu, C.-L., Wang, D., Jang, J.-S. R., & Hu, K. (2012). A tandem algorithm for singing pitch extraction and voice separation from music accompaniment. IEEE Transactions on Audio, Speech and Langauge Processing, 20(5), 1482–1491.
Article Google Scholar
Hu, G., & Wang, D. L. (2010). A tandem algorithm for pitch estimation and voiced speech segregation. IEEE Transactions on Audio Speech Language Processing, 18(8), 2067–2079.
Article Google Scholar
Jones, D., & Parks, T. (1990). A high-resolution data-adaptive time-frequency representation. IEEE Transactions on Acoustics, Speech, and Signal Processing, 38(12), 2127–2135, 1990.
Article Google Scholar
Joo, S., Jo, S., & Yoo, C. D. (2010). Melody extraction from polyphonic audio signal mirex-2010. In 6th Music information retrieval evaluation exchange (MIREX), 2010.
Joo, S., Park, S., Jo, S., & Yo, C. D. (2011). Melody extraction based on harmonic coded structures. In 12th international society for music information retrieval conference (ISMIR 2011) (pp. 227 –232).
Kitahara, T. (2006). Computational musical instrument recognition and its application to content-based music information retrieval. Ph.D. Diss., Kyoto University, Japan.
Kum, S., Oh, C., & Nam, J. (2016). Melody extraction on vocal segments using multi-column deep neural networks. In Proceedings of 17th international society for music information retrieval (ISMIR).
Mauch, M., & Dixon, S. (2014, April). Pyin: A fundamental frequency estimator using probabilistic threshold distributions. In Proceedings of IEEE international conference on acoustics, speech, and signal processing (pp. 659–663).
Murthy, H. A. (1991, December). Algorithms for processing fourier transform phase of signals. PhD Dissertation, Department of Computer Science and Engg, Indian Institute of Technology, Madras, India.
Murthy, H. A., & Yegnanarayana, B. (1991a). Formant extraction from minimum phase group delay function. Speech Communications, 10, 209–221.
Article Google Scholar
Murthy, H. A., & Yegnanarayana, B. (1991b). Speech processing using group delay functions. Signal Processing, 22, 259–267.
Article Google Scholar
Murthy, H. A., & Yegnanarayana, B. (2011). Group delay functions and its application to speech processing. Sadhana, 36(5), 745–782.
Article Google Scholar
Nagarajan, T., Prasad, V. K., & Murthy, H. A. (2003). Minimum phase signal derived from the root cepstrum. IEEE Electronics Letters, 39, 941–942.
Article Google Scholar
Oppenheim, A. V., & Schafer, R. W. (1990). Discrete time signal processing. New Jersey: Prentice Hall Inc.
MATH Google Scholar
Painter, T., & Spanias, A. (2000, April). Perceptual coding of digital audio. In Proceedings of IEEE (Vol. 88, No. 4, pp. 451–513).
Poliner, G., Ellis, D., Ehmann, A., Gomez, E., Streich, S., & Ong, B. (2007, May). Melody transcription from music audio:approaches and evaluation. In Proceedings of the IEEE international conference on audio, speech and language processing (Vol. 15, No. 4, pp. 1247–1256).
Prasad, V. K., Nagarajan, T., & Murthy, H. A. (2004). Automatic segmentation of continuous speech using minimum phase group delay functions. Speech Communications, 42, 429–446.
Article Google Scholar
Rabiner, L., Cheng, M. J., Rosenberg, A. E., & McGonegal, C. A. (1976, October). A comparative performance study of several pitch detection algorithms. IEEE transactions on acoustics, speech and signal processing (Vol. ASSP-24, No. 5, pp. 399–418).
Rajan, R., & Murthy, H. A. (2013a, May). Group delay based melody monopitch extraction from music. In Proceedings of the IEEE international conference on audio, speech and signal processing (pp. 186–190).
Rajan, R., & Murthy, H. A. (2013b, February). Melodic pitch extraction from music signals using modified group delay functions. In 2013 National conference on proceedings of the communications (NCC) (pp. 1–5).
Rajan, R., & Murthy, H. A. (2016). Modified group delay based multipitch estimation in co-channel speech. arXiv:1603.05435.
Ramakrishnan, S., Rao, V., & Rao, P. (2008, February). Singing voice detection in north indian classical music. In Proceedings of the national conference on communications (NCC).
Rao, P., & Shandilya, S. (2004). On the detection of melodic pitch in a percussive background. The Journal of the Audio Engineering Society, 52(4), 378–391.
Google Scholar
Rao, V., Gaddipati, P., & Rao, P. (2012). Signal-driven window length adaptation for sinusoid detection in polyphonic music. IEEE Transactions on Audio Speech and Language Processing, 20(1), 342–348.
Article Google Scholar
Rao, V., & Rao, P. (2010). Vocal melody extraction in the presence of pitched accompaniment in polyphonic music. IEEE Transactions on Audio, Speech, and Language Processing, 18(8), 2145–2154.
Article Google Scholar
Ryynanen, M., & Klapuri, A. (2008). Automatic transcription of melody, base line, and chords in polyphonic music. Computer Music Journal, 32(3), 72–86.
Article Google Scholar
Salamon, J., & Gomez, E. (2012). Melody extraction from polyphonic music signals using pitch contours characteristics. IEEE Transactions on Audio Speech and Language Processing, 20(6), 1759–1770.
Article Google Scholar
Salamon, J., Gomez, E., Ellis, D. P. W., & Richard, G. (2014). Melody extraction from polyphonic music signals: Approaches, applications and challenges. IEEE Signal Processing Magazine, 31(2), 114–118.
Article Google Scholar
Salamon, J., Gomez, E., Ellis, D., & Richard, G. (2015, April). Melody extraction from music recordings. In IEEE signal processing society.
Sebastian, J., Kumar, P. A. M., & Murthy, H. A. (2016). An analysis of the high resolution property of group delay function with applications to audio signal processing. Speech Communication.
Shanmugam, S. A., & Murthy, H. (2014, September). A hybrid approach to segmentation of speech using group delay processing and HMM based embedded reestimation. In Proceedings of fifteenth annual conference of the international speech communication association (INTERSPEECH 2014).
Tachibana, H., Ono, T., Ono, N., & Sagayama, S. (2010, April). Melody line estimation in homophonic music audio signals based on temporal-variability of melodic source. In Proceedings of IEEE international conference acoustics, speech, signal processing (pp. 425–428).
Thornburg, H. (2003, September). Detection and modeling of transient audio signals with prior information. Ph.D. Thesis, Standford University.
Veldhuis, R. (2000, October). Consistent pitch marking. In Proceedings of sixth international conference on spoken language processing (Vol. 3, pp. 207–210).
Vijayan, K. Kumar, V., & Murty, K. S. R. (2014, September). Feature extraction from analytic phase of speech signals for speaker verification. In Proceedings of fifteenth annual conference of the international speech communication association (INTERSPEECH 2014) (pp. 1658–1662).
Wavesurfer-an open source speech tool. (2000) [Online]. http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.38.1118
http://www.music-ir.org/mirex/wiki/2012:mirex2012-results.
Yegnanarayana, B., & Murthy, H. A. (1992). Significance of group delay functions in spectrum estimation. IEEE Transactions on Signal Processing, 40(9), 2281–2289.
Article MATH Google Scholar
Yegnanarayana, B., Murthy, H. A., & Ramachandran, V. R. (1991, May). Processing of noisy speech using modified group delay functions. In Proceedings of the IEEE international conference on audio, speech and signal processing (pp. 945–948).
Yeh, T. C., Wu, M. J., Jang, J. S. R., Chang, W. L., & Liao, I. B. (2012, March). A hybrid approach to singing pitch extraction based on trend estimation and hidden markov models. In Proceedings of IEEE international conference on acoustics speech and signal processing (ICASSP) Kyoto, Japan (pp. 457–460).
Yoon, J. -Y., Song, C.-J., Lee, S.-P., & Park, H. (2011). Extracting predominant melody of polyphonic music based on harmonic structure. In 7th Music information retrieval evaluation eXchange (MIREX), extended abstract.

Download references

Author information

Authors and Affiliations

Department of Computer Science and Engineering, Indian Institute of Technology, Madras, Chennai, India
Rajeev Rajan & Hema A. Murthy
Center for Computer Research in Music and Acoustics, Stanford University, Stanford, CA, USA
Manaswi Misra

Authors

Rajeev Rajan
View author publications
You can also search for this author in PubMed Google Scholar
Manaswi Misra
View author publications
You can also search for this author in PubMed Google Scholar
Hema A. Murthy
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Rajeev Rajan.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Rajan, R., Misra, M. & Murthy, H.A. Melody extraction from music using modified group delay functions. Int J Speech Technol 20, 185–204 (2017). https://doi.org/10.1007/s10772-017-9397-1

Download citation

Received: 26 August 2016
Accepted: 07 January 2017
Published: 03 February 2017
Issue Date: March 2017
DOI: https://doi.org/10.1007/s10772-017-9397-1

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Melody extraction from music using modified group delay functions

Abstract

Access this article

Similar content being viewed by others

Predominant Melody Extraction from Vocal Polyphonic Music Signal by Time-Domain Adaptive Filtering-Based Method

Pitch estimation of speech and music sound based on multi-scale product with auditory feature extraction

Multiple Pitch Estimation Based on Modified Harmonic Product Spectrum

Notes

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Melody extraction from music using modified group delay functions

Abstract

Access this article

Similar content being viewed by others

Predominant Melody Extraction from Vocal Polyphonic Music Signal by Time-Domain Adaptive Filtering-Based Method

Pitch estimation of speech and music sound based on multi-scale product with auditory feature extraction

Multiple Pitch Estimation Based on Modified Harmonic Product Spectrum

Notes

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation