Skip to main content

A Multi-scale Convolutional Neural Network Architecture for Music Auto-Tagging

  • Conference paper
  • First Online:
Soft Computing for Problem Solving

Abstract

The application of deep neural networks, particularly convolutional neural networks, in the field of music auto-tagging has been gaining traction in recent times. These deep networks relieve the engineers from the burden of handcrafting domain-specific features. However, musical features often show great temporal diversity which traditional deep networks are unable to capture. Keeping this in mind, we propose a convolutional neural network architecture which attempts to learn features over multiple timescales. The architecture runs multiple convolutions over various subsampled versions of the original audio spectrogram. These convolution streams are then concatenated to make the tag predictions. We evaluate the architecture on the MagnaTagATune dataset, and we show that the proposed architecture yields results close to the state of the art and comprehensively beats shallow classifiers trained on handcrafted features.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 169.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 219.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Davis, S.B., Mermelstein, P.: Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences. IEEE Trans. Acoust. Speech Sig. Process. 28(4), 357–366 (1980)

    Article  Google Scholar 

  2. Hinton, G.E., Osindero, S., Teh, Y.-W.: A fast learning algorithm for deep belief nets. Neural Comput. 18, 1527–1554 (2006)

    Article  MathSciNet  Google Scholar 

  3. Schmidhuber, J.: Deep learning in neural networks: an overview (2014). arXiv:1404.7828

  4. Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet classification with deep convolutional neural networks. In: Proceedings of the Neural Information Processing Systems Conference (2012)

    Google Scholar 

  5. Hinton, G., Deng, L., Dong, Y., Dahl, G., Mohamed, A.-R., Jaitly, N., Senior, A., Vanhoucke, V., Nguyen, P., Sainath, T., Kingsbury, B.: Deep neural networks for acoustic modeling in speech recognition. IEEE Sig. Process. Mag. 29, 82–97 (2012)

    Article  Google Scholar 

  6. Humphrey, E.J., Bello, J.P., LeCun, Y.: Moving beyond feature design: deep architecture and automatic feature learning in music informatics. In: Proceedings of the 13th International Society for Music Information Retrieval Conference (2012)

    Google Scholar 

  7. Choi, K., Fazekas, G., Sandler, M., Cho, K.: Convolutional recurrent neural networks for music classification (2016). arXiv:1609.04243

  8. Briot, J.-P., Hadjeres, G., Pachet, F.: Deep learning techniques for music generation—A survey (2017). arXiv:1709.01620

  9. Multiscale approaches to music audio feature learning. In: Proceedings of the 14th International Society for Music Information Retrieval Conference (2013)

    Google Scholar 

  10. Law, E., West, K., Mandel, M., Bay, M., Downie, J.S.: Evaluation of algorithms using games: the case of music annotation. In: Proceedings of the 10th International Conference on Music Information Retrieval (ISMIR) (2009)

    Google Scholar 

  11. Wulfing, J., Riedmiller, M.: Unsupervised learning of local features for music classification. In: Proceedings of the 13th International Society for Music Information Retrieval Conference (2012)

    Google Scholar 

  12. Nam, J., Herrera, J., Slaney, M., Smith, J.: Learning sparse feature representations for music annotation and retrieval. In: Proceedings of the 13th International Society for Music Information Retrieval Conference (2012)

    Google Scholar 

  13. Nam, J., Herrera, J., Lee, K.: A deep bag-of-features model for music auto-tagging (2015). arXiv:1508.04999

  14. Choi, K., Fazekas, G., Sandler, M.: Automatic tagging using deep convolutional neural networks (2016). arXiv:1606.00298

  15. van den Oord, A., Dieleman, S., Schrauwen, B.: Deep content-based music recommendation. In: Proceedings of the Neural Information Processing Systems Conference (2013)

    Google Scholar 

  16. Dieleman, S., Schrauwen, B.: End-to-end learning for music audio. In: Proceedings of the IEEE International Conference on Acoustic, Speech and Signal Processing (ICASSP) (2014)

    Google Scholar 

  17. Lee, J., Park, J., Kim, K.L., Nam, J.: Sample-level deep convolutional neural networks for music auto-tagging using raw waveforms (2017). arXiv:1703.01789

  18. Hamel, P., Bengio, Y., Eck, D.: Building musically-relevant audio features through multiple timescale representations. In: Proceedings of the 13th International Society for Music Information Retrieval Conference (2012)

    Google Scholar 

  19. Mesgarani, N., Shamma, S., Slaney, M.: Speech discrimination based on multiscale spectro-temporal modulations. In: Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP) (2004)

    Google Scholar 

  20. Lee, J., Nam, J.: Multi-level and multi-scale feature aggregation using pre-trained convolutional neural networks for music auto-tagging. arXiv:1703.01793 (2017)

  21. LeCun, Y., Boser, B.E., Denker, J.S., Henderson, D., Howard, R.E., Hubbard, W.E., Jackel, L.D.: Handwritten digit recognition with a back-propagation network. In: Proceedings of the Neural Information Processing Systems Conference (1989)

    Google Scholar 

  22. Sainath, T.N., Mohamed, A.-R., Kingsbury, B., Ramabhadran, B.: Deep convolutional neural networks For LVCSR. In: Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (2013)

    Google Scholar 

  23. Dorfler, M., Bammer, R., Grill, T.: Inside the spectrogram: convolutional neural networks in audio processing. In: Proceedings of the International Conference on Sampling Theory and Applications (SampTA) (2017)

    Google Scholar 

  24. Fawcett, T.: An introduction to ROC analysis. Pattern Recogn. Lett. 27, 861–874 (2006)

    Article  Google Scholar 

  25. Nair, V., Hinton, G.E.: Rectified linear units improve restricted Boltzmann machines. In: Proceedings of the 27th International Conference on Machine Learning (ICML) (2010)

    Google Scholar 

  26. Ioffe, S., Szegedy, C.: Batch normalization: accelerating deep network training by reducing internal covariate shift (2015). arXiv:1502.03167

  27. Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., Salakhutdinov, R.: Dropout: a simple way to prevent neural networks from overfitting. J. Mach. Learn. Res. 15, 1929–1958 (2014)

    MathSciNet  MATH  Google Scholar 

  28. Kingma, D.P., Adam, J.B.: A method for stochastic optimization (2014). arXiv:1412.6980

  29. McFee, B., Raffel, C., Liang, D., Ellis, D.P.W., McVicar, M., Battenberg, E., Nieto, O.: librosa: audio and music signal analysis in python. In: Proceedings of the 14th Python in Science Conference, pp. 18–25 (2015)

    Google Scholar 

  30. Theano Development Team, Theano: A python framework for fast computation of mathematical expressions (2016). arXiv:1605.02688

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Tanmaya Shekhar Dabral .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Dabral, T.S., Deshmukh, A.S., Malapati, A. (2019). A Multi-scale Convolutional Neural Network Architecture for Music Auto-Tagging. In: Bansal, J., Das, K., Nagar, A., Deep, K., Ojha, A. (eds) Soft Computing for Problem Solving. Advances in Intelligent Systems and Computing, vol 816. Springer, Singapore. https://doi.org/10.1007/978-981-13-1592-3_60

Download citation

Publish with us

Policies and ethics