Skip to main content
Log in

Designing a graph-based framework to support a multi-modal approach for music information retrieval

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

In this paper, we propose a graph-based framework to organize low-level and high-level features of music objects in a unified way. The featured graph, called the power graph, is associated with operators to support a variety of music information retrieval applications, such as auto-tagging, link analysis, similarity measurement, and clustering. Among these operators, we have identified the node ranking by computing prestige value as one of the essential fundamental link analysis operators. For this particular operator, we propose two methods of computing prestige; they are the power method and the algebraic method. Although the algebraic method is originated from the symmetric graph, the algebraic method can be applied as an approximate but efficient alternative to the power method. To demonstrate the feasibility of our framework, we have carried out an auto-tagging experiment and a music object clustering experiment. According to the auto-tagging experimental results, we have observed that the algebraic method has achieved almost the same results as the power method with only a one-fifth elapsed time. In the experiments we have conducted, we have achieved accuracy levels up to 75 %.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16

Similar content being viewed by others

Notes

  1. MusicBrainz, available at http://musicbrainz.org/

  2. EchoNest, available at http://the.echonest.com/

  3. AllMusic, available at http://www.allmusic.com/

  4. “Words and other instructions in musical scores used to define the speed and specify the manner of performance” [15].

  5. http://www.music-ir.org/mirex/wiki/MIREX_HOME.

References

  1. Bailloeul T, Zhu C, Xu Y (2008) Automatic image tagging as a random walk with priors on the canonical correlation subspace. Proceedings of the 1st ACM International Conference on Multimedia Information Retrieval (MIR 2008) (pp. 75–82). ACM Press. doi:10.1145/1460096.1460110

  2. Barbedo JGA, Lopes A (2007) Automatic genre classification of musical signals. EURASIP J Adv Signal Proc, 2007(Article ID 64960). doi:10.1155/2007/64960

  3. Berenzweig A, Ellis D, Logan B, Whitman B (2004) A large scale evaluation of acoustic and subjective music similarity measures. In: Proceedings of the 5th International Conference on Music Information Retrieval (ISMIR’04), Barcelona, Spain, October 2004

  4. Bertin-Mahieux T, Ellis DPW, Whitman B, Lamere P (2011) The million song dataset. Proceedings of the 12th International Society for Music Information Retrieval Conference (ISMIR 2011), 591–596

  5. Breyer L (2002) Markovian page ranking distributions: some theory and simulations. Technical report. Available at http://www.lbreyer.com/preprints.html

  6. Brin S, Page L (1998) The anatomy of a large-scale hypertextual web search engine. Comput Netw ISDN Sys 30(1–7):107–117. doi:10.1016/j.comnet.2012.10.007

    Article  Google Scholar 

  7. Bryan NJ, Wang G (2011) Musical influence network analysis and rank of sample-based music. Proceedings of the 12th International Society for Music Information Retrieval Conference (ISMIR 2011), 329–334

  8. Cano P, Celma O, Koppenberger M, Martin-Buldu J (2006) Topology of music recommendation networks. Chaos Interdiscip J Nonlinear Sci 16

  9. Cano P, Koppenberger M (2004) The emergence of complex network patterns in music artist networks. Proceedings of the International Society for Music Information Retrieval Conference (ISMIR 2004), 466–469

  10. Chen L, Wright P, Nejdl W (2009) Improving music genre classification using collaborative tagging data. Proceedings of Second ACM International Conference on Web Search and Data Mining (WSDM 2009).

  11. Coscia M, Giannotti F, Pedreschi D (2011) A classification for community discovery methods in complex networks. Stat Anal Data Min 4(5):512–546. doi:10.1002/sam, Wiley Periodicals, Inc

    Article  MathSciNet  Google Scholar 

  12. Downie JS (2008) The music information retrieval evaluation exchange (2005–2007): a window into music information retrieval research. Acoust Sci Technol 29(4):247–255. doi:10.1250/ast.29.247

    Article  Google Scholar 

  13. Downie JS, Ehmann AF, Bay M, Cameron Jones M (2010) The music information retrieval evaluation eXchange: some observations and insights. Adv Music Inf Retr 274:93–115

    Google Scholar 

  14. Easley D, Kleinberg J (2010) Networks, crowds, and markets: reasoning about a highly connected world. Cambridge University Press, New York

    Book  Google Scholar 

  15. Fallows D (accessed September 28, 2013) “Tempo and expression marks.” Grove Music Online. Oxford Music Online. Oxford University Press. Available at http://www.oxfordmusiconline.com/subscriber/article/grove/music/27650

  16. Fu Z, Lu G, Ting KM, Zhang D (2011) A survey of audio-based music classification and annotation. IEEE Trans Multimed 13(2):303–319. doi:10.1109/TMM.2010.2098858

    Article  Google Scholar 

  17. Gersho A, Gray RM (1991) Vector quantization and signal compression. Kluwer Academic Publishers, Norwell

    Google Scholar 

  18. Girvan M, Newman MEJ (2002) Community structure in social and biological networks. PNAS 99(12):7821–7826. doi:10.1073/pnas.122653799

    Article  MATH  MathSciNet  Google Scholar 

  19. Gouyon F, Dixon S, Pampalk E, Widmer G (2004) Evaluating rhythmic descriptors for musical genre classification. In: Proceedings of the 25th International AES Conference, London, UK, June 2004

  20. Graphviz. Graph visualization software. Available at http://www.graphviz.org/Home.php

  21. Hsu J-L, Li Y-F (2012) A cross-modal method of labeling music tags. Multimedia Tools Appl 58(3):521–541. doi:10.1007/s11042-011-0729-x

    Article  MathSciNet  Google Scholar 

  22. Jang R (2011) DCPR toolbox. Retrieved from http://neural.cs.nthu.edu.tw/jang/books/dcpr/

  23. Lartillot O, Toiviainen P, Eerola T (2011) MIRtoolbox. Retrieved from https://www.jyu.fi/hum/laitokset/musiikki/en/research/coe/materials/mirtoolbox

  24. Levy M, Sandler M (2009) Music information retrieval using social tags and audio. IEEE Trans Multimed 11(3):383–395. doi:10.1109/TMM.2009.2012913

    Article  Google Scholar 

  25. Lew MS, Sebe N, Djeraba C, Jain R (2006) Content-based multimedia information retrieval: state of the art and challenges. ACM Trans Multimed Comput Commun Appl 2(1):1–19. doi:10.1145/1126004.1126005

    Article  Google Scholar 

  26. Li Q, Myaeng SH, Kim BM (2007) A probabilistic music recommender considering user options and audio features. Inf Process Manag 43(2):473–487

    Article  Google Scholar 

  27. Lidy T, Rauber A (2005) “Evaluation of feature extractors and psycho-acoustic transformations for music genre classification”. In: Proceedings of the 6th International Conference on Music Information Retrieval (ISMIR’05), pp. 34–41, London, UK, September 2005

  28. Liu B (2011) Web data mining: exploring hyperlinks, contents, and usage data, 2nd edn. Springer

  29. Manning CD, Raghavan P, Schütze H (2008) Introduction to information retrieval (p. 496). Cambridge University Press

  30. Marcus SE, Moy M, Coffman T (2007) Social network analysis. In: Cook DJ, Holder LB (eds) Mining graph data (pp. 443–451). John Wiley & Sons, Inc

  31. McFee B, Bertin-Mahieux T, Ellis D, Lanckriet G (2012) The million song dataset challenge. In: Proceedings of the 4th International Workshop on Advances in Music Information Research (AdMIRe ‘12)

  32. McKay C, Burgoyne JA, Hockman J, Smith JBL, Vigliensoni G (2010) Evaluating the genre classification performance of lyrical features relative to audio, symbolic and cultural features. In: Proceeding of the 11th International Conference for Music Information Retrieval Conference

  33. McKay C, Fujinaga I (2008) Combining features extracted from audio, symbolic, and cultural sources. In: Proceedings of International Conference on Music Information Retrieval

  34. Miotto R, Orio N (2010) A probabilistic approach to merge context and content information for music retrieval. In: Downie JS, Veltkamp RC (eds) International Conference on Music Information Retrieval (pp. 15–20). International Society for Music Information Retrieval

  35. Miotto R, Orio N (2012) A probabilistic model to combine tags and acoustic similarity for music retrieval. ACM Trans Inf Syst 30(2):8.1–8.29. doi:10.1145/2180868.2180870

    Article  Google Scholar 

  36. Page L, Brin S, Motwani R, Winograd T (1998) The PageRank citation ranking: bringing order to the web. Proceedings of the World Wide Web Internet and Web Information Systems (pp. 1–17). Technical report, Stanford Digital Library Technologies Project, 1998. Retrieved from http://en.scientificcommons.org/42893894

  37. Pan J-Y, Yang H-J, Faloutsos C, DuyguluP (2004) Automatic multimedia cross-modal correlation discovery. Proceedings of the tenth ACM International Conference on Knowledge Discovery and Data Mining (SIGKDD 2004) (pp. 653–658). Seattle, WA, USA: ACM Press. doi:10.1145/1014052.1014135

  38. Pohle T, Pampalk E, Widmer G (2005) Evaluation of frequently used audio features for classification of music into perceptual categories. In: Proceedings of the 4th International Workshop on Content-Based Multimedia Indexing (CBMI’05), Riga, Latvia, June 2005

  39. Sayood K (2012) Introduction to data compression, 4th edn., p. 768. Morgan Kaufmann Publishers

  40. Scaringella N, Zoia G, Mlynek D (2006) Automatic genre classification of music content: a survey. IEEE Signal Process Mag 23(2):133–141. doi:10.1109/MSP.2006.1598089

    Article  Google Scholar 

  41. Sergios T, Konstantinos K (2006) Pattern recognition, 3rd edn. Academic Press

  42. Song, Y., Dixon, S., and Pearce, P. (2012). Evaluation of Musical Features for Emotion Classification. In: Proceeding of the 13th International Conference for Music Information Retrieval Conference

  43. Stober S, Nürnberger A (2013) Adaptive music retrieval: a state of the art. Multimedia Tools Appl 65(3):467–494. doi:10.1007/s11042-012-1042-z

    Article  Google Scholar 

  44. Tan S, Bu J, Chen C, Xu B, Wang C, He X (2011) Using rich social media information for music recommendation via hypergraph model. ACM Trans Multimed Comput Commun Appl 7S(1), 22:1–22:22. doi:10.1145/2037676.2037679

  45. Tong H, Faloutsos C, Pan J-Y (2007) Random walk with restart: fast solutions and applications. Knowl Inf Syst 14(3):327–346. Springer-Verlag New York, Inc. doi:10.1007/s10115-007-0094-2

  46. Tzanetakis G, Cook P (2002) Musical genre classification of audio signals. IEEE Trans Speech Audio Process 10(5):293–302. doi:10.1109/TSA.2002.800560

    Article  Google Scholar 

  47. Wang C, Jing F, Zhang L, Zhang H-J (2006) Image annotation refinement using random walk with restarts. Proceedings of the 14th annual ACM International Conference on Multimedia (MULTIMEDIA 2006) (pp. 647–650). Santa Barbara, CA, USA: ACM press. doi:10.1145/1180639.1180774

  48. Zsuzsanna M, Sacarea C (2011) Using conceptual graphs to represent modern music. In: Proceedings of the 2011 I.E. International Conference on Intelligent Computer Communication and Processing (ICCP), pp. 137,140, 25–27 Aug. 2011. doi:10.1109/ICCP.2011.6047857

Download references

Acknowledgments

The authors would like to thank to Professor George Tzanetakis for his valuable guidance and advice on experiments using the Million Songs Dataset [4].

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jia-Lien Hsu.

Additional information

This research was supported by Fu Jen Catholic University with Project No. 410031044042 and sponsored by the National Science Council under Contract No. NSC-100-2221-E-030-021 and NSC-101-2221-E-030-008.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Hsu, JL., Huang, CC. Designing a graph-based framework to support a multi-modal approach for music information retrieval. Multimed Tools Appl 74, 5401–5427 (2015). https://doi.org/10.1007/s11042-014-1860-2

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-014-1860-2

Keywords

Navigation