Abstract
Multimedia sharing has experienced an enormous growth in recent years, and sound sharing has not been an exception. Nowadays one can find online sound sharing sites in which users can search, browse, and contribute large amounts of audio content such as sound effects, field and urban recordings, music tracks, and music samples. This poses many challenges to enable search, discovery, and ultimately reuse of this content. In this chapter we give an overview of different ways to approach such challenges. We describe how to build an audio database by outlining different aspects to be taken into account. We discuss metadata-based descriptions of audio content and different searching and browsing techniques that can be used to navigate the database. In addition to metadata, we show sound retrieval techniques based on the extraction of audio features from (possibly) unannotated audio. We end the chapter by discussing advanced approaches to sound retrieval and by drawing some conclusions about present and future of sound sharing and retrieval. In addition to our explanations, we provide code examples that illustrate some of the concepts discussed.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
- 2.
- 3.
- 4.
www.TODO:bookwebsite.
- 5.
- 6.
- 7.
- 8.
- 9.
For this reason, histograms are provided as part of the documentation of the Freesound API: https://www.freesound.org/docs/api/analysis_docs.html.
- 10.
- 11.
References
Angeletou, S., Sabou, M., Motta, E.: Semantically enriching folksonomies with FLOR. In: Proceedings of the European Semantic Web Conference (ESWC) (2008)
Aucouturier, J.J., Sandler, M.: Finding repeating patterns in acoustic musical signals: applications for audio thumbnailing. In: Audio Engineering Society Conference: 22nd International Conference: Virtual, Synthetic, and Entertainment Audio. Audio Engineering Society, New York (2002)
Aucouturier, J.J., Defreville, B., Pachet, F.: The bag-of-frames approach to audio pattern recognition: a sufficient model for urban soundscapes but not for polyphonic music. J. Acoust. Soc. Am. 122(2), 881–891 (2007)
Azizyan, M., Constandache, I., Roy Choudhury, R.: Surroundsense: mobile phone localization via ambience fingerprinting. In: Proceedings of the Annual International Conference on Mobile Computing and Networking (MobiCom), pp. 261–272. ACM, New York (2009)
Bischoff, K., Firan, C.S., Nejdl, W., Paiu, R.: Can all tags be used for search? In: Proceedings of the ACM Conference on Information and Knowledge Management (CIKM), pp. 193–202 (2008)
Blancas, D.S., Janer, J.: Sound retrieval from voice imitation queries in collaborative databases. In: Proceedings of the AES Conference on Semantic Audio. Audio Engineering Society, New York (2014)
Bodner, R.C., Song, F.: Knowledge-based approaches to query expansion in information retrieval. In: Proceedings of the Biennial Conference of the Canadian Society for Computational Studies of Intelligence (AI), pp. 146–158. Springer, New York (1996)
Bogdanov, D., Wack, N., Gómez, E., Gulati, S., Herrera, P., Mayor, O., Roma, G., Salamon, J., Zapata, J.R., Serra, X.: Essentia: an audio analysis library for music information retrieval. In: Proceedings of the International Music Information Retrieval Conference (ISMIR), pp. 493–498 (2013)
Brazil, E., Fernstroem, M., Tzanetakis, G., Cook, P.: Enhancing sonic browsing using audio information retrieval. In: Proceedings of the International Conference on Auditory Display (ICAD), Kyoto, pp. 132–135 (2002)
Brossier, P.M.: The aubio library at MIREX 2006. In: Proceedings of the Music Information Retrieval Evaluation Exchange (MIREX), p. 1 (2006)
Bullock, J., Conservatoire, U.: Libxtract: a lightweight library for audio feature extraction. In: Proceedings of the International Computer Music Conference (ICMC), pp. 22–28 (2007)
Cano, P., Batlle, E., Kalker, T., Haitsma, J.: A review of audio fingerprinting. J. VLSI Signal Process. Syst. 41(3), 271–284 (2005)
Cano, P., Koppenberger, M., Wack, N.: An industrial-strength content-based music recommendation system. In: Proceedings of the International ACM SIGIR Conference on Research and Development in Information Retrieval, Salvador, p. 673 (2005)
Carpineto, C., Romano, G.: A survey of automatic query expansion in information retrieval. ACM Comput. Surv. 44(1), 1:1–1:50 (2012)
Cartwright, M., Pardo, B.: Vocalsketch: Vocally imitating audio concepts. In: Proceedings of the ACM Conference on Human Factors in Computing Systems (CHI), pp. 43–46. ACM, New York (2015)
Casey, M.A.: Acoustic lexemes for organizing internet audio. Contemp. Music Rev. 24(6), 489–508 (2005)
Comajuncosas, J.M., Barrachina, A., O’Connell, J., Guaus, E.: Nuvolet: 3d gesture-driven collaborative audio mosaicing. In: Proceedings of the New Interfaces for Musical Expression Conference (NIME), pp. 252–255 (2011)
Fensel, D.: Ontologies: A Silver Bullet for Knowledge Management and Electronic Commerce. Springer, New York (2001)
Font, F.: Tag Recommendation using Folksonomy information for online sound sharing platforms. Ph.D. thesis, Universitat Pompeu Fabra (2015)
Foote, J.: An overview of audio information retrieval. Multimed. Syst. 7(1), 2–10 (1999)
Foote, J., Uchihashi, S.: The beat spectrum: a new approach to rhythm analysis. In: Proceedings of the IEEE International Conference on Multimedia and Expo (ICME) (2001)
Gaver, W.W.: What in the world do we hear?: An ecological approach to auditory event perception. Ecol. Psychol. 5(1), 1–29 (1993)
Ghias, A., Logan, J., Chamberlin, D., Smith, B.C.: Query by humming: musical information retrieval in an audio database. In: Proceedings of the ACM International Conference on Multimedia (MM), pp. 231–236. ACM, New York (1995)
Golder, S.A., Huberman, B.A.: Usage patterns of collaborative tagging systems. J. Inf. Sci. 32(2), 198–208 (2006)
Guo, G., Li, S.Z.: Content-based audio classification and retrieval by support vector machines. IEEE Trans. Neural Netw. 14(1), 209–215 (2003)
Guy, M., Tonkin, E.: Folksonomies: tidying up tags? D-Lib Mag. 12(1) (2006)
Halpin, H., Robu, V., Shepard, H.: The dynamics and semantics of collaborative tagging. In: Proceedings of the Semantic Authoring and Annotation Workshop (SAAW), pp. 1–21 (2006)
Heise, S., Hlatky, M., Loviscach, J.: Soundtorch: quick browsing in large audio collections. In: Proceedings of the 125th AES Convention. Audio Engineering Society (2008)
Huber, D.M., Runstein, R.E.: Modern Recording Techniques. Taylor & Francis, London (2013)
Jaitly, N., Hinton, G.: Learning a better representation of speech soundwaves using restricted boltzmann machines. In: Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 5884–5887. IEEE, New York (2011)
Jang, D., Jin, M., Lee, J.S., Lee, S., Lee, S., Seo, J.S., Yoo, C.D.: Automatic commercial monitoring for TV broadcasting using audio fingerprinting. In: Proceedings of the AES Conference on Audio for Mobile and Handheld Devices. Audio Engineering Society, New York (2006)
Jeffries, A.: The man behind Flickr on making the service ‘awesome again’ (2013). http://www.theverge.com/2013/3/20/4121574/flickr-chief-markus-spiering-talks-photos-and-marissa-mayer. Last accessed 15 Nov 2016
Kaser, O., Lemire, D.: Tag-cloud drawing: algorithms for cloud visualization. In: Proceedings of the International World Wide Web Conference (WWW) (2007)
Krumm, J., Davies, N., Narayanaswami, C.: User-generated content. IEEE Pervasive Comput. 10–11 (2008)
Kummamuru, K., Lotlikar, R., Roy, S., Singal, K., Krishnapuram, R.: A hierarchical monothetic document clustering algorithm for summarization and browsing search results. In: Proceedings of the International Conference on World Wide Web (WWW), pp. 658–665. ACM, New York (2004)
Lartillot, O., Toiviainen, P., Eerola, T.: A MATLAB toolbox for music information retrieval. In: Proceedings of the Data analysis, Machine Learning and Applications Conference, pp. 261–268. Springer, Berlin, Heidelberg (2008)
Lee, K., Ellis, D.P.W.: Audio-based semantic concept classification for consumer video. IEEE Audio Speech Language Process. 18(6), 1406–1416 (2010)
Lee, H., Pham, P., Largman, Y., Ng, A.Y.: Unsupervised feature learning for audio classification using convolutional deep belief networks. In: Proceedings of the Neural Information Processing Systems (NIPS), pp. 1096–1104 (2009)
Lessing, L.: Remix: Making Art and Commerce Thrive in the Hybrid Economy. Penguin Press, Harmondsworth (2008)
Limpens, F., Gandon, F.L., Buffa, M.: Linking folksonomies and ontologies for supporting knowledge sharing: a state of the art. Tech. rep., Institut National de Recherche en Informatique et Automatique (INRIA) (2009)
Maaten, L.V.D., Hinton, G.: Visualizing data using t-SNE. J. Mach. Learn. Res. 9(Nov), 2579–2605 (2008)
Macgregor, G., Mcculloch, E.: Collaborative tagging as a knowledge organisation and resource discovery tool. Libr. Rev. 55(5), 291–300 (2006)
Marcell, M.M., Borella, D., Greene, M., Kerr, E., Rogers, S.: Confrontation naming of environmental sounds. J. Clin. Exp. Neuropsychol. 22(6), 830–864 (2000)
Marlow, C., Naaman, M., Boyd, D., Davis, M.: HT06, Tagging paper, taxonomy, Flickr, academic article, to read. In: Proceedings of the ACM Conference on Hypertext and Hypermedia (Hypertext), pp. 31–41 (2006)
Mathieu, B., Essid, S., Fillon, T., Prado, J., Richard, G.: YAAFE, an easy to use and efficient audio feature extraction software. In: Proceedings of the International Music Information Retrieval Conference (ISMIR) (2010)
McFee, B., Raffel, C., Liang, D.: librosa: Audio and music signal analysis in python. In: Proceedings of the Python in Science Conference (SciPy) (2015)
Mika, P.: Ontologies are us: a unified model of social networks and semantics. Web Semant.: Sci. Serv. Agents World Wide Web 5(1), 5–15 (2007)
Nagypál, G.: Improving information retrieval effectiveness by using domain knowledge stored in ontologies. In: Proceedings of the OTM Confederated International Conferences - On the Move to Meaningful Internet Systems, pp. 780–789. Springer, New York (2005)
Nakatani, T., Okuno, H.G.: Sound ontology for computational auditory scene analysis. In: Proceedings of the Innovative Applications of Artificial Intelligence Conference (IAAI), pp. 1004–1010 (1998)
Page, L., Brin, S., Motwani, R., Winograd, T.: The pagerank citation ranking: bringing order to the web. Tech. rep., Stanford InfoLab (1999)
Pampalk, E., Rauber, A., Merkl, D.: Content-based organization and visualization of music archives. In: Proceedings of the ACM International Conference on Multimedia (MM), pp. 570–579. ACM, New York (2002)
Pampalk, E., Hlavac, P., Herrera, P.: Hierarchical organization and visualization of drum sample libraries. In: Proceedings of the International Conference on Digital Audio Effects (DAFx), Naples, pp. 378–383 (2004)
Passant, A., Laublet, P., Breslin, J.G., Decker, S.: A URI is worth a thousand tags: from tagging to linked data with MOAT. In: Semantic Services, Interoperability and Web Applications: Emerging Concepts, p. 279 (2011)
Pedregosa, F., Varoquaux, G.: Scikit-learn: machine learning in Python. J. Mach. Learn. Res. 12, 2825–2830 (2011)
Peeters, G.: A large set of audio features for sound description (similarity and classification) in the CUIDADO project. Tech. rep., IRCAM (2004)
Resnick, P., Varian, H.R.: Recommender systems. Commun. ACM 40(3), 56–58 (1997)
Robertson, S., Zaragoza, H.: The probabilistic relevance framework: BM25 and beyond. Found. Trends Inf. Retr. 3(4), 333–389 (2009)
Robu, V., Halpin, H., Shepherd, H.: Emergence of consensus and shared vocabularies in collaborative tagging systems. ACM Trans. Web 3(4) (2009)
Rohloff, K., Dean, M., Emmons, I., Ryder, D., Sumner, J.: An evaluation of triple-store technologies for large data stores. In: Proceedings of the OTM Confederated International Conferences - On the Move to Meaningful Internet Systems, pp. 1105–1114. Springer, New York (2007)
Roma, G.: Algorithms and representations for supporting online music creation with large-scale audio databases. Ph.D. thesis, Universitat Pompeu Fabra (2015)
Roma, G., Serra, X.: Music performance by discovering community loops. In: Proceedings of the Web Audio Conference (WAC), Paris (2015)
Roma, G., Serra, X.: Querying Freesound with a microphone. In: Proceedings of the Web Audio Conference (WAC) (2015)
Roma, G., Janer, J., Kersten, S., Schirosa, M., Herrera, P., Serra, X.: Ecological acoustics perspective for content-based retrieval of environmental sounds. EURASIP J. Audio Speech Music Process. 2010, 1–11 (2010)
Salamon, J., Bello, J.P.: Feature learning with deep scattering for urban sound analysis. In: Signal Processing Conference (EUSIPCO), 2015 23rd European, pp. 724–728. IEEE, New York (2015)
Salamon, J., Jacoby, C., Bello, J.P.: A dataset and taxonomy for urban sound research. In: Proceedings of the ACM International Conference on Multimedia (MM), pp. 1041–1044 (2014)
Sarwar, B., Karypis, G., Konstan, J., Riedl, J.: Item-based collaborative filtering recommendation algorithms. In: Proceedings of the International Conference on World Wide Web (WWW), pp. 285–295. ACM, New York (2001)
Schwarz, D.: Corpus-based concatenative synthesis. IEEE Signal Process. Mag. 24(2), 92–104 (2007)
Schwarz, D., Cahen, R., Britton, S.: Principles and applications of interactive corpus-based concatenative synthesis. J. d’Informatique Musicale 1 (2008)
Schwarz, D., Schnell, N.: Sound search by content-based navigation in large databases. In: Proceedings of the Sound and Music Computing Conference (SMC), p. 1 (2009)
Sen, S., Lam, S., Rashid, A., Cosley, D.: Tagging, communities, vocabulary, evolution. In: Proceedings of the Conference on Community Supported Cooperative Work (CSCW), pp. 181–190 (2006)
Shirky, C.: Ontology is overrated: Categories, links, and tags (2005). http://www.shirky.com/writings/ontology_overrated.html. Last accessed 15 Nov 2016
Singhal, A.: Modern information retrieval: a brief overview. Bull. IEEE Comput. Soc. Tech. Commun. Data Eng. 24(4), 35–43 (2001)
Smith, T.: The social media revolution. Int. J. Mark. Res. 51(4), 559–561 (2009)
Sood, S.C., Owsley, S.H., Hammond, K.J., Birnbaum, L.: TagAssist: automatic tag suggestion for blog posts. In: Proceedings of the International Conference on Weblogs and Social Media (ICWSM), pp. 1–8 (2007)
The YouTube Team: Here’s to eight great years (2013). http://youtube-global.blogspot.com/2013/05/heres-to-eight-great-years.html. Last accessed 15 Nov 2016
Tunkelang, D.: Faceted search. Synth. Lect. Inf. Concepts Retr. Serv. 1(1), 1–80 (2009)
Tzanetakis, G., Cook, P.: Marsyas: a framework for audio analysis. Organised Sound 4, 169–175 (2000)
Wagner, C., Strohmaier, M., Huberman, B.: Semantic stability and implicit consensus in social tagging streams. In: Proceedings of the International Conference on World Wide Web (WWW), pp. 735–746 (2014)
Wahlforss, A.L.: SoundCloud is 5! (2013). http://blog.soundcloud.com/2013/11/13/soundcloud-is-5/. Last accessed 15 Nov 2016
Wikipedia: Remix culture (2014). https://en.wikipedia.org/wiki/Remix_culture. Last accessed 15 Nov 2016
Zils, A., Pachet, F.: Musical mosaicing. In: Proceedings of the International Conference on Digital Audio Effects (DAFx), p. 135 (2001)
Zlatintsi, A., Maragos, P., Potamianos, A., Evangelopoulos, G.: A saliency-based approach to audio event detection and summarization. In: Signal Processing Conference (EUSIPCO), 2012 Proceedings of the 20th European, pp. 1294–1298. IEEE, New York (2012)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer International Publishing AG
About this chapter
Cite this chapter
Font, F., Roma, G., Serra, X. (2018). Sound Sharing and Retrieval. In: Virtanen, T., Plumbley, M., Ellis, D. (eds) Computational Analysis of Sound Scenes and Events. Springer, Cham. https://doi.org/10.1007/978-3-319-63450-0_10
Download citation
DOI: https://doi.org/10.1007/978-3-319-63450-0_10
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-63449-4
Online ISBN: 978-3-319-63450-0
eBook Packages: EngineeringEngineering (R0)