Sound Sharing and Retrieval

Font, Frederic; Roma, Gerard; Serra, Xavier

doi:10.1007/978-3-319-63450-0_10

Frederic Font⁴,
Gerard Roma⁵ &
Xavier Serra⁴

2635 Accesses
4 Citations

Abstract

Multimedia sharing has experienced an enormous growth in recent years, and sound sharing has not been an exception. Nowadays one can find online sound sharing sites in which users can search, browse, and contribute large amounts of audio content such as sound effects, field and urban recordings, music tracks, and music samples. This poses many challenges to enable search, discovery, and ultimately reuse of this content. In this chapter we give an overview of different ways to approach such challenges. We describe how to build an audio database by outlining different aspects to be taken into account. We discuss metadata-based descriptions of audio content and different searching and browsing techniques that can be used to navigate the database. In addition to metadata, we show sound retrieval techniques based on the extraction of audio features from (possibly) unannotated audio. We end the chapter by discussing advanced approaches to sound retrieval and by drawing some conclusions about present and future of sound sharing and retrieval. In addition to our explanations, we provide code examples that illustrate some of the concepts discussed.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 149.00; Price excludes VAT (USA)

Softcover Book: USD 199.99; Price excludes VAT (USA)

Hardcover Book: USD 199.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
https://freesound.org, https://looperman.com, http://ccmixter.org, http://aporee.org/maps.
2.
https://sounddogs.com, https://soundsnap.com, https://asoundeffect.com.
3.
https://python.org.
4.
www.TODO:bookwebsite.
5.
https://freesound.org/docs/api/resources_apiv2.html.
6.
https://creativecommons.org.
7.
https://github.com/MTG/essentia/tree/master/src/examples/freesound.
8.
https://youtube.com, https://vimeo.com, https://flickr.com, https://soundcloud.com, https://bandcamp.com, https://last.fm, https://freesound.org.
9.
For this reason, histograms are provided as part of the documentation of the Freesound API: https://www.freesound.org/docs/api/analysis_docs.html.
10.
https://labs.freesound.org/floop/.
11.
https://ffont.github.io/freesound-explorer/.

References

Angeletou, S., Sabou, M., Motta, E.: Semantically enriching folksonomies with FLOR. In: Proceedings of the European Semantic Web Conference (ESWC) (2008)
Google Scholar
Aucouturier, J.J., Sandler, M.: Finding repeating patterns in acoustic musical signals: applications for audio thumbnailing. In: Audio Engineering Society Conference: 22nd International Conference: Virtual, Synthetic, and Entertainment Audio. Audio Engineering Society, New York (2002)
Google Scholar
Aucouturier, J.J., Defreville, B., Pachet, F.: The bag-of-frames approach to audio pattern recognition: a sufficient model for urban soundscapes but not for polyphonic music. J. Acoust. Soc. Am. 122(2), 881–891 (2007)
Article Google Scholar
Azizyan, M., Constandache, I., Roy Choudhury, R.: Surroundsense: mobile phone localization via ambience fingerprinting. In: Proceedings of the Annual International Conference on Mobile Computing and Networking (MobiCom), pp. 261–272. ACM, New York (2009)
Google Scholar
Bischoff, K., Firan, C.S., Nejdl, W., Paiu, R.: Can all tags be used for search? In: Proceedings of the ACM Conference on Information and Knowledge Management (CIKM), pp. 193–202 (2008)
Google Scholar
Blancas, D.S., Janer, J.: Sound retrieval from voice imitation queries in collaborative databases. In: Proceedings of the AES Conference on Semantic Audio. Audio Engineering Society, New York (2014)
Google Scholar
Bodner, R.C., Song, F.: Knowledge-based approaches to query expansion in information retrieval. In: Proceedings of the Biennial Conference of the Canadian Society for Computational Studies of Intelligence (AI), pp. 146–158. Springer, New York (1996)
Google Scholar
Bogdanov, D., Wack, N., Gómez, E., Gulati, S., Herrera, P., Mayor, O., Roma, G., Salamon, J., Zapata, J.R., Serra, X.: Essentia: an audio analysis library for music information retrieval. In: Proceedings of the International Music Information Retrieval Conference (ISMIR), pp. 493–498 (2013)
Google Scholar
Brazil, E., Fernstroem, M., Tzanetakis, G., Cook, P.: Enhancing sonic browsing using audio information retrieval. In: Proceedings of the International Conference on Auditory Display (ICAD), Kyoto, pp. 132–135 (2002)
Google Scholar
Brossier, P.M.: The aubio library at MIREX 2006. In: Proceedings of the Music Information Retrieval Evaluation Exchange (MIREX), p. 1 (2006)
Google Scholar
Bullock, J., Conservatoire, U.: Libxtract: a lightweight library for audio feature extraction. In: Proceedings of the International Computer Music Conference (ICMC), pp. 22–28 (2007)
Google Scholar
Cano, P., Batlle, E., Kalker, T., Haitsma, J.: A review of audio fingerprinting. J. VLSI Signal Process. Syst. 41(3), 271–284 (2005)
Article Google Scholar
Cano, P., Koppenberger, M., Wack, N.: An industrial-strength content-based music recommendation system. In: Proceedings of the International ACM SIGIR Conference on Research and Development in Information Retrieval, Salvador, p. 673 (2005)
Google Scholar
Carpineto, C., Romano, G.: A survey of automatic query expansion in information retrieval. ACM Comput. Surv. 44(1), 1:1–1:50 (2012)
Google Scholar
Cartwright, M., Pardo, B.: Vocalsketch: Vocally imitating audio concepts. In: Proceedings of the ACM Conference on Human Factors in Computing Systems (CHI), pp. 43–46. ACM, New York (2015)
Google Scholar
Casey, M.A.: Acoustic lexemes for organizing internet audio. Contemp. Music Rev. 24(6), 489–508 (2005)
Article Google Scholar
Comajuncosas, J.M., Barrachina, A., O’Connell, J., Guaus, E.: Nuvolet: 3d gesture-driven collaborative audio mosaicing. In: Proceedings of the New Interfaces for Musical Expression Conference (NIME), pp. 252–255 (2011)
Google Scholar
Fensel, D.: Ontologies: A Silver Bullet for Knowledge Management and Electronic Commerce. Springer, New York (2001)
Book MATH Google Scholar
Font, F.: Tag Recommendation using Folksonomy information for online sound sharing platforms. Ph.D. thesis, Universitat Pompeu Fabra (2015)
Google Scholar
Foote, J.: An overview of audio information retrieval. Multimed. Syst. 7(1), 2–10 (1999)
Article Google Scholar
Foote, J., Uchihashi, S.: The beat spectrum: a new approach to rhythm analysis. In: Proceedings of the IEEE International Conference on Multimedia and Expo (ICME) (2001)
Google Scholar
Gaver, W.W.: What in the world do we hear?: An ecological approach to auditory event perception. Ecol. Psychol. 5(1), 1–29 (1993)
Article Google Scholar
Ghias, A., Logan, J., Chamberlin, D., Smith, B.C.: Query by humming: musical information retrieval in an audio database. In: Proceedings of the ACM International Conference on Multimedia (MM), pp. 231–236. ACM, New York (1995)
Google Scholar
Golder, S.A., Huberman, B.A.: Usage patterns of collaborative tagging systems. J. Inf. Sci. 32(2), 198–208 (2006)
Article Google Scholar
Guo, G., Li, S.Z.: Content-based audio classification and retrieval by support vector machines. IEEE Trans. Neural Netw. 14(1), 209–215 (2003)
Article MathSciNet Google Scholar
Guy, M., Tonkin, E.: Folksonomies: tidying up tags? D-Lib Mag. 12(1) (2006)
Google Scholar
Halpin, H., Robu, V., Shepard, H.: The dynamics and semantics of collaborative tagging. In: Proceedings of the Semantic Authoring and Annotation Workshop (SAAW), pp. 1–21 (2006)
Google Scholar
Heise, S., Hlatky, M., Loviscach, J.: Soundtorch: quick browsing in large audio collections. In: Proceedings of the 125th AES Convention. Audio Engineering Society (2008)
Google Scholar
Huber, D.M., Runstein, R.E.: Modern Recording Techniques. Taylor & Francis, London (2013)
Google Scholar
Jaitly, N., Hinton, G.: Learning a better representation of speech soundwaves using restricted boltzmann machines. In: Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 5884–5887. IEEE, New York (2011)
Google Scholar
Jang, D., Jin, M., Lee, J.S., Lee, S., Lee, S., Seo, J.S., Yoo, C.D.: Automatic commercial monitoring for TV broadcasting using audio fingerprinting. In: Proceedings of the AES Conference on Audio for Mobile and Handheld Devices. Audio Engineering Society, New York (2006)
Google Scholar
Jeffries, A.: The man behind Flickr on making the service ‘awesome again’ (2013). http://www.theverge.com/2013/3/20/4121574/flickr-chief-markus-spiering-talks-photos-and-marissa-mayer. Last accessed 15 Nov 2016
Kaser, O., Lemire, D.: Tag-cloud drawing: algorithms for cloud visualization. In: Proceedings of the International World Wide Web Conference (WWW) (2007)
Google Scholar
Krumm, J., Davies, N., Narayanaswami, C.: User-generated content. IEEE Pervasive Comput. 10–11 (2008)
Google Scholar
Kummamuru, K., Lotlikar, R., Roy, S., Singal, K., Krishnapuram, R.: A hierarchical monothetic document clustering algorithm for summarization and browsing search results. In: Proceedings of the International Conference on World Wide Web (WWW), pp. 658–665. ACM, New York (2004)
Google Scholar
Lartillot, O., Toiviainen, P., Eerola, T.: A MATLAB toolbox for music information retrieval. In: Proceedings of the Data analysis, Machine Learning and Applications Conference, pp. 261–268. Springer, Berlin, Heidelberg (2008)
Google Scholar
Lee, K., Ellis, D.P.W.: Audio-based semantic concept classification for consumer video. IEEE Audio Speech Language Process. 18(6), 1406–1416 (2010)
Article Google Scholar
Lee, H., Pham, P., Largman, Y., Ng, A.Y.: Unsupervised feature learning for audio classification using convolutional deep belief networks. In: Proceedings of the Neural Information Processing Systems (NIPS), pp. 1096–1104 (2009)
Google Scholar
Lessing, L.: Remix: Making Art and Commerce Thrive in the Hybrid Economy. Penguin Press, Harmondsworth (2008)
Book Google Scholar
Limpens, F., Gandon, F.L., Buffa, M.: Linking folksonomies and ontologies for supporting knowledge sharing: a state of the art. Tech. rep., Institut National de Recherche en Informatique et Automatique (INRIA) (2009)
Google Scholar
Maaten, L.V.D., Hinton, G.: Visualizing data using t-SNE. J. Mach. Learn. Res. 9(Nov), 2579–2605 (2008)
MATH Google Scholar
Macgregor, G., Mcculloch, E.: Collaborative tagging as a knowledge organisation and resource discovery tool. Libr. Rev. 55(5), 291–300 (2006)
Article Google Scholar
Marcell, M.M., Borella, D., Greene, M., Kerr, E., Rogers, S.: Confrontation naming of environmental sounds. J. Clin. Exp. Neuropsychol. 22(6), 830–864 (2000)
Article Google Scholar
Marlow, C., Naaman, M., Boyd, D., Davis, M.: HT06, Tagging paper, taxonomy, Flickr, academic article, to read. In: Proceedings of the ACM Conference on Hypertext and Hypermedia (Hypertext), pp. 31–41 (2006)
Google Scholar
Mathieu, B., Essid, S., Fillon, T., Prado, J., Richard, G.: YAAFE, an easy to use and efficient audio feature extraction software. In: Proceedings of the International Music Information Retrieval Conference (ISMIR) (2010)
Google Scholar
McFee, B., Raffel, C., Liang, D.: librosa: Audio and music signal analysis in python. In: Proceedings of the Python in Science Conference (SciPy) (2015)
Google Scholar
Mika, P.: Ontologies are us: a unified model of social networks and semantics. Web Semant.: Sci. Serv. Agents World Wide Web 5(1), 5–15 (2007)
Google Scholar
Nagypál, G.: Improving information retrieval effectiveness by using domain knowledge stored in ontologies. In: Proceedings of the OTM Confederated International Conferences - On the Move to Meaningful Internet Systems, pp. 780–789. Springer, New York (2005)
Google Scholar
Nakatani, T., Okuno, H.G.: Sound ontology for computational auditory scene analysis. In: Proceedings of the Innovative Applications of Artificial Intelligence Conference (IAAI), pp. 1004–1010 (1998)
Google Scholar
Page, L., Brin, S., Motwani, R., Winograd, T.: The pagerank citation ranking: bringing order to the web. Tech. rep., Stanford InfoLab (1999)
Google Scholar
Pampalk, E., Rauber, A., Merkl, D.: Content-based organization and visualization of music archives. In: Proceedings of the ACM International Conference on Multimedia (MM), pp. 570–579. ACM, New York (2002)
Google Scholar
Pampalk, E., Hlavac, P., Herrera, P.: Hierarchical organization and visualization of drum sample libraries. In: Proceedings of the International Conference on Digital Audio Effects (DAFx), Naples, pp. 378–383 (2004)
Google Scholar
Passant, A., Laublet, P., Breslin, J.G., Decker, S.: A URI is worth a thousand tags: from tagging to linked data with MOAT. In: Semantic Services, Interoperability and Web Applications: Emerging Concepts, p. 279 (2011)
Google Scholar
Pedregosa, F., Varoquaux, G.: Scikit-learn: machine learning in Python. J. Mach. Learn. Res. 12, 2825–2830 (2011)
MathSciNet MATH Google Scholar
Peeters, G.: A large set of audio features for sound description (similarity and classification) in the CUIDADO project. Tech. rep., IRCAM (2004)
Google Scholar
Resnick, P., Varian, H.R.: Recommender systems. Commun. ACM 40(3), 56–58 (1997)
Article Google Scholar
Robertson, S., Zaragoza, H.: The probabilistic relevance framework: BM25 and beyond. Found. Trends Inf. Retr. 3(4), 333–389 (2009)
Article Google Scholar
Robu, V., Halpin, H., Shepherd, H.: Emergence of consensus and shared vocabularies in collaborative tagging systems. ACM Trans. Web 3(4) (2009)
Google Scholar
Rohloff, K., Dean, M., Emmons, I., Ryder, D., Sumner, J.: An evaluation of triple-store technologies for large data stores. In: Proceedings of the OTM Confederated International Conferences - On the Move to Meaningful Internet Systems, pp. 1105–1114. Springer, New York (2007)
Google Scholar
Roma, G.: Algorithms and representations for supporting online music creation with large-scale audio databases. Ph.D. thesis, Universitat Pompeu Fabra (2015)
Google Scholar
Roma, G., Serra, X.: Music performance by discovering community loops. In: Proceedings of the Web Audio Conference (WAC), Paris (2015)
Google Scholar
Roma, G., Serra, X.: Querying Freesound with a microphone. In: Proceedings of the Web Audio Conference (WAC) (2015)
Google Scholar
Roma, G., Janer, J., Kersten, S., Schirosa, M., Herrera, P., Serra, X.: Ecological acoustics perspective for content-based retrieval of environmental sounds. EURASIP J. Audio Speech Music Process. 2010, 1–11 (2010)
Article Google Scholar
Salamon, J., Bello, J.P.: Feature learning with deep scattering for urban sound analysis. In: Signal Processing Conference (EUSIPCO), 2015 23rd European, pp. 724–728. IEEE, New York (2015)
Google Scholar
Salamon, J., Jacoby, C., Bello, J.P.: A dataset and taxonomy for urban sound research. In: Proceedings of the ACM International Conference on Multimedia (MM), pp. 1041–1044 (2014)
Google Scholar
Sarwar, B., Karypis, G., Konstan, J., Riedl, J.: Item-based collaborative filtering recommendation algorithms. In: Proceedings of the International Conference on World Wide Web (WWW), pp. 285–295. ACM, New York (2001)
Google Scholar
Schwarz, D.: Corpus-based concatenative synthesis. IEEE Signal Process. Mag. 24(2), 92–104 (2007)
Article Google Scholar
Schwarz, D., Cahen, R., Britton, S.: Principles and applications of interactive corpus-based concatenative synthesis. J. d’Informatique Musicale 1 (2008)
Google Scholar
Schwarz, D., Schnell, N.: Sound search by content-based navigation in large databases. In: Proceedings of the Sound and Music Computing Conference (SMC), p. 1 (2009)
Google Scholar
Sen, S., Lam, S., Rashid, A., Cosley, D.: Tagging, communities, vocabulary, evolution. In: Proceedings of the Conference on Community Supported Cooperative Work (CSCW), pp. 181–190 (2006)
Google Scholar
Shirky, C.: Ontology is overrated: Categories, links, and tags (2005). http://www.shirky.com/writings/ontology_overrated.html. Last accessed 15 Nov 2016
Singhal, A.: Modern information retrieval: a brief overview. Bull. IEEE Comput. Soc. Tech. Commun. Data Eng. 24(4), 35–43 (2001)
Google Scholar
Smith, T.: The social media revolution. Int. J. Mark. Res. 51(4), 559–561 (2009)
Article Google Scholar
Sood, S.C., Owsley, S.H., Hammond, K.J., Birnbaum, L.: TagAssist: automatic tag suggestion for blog posts. In: Proceedings of the International Conference on Weblogs and Social Media (ICWSM), pp. 1–8 (2007)
Google Scholar
The YouTube Team: Here’s to eight great years (2013). http://youtube-global.blogspot.com/2013/05/heres-to-eight-great-years.html. Last accessed 15 Nov 2016
Tunkelang, D.: Faceted search. Synth. Lect. Inf. Concepts Retr. Serv. 1(1), 1–80 (2009)
Google Scholar
Tzanetakis, G., Cook, P.: Marsyas: a framework for audio analysis. Organised Sound 4, 169–175 (2000)
Article Google Scholar
Wagner, C., Strohmaier, M., Huberman, B.: Semantic stability and implicit consensus in social tagging streams. In: Proceedings of the International Conference on World Wide Web (WWW), pp. 735–746 (2014)
Google Scholar
Wahlforss, A.L.: SoundCloud is 5! (2013). http://blog.soundcloud.com/2013/11/13/soundcloud-is-5/. Last accessed 15 Nov 2016
Wikipedia: Remix culture (2014). https://en.wikipedia.org/wiki/Remix_culture. Last accessed 15 Nov 2016
Zils, A., Pachet, F.: Musical mosaicing. In: Proceedings of the International Conference on Digital Audio Effects (DAFx), p. 135 (2001)
Google Scholar
Zlatintsi, A., Maragos, P., Potamianos, A., Evangelopoulos, G.: A saliency-based approach to audio event detection and summarization. In: Signal Processing Conference (EUSIPCO), 2012 Proceedings of the 20th European, pp. 1294–1298. IEEE, New York (2012)
Google Scholar

Download references

Author information

Authors and Affiliations

Music Technology Group (MTG), Universitat Pompeu Fabra, Barcelona, Spain
Frederic Font & Xavier Serra
Centre for Vision, Speech and Signal Processing (CVSSP), University of Surrey, Guildford, UK
Gerard Roma

Authors

Frederic Font
View author publications
You can also search for this author in PubMed Google Scholar
Gerard Roma
View author publications
You can also search for this author in PubMed Google Scholar
Xavier Serra
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Frederic Font .

Editor information

Editors and Affiliations

Laboratory of Signal Processing, Tampere University of Technology, Tampere, Finland
Tuomas Virtanen
Centre for Vision, Speech and Signal Processing, University of Surrey, Surrey, United Kingdom
Mark D. Plumbley
Google Inc., New York, New York, USA
Dan Ellis

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Font, F., Roma, G., Serra, X. (2018). Sound Sharing and Retrieval. In: Virtanen, T., Plumbley, M., Ellis, D. (eds) Computational Analysis of Sound Scenes and Events. Springer, Cham. https://doi.org/10.1007/978-3-319-63450-0_10

Download citation

DOI: https://doi.org/10.1007/978-3-319-63450-0_10
Published: 22 September 2017
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-63449-4
Online ISBN: 978-3-319-63450-0
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics