Skip to main content
Log in

A spectrogram-based audio fingerprinting system for content-based copy detection

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

This paper presents a novel audio fingerprinting method that is highly robust to a variety of audio distortions. It is based on an unconventional audio fingerprint generation scheme. The robustness is achieved by generating different versions of the spectrogram matrix of the audio signal by using a threshold based on the average of the spectral values to prune this matrix. We transform each version of this pruned spectrogram matrix into a 2-D binary image. Multiple versions of these 2-D images suppress noise to a varying degree. This varying degree of noise suppression improves likelihood of one of the images matching a reference image. To speed up matching, we convert each image into an n-dimensional vector, and perform a nearest neighbor search based on this n-dimensional vector. We give results with two different feature parameters and their combination. We test this method on TRECVID 2010 content-based copy detection evaluation dataset, and we validate the performance on TRECVID 2009 dataset also. Experimental results show the effectiveness of these features even when the audio is distorted. We compare the proposed method to two state-of-the-art audio copy detection systems, namely NN-based and Shazam systems. Our method by far outperforms Shazam system for all audio transformations (or distortions) in terms of detection performance, number of missed queries and localization accuracy. Compared to NN-based system, our approach reduces minimal Normalized Detection Cost Rate (min NDCR) by 23 % and improves localization accuracy by 24 %.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

Notes

  1. These are only approximations of the speed difference between the query and the corresponding reference. For example, −180 % means that the query is approximately 2.8 times slower than the reference (1 s of the query corresponds to approximately 0.36 s of the reference).

References

  1. Anguera X, Garzon A, Adamek T (2012) Mask: robust local features for audio fingerprinting. In: 2012 13th IEEE International Conference on Multimedia and Expo, ICME 2012, July 9, 2012 - July 13, 2012, 455–460. Melbourne, VIC, Australia: IEEE Computer Society

  2. Ayari M, Delhumeau J, Douze M, Jégou H, Potapov D, Revaud J, Schmid C, Yuan J(2011) Inria@Trecvid’2011: Copy Detection & Multimedia Event Detection. In: TRECVID workshop

  3. Baluja S, Covell M (2007) Audio fingerprinting: combining computer vision data stream processing. In: 2007 I.E. International Conference on Acoustics, Speech, and Signal Processing, 15–20 April 2007, 213–16. Piscataway, NJ, USA: IEEE

  4. Building Video Queries for Trecvid (2008) Copy Detection Task http://www-nlpir.nist.gov/projects/tv2010/TrecVid2008CopyQueries.pdf. Accessed January 2014

  5. Cano P, Batle E, Kalker T, Haitsma J (2002) A review of algorithms for audio fingerprinting. In: 2002 I.E. 5th Workshop on Multimedia Signal Processing, 9–11 Dec. 2002, 169–73. Piscataway, NJ, USA: IEEE

  6. Ellis D (2009) Robust landmark-based audio fingerprinting, Online Serial],(2009 May), Available at HTTP: http://labrosa.ee.columbia.edu/∼dpwe/resources/matlab/fingerprint, ci4

  7. Gupta VN, Boulianne G, Cardinal P (2012) CRIM’s content-based audio copy detection system for Trecvid 2009. Multimed Tools Appl 60(2):371–87

    Article  Google Scholar 

  8. Haitsma J, Kalker T (2002) A highly robust audio fingerprinting system. In: Ismir

  9. Hartung F, Kutter M (1999) Multimedia watermarking techniques. Proc IEEE 87(7):1079–1107

    Article  Google Scholar 

  10. Heritier M, Gupta V, Gagnon L, Boulianne G, Foucher S, Cardinal P (2009) CRIM’s content-based copy detection system for trecvid. In: Proc. TRECVID-2009. Gaithersburg, MD., USA

  11. Jegou H, Delhumeau J, Jiangbo Y, Gravier G, Gros P (2012) Babaz: a large scale audio search system for video copy detection. In: 2012 I.E. International Conference on Acoustics, Speech and Signal Processing (ICASSP 2012), 25–30 March, 2369–72. Kyoto, Japan

  12. Jiang M, Fang S, Tian YH, Huang T, Gao W (2011) Pku-Idm@ Trecvid 2011 Cbcd: content-based copy detection with cascade of multimodal features and temporal pyramid matching. In: TRECVID workshop

  13. Lebosse J, Brun L, Pailles JC (2007) A robust audio fingerprint extraction algorithm. In: Proceedings of the Fourth IASTED International Conference on Signal Processing, Pattern Recognition and Applications, 14–16 Feb. 2007, 269–74. Anaheim, CA, USA: ACTA Press

  14. Lezi W, Yuan D, Hongliang B, Jiwei Z, Chong H, Wei L (2012) Contented-based large scale web audio copy detection. In: 2012 I.E. International Conference on Multimedia and Expo (ICME), 9–13 July 2012, 961–6. Los Alamitos, CA, USA: IEEE Computer Society

  15. Ouali C, Dumouchel P, Gupta V (2014) A robust audio fingerprinting method for content-based copy detection. In: International Workshop on Content-Based Multimedia Indexing. Austria

  16. Ouali C, Dumouchel P, Gupta V (2014) Robust features for content-based audio copy detection. In: Fifteenth Annual Conference of the International Speech Communication Association. Singapore

  17. Saracoglu A, Esen E, Ates TK, Acar BO, Zubari U, Ozan EC, Ozalp E, Alatan AA, Ciloglu T (2009) Content based copy detection with coarse audio-visual fingerprints. In: 2009 Seventh International Workshop on Content-Based Multimedia Indexing (CBMI), 3–5 June 2009, 213–18. Piscataway, NJ, USA: IEEE

  18. Smeaton AF, Over P, Kraaij W (2006) Evaluation campaigns and trecvid. In: 8th ACM Multimedia International Workshop on Multimedia Information Retrieval, MIR 2006, co-located with the 2006 ACM International Multimedia Conferenc, October 26, 2006 - October 27, 2006, 321–330. Santa Barbara, CA, United states: Association for Computing Machinery

  19. Wang ALC (2003) An industrial-strength audio search algorithm. In: International Conference on Music Information Retrieval (ISMIR), pp 7–13

  20. Yan K, Hoiem D, Sukthankar R (2005) Computer vision for music identification. In: Proceedings. 2005 I.E. Computer Society Conference on Computer Vision and Pattern Recognition, 20–25 June 2005, vol. 1, 597–604. Los Alamitos, CA, USA: IEEE Comput. Soc

  21. Zhu B, Li W, Wang Z, Xue X (2010) A novel audio fingerprinting method robust to time scale modification and pitch shifting. In: 18th ACM International Conference on Multimedia ACM Multimedia 2010, MM’10, October 25, 2010 - October 29, 2010, 987–990. Firenze, Italy: Association for Computing Machinery

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Chahid Ouali.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Ouali, C., Dumouchel, P. & Gupta, V. A spectrogram-based audio fingerprinting system for content-based copy detection. Multimed Tools Appl 75, 9145–9165 (2016). https://doi.org/10.1007/s11042-015-3081-8

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-015-3081-8

Keywords

Navigation