Skip to main content

Picognizer: A JavaScript Library for Detecting and Recognizing Synthesized Sounds

  • Conference paper
  • First Online:
Advances in Computer Entertainment Technology (ACE 2017)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 10714))

Included in the following conference series:

Abstract

In this paper, we describe and evaluate Picognizer, a JavaScript library that detects and recognizes user-specified synthesized sounds using a template-matching approach. In their daily lives, people are surrounded by various synthesized sounds, so it is valuable to establish a way to recognize such sounds as triggers for invoking information systems. However, it is not easy to enable end-user programmers to create custom-built recognizers for each usage scenario through supervised learning. Thus, by focusing on a feature of synthesized sounds whose auditory deviation is small for each replay, we implemented a JavaScript library that detects and recognizes sounds using traditional pattern-matching algorithms. We evaluated its performance quantitatively and show its effectiveness by proposing various usage scenarios such as an autoplay system of digital games, and the augmentation of digital games including a gamification.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 179.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 229.00
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    A demo video of Picognizer is available from the following link: https://youtu.be/CoYJmNdxPNY.

References

  1. Kurihara, K.: Toolification of games: achieving non-game purposes in the redundant spaces of existing games. In: Proceedings of ACE 2015, pp. 31:1–31:5 (2015)

    Google Scholar 

  2. Hinton, G., Deng, L., Yu, D., Dahl, G.E., Mohamed, A.R., Jaitly, N., Senior, A., Vanhoucke, V., Nguyen, P., Sainath, T.N., Kingsbury, B.: Deep neural networks for acoustic modeling in speech recognition: the shared views of four research groups. IEEE Signal Process. Mag. 29(6), 82–97 (2012)

    Article  Google Scholar 

  3. myThings. https://mythings.yahoo.co.jp/. Accessed 28 July 2017

  4. Sony MESH. http://meshprj.com/. Accessed 28 July 2017

  5. Cai, R., Lu, L., Zhang, H.-J., Cai, L.-H.: Highlight sound effects detection in audio stream. In: Proceedings of IEEE International Conference on Multimedia and Expo (ICME), vol. III, pp. 37–40 (2003)

    Google Scholar 

  6. Pikrakis, A., Giannakopoulos, T., Theodoridis, S.: Gunshot detection in audio streams from movies by means of dynamic programming and Bayesian networks. In: Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), pp. 21–24 (2008)

    Google Scholar 

  7. Harma, A., McKinney, M.F., Skowronek, J.: Automatic surveillance of the acoustic activity in our living environment. In: Proceedings of IEEE International Conference on Multimedia and Expo (ICME) (2005)

    Google Scholar 

  8. Imoto, K., Shimauchi, S.: Acoustic event analysis based on hierarchical generative model of acoustic event sequence. IEICE Trans. Inf. Syst. E990D(10), 2539–2549 (2016)

    Article  Google Scholar 

  9. Lu, X., Yu, T., Matsuda, S., Hori, C.: Sparse representation based on a bag of spectral exemplars for acoustic event detection. Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), pp. 6255–6259 (2014)

    Google Scholar 

  10. Wang, Y., Neves, L., Metze, F.: Audio-based multimedia event detection using deep recurrent neural networks. In: Proceedings of International Conference on Acoustics, Speech, and Signal Processing (ICASSP) (2016)

    Google Scholar 

  11. Cakir, E., Parascandolo, G., Heittola, T., Huttunen, H., Virtanen, T.: Convolutional recurrent neural networks for polyphonic sound event detection. IEEE/ACM Trans. Audio Speech Lang. Process. 25(6), 1291–1303 (2017)

    Article  Google Scholar 

  12. Bing Speech API. https://azure.microsoft.com/ja-jp/services/cognitive-services/speech/. Accessed 28 July 2017

  13. Google Cloud Speech API. https://cloud.google.com/speech/. Accessed 28 July 2017

  14. IBM Watson Speech to Text. https://www.ibm.com/watson/developercloud/speech-to-text.html. Accessed 28 July 2017

  15. Docomo Speech Recognition API. https://dev.smt.docomo.ne.jp/?p=docs.api.page&api_name=speech_recognition&p_name=api_usage_scenario. Accessed 28 July 2017

  16. Shazam. https://www.shazam.com/. Accessed 28 July 2017

  17. Amazon Echo. https://www.amazon.com/dp/B00X4WHP5E. Accessed 28 July 2017

  18. Listnr. https://listnr.cerevo.com/ja/. Accessed 28 July 2017

  19. MagicKnock. http://magicknock.com/. Accessed 28 July 2017

  20. Yeh, T., Chang, T.-H., Miller, R.C.: Sikuli: using GUI screenshots for search and automation. In: Proceedings of UIST 2009, pp. 183–192 (2009)

    Google Scholar 

  21. IFTTT. https://ifttt.com/. Accessed 28 July 2017

  22. Srt.js. http://www.unryu.org/top-english/srtjs. Accessed 28 July 2017

  23. Rawlinson, H., Segal, N., Fiala, J.: Meyda: an audio feature extraction library for the web audio API. In: The 1st Web Audio Conference (WAC), Paris, France (2015)

    Google Scholar 

  24. Sakoe, H., Chiba, S.: Dynamic programming algorithm optimization for spoken word recognition. IEEE Trans. Acoust. Speech Signal Process. 26, 43–49 (1978)

    Article  MATH  Google Scholar 

  25. Raffel, C., McFee, B., Humphrey, E.J., Salamon, J., Nieto, O., Liang, D., Ellis, D.P.W.: mir_eval: a transparent implementation of common MIR metrics. In: Proceedings of the 15th International Conference on Music Information Retrieval (2014)

    Google Scholar 

  26. Monster Strike. http://us.monster-strike.com/. Accessed 28 July 2017

  27. Philips Hue. https://www2.meethue.com. Accessed 28 July 2017

  28. 1-2-switch. https://www.nintendo.co.jp/switch/aacca/. Accessed 28 July 2017

Download references

Acknowledgements

This work was supported by JSPS KAKENHI Grant Numbers JP15H02735, JP16H02867, JP17H00749.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Kazutaka Kurihara .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer International Publishing AG, part of Springer Nature

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Kurihara, K., Itaya, A., Uemura, A., Kitahara, T., Nagao, K. (2018). Picognizer: A JavaScript Library for Detecting and Recognizing Synthesized Sounds. In: Cheok, A., Inami, M., Romão, T. (eds) Advances in Computer Entertainment Technology. ACE 2017. Lecture Notes in Computer Science(), vol 10714. Springer, Cham. https://doi.org/10.1007/978-3-319-76270-8_24

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-76270-8_24

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-76269-2

  • Online ISBN: 978-3-319-76270-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics