Picognizer: A JavaScript Library for Detecting and Recognizing Synthesized Sounds

Kurihara, Kazutaka; Itaya, Akari; Uemura, Aiko; Kitahara, Tetsuro; Nagao, Katashi

doi:10.1007/978-3-319-76270-8_24

Kazutaka Kurihara¹⁶,
Akari Itaya¹⁶,
Aiko Uemura¹⁷,
Tetsuro Kitahara¹⁷ &
…
Katashi Nagao¹⁸

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 10714))

Included in the following conference series:

International Conference on Advances in Computer Entertainment

2872 Accesses
1 Citations

Abstract

In this paper, we describe and evaluate Picognizer, a JavaScript library that detects and recognizes user-specified synthesized sounds using a template-matching approach. In their daily lives, people are surrounded by various synthesized sounds, so it is valuable to establish a way to recognize such sounds as triggers for invoking information systems. However, it is not easy to enable end-user programmers to create custom-built recognizers for each usage scenario through supervised learning. Thus, by focusing on a feature of synthesized sounds whose auditory deviation is small for each replay, we implemented a JavaScript library that detects and recognizes sounds using traditional pattern-matching algorithms. We evaluated its performance quantitatively and show its effectiveness by proposing various usage scenarios such as an autoplay system of digital games, and the augmentation of digital games including a gamification.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 179.00; Price excludes VAT (USA)

Softcover Book: USD 229.00; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
A demo video of Picognizer is available from the following link: https://youtu.be/CoYJmNdxPNY.

References

Kurihara, K.: Toolification of games: achieving non-game purposes in the redundant spaces of existing games. In: Proceedings of ACE 2015, pp. 31:1–31:5 (2015)
Google Scholar
Hinton, G., Deng, L., Yu, D., Dahl, G.E., Mohamed, A.R., Jaitly, N., Senior, A., Vanhoucke, V., Nguyen, P., Sainath, T.N., Kingsbury, B.: Deep neural networks for acoustic modeling in speech recognition: the shared views of four research groups. IEEE Signal Process. Mag. 29(6), 82–97 (2012)
Article Google Scholar
myThings. https://mythings.yahoo.co.jp/. Accessed 28 July 2017
Sony MESH. http://meshprj.com/. Accessed 28 July 2017
Cai, R., Lu, L., Zhang, H.-J., Cai, L.-H.: Highlight sound effects detection in audio stream. In: Proceedings of IEEE International Conference on Multimedia and Expo (ICME), vol. III, pp. 37–40 (2003)
Google Scholar
Pikrakis, A., Giannakopoulos, T., Theodoridis, S.: Gunshot detection in audio streams from movies by means of dynamic programming and Bayesian networks. In: Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), pp. 21–24 (2008)
Google Scholar
Harma, A., McKinney, M.F., Skowronek, J.: Automatic surveillance of the acoustic activity in our living environment. In: Proceedings of IEEE International Conference on Multimedia and Expo (ICME) (2005)
Google Scholar
Imoto, K., Shimauchi, S.: Acoustic event analysis based on hierarchical generative model of acoustic event sequence. IEICE Trans. Inf. Syst. E990D(10), 2539–2549 (2016)
Article Google Scholar
Lu, X., Yu, T., Matsuda, S., Hori, C.: Sparse representation based on a bag of spectral exemplars for acoustic event detection. Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), pp. 6255–6259 (2014)
Google Scholar
Wang, Y., Neves, L., Metze, F.: Audio-based multimedia event detection using deep recurrent neural networks. In: Proceedings of International Conference on Acoustics, Speech, and Signal Processing (ICASSP) (2016)
Google Scholar
Cakir, E., Parascandolo, G., Heittola, T., Huttunen, H., Virtanen, T.: Convolutional recurrent neural networks for polyphonic sound event detection. IEEE/ACM Trans. Audio Speech Lang. Process. 25(6), 1291–1303 (2017)
Article Google Scholar
Bing Speech API. https://azure.microsoft.com/ja-jp/services/cognitive-services/speech/. Accessed 28 July 2017
Google Cloud Speech API. https://cloud.google.com/speech/. Accessed 28 July 2017
IBM Watson Speech to Text. https://www.ibm.com/watson/developercloud/speech-to-text.html. Accessed 28 July 2017
Docomo Speech Recognition API. https://dev.smt.docomo.ne.jp/?p=docs.api.page&api_name=speech_recognition&p_name=api_usage_scenario. Accessed 28 July 2017
Shazam. https://www.shazam.com/. Accessed 28 July 2017
Amazon Echo. https://www.amazon.com/dp/B00X4WHP5E. Accessed 28 July 2017
Listnr. https://listnr.cerevo.com/ja/. Accessed 28 July 2017
MagicKnock. http://magicknock.com/. Accessed 28 July 2017
Yeh, T., Chang, T.-H., Miller, R.C.: Sikuli: using GUI screenshots for search and automation. In: Proceedings of UIST 2009, pp. 183–192 (2009)
Google Scholar
IFTTT. https://ifttt.com/. Accessed 28 July 2017
Srt.js. http://www.unryu.org/top-english/srtjs. Accessed 28 July 2017
Rawlinson, H., Segal, N., Fiala, J.: Meyda: an audio feature extraction library for the web audio API. In: The 1st Web Audio Conference (WAC), Paris, France (2015)
Google Scholar
Sakoe, H., Chiba, S.: Dynamic programming algorithm optimization for spoken word recognition. IEEE Trans. Acoust. Speech Signal Process. 26, 43–49 (1978)
Article MATH Google Scholar
Raffel, C., McFee, B., Humphrey, E.J., Salamon, J., Nieto, O., Liang, D., Ellis, D.P.W.: mir_eval: a transparent implementation of common MIR metrics. In: Proceedings of the 15th International Conference on Music Information Retrieval (2014)
Google Scholar
Monster Strike. http://us.monster-strike.com/. Accessed 28 July 2017
Philips Hue. https://www2.meethue.com. Accessed 28 July 2017
1-2-switch. https://www.nintendo.co.jp/switch/aacca/. Accessed 28 July 2017

Download references

Acknowledgements

This work was supported by JSPS KAKENHI Grant Numbers JP15H02735, JP16H02867, JP17H00749.

Author information

Authors and Affiliations

Department of Computer Science, Tsuda University, 2-1-1, Tsuda-machi, Kodaira-shi, Tokyo, 187-8577, Japan
Kazutaka Kurihara & Akari Itaya
Department of Information Science, Nihon University, 3-25-40 Sakurajosui, Setagaya-ku, Tokyo, 156-8550, Japan
Aiko Uemura & Tetsuro Kitahara
Department of Media Science, Graduate School of Information Science, Nagoya University, Furo-cho, Chikusa-ku, Nagoya, 464-8601, Japan
Katashi Nagao

Authors

Kazutaka Kurihara
View author publications
You can also search for this author in PubMed Google Scholar
Akari Itaya
View author publications
You can also search for this author in PubMed Google Scholar
Aiko Uemura
View author publications
You can also search for this author in PubMed Google Scholar
Tetsuro Kitahara
View author publications
You can also search for this author in PubMed Google Scholar
Katashi Nagao
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Kazutaka Kurihara .

Editor information

Editors and Affiliations

City, University of London, London, United Kingdom
Adrian David Cheok
University of Tokyo, Tokyo, Japan
Masahiko Inami
NOVA University of Lisbon, Lisbon, Portugal
Teresa Romão

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Kurihara, K., Itaya, A., Uemura, A., Kitahara, T., Nagao, K. (2018). Picognizer: A JavaScript Library for Detecting and Recognizing Synthesized Sounds. In: Cheok, A., Inami, M., Romão, T. (eds) Advances in Computer Entertainment Technology. ACE 2017. Lecture Notes in Computer Science(), vol 10714. Springer, Cham. https://doi.org/10.1007/978-3-319-76270-8_24

Download citation

DOI: https://doi.org/10.1007/978-3-319-76270-8_24
Published: 21 February 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-76269-2
Online ISBN: 978-3-319-76270-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics