Abstract
In this paper, we describe and evaluate Picognizer, a JavaScript library that detects and recognizes user-specified synthesized sounds using a template-matching approach. In their daily lives, people are surrounded by various synthesized sounds, so it is valuable to establish a way to recognize such sounds as triggers for invoking information systems. However, it is not easy to enable end-user programmers to create custom-built recognizers for each usage scenario through supervised learning. Thus, by focusing on a feature of synthesized sounds whose auditory deviation is small for each replay, we implemented a JavaScript library that detects and recognizes sounds using traditional pattern-matching algorithms. We evaluated its performance quantitatively and show its effectiveness by proposing various usage scenarios such as an autoplay system of digital games, and the augmentation of digital games including a gamification.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
A demo video of Picognizer is available from the following link: https://youtu.be/CoYJmNdxPNY.
References
Kurihara, K.: Toolification of games: achieving non-game purposes in the redundant spaces of existing games. In: Proceedings of ACE 2015, pp. 31:1–31:5 (2015)
Hinton, G., Deng, L., Yu, D., Dahl, G.E., Mohamed, A.R., Jaitly, N., Senior, A., Vanhoucke, V., Nguyen, P., Sainath, T.N., Kingsbury, B.: Deep neural networks for acoustic modeling in speech recognition: the shared views of four research groups. IEEE Signal Process. Mag. 29(6), 82–97 (2012)
myThings. https://mythings.yahoo.co.jp/. Accessed 28 July 2017
Sony MESH. http://meshprj.com/. Accessed 28 July 2017
Cai, R., Lu, L., Zhang, H.-J., Cai, L.-H.: Highlight sound effects detection in audio stream. In: Proceedings of IEEE International Conference on Multimedia and Expo (ICME), vol. III, pp. 37–40 (2003)
Pikrakis, A., Giannakopoulos, T., Theodoridis, S.: Gunshot detection in audio streams from movies by means of dynamic programming and Bayesian networks. In: Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), pp. 21–24 (2008)
Harma, A., McKinney, M.F., Skowronek, J.: Automatic surveillance of the acoustic activity in our living environment. In: Proceedings of IEEE International Conference on Multimedia and Expo (ICME) (2005)
Imoto, K., Shimauchi, S.: Acoustic event analysis based on hierarchical generative model of acoustic event sequence. IEICE Trans. Inf. Syst. E990D(10), 2539–2549 (2016)
Lu, X., Yu, T., Matsuda, S., Hori, C.: Sparse representation based on a bag of spectral exemplars for acoustic event detection. Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), pp. 6255–6259 (2014)
Wang, Y., Neves, L., Metze, F.: Audio-based multimedia event detection using deep recurrent neural networks. In: Proceedings of International Conference on Acoustics, Speech, and Signal Processing (ICASSP) (2016)
Cakir, E., Parascandolo, G., Heittola, T., Huttunen, H., Virtanen, T.: Convolutional recurrent neural networks for polyphonic sound event detection. IEEE/ACM Trans. Audio Speech Lang. Process. 25(6), 1291–1303 (2017)
Bing Speech API. https://azure.microsoft.com/ja-jp/services/cognitive-services/speech/. Accessed 28 July 2017
Google Cloud Speech API. https://cloud.google.com/speech/. Accessed 28 July 2017
IBM Watson Speech to Text. https://www.ibm.com/watson/developercloud/speech-to-text.html. Accessed 28 July 2017
Docomo Speech Recognition API. https://dev.smt.docomo.ne.jp/?p=docs.api.page&api_name=speech_recognition&p_name=api_usage_scenario. Accessed 28 July 2017
Shazam. https://www.shazam.com/. Accessed 28 July 2017
Amazon Echo. https://www.amazon.com/dp/B00X4WHP5E. Accessed 28 July 2017
Listnr. https://listnr.cerevo.com/ja/. Accessed 28 July 2017
MagicKnock. http://magicknock.com/. Accessed 28 July 2017
Yeh, T., Chang, T.-H., Miller, R.C.: Sikuli: using GUI screenshots for search and automation. In: Proceedings of UIST 2009, pp. 183–192 (2009)
IFTTT. https://ifttt.com/. Accessed 28 July 2017
Srt.js. http://www.unryu.org/top-english/srtjs. Accessed 28 July 2017
Rawlinson, H., Segal, N., Fiala, J.: Meyda: an audio feature extraction library for the web audio API. In: The 1st Web Audio Conference (WAC), Paris, France (2015)
Sakoe, H., Chiba, S.: Dynamic programming algorithm optimization for spoken word recognition. IEEE Trans. Acoust. Speech Signal Process. 26, 43–49 (1978)
Raffel, C., McFee, B., Humphrey, E.J., Salamon, J., Nieto, O., Liang, D., Ellis, D.P.W.: mir_eval: a transparent implementation of common MIR metrics. In: Proceedings of the 15th International Conference on Music Information Retrieval (2014)
Monster Strike. http://us.monster-strike.com/. Accessed 28 July 2017
Philips Hue. https://www2.meethue.com. Accessed 28 July 2017
1-2-switch. https://www.nintendo.co.jp/switch/aacca/. Accessed 28 July 2017
Acknowledgements
This work was supported by JSPS KAKENHI Grant Numbers JP15H02735, JP16H02867, JP17H00749.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer International Publishing AG, part of Springer Nature
About this paper
Cite this paper
Kurihara, K., Itaya, A., Uemura, A., Kitahara, T., Nagao, K. (2018). Picognizer: A JavaScript Library for Detecting and Recognizing Synthesized Sounds. In: Cheok, A., Inami, M., Romão, T. (eds) Advances in Computer Entertainment Technology. ACE 2017. Lecture Notes in Computer Science(), vol 10714. Springer, Cham. https://doi.org/10.1007/978-3-319-76270-8_24
Download citation
DOI: https://doi.org/10.1007/978-3-319-76270-8_24
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-76269-2
Online ISBN: 978-3-319-76270-8
eBook Packages: Computer ScienceComputer Science (R0)