Vocal Manipulation Based on Pitch Transcription and Its Application to Interactive Entertainment for Karaoke

Nakano, Kota; Morise, Masanori; Nishiura, Takanobu

doi:10.1007/978-3-642-22950-3_6

Kota Nakano¹⁹,
Masanori Morise²⁰ &
Takanobu Nishiura²⁰

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 6851))

Included in the following conference series:

International Workshop on Haptic and Audio Interaction Design

884 Accesses
1 Citations

Abstract

A real-time vocal manipulation system is described for improving karaoke. Karaoke is an interactive entertainment system where users sing along with recorded music, and it is used all over the world. However, although the users should sing with accurate pitch, it is difficult for the tone-deaf people to sing with accurate pitch. In this paper, a real-time vocal manipulation system is proposed to help tone-deaf people. The system consists of vocoder-based voice synthesis method that can synthesize the voiced sound with fundamental frequency (pitch) and spectral envelope (timbre). Vocal manipulation is achieved based on pitch transcription by replacing the pitch of a tone-deaf person with that of a professional singer. Subjective evaluation is carried out to verify the effectiveness of the proposed system. The results suggested that the proposed system can manipulate vocal sounds in real time.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Kenmochi, H., Ohshita, H.: VOCALOID - commercial singing synthesizer based on sample concatenation. In: Proc. Interspeech 2007, pp. 4009–4010 (2007)
Google Scholar
Hidebrand, H.A.: Pitch detection and intonation correction apparatus and method. U.S. Patent 5,973252 (1999)
Google Scholar
Dudley, H.: Remaking speech. J. Acoust. Soc. Am. 11(2), 169–177 (1939)
Article Google Scholar
Nakano, K., Morise, M., Nishiura, T.: Proposal of a new vocoder for real-time synthesis of speech signal with high quality. In: Proc. ICA 2010, PaperID:332 (2010)
Google Scholar
Cano, P., Loscos, A., Bonada, J., de Boer, M., Serra, X.: Voice morphing system for impersonating in karaoke applications. In: Proc. ICMC, pp.109–112 (2000)
Google Scholar
Morise, M., Onishi, M., Kawahara, H., Katayose, H.: v.morish 2009: A morphing-based singing design interface for vocal melodies. In: Natkin, S., Dupire, J. (eds.) ICEC 2009. LNCS, vol. 5709, pp. 185–190. Springer, Heidelberg (2009)
Chapter Google Scholar
Kawahara, H., Nisimura, R., Irino, T., Morise, M., Takahashi, T., Banno, H.: Temporally variable multi-aspect auditory morphing enabling extrapolation without objective and perceptual breakdown. In: Proc. ICASSP 2009, pp. 3905–3908 (2009)
Google Scholar
Kawahara, H., Nishikara, R., Irino, T., Morise, M., Takahashi, T., Banno, H.: Higi-quality and light-weight voice transformation enabling extrapolation without perceptual and objective breakdown. In: Proc. ICASSP 2010, pp. 4818–4821 (2010)
Google Scholar
Uchimura, Y., Banno, H., Itakura, F., Kawahara, H.: Study of manipulation method of voice quality based on the vocal tract area function. In: Proc. Interspeech 2008, pp.1084–1087 (2008)
Google Scholar
Oppenheim, A.V.: A speech analysis-synthesis system based on homomorphic filtering. J. Acoust. Soc. Am. 45(2), 458–465 (1969)
Article Google Scholar
Atal, B.S., Hanauer, M.R.: Speech Analysis and Synthesis by Linear Predictive of the Speech Wave. J. Acoust. Soc. Am. 50(2), 637–655 (1971)
Article Google Scholar
Kawahara, H., Morise, M., Banno, H., Takahashi, T., Irino, T.: TANDEM-STRAIGHT: A temporally stable power spectral representation for periodic signals and applications to interference-free spectrum, f0, and aperiodicity estimation. In: Proc. ICASSP 2008, pp. 3933–3936 (2008)
Google Scholar

Download references

Author information

Authors and Affiliations

Graduate School of Science and Engineering, Ritsumeikan University, 1-1-1 Nojihigashi, Kusatsu, Shiga, 525-8577, Japan
Kota Nakano
College of Information and Science, Ritsumeikan University, 1-1-1 Nojihigashi, Kusatsu, Shiga, 525-8577, Japan
Masanori Morise & Takanobu Nishiura

Authors

Kota Nakano
View author publications
You can also search for this author in PubMed Google Scholar
Masanori Morise
View author publications
You can also search for this author in PubMed Google Scholar
Takanobu Nishiura
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Faculty of Information Science and Engineering, Ritsumeikan University, 1-1-1 Nojihi-Higashi, 525-8577, Kusatsu, Shiga, Japan
Eric W. Cooper
Faculty of Information Science and Engineering, Ritsumeikan University, 1-1-1 Noji-Higashi, 525-8577, Kusatsu, Shiga, Japan
Victor V. Kryssanov & Hitoshi Ogawa &
School of Computing Science, University of Glasgow, G12 8QQ, Glasgow, UK
Stephen Brewster

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Nakano, K., Morise, M., Nishiura, T. (2011). Vocal Manipulation Based on Pitch Transcription and Its Application to Interactive Entertainment for Karaoke. In: Cooper, E.W., Kryssanov, V.V., Ogawa, H., Brewster, S. (eds) Haptic and Audio Interaction Design. HAID 2011. Lecture Notes in Computer Science, vol 6851. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-22950-3_6

Download citation

DOI: https://doi.org/10.1007/978-3-642-22950-3_6
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-22949-7
Online ISBN: 978-3-642-22950-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics