Chinese-Japanese Clause Alignment

Wang, Xiaojie; Ren, Fuji

doi:10.1007/978-3-540-30586-6_43

Xiaojie Wang¹⁷ &
Fuji Ren¹⁸

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 3406))

Included in the following conference series:

International Conference on Intelligent Text Processing and Computational Linguistics

2245 Accesses
7 Citations

Abstract

Bi-text alignment is useful to many Natural Language Processing tasks such as machine translation, bilingual lexicography and word sense disambiguation. This paper presents a Chinese-Japanese alignment at the level of clause. After describing some characteristics in Chinese-Japanese bilingual texts, we first investigate some statistical properties of Chinese-Japanese bilingual corpus, including the correlation test of text lengths between two languages and the distribution test of length ratio data. We then pay more attention to n-m(n>1 or m>1) alignment modes which are prone to mismatch. We propose a similarity measure based on Hanzi characters information for these kinds of alignment modes. By using dynamic programming, we combine statistical information and Hanzi character information to find the overall least cost in aligning. Experiments show our algorithm can achieve good alignment accuracy.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Brown, P.F., Lai, J.C., Mercer, R.L.: Aligning Sentences in Parallel Corpora. In: Proc. of 29th Conf. of Assoc. for Comput. Linguistics, ACL 1991, pp. 169–176 (1991)
Google Scholar
Diab, M., Resnik, P.: An Unsupervised Method for Word Sense Tagging using Parallel Corpora. In: Proceedings of the 40th Annual Conference of the Association for Computational Linguistics, ACL 2002, pp. 255–262 (2002)
Google Scholar
Ding, Y., Palmer, M.: Automatic Learning of Parallel Dependency Treelet Pairs. In: Su, K.-Y., Tsujii, J., Lee, J.-H., Kwong, O.Y. (eds.) IJCNLP 2004. LNCS (LNAI), vol. 3248, pp. 30–37. Springer, Heidelberg (2005)
Google Scholar
Gale, W.A., Church, K.W.: A Program for Aligning Sentences in Bilingual Corpora. In: Proc. of 29th Conf. of Assoc. for Comput. Linguistics, ACL 1991, pp. 177–184 (1991)
Google Scholar
Kaji, H., Kida, Y., Morimoto, Y.: Learning Translation Templates from Bilingual Text. In: Proceedings of the fifteenth International Conference on Computational Linguistics, COLING-1992, Nantes, France, pp. 672–678 (1992)
Google Scholar
Kay, M., Roscheisen, M.: Text-Translation Alignment. Computational Linguistics 19(1), 121–142 (1993)
Google Scholar
Tan, C.L., Nagao, M.: Automatic Alignment of Japanese-Chinese Bilingual Texts. IEICE Trans. Information and System E78-D(1) (January 1995)
Google Scholar
Matsumoto, Y., Ishimoto, H., Utsuro, T.: Sructural Matching of Parallel Texts. In: Proc. of 31st Conf. of the Association for Computational Linguistics. ACL 1993, pp. 23–30 (1993)
Google Scholar
Kit, C., Webster, J.J., Sin, K.K., Pan, H., Li, H.: Clause alignment for Hong Kong legal text. Intern. Journal of Corpus Linguistics 9(1), 29–51 (2004)
Article Google Scholar
Dan Melamed, I.: Pattern recognition for mapping bitext correspondence. In: Veronis, J. (ed.) Parallel Text Processing, pp. 25–47. Kluwer Academic Publishers, Dordrecht (2000)
Google Scholar
Tiedemann, J.: Recycling Translations - Extraction of Lexical Data from Parallel Corpora and their Application in Natural Language Processing, Doctoral Thesis, Studia Linguistica Upsaliensia 1, ISSN 1652-1366, ISBN 91-554-5815-7
Google Scholar
Venugopal, A., Vogel, S., Waibel, A.: Effective Phrase Translation Extraction from alignment model. In: Proceedings of the 41st Annual Meeting of the Association for Computational Linguistics, ACL 2003, pp. 319-326 (2003)
Google Scholar
Veronis, J.: From the Rosetta stone to the information society—A survey of parallel text processing. In: Veronis, J. (ed.) Parallel Text Processing, pp. 25–47. Kluwer Academic Publishers, Dordrecht (2000)
Google Scholar
Dekai, W.: Aligning a Parallel English-Chinese Corpus Statistically with Lexical Criteria. In: Proc. of the 32st Meeting of Association for Comput. Linguistics, ACL 1994, pp. 80–87 (1994)
Google Scholar
Dekai, W.: Stochastic inversion transduction grammars and bilingual parsing of parallel corpara. Computational Linguistics 23(3), 377–404
Google Scholar
Zaiman, Y., Yasukawa, R., Ren, F., Aizawa, T.: Text alignment using statistical technique and the language feature. Technical Report of IEICE TL2000-40, NLC2000-75(2001-03), pp.1-8
Google Scholar

Download references

Author information

Authors and Affiliations

School of Information Engineering, Beijing University of Posts and Telecommunications, Beijing, 100876, China
Xiaojie Wang
Department of Information Science & Intelligent Systems, Tokushima University, Tokushima, Japan
Fuji Ren

Authors

Xiaojie Wang
View author publications
You can also search for this author in PubMed Google Scholar
Fuji Ren
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

National Polytechnic Institute, Center for Computing Research, 07738, Mexico City, México
Alexander Gelbukh

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Wang, X., Ren, F. (2005). Chinese-Japanese Clause Alignment. In: Gelbukh, A. (eds) Computational Linguistics and Intelligent Text Processing. CICLing 2005. Lecture Notes in Computer Science, vol 3406. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-30586-6_43

Download citation

DOI: https://doi.org/10.1007/978-3-540-30586-6_43
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-24523-0
Online ISBN: 978-3-540-30586-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics