Abstract
We present a new technique to estimate the reliability of the words in automatically generated translations. Our approach addresses confidence estimation as a classification problem where a confidence score is to be predicted from a feature vector that represents each translated word. We describe a new set of prediction features designed to capture context information, and propose a model based on partial least squares to perform the classification. Good empirical results are reported in a large-domain news translation task.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
NIST: National Institute of Standards and Technology MT evaluation official results (November 2006), http://www.itl.nist.gov/iad/mig/tests/mt/
Ueffing, N., Macherey, K., Ney, H.: Confidence measures for statistical machine translation. In: Proc. of the MT Summit, pp. 394–401. Springer (2003)
Sanchis, A., Juan, A., Vidal, E.: Estimation of confidence measures for machine translation. In: Proc. of the Machine Translation Summit, pp. 407–412 (2007)
Wold, H.: Estimation of Principal Components and Related Models by Iterative Least squares, pp. 391–420. Academic Press, New York (1966)
Berger, A.L., Pietra, V.J.D., Pietra, S.A.D.: A maximum entropy approach to natural language processing. Computational Linguistics 22, 39–71 (1996)
Levenshtein, V.: Binary codes capable of correcting deletions, insertions and reversals. Soviet Physics Doklady 10(8), 707–710 (1966)
Brown, P., Della Pietra, V., Della Pietra, S., Mercer, R.: The mathematics of statistical machine translation: parameter estimation. Computational Linguistics 19, 263–311 (1993)
Mevik, B.H., Wehrens, R., Liland, K.H.: pls: Partial Least Squares and Principal Component regression. R package version 2.3-0 (2011)
Callison-Burch, C., Koehn, P., Monz, C., Post, M., Soricut, R., Specia, L.: Findings of the 2012 workshop on statistical machine translation. In: Proc. of the Workshop on Statistical Machine Translation, Montréal, Canada, pp. 10–51 (June 2012)
Chinchor, N.: The statistical significance of the muc-4 results. In: Proceedings of the Conference on Message Understanding, pp. 30–50 (1992)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
González-Rubio, J., Navarro-Cerdán, J.R., Casacuberta, F. (2013). Partial Least Squares for Word Confidence Estimation in Machine Translation. In: Sanches, J.M., Micó, L., Cardoso, J.S. (eds) Pattern Recognition and Image Analysis. IbPRIA 2013. Lecture Notes in Computer Science, vol 7887. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-38628-2_59
Download citation
DOI: https://doi.org/10.1007/978-3-642-38628-2_59
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-38627-5
Online ISBN: 978-3-642-38628-2
eBook Packages: Computer ScienceComputer Science (R0)