Skip to main content

Cross Lingual Snippet Generation Using Snippet Translation System

  • Conference paper
Computational Linguistics and Intelligent Text Processing (CICLing 2014)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 8404))

Abstract

Multi Lingual Snippet Generation (MLSG) systems provide the users with snippets in multiple languages. But collecting and managing documents in multiple languages in an efficient way is a difficult task and thereby makes this process more complicated. Fortunately, this requirement can be fulfilled in another way by translating the snippets from one language to another with the help of Machine Translation (MT) systems. The resulting system is called Cross Lingual Snippet Generation (CLSG) system. This paper presents the development of a CLSG system by Snippet Translation when documents are available only in one language. We consider the English-Bengali language pair for snippet translation in one direction (English to Bengali). In this work, a major concentration is given towards translating snippets with simpler but excluding deeper MT concepts. In experimental results, an average BLEU score of 14.26 and NIST score of 4.93 are obtained.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Koehn, P., Och, F.J., Marcu, D.: Statistical phrase-based translation. In: Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology, vol. 1, pp. 48–54 (2003)

    Google Scholar 

  2. Carbonell, J., Goldstein, J.: The Use of MMR, Diversity-based Reranking for Reordering Documents and Producing Summaries. In: ACM SIGIR, pp. 335–336 (1998)

    Google Scholar 

  3. Knight, K., Marcu, D.: Statistics-based summarization - step one: Sentence compression. In: The American Association for Artificial Intelligence Conference (AAAI), pp. 703–710 (2000)

    Google Scholar 

  4. Barzilay, R., Elhadad, N., McKeown, K.R.: Inferring strategies for sentence ordering in multidocument news summarization. J. Artificial Intelligence Research. 17, 35–55 (2002)

    MATH  Google Scholar 

  5. Radev, D.R., Jing, H., Styś, M., Tam, D.: Centroid - based summarization of multiple documents. J. Information Processing and Management. 40, 919–938 (2004)

    Article  MATH  Google Scholar 

  6. Lin, C.Y., Hovy, E.H.: From Single to Multidocument Summarization: A Prototype System and its Evaluation. In: ACL, pp. 457–464 (2002)

    Google Scholar 

  7. Hardy, H., Shimizu, N., Strzalkowski, T., Ting, L., Wise, G.B., Zhang, X.: Cross-document summarization by concept classification. In: SIGIR, pp. 65–69 (2002)

    Google Scholar 

  8. Bhaskar, P., Bandyopadhyay, S.: A Query Focused Multi Document Automatic Summarization. In: The 24th Pacific Asia Conference on Language, Information and Computation (PACLIC 24). Tohoku University, Sendai (2010)

    Google Scholar 

  9. Bhaskar, P., Bandyopadhyay, S.: A Query Focused Automatic Multi Document Summarizer. In: The International Conference on Natural Language Processing (ICON), IIT, Kharagpur, India (2010)

    Google Scholar 

  10. Bhaskar, P.: Query Focused Language Independent Multi-document Summarization and Information Retrieval for English and Bengali. Jian, A. (ed.). LAMBERT Academic Publishing, Saarbrücken (2013) ISBN 978-3-8484-0089-8

    Google Scholar 

  11. Tombros, A., Sanderson, M.: Advantages of Query Biased Summaries in Information Retrieval. In: SIGIR (1998)

    Google Scholar 

  12. Turpin, A., Tsegay, Y., Hawking, D., Williams, H.E.: Fast Generation of Result Snippets in Web Search. In: SIGIR (2007)

    Google Scholar 

  13. Huang, Y., Liu, Z., Chen, Y.: Query Biased Snippet Generation in XML Search. In: SIGMOD, Vancouver, BC, Canada (2008)

    Google Scholar 

  14. Reddy, M.V., Hanumanthappa, M., Kumar, M.: Cross Lingual Information Retrieval Using Search Engine and Data Mining. ACEEE International Journal on Information Technology (2011)

    Google Scholar 

  15. Jagarlamudi, J., Kumaran, A.: Cross-lingual Information Retrieval for Indian Languages. In: Peters, C., Jijkoun, V., Mandl, T., Müller, H., Oard, D.W., Peñas, A., Petras, V., Santos, D. (eds.) CLEF 2007. LNCS, vol. 5152, pp. 80–87. Springer, Heidelberg (2008)

    Chapter  Google Scholar 

  16. Xu, J., Weischedel, R.: Cross-lingual information retrieval using hidden Markov models. In: Joint SIGDAT Conference on Empirical Methods in Natural Language Processing and Very Large Corpora: Held in Conjunction with the 38th Annual Meeting of the Association for Computational Linguistics, vol. 13 (2000)

    Google Scholar 

  17. Och, F.J., Ney, H.: The Alignment Template Approach to Statistical Machine Translation. In: ACL (2004)

    Google Scholar 

  18. Chiang, D.: A Hierarchical Phrase-Based Model for Statistical Machine Translation. In: 43rd Annual Meeting on Association for Computational Linguistics (2005)

    Google Scholar 

  19. Pal, S., Naskar, S.K., Bandyopadhyay, S.: MWE Alignment in Phrase Based Statistical Machine Translation. In: The XIV Machine Translation Summit, pp. 61–68 (2013)

    Google Scholar 

  20. Islam, M.Z., Tiedemann, J., Eisele, A.: English to Bangla Phrase – Based Machine Translation. In: The 14th Annual Conference of The European Association for Machine Translation, Saint-Raphaël, France, pp. 27–28 (2010)

    Google Scholar 

  21. Bhaskar, P., Bandyopadhyay, S.: Cross Lingual Query Dependent Snippet Generation. International Journal of Computer Science and Information Technologies (IJCSIT) 3(4), 4603–4609 (2012) ISSN: 0975-9646

    Google Scholar 

  22. Bhaskar, P., Bandyopadhyay, S.: Language Independent Query Focused Snippet Generation. In: Catarci, T., Forner, P., Hiemstra, D., Peñas, A., Santucci, G. (eds.) CLEF 2012. LNCS, vol. 7488, pp. 138–140. Springer, Heidelberg (2012)

    Chapter  Google Scholar 

  23. List of Unicode characters on Wikipedia, http://en.wikipedia.org/wiki/List_of_Unicode_characters

  24. Koehn, P.: Statistical machine translation. Cambridge University Press (2010)

    Google Scholar 

  25. Rama, T., Gali, K.: Modeling machine transliteration as a phrase based statistical machine translation problem. In: Named Entities Workshop: Shared Task on Transliteration, pp. 124–127 (2009)

    Google Scholar 

  26. Papineni, K., Roukos, S., Ward, T., Zhu, W.J.: BLEU: a method for automatic evaluation of machine translation. In: ACL (2002)

    Google Scholar 

  27. Doddington, G.: Automatic evaluation of machine translation quality using n-gram cooccurrence statistics. In: Human Language Technology Conference (HLT), San Diego, CA, pp. 128–132 (2002)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2014 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Lohar, P., Bhaskar, P., Pal, S., Bandyopadhyay, S. (2014). Cross Lingual Snippet Generation Using Snippet Translation System. In: Gelbukh, A. (eds) Computational Linguistics and Intelligent Text Processing. CICLing 2014. Lecture Notes in Computer Science, vol 8404. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-54903-8_28

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-54903-8_28

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-54902-1

  • Online ISBN: 978-3-642-54903-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics