Skip to main content

An Approach to Compress English Posts from Social Media Texts

  • Conference paper
  • First Online:
Computational Intelligence in Data Mining

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 990))

Abstract

Compression of sentences in Facebook posts and Twitter is one of the important tasks for automatic summarization of social media text. The task can be formally defined as transformation of sentence into precise form by preserving the original meaning of the sentence. In this paper, we propose an approach for compressing sentences from Facebook English posts by dropping those words who contribute very less importance to the overall meaning of sentences. We develop one parallel corpus of Facebook English posts and corresponding compressed sentences for our research task. We also report evaluation result of our approach through experiments on develop dataset.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 189.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 249.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    https://developers.facebook.com/.

  2. 2.

    https://www.facebook.com/cnn/.

  3. 3.

    https://github.com/aritter/twitternlp.

  4. 4.

    https://www.cs.waikato.ac.nz/ml/weka/.

References

  1. Baldwin, T., Li, Y.: An in-depth analysis of the effect of text normalization in social media. In: Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp. 420–429 (2015)

    Google Scholar 

  2. Banerjee, S., Mitra, P., Sugiyama, K.: Multi-document abstractive summarization using ilp based multi-sentence compression. In: IJCAI, pp. 1208–1214 (2015)

    Google Scholar 

  3. Clarke, J., Lapata, M.: Constraint-based sentence compression an integer programming approach. In: Proceedings of the COLING/ACL on Main conference poster sessions, pp. 144–151. Association for Computational Linguistics (2006)

    Google Scholar 

  4. Clarke, J., Lapata, M.: Global inference for sentence compression: An integer linear programming approach. J. Artif. Intell. Res. 31, 399–429 (2008)

    Article  Google Scholar 

  5. Cohn, T., Lapata, M.: Sentence compression beyond word deletion. In: Proceedings of the 22nd International Conference on Computational Linguistics-Volume 1, pp. 137–144. Association for Computational Linguistics (2008)

    Google Scholar 

  6. Cohn, T., Lapata, M.: An abstractive approach to sentence compression. ACM Trans. Intell. Syst. Technol. (TIST) 4(3), 41 (2013)

    Google Scholar 

  7. Collins, M.: Head-driven statistical models for natural language parsing. Comput. Linguist. 29(4), 589–637 (2003)

    Article  MathSciNet  Google Scholar 

  8. Cordeiro, J., Dias, G., Brazdil, P.: Unsupervised induction of sentence compression rules. In: Proceedings of the 2009 Workshop on Language Generation and Summarisation, pp. 15–22. Association for Computational Linguistics (2009)

    Google Scholar 

  9. Jurafsky, D., Martin, J.H.: Part-of-speech tagging. In: Speech and Language Processing, 3rd edn, pp. 142–167 (2017). draft(ch. 10)

    Google Scholar 

  10. Eisenstein, J.: What to do about bad language on the internet. In: Proceedings of the 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human language technologies, pp. 359–369 (2013)

    Google Scholar 

  11. Filippova, K., Strube, M.: Dependency tree based sentence compression. In: Proceedings of the Fifth International Natural Language Generation Conference, pp. 25–32. Association for Computational Linguistics (2008)

    Google Scholar 

  12. Galley, M., McKeown, K.: Lexicalized markov grammars for sentence compression. In: Human Language Technologies 2007: The Conference of the North American Chapter of the Association for Computational Linguistics; Proceedings of the Main Conference, pp. 180–187 (2007)

    Google Scholar 

  13. Gupta, S., Nenkova, A., Jurafsky, D.: Measuring importance and query relevance in topic-focused multi-document summarization. In: Proceedings of the 45th Annual Meeting of the ACL on Interactive Poster and Demonstration Sessions, pp. 193–196. Association for Computational Linguistics (2007)

    Google Scholar 

  14. Jing, H.: Sentence reduction for automatic text summarization. In: Proceedings of the Sixth Conference on Applied Natural Language Processing, pp. 310–315. Association for Computational Linguistics (2000)

    Google Scholar 

  15. Knight, K., Marcu, D.: Summarization beyond sentence extraction: A probabilistic approach to sentence compression. Artif. Intell. 139(1), 91–107 (2002)

    Article  Google Scholar 

  16. Lu, Z., Liu, W., Zhou, Y., Hu, X., Wang, B.: An effective approach of sentence compression based on re-read mechanism and bayesian combination model. In: Chinese National Conference on Social Media Processing, pp. 129–140. Springer, Berlin (2017)

    Google Scholar 

  17. Najibullah, A.: Indonesian text summarization based on naïve bayes method. In: Proceeding of the International Seminar and Conference on Global Issues, vol. 1 (2015)

    Google Scholar 

  18. Nenkova, A., Vanderwende, L., McKeown, K.: A compositional context sensitive multi-document summarizer: exploring the factors that influence summarization. In: Proceedings of the 29th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 573–580. ACM (2006)

    Google Scholar 

  19. Nguyen, M.L., Horiguchi, S., Shimazu, A., Ho, B.T.: Example-based sentence reduction using the hidden markov model. ACM Trans. Asian Lang. Inf. Process. (TALIP) 3(2), 146–158 (2004)

    Article  Google Scholar 

  20. Platt, J.: Sequential minimal optimization: A fast algorithm for training support vector machines (1998)

    Google Scholar 

  21. Tseng, H., Jurafsky, D., Manning, C.: Morphological features help pos tagging of unknown words across language varieties. In: Proceedings of the Fourth SIGHAN Workshop on Chinese Language Processing (2005)

    Google Scholar 

  22. Unno, Y., Ninomiya, T., Miyao, Y., Tsujii, J.: Trimming cfg parse trees for sentence compression using machine learning approaches. In: Proceedings of the COLING/ACL on Main Conference Poster Sessions, pp. 850–857. Association for Computational Linguistics (2006)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Dwijen Rudrapal .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Rudrapal, D., Datta, M., Datta, A., Das, P., Das, S. (2020). An Approach to Compress English Posts from Social Media Texts. In: Behera, H., Nayak, J., Naik, B., Pelusi, D. (eds) Computational Intelligence in Data Mining. Advances in Intelligent Systems and Computing, vol 990. Springer, Singapore. https://doi.org/10.1007/978-981-13-8676-3_8

Download citation

Publish with us

Policies and ethics