An Approach to Compress English Posts from Social Media Texts

Rudrapal, Dwijen; Datta, Manaswita; Datta, Ankita; Das, Puja; Das, Subhankar

doi:10.1007/978-981-13-8676-3_8

Dwijen Rudrapal¹⁸,
Manaswita Datta¹⁸,
Ankita Datta¹⁸,
Puja Das¹⁸ &
…
Subhankar Das¹⁸

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 990))

681 Accesses
1 Citations

Abstract

Compression of sentences in Facebook posts and Twitter is one of the important tasks for automatic summarization of social media text. The task can be formally defined as transformation of sentence into precise form by preserving the original meaning of the sentence. In this paper, we propose an approach for compressing sentences from Facebook English posts by dropping those words who contribute very less importance to the overall meaning of sentences. We develop one parallel corpus of Facebook English posts and corresponding compressed sentences for our research task. We also report evaluation result of our approach through experiments on develop dataset.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 189.00; Price excludes VAT (USA)

Softcover Book: USD 249.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

References

Baldwin, T., Li, Y.: An in-depth analysis of the effect of text normalization in social media. In: Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp. 420–429 (2015)
Google Scholar
Banerjee, S., Mitra, P., Sugiyama, K.: Multi-document abstractive summarization using ilp based multi-sentence compression. In: IJCAI, pp. 1208–1214 (2015)
Google Scholar
Clarke, J., Lapata, M.: Constraint-based sentence compression an integer programming approach. In: Proceedings of the COLING/ACL on Main conference poster sessions, pp. 144–151. Association for Computational Linguistics (2006)
Google Scholar
Clarke, J., Lapata, M.: Global inference for sentence compression: An integer linear programming approach. J. Artif. Intell. Res. 31, 399–429 (2008)
Article Google Scholar
Cohn, T., Lapata, M.: Sentence compression beyond word deletion. In: Proceedings of the 22nd International Conference on Computational Linguistics-Volume 1, pp. 137–144. Association for Computational Linguistics (2008)
Google Scholar
Cohn, T., Lapata, M.: An abstractive approach to sentence compression. ACM Trans. Intell. Syst. Technol. (TIST) 4(3), 41 (2013)
Google Scholar
Collins, M.: Head-driven statistical models for natural language parsing. Comput. Linguist. 29(4), 589–637 (2003)
Article MathSciNet Google Scholar
Cordeiro, J., Dias, G., Brazdil, P.: Unsupervised induction of sentence compression rules. In: Proceedings of the 2009 Workshop on Language Generation and Summarisation, pp. 15–22. Association for Computational Linguistics (2009)
Google Scholar
Jurafsky, D., Martin, J.H.: Part-of-speech tagging. In: Speech and Language Processing, 3rd edn, pp. 142–167 (2017). draft(ch. 10)
Google Scholar
Eisenstein, J.: What to do about bad language on the internet. In: Proceedings of the 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human language technologies, pp. 359–369 (2013)
Google Scholar
Filippova, K., Strube, M.: Dependency tree based sentence compression. In: Proceedings of the Fifth International Natural Language Generation Conference, pp. 25–32. Association for Computational Linguistics (2008)
Google Scholar
Galley, M., McKeown, K.: Lexicalized markov grammars for sentence compression. In: Human Language Technologies 2007: The Conference of the North American Chapter of the Association for Computational Linguistics; Proceedings of the Main Conference, pp. 180–187 (2007)
Google Scholar
Gupta, S., Nenkova, A., Jurafsky, D.: Measuring importance and query relevance in topic-focused multi-document summarization. In: Proceedings of the 45th Annual Meeting of the ACL on Interactive Poster and Demonstration Sessions, pp. 193–196. Association for Computational Linguistics (2007)
Google Scholar
Jing, H.: Sentence reduction for automatic text summarization. In: Proceedings of the Sixth Conference on Applied Natural Language Processing, pp. 310–315. Association for Computational Linguistics (2000)
Google Scholar
Knight, K., Marcu, D.: Summarization beyond sentence extraction: A probabilistic approach to sentence compression. Artif. Intell. 139(1), 91–107 (2002)
Article Google Scholar
Lu, Z., Liu, W., Zhou, Y., Hu, X., Wang, B.: An effective approach of sentence compression based on re-read mechanism and bayesian combination model. In: Chinese National Conference on Social Media Processing, pp. 129–140. Springer, Berlin (2017)
Google Scholar
Najibullah, A.: Indonesian text summarization based on naïve bayes method. In: Proceeding of the International Seminar and Conference on Global Issues, vol. 1 (2015)
Google Scholar
Nenkova, A., Vanderwende, L., McKeown, K.: A compositional context sensitive multi-document summarizer: exploring the factors that influence summarization. In: Proceedings of the 29th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 573–580. ACM (2006)
Google Scholar
Nguyen, M.L., Horiguchi, S., Shimazu, A., Ho, B.T.: Example-based sentence reduction using the hidden markov model. ACM Trans. Asian Lang. Inf. Process. (TALIP) 3(2), 146–158 (2004)
Article Google Scholar
Platt, J.: Sequential minimal optimization: A fast algorithm for training support vector machines (1998)
Google Scholar
Tseng, H., Jurafsky, D., Manning, C.: Morphological features help pos tagging of unknown words across language varieties. In: Proceedings of the Fourth SIGHAN Workshop on Chinese Language Processing (2005)
Google Scholar
Unno, Y., Ninomiya, T., Miyao, Y., Tsujii, J.: Trimming cfg parse trees for sentence compression using machine learning approaches. In: Proceedings of the COLING/ACL on Main Conference Poster Sessions, pp. 850–857. Association for Computational Linguistics (2006)
Google Scholar

Download references

Author information

Authors and Affiliations

National Institute of Technology, Agartala, 799046, Tripura, India
Dwijen Rudrapal, Manaswita Datta, Ankita Datta, Puja Das & Subhankar Das

Authors

Dwijen Rudrapal
View author publications
You can also search for this author in PubMed Google Scholar
Manaswita Datta
View author publications
You can also search for this author in PubMed Google Scholar
Ankita Datta
View author publications
You can also search for this author in PubMed Google Scholar
Puja Das
View author publications
You can also search for this author in PubMed Google Scholar
Subhankar Das
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Dwijen Rudrapal .

Editor information

Editors and Affiliations

Department of Information Technology, Veer Surendra Sai University of Technology, Burla, Sambalpur, Odisha, India
Himansu Sekhar Behera
Department of Computer Science and Engineering, Sri Sivani College of Engineering, Srikakulam, Andhra Pradesh, India
Janmenjoy Nayak
Department of Computer Application, Veer Surendra Sai University of Technology, Burla, Sambalpur, Odisha, India
Bighnaraj Naik
Faculty of Communication Sciences, University of Teramo, Teramo, Italy
Danilo Pelusi

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Rudrapal, D., Datta, M., Datta, A., Das, P., Das, S. (2020). An Approach to Compress English Posts from Social Media Texts. In: Behera, H., Nayak, J., Naik, B., Pelusi, D. (eds) Computational Intelligence in Data Mining. Advances in Intelligent Systems and Computing, vol 990. Springer, Singapore. https://doi.org/10.1007/978-981-13-8676-3_8

Download citation

DOI: https://doi.org/10.1007/978-981-13-8676-3_8
Published: 18 August 2019
Publisher Name: Springer, Singapore
Print ISBN: 978-981-13-8675-6
Online ISBN: 978-981-13-8676-3
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)

Publish with us

Policies and ethics