Abstract
Compression of sentences in Facebook posts and Twitter is one of the important tasks for automatic summarization of social media text. The task can be formally defined as transformation of sentence into precise form by preserving the original meaning of the sentence. In this paper, we propose an approach for compressing sentences from Facebook English posts by dropping those words who contribute very less importance to the overall meaning of sentences. We develop one parallel corpus of Facebook English posts and corresponding compressed sentences for our research task. We also report evaluation result of our approach through experiments on develop dataset.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Baldwin, T., Li, Y.: An in-depth analysis of the effect of text normalization in social media. In: Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp. 420–429 (2015)
Banerjee, S., Mitra, P., Sugiyama, K.: Multi-document abstractive summarization using ilp based multi-sentence compression. In: IJCAI, pp. 1208–1214 (2015)
Clarke, J., Lapata, M.: Constraint-based sentence compression an integer programming approach. In: Proceedings of the COLING/ACL on Main conference poster sessions, pp. 144–151. Association for Computational Linguistics (2006)
Clarke, J., Lapata, M.: Global inference for sentence compression: An integer linear programming approach. J. Artif. Intell. Res. 31, 399–429 (2008)
Cohn, T., Lapata, M.: Sentence compression beyond word deletion. In: Proceedings of the 22nd International Conference on Computational Linguistics-Volume 1, pp. 137–144. Association for Computational Linguistics (2008)
Cohn, T., Lapata, M.: An abstractive approach to sentence compression. ACM Trans. Intell. Syst. Technol. (TIST) 4(3), 41 (2013)
Collins, M.: Head-driven statistical models for natural language parsing. Comput. Linguist. 29(4), 589–637 (2003)
Cordeiro, J., Dias, G., Brazdil, P.: Unsupervised induction of sentence compression rules. In: Proceedings of the 2009 Workshop on Language Generation and Summarisation, pp. 15–22. Association for Computational Linguistics (2009)
Jurafsky, D., Martin, J.H.: Part-of-speech tagging. In: Speech and Language Processing, 3rd edn, pp. 142–167 (2017). draft(ch. 10)
Eisenstein, J.: What to do about bad language on the internet. In: Proceedings of the 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human language technologies, pp. 359–369 (2013)
Filippova, K., Strube, M.: Dependency tree based sentence compression. In: Proceedings of the Fifth International Natural Language Generation Conference, pp. 25–32. Association for Computational Linguistics (2008)
Galley, M., McKeown, K.: Lexicalized markov grammars for sentence compression. In: Human Language Technologies 2007: The Conference of the North American Chapter of the Association for Computational Linguistics; Proceedings of the Main Conference, pp. 180–187 (2007)
Gupta, S., Nenkova, A., Jurafsky, D.: Measuring importance and query relevance in topic-focused multi-document summarization. In: Proceedings of the 45th Annual Meeting of the ACL on Interactive Poster and Demonstration Sessions, pp. 193–196. Association for Computational Linguistics (2007)
Jing, H.: Sentence reduction for automatic text summarization. In: Proceedings of the Sixth Conference on Applied Natural Language Processing, pp. 310–315. Association for Computational Linguistics (2000)
Knight, K., Marcu, D.: Summarization beyond sentence extraction: A probabilistic approach to sentence compression. Artif. Intell. 139(1), 91–107 (2002)
Lu, Z., Liu, W., Zhou, Y., Hu, X., Wang, B.: An effective approach of sentence compression based on re-read mechanism and bayesian combination model. In: Chinese National Conference on Social Media Processing, pp. 129–140. Springer, Berlin (2017)
Najibullah, A.: Indonesian text summarization based on naïve bayes method. In: Proceeding of the International Seminar and Conference on Global Issues, vol. 1 (2015)
Nenkova, A., Vanderwende, L., McKeown, K.: A compositional context sensitive multi-document summarizer: exploring the factors that influence summarization. In: Proceedings of the 29th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 573–580. ACM (2006)
Nguyen, M.L., Horiguchi, S., Shimazu, A., Ho, B.T.: Example-based sentence reduction using the hidden markov model. ACM Trans. Asian Lang. Inf. Process. (TALIP) 3(2), 146–158 (2004)
Platt, J.: Sequential minimal optimization: A fast algorithm for training support vector machines (1998)
Tseng, H., Jurafsky, D., Manning, C.: Morphological features help pos tagging of unknown words across language varieties. In: Proceedings of the Fourth SIGHAN Workshop on Chinese Language Processing (2005)
Unno, Y., Ninomiya, T., Miyao, Y., Tsujii, J.: Trimming cfg parse trees for sentence compression using machine learning approaches. In: Proceedings of the COLING/ACL on Main Conference Poster Sessions, pp. 850–857. Association for Computational Linguistics (2006)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Rudrapal, D., Datta, M., Datta, A., Das, P., Das, S. (2020). An Approach to Compress English Posts from Social Media Texts. In: Behera, H., Nayak, J., Naik, B., Pelusi, D. (eds) Computational Intelligence in Data Mining. Advances in Intelligent Systems and Computing, vol 990. Springer, Singapore. https://doi.org/10.1007/978-981-13-8676-3_8
Download citation
DOI: https://doi.org/10.1007/978-981-13-8676-3_8
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-13-8675-6
Online ISBN: 978-981-13-8676-3
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)