Abstract
Similar to the traditional approach, we consider the task of summarization as selection of top ranked sentences from ranked sentence-clusters. To achieve this goal, we rank the sentence clusters by using the importance of words calculated by using page rank algorithm on reverse directed word graph of sentences. Next, to rank the sentences in every cluster we introduce the use of weighted clustering coefficient. We use page rank score of words for calculation of weighted clustering coefficient. Finally the most important issue is the presence of a lot of noisy entries in the text, which downgrades the performance of most of the text mining algorithms. To solve this problem, we introduce the use of Wikipedia anchor text based phrase mapping scheme. Our experimental results on DUC-2002 and DUC-2004 dataset show that our system performs better than unsupervised systems and better than/comparable with novel supervised systems of this area.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent dirichlet allocation. In: Advances in Neural Information Processing Systems, vol. 14
Kumar, N., Srinathan, K.: Automatic keyphrase extraction from scientific documents using N-gram filtration technique. In: Proceeding of the Eighth ACM Symposium on Document Engineering, DocEng 2008, Sao Paulo, Brazil, pp. 199–208 (2008)
Page, L., Brin, S., Motwani, R., Winograd, T.: The pagerank citation ranking: bringing order to the web. Technical report, Stanford digital library technologies project (1998)
Saramaki, J., Onnela, J.-P., Kertesz, J., Kaski, K.: Characterizing Motifs in Weighted Complex Networks
Mcdonald, D.M., Chen, H.: Summary in context: searching versus browsing. ACM Transactions on Information Systems 24(1), 111–141 (2006)
Wang, D., Zhu, S., Li, T., Gong, Y.: Multi-Document Summarization using Sentence-based Topic Models. In: Proceedings of the ACL-IJCNLP 2009 Conference Short Papers, ACL and AFNLP, Suntec, Singapore, pp. 297–300 (August 4, 2009)
Ding, C., He, X.: K-means clustering and principal component analysis. In: Prodeedings of ICML 2004 (2004)
Ding, C., He, X., Simon, H.: On the equivalence of nonnegative matrix factorization and spectral clustering. In: Proceedings of Siam Data Mining 2005 (2005)
Ding, C., Li, T., Peng, W., Park, H.: Orthogonal nonnegative matrix tri-factorizations for clustering. In: Proceedings of SIGKDD 2006 (2006)
Erkan, G., Radev, D.: Lexpagerank: Prestige in multi-document text summarization. In: Proceedings of EMNLP 2004 (2004)
Gong, Y., Liu, X.: Generic text summarization using relevance measure and latent semantic analysis. In: Proceedings of SIGIR (2001)
Lee, D.D., Sebastian Seung, H.: Algorithms for non-negative matrix factorization. In: Advances in Neural Information Processing Systems, vol. 13
Lin, C.-Y., Hovy, E.: Automatic evaluation of summaries using n-gram cooccurrence statistics. In: Proceedings of NLT-NAACL 2003 (2003)
Lin, C.-Y., Hovy, E.: From single to multi-document summarization: A prototype system and its evaluation. In: Proceedings of ACL 2002 (2002)
Mani, I.: Automatic summarization. John Benjamins Publishing Company (2001)
Radev, D., Jing, H., Stys, M., Tam, D.: Centroid-based summarization of multiple documents. Information Processing and Management, 919–938 (2004)
Ricardo, B., Berthier, R.: Modern information retrieval. ACM Press (1999)
Shen, D., Sun, J.-T., Li, H., Yang, Q., Chen, Z.: Document summarization using conditional random fields. In: Proceedings of IJCAI 2007 (2007)
Wang, D., Zhu, S., Li, T., Chi, Y., Gong, Y.: Integrating clustering and multi-document summarization to improve document understanding. In: Proceedings of CIKM 2008 (2008)
Yih, W.-T., Goodman, J., Vanderwende, L., Suzuki, H.: Multidocument summarization by maximizing informative content-words. In: Proceedings of IJCAI 2007 (2007)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2012 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Kumar, N., Srinathan, K., Varma, V. (2012). Using Wikipedia Anchor Text and Weighted Clustering Coefficient to Enhance the Traditional Multi-document Summarization. In: Gelbukh, A. (eds) Computational Linguistics and Intelligent Text Processing. CICLing 2012. Lecture Notes in Computer Science, vol 7182. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-28601-8_33
Download citation
DOI: https://doi.org/10.1007/978-3-642-28601-8_33
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-28600-1
Online ISBN: 978-3-642-28601-8
eBook Packages: Computer ScienceComputer Science (R0)