Abstract
This paper focuses on an emerging research topic about mining microbloggers’ personalized interest tags from their own microblogs ever posted. It based on an intuition that microblogs indicate the daily interests and concerns of microblogs. Previous studies regarded the microblogs posted by one microblogger as a whole document and adopted traditional keyword extraction approaches to select high weighting nouns without considering the characteristics of microblogs. Given the less textual information of microblogs and the implicit interest expression of microbloggers, we suggest a new research framework on mining microbloggers’ interests via exploiting the Wikipedia, a huge online word knowledge encyclopedia, to take up those challenges. Based on the semantic graph constructed via the Wikipedia, the proposed semantic spreading model (SSM) can discover and leverage the semantically related interest tags which do not occur in one’s microblogs. According to SSM, An interest mining system have implemented and deployed on the biggest microblogging platform (Sina Weibo) in China. We have also specified a suite of new evaluation metrics to make up the shortage of evaluation functions in this research topic. Experiments conducted on a real-time dataset demonstrate that our approach outperforms the state-of-the-art methods to identify microbloggers’ interests.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Brown, P.F., Pietra, S.A.D., Pietra, V.J.D., Mercer, R.L.: The mathematics of statistical machine translation: parameter estimation. Computational Linguistics 19(2), 263–311 (1993)
Bu, F., Hao, Y., Zhu, X.: Semantic relationship discovery with Wikipedia structure. In: Proceedings of the 22nd International Joint Conference on Artificial Intelligence, pp. 1770–1775 (2011)
Chen, K., Chen, T., Zheng, G., Jin, O., Yao, E., Yu, Y.: Collaborative personalized tweet recommendation. In: Proceedings of the 35th Annual International Conference on Research and Development in Information Retrieval, pp. 661–670 (2012)
Efron, M.: Hashtag retrieval in a microblogging environment. In: Proceedings of the 33rd Annual International Conference on Research and Development in Information Retrieval, pp. 787–788 (2010)
Gabrilvich, E., Markovitch, S.: Computing Semantic Relatedness using Wikipedia-based Explicit Semantic Analysis. In: IJCAI 2007, pp. 1606–1610 (2007)
Kwak, H., Lee, C., Park, H., Moon, S.: What is Twitter, a social network or a news media? In: Proceedings of the 19th International Conference on World Wide Web, pp. 591–600 (2010)
Gupta, M., Li, R., Yin, Z., Han, J.: Survey on social tagging techniques. In: SIGKDD Explor., pp. 58–72 (2010)
Hu, J., Fang, L., Cao, Y., Zeng, H.-J., Li, H., Yang, Q., Chen, Z.: Enhancing text clustering by leveraging Wikipedia semantics. In: Proceedings of the 31st Annual International Conference on Research and Development in Information Retrieval, pp. 179–186 (2008)
Hu, J., Wang, G., Lochovsky, F., Sun, J., Chen, Z.: Understanding use’s query intent with Wikipedia. In: Proceedings of the 18th World Wide Web Conference, pp. 471–478 (2009)
Jiang, L., Yu, M., Zhou. M., Liu, X., Zhao, T. : Target-dependent twitter sentiment classification. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics, pp. 151–160 (2011)
Liu, Z., Chen, X., Sun, M.: Mining the interests of Chinese microbloggers via keyword extraction. Front. Comput. Sci. 6(1), 76–87 (2012)
Mihalcea, R., Tarau, P.: Textrank: Bringing order into texts. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing, Barcelona, Spain (2004)
Petrovic, S., Osborne, M., Lavrendo, V.: Streaming first story detection with application to Twitter. In: Proceedings of the North American Chapter of the ACL, pp. 181–189 (2010)
Sakaki, T., Okazaki, M., Matsuo, Y.: Earthquake shakes Twitter users: real-time event detection by social sensors. In: Proceedings of the 19th World Wide Web Conference, pp. 851–860 (2010)
Salton, G., Buckley, C.: Term-weighting approaches in automatic text retrieval. Information Processing and Management 24(5), 513–523 (1988)
Schonhofen, P.: Identifying document topics using the Wikipedia category network. In: Web Intell. Agent Syst., pp. 456–462 (2006)
Sowa, J.: Semantics of conceptual graphs. In: Proceedings of the Annual Meeting of the Association for Computational Linguistics, pp. 39–44 (1979)
Strube, M., Ponzetto, S.P.: Wikirelate! Computing semantic relatedness using Wikipedia. In: Proceedings of the 21st National Conference on Artificial Intelligence, Boston, MA (2006)
Wang, P., Hu, J., Zeng, H.-J., Chen, Z.: Using Wikipedia knowledge to improve text classification. Knowl. Inf. Syst. 19, 265–281 (2009)
Wu, W., Zhang, B., Ostendorf, M.: Automatic generation of personalized annotation tags for twitter users. In: Proceedings of the North American Chapter of the ACL, pp. 689–692 (2010)
Yu, J., Thom, J., Tam, A.: Ontology evaluation using Wikipedia categories for browsing. In: Proceedings of the 6th ACM Conference on Information and Knowledge Management, pp. 223–232 (2007)
Zhang, W., Wang, D., Xue, G.-R., Zha, H.: Advertising keywords recommendation for short-text Web pages using Wikipeda. ACM Trans. Intell. Syst. Technol. 3(2), Article 36, 25 pages (February 2012)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Fan, M., Zhou, Q., Zheng, T.F. (2014). Mining the Personal Interests of Microbloggers via Exploiting Wikipedia Knowledge. In: Gelbukh, A. (eds) Computational Linguistics and Intelligent Text Processing. CICLing 2014. Lecture Notes in Computer Science, vol 8404. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-54903-8_16
Download citation
DOI: https://doi.org/10.1007/978-3-642-54903-8_16
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-54902-1
Online ISBN: 978-3-642-54903-8
eBook Packages: Computer ScienceComputer Science (R0)