Skip to main content

Fast Training of a Graph Boosting for Large-Scale Text Classification

  • Conference paper
  • First Online:
PRICAI 2016: Trends in Artificial Intelligence (PRICAI 2016)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 9810))

Included in the following conference series:

Abstract

This paper proposes a fast training method for graph classification based on a boosting algorithm and its application to sentimental analysis with input texts represented by graphs. Graph format is very suitable for representing texts structured with Natural Language Processing techniques such as morphological analysis, Named Entity Recognition, and parsing. A number of classification methods which represent texts as graphs have been proposed so far. However, many of them limit candidate features in advance because of quite large size of feature space. Instead of limiting search space in advance, we propose two approximation methods for learning of graph-based rules in a boosting. Experimental results on a sentimental analysis dataset show that our method contributes to improved training speed. In addition, the graph representation-based classification method exploits rich structural information of texts, which is impossible to be detected when using other simpler input formats, and shows higher accuracy.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    In (Kudo et al. 2004), a weak classifier is defined to return \(-\alpha \) if \(g\not \subseteq x\). Considering the results of preliminary experiments, we decided to use the above definition instead.

  2. 2.

    We may omit the iteration index j when no confusion can arise.

  3. 3.

    With a slight modification, we can start searches from single node graphs so that the result may contain single node feature graphs.

  4. 4.

    To convert the output of SENNA into tree format, we used Penn2Malt 0.2 (http://stp.lingfil.uu.se/~nivre/research/Penn2Malt.html) with the following options: head rules in (http://stp.lingfil.uu.se/~nivre/research/headrules.txt), deprel 1, and punctuation 1.

References

  • Arora, S., Mayfield, E., Rosé, C.P., Nyberg, E.: Sentiment classification using automatically extracted subgraph features. In: Proceedings of the NAACL HLT 2010 Workshop on Computational Approaches to Analysis and Generation of Emotion in Text, pp. 131–139 (2010)

    Google Scholar 

  • Baccianella, S., Esuli, A., Sebastiani, F.: SentiWordNet 3.0: an enhanced lexical resource for sentiment analysis and opinion mining. In: Proceedings of Seventh International Conference on Language Resources and Evaluation, pp. 2200–2204 (2010)

    Google Scholar 

  • Blitzer, J., Dredze, M., Pereira, F.: Biographies, bollywood, boom-boxes and blenders: domain adaptation for sentiment classification. In: Proceedings of the 45th Annual Meeting of the Association of Computational Linguistics, pp. 440–447 (2007)

    Google Scholar 

  • Boser, B.E., Guyon, I.M., Vapnik, V.N.: A training algorithm for optimal margin classifiers. In: Proceedings of the Fifth Annual ACM Conference on Computational Learning Theory, pp. 144–152 (1992)

    Google Scholar 

  • Collobert, R.: Deep learning for efficient discriminative parsing. In: International Conference on Artificial Intelligence and Statistics (2011)

    Google Scholar 

  • Collobert, R., Weston, J., Bottou, L., Karlen, M., Kavukcuoglu, K., Kuksa, P.: Natural language processing (almost) from scratch. J. Mach. Learn. Res. 12, 2493–2537 (2011)

    MATH  Google Scholar 

  • Fei, H., Huan, J.: Structured sparse boosting for graph classification. ACM Trans. Knowl. Discov. Data 9, 1–22 (2014)

    Article  Google Scholar 

  • Frank, R.: The perceptron: A probabilistic model for information storage and organization in the brain. Psycholog. Rev. 65, 386–408 (1958)

    Article  Google Scholar 

  • Freund, Y.: The alternating decision tree algorithm. In: Proceedings of the Sixteenth International Conference on Machine Learning, pp. 124–133 (1999)

    Google Scholar 

  • Gee, K.R., Cook, D.J.: Text classification using graph-encoded linguistic elements. In: Proceedings of the Eighteenth International Florida Artificial Intelligence Research Society Conference, pp. 487–492 (2005)

    Google Scholar 

  • Iwakura, T.: A boosting-based algorithm for classification of semi-structured text using frequency of substructures. In: Proceedings of 9th International Conference on Recent Advances in Natural Language Processing, pp. 319–326 (2013)

    Google Scholar 

  • Iwakura, T., Okamoto, S.: A fast boosting-based learner for feature-rich tagging and chunking. In: Proceedings of Twelfth Conference on Computational Natural Language Learning, pp. 17–24 (2008)

    Google Scholar 

  • Jiang, C., Coenen, F., Sanderson, R., Zito, M.: Text classification using graph mining-based feature extraction. Knowl-Bas. Syst. 23, 302–308 (2010)

    Article  Google Scholar 

  • Kashima, H., Tsuda, K., Inokuchi, A.: Marginalized kernels between labeled graphs. In: Proceedings of the Twentieth International Conference on Machine Learning, pp. 321–328 (2003)

    Google Scholar 

  • Kudo, T., Maeda, E., Matsumoto, Y.: An application of boosting to graph classification. Adv. Neural Inf. Process. Syst. 17, 729–736 (2004)

    Google Scholar 

  • Kudo, T., Matsumoto, Y.: A boosting algorithm for classification of semi-structured text. In: Proceedings of 9th Conference on Empirical Methods in Natural Language Processing, pp. 301–308 (2004)

    Google Scholar 

  • Matsumoto, S., Takamura, H., Okumura, M.: Sentiment classification using word sub-sequences and dependency sub-trees. In: Ho, T.B., Cheung, D., Liu, H. (eds.) PAKDD 2005. LNCS, vol. 3518, pp. 301–311. Springer, Heidelberg (2005)

    Chapter  Google Scholar 

  • Okazaki, N.: Classias: a collection of machine-learning algorithms for classification (2009). http://www.chokkan.org/software/classias/

  • Pan, S., Wu, J., Zhu, X.: CogBoost: boosting for fast cost-sensitive graph classification. IEEE Trans. Knowl. Data Eng. 27, 2933–2946 (2015)

    Article  Google Scholar 

  • Pan, S., Wu, J., Zhu, X., Long, G., Zhang, C.: Boosting for graph classification with universum. Knowl. Inf. Syst. 47, 1–25 (2016)

    Article  Google Scholar 

  • Pan, S., Wu, J., Zhu, X., Zhang, C.: Graph ensemble boosting for imbalanced noisy graph stream classification. IEEE Trans. Cybern. 45, 940–954 (2015)

    Google Scholar 

  • Saigo, H., Nowozin, S., Kadowaki, T., Kudo, T., Tsuda, K.: gBoost: a mathematical programming approach to graph classification and regression. Mach. Learn. 75, 69–89 (2009)

    Article  Google Scholar 

  • Schapire, R.E., Singer, Y.: Improved boosting algorithms using confidence-rated predictions. Mach. Learn. 37, 297–336 (1999)

    Article  MATH  Google Scholar 

  • Wu, J., Pan, S., Zhu, X., Cai, Z.: Boosting for multi-graph classification. IEEE Trans. Cybern. 45, 430–443 (2015)

    Article  Google Scholar 

  • Yan, X., Han, J.: gSpan: graph-based substructure pattern mining. In: Proceedings of 2002 IEEE International Conference on Data Mining, pp. 721–724 (2002)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Hiyori Yoshikawa .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer International Publishing Switzerland

About this paper

Cite this paper

Yoshikawa, H., Iwakura, T. (2016). Fast Training of a Graph Boosting for Large-Scale Text Classification. In: Booth, R., Zhang, ML. (eds) PRICAI 2016: Trends in Artificial Intelligence. PRICAI 2016. Lecture Notes in Computer Science(), vol 9810. Springer, Cham. https://doi.org/10.1007/978-3-319-42911-3_53

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-42911-3_53

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-42910-6

  • Online ISBN: 978-3-319-42911-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics