Fast Training of a Graph Boosting for Large-Scale Text Classification

Yoshikawa, Hiyori; Iwakura, Tomoya

doi:10.1007/978-3-319-42911-3_53

Hiyori Yoshikawa¹⁵ &
Tomoya Iwakura¹⁵

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 9810))

Included in the following conference series:

Pacific Rim International Conference on Artificial Intelligence

2829 Accesses
1 Citations

Abstract

This paper proposes a fast training method for graph classification based on a boosting algorithm and its application to sentimental analysis with input texts represented by graphs. Graph format is very suitable for representing texts structured with Natural Language Processing techniques such as morphological analysis, Named Entity Recognition, and parsing. A number of classification methods which represent texts as graphs have been proposed so far. However, many of them limit candidate features in advance because of quite large size of feature space. Instead of limiting search space in advance, we propose two approximation methods for learning of graph-based rules in a boosting. Experimental results on a sentimental analysis dataset show that our method contributes to improved training speed. In addition, the graph representation-based classification method exploits rich structural information of texts, which is impossible to be detected when using other simpler input formats, and shows higher accuracy.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
In (Kudo et al. 2004), a weak classifier is defined to return \(-\alpha \) if \(g\not \subseteq x\). Considering the results of preliminary experiments, we decided to use the above definition instead.
2.
We may omit the iteration index j when no confusion can arise.
3.
With a slight modification, we can start searches from single node graphs so that the result may contain single node feature graphs.
4.
To convert the output of SENNA into tree format, we used Penn2Malt 0.2 (http://stp.lingfil.uu.se/~nivre/research/Penn2Malt.html) with the following options: head rules in (http://stp.lingfil.uu.se/~nivre/research/headrules.txt), deprel 1, and punctuation 1.

References

Arora, S., Mayfield, E., Rosé, C.P., Nyberg, E.: Sentiment classification using automatically extracted subgraph features. In: Proceedings of the NAACL HLT 2010 Workshop on Computational Approaches to Analysis and Generation of Emotion in Text, pp. 131–139 (2010)
Google Scholar
Baccianella, S., Esuli, A., Sebastiani, F.: SentiWordNet 3.0: an enhanced lexical resource for sentiment analysis and opinion mining. In: Proceedings of Seventh International Conference on Language Resources and Evaluation, pp. 2200–2204 (2010)
Google Scholar
Blitzer, J., Dredze, M., Pereira, F.: Biographies, bollywood, boom-boxes and blenders: domain adaptation for sentiment classification. In: Proceedings of the 45th Annual Meeting of the Association of Computational Linguistics, pp. 440–447 (2007)
Google Scholar
Boser, B.E., Guyon, I.M., Vapnik, V.N.: A training algorithm for optimal margin classifiers. In: Proceedings of the Fifth Annual ACM Conference on Computational Learning Theory, pp. 144–152 (1992)
Google Scholar
Collobert, R.: Deep learning for efficient discriminative parsing. In: International Conference on Artificial Intelligence and Statistics (2011)
Google Scholar
Collobert, R., Weston, J., Bottou, L., Karlen, M., Kavukcuoglu, K., Kuksa, P.: Natural language processing (almost) from scratch. J. Mach. Learn. Res. 12, 2493–2537 (2011)
MATH Google Scholar
Fei, H., Huan, J.: Structured sparse boosting for graph classification. ACM Trans. Knowl. Discov. Data 9, 1–22 (2014)
Article Google Scholar
Frank, R.: The perceptron: A probabilistic model for information storage and organization in the brain. Psycholog. Rev. 65, 386–408 (1958)
Article Google Scholar
Freund, Y.: The alternating decision tree algorithm. In: Proceedings of the Sixteenth International Conference on Machine Learning, pp. 124–133 (1999)
Google Scholar
Gee, K.R., Cook, D.J.: Text classification using graph-encoded linguistic elements. In: Proceedings of the Eighteenth International Florida Artificial Intelligence Research Society Conference, pp. 487–492 (2005)
Google Scholar
Iwakura, T.: A boosting-based algorithm for classification of semi-structured text using frequency of substructures. In: Proceedings of 9th International Conference on Recent Advances in Natural Language Processing, pp. 319–326 (2013)
Google Scholar
Iwakura, T., Okamoto, S.: A fast boosting-based learner for feature-rich tagging and chunking. In: Proceedings of Twelfth Conference on Computational Natural Language Learning, pp. 17–24 (2008)
Google Scholar
Jiang, C., Coenen, F., Sanderson, R., Zito, M.: Text classification using graph mining-based feature extraction. Knowl-Bas. Syst. 23, 302–308 (2010)
Article Google Scholar
Kashima, H., Tsuda, K., Inokuchi, A.: Marginalized kernels between labeled graphs. In: Proceedings of the Twentieth International Conference on Machine Learning, pp. 321–328 (2003)
Google Scholar
Kudo, T., Maeda, E., Matsumoto, Y.: An application of boosting to graph classification. Adv. Neural Inf. Process. Syst. 17, 729–736 (2004)
Google Scholar
Kudo, T., Matsumoto, Y.: A boosting algorithm for classification of semi-structured text. In: Proceedings of 9th Conference on Empirical Methods in Natural Language Processing, pp. 301–308 (2004)
Google Scholar
Matsumoto, S., Takamura, H., Okumura, M.: Sentiment classification using word sub-sequences and dependency sub-trees. In: Ho, T.B., Cheung, D., Liu, H. (eds.) PAKDD 2005. LNCS, vol. 3518, pp. 301–311. Springer, Heidelberg (2005)
Chapter Google Scholar
Okazaki, N.: Classias: a collection of machine-learning algorithms for classification (2009). http://www.chokkan.org/software/classias/
Pan, S., Wu, J., Zhu, X.: CogBoost: boosting for fast cost-sensitive graph classification. IEEE Trans. Knowl. Data Eng. 27, 2933–2946 (2015)
Article Google Scholar
Pan, S., Wu, J., Zhu, X., Long, G., Zhang, C.: Boosting for graph classification with universum. Knowl. Inf. Syst. 47, 1–25 (2016)
Article Google Scholar
Pan, S., Wu, J., Zhu, X., Zhang, C.: Graph ensemble boosting for imbalanced noisy graph stream classification. IEEE Trans. Cybern. 45, 940–954 (2015)
Google Scholar
Saigo, H., Nowozin, S., Kadowaki, T., Kudo, T., Tsuda, K.: gBoost: a mathematical programming approach to graph classification and regression. Mach. Learn. 75, 69–89 (2009)
Article Google Scholar
Schapire, R.E., Singer, Y.: Improved boosting algorithms using confidence-rated predictions. Mach. Learn. 37, 297–336 (1999)
Article MATH Google Scholar
Wu, J., Pan, S., Zhu, X., Cai, Z.: Boosting for multi-graph classification. IEEE Trans. Cybern. 45, 430–443 (2015)
Article Google Scholar
Yan, X., Han, J.: gSpan: graph-based substructure pattern mining. In: Proceedings of 2002 IEEE International Conference on Data Mining, pp. 721–724 (2002)
Google Scholar

Download references

Author information

Authors and Affiliations

Fujitsu Laboratories Ltd., Kawasaki, Japan
Hiyori Yoshikawa & Tomoya Iwakura

Authors

Hiyori Yoshikawa
View author publications
You can also search for this author in PubMed Google Scholar
Tomoya Iwakura
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Hiyori Yoshikawa .

Editor information

Editors and Affiliations

Cardiff University, Cardiff, United Kingdom
Richard Booth
Southeast University , Nanjing, China
Min-Ling Zhang

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Yoshikawa, H., Iwakura, T. (2016). Fast Training of a Graph Boosting for Large-Scale Text Classification. In: Booth, R., Zhang, ML. (eds) PRICAI 2016: Trends in Artificial Intelligence. PRICAI 2016. Lecture Notes in Computer Science(), vol 9810. Springer, Cham. https://doi.org/10.1007/978-3-319-42911-3_53

Download citation

DOI: https://doi.org/10.1007/978-3-319-42911-3_53
Published: 10 August 2016
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-42910-6
Online ISBN: 978-3-319-42911-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics