A Probabilistic Neighbourhood Translation Approach for Non-standard Text Categorisation

Kabán, Ata

doi:10.1007/978-3-540-88411-8_31

Ata Kabán²²

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 5255))

Included in the following conference series:

International Conference on Discovery Science

872 Accesses

Abstract

The need for non-standard text categorisation, i.e. based on some subtle criterion other than topics, may arise in various circumstances. In this study, we consider written responses to a standardised psychometric test for determining the personality trait of human subjects. A number of state-of-the-art text classifiers that have been very successful in standard topic-based classification problems turn out to perform poorly in this task. Here we propose a very simple probabilistic approach, which is able to achieve accurate predictions, and demonstrates this peculiar problem is still solvable by simple statistical text representation means. We then extend this approach to include a latent variable, in order to obtain additional explanatory information beyond a black-box prediction.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Colas, F., Brazdil, P.: Comparison of SVM and Some Other Classification Algorithms in Text Classification Tasks. Artificial Intelligence in Theory and Practice 217, 169–178 (2006)
Article Google Scholar
Madsen, R.E., Kauchak, D., Elkan, C.: Modeling Word Burstiness Using the Dirichlet Distribution. In: Proceedings of the Twenty-Second International Conference on Machine Learning (2005)
Google Scholar
Eyheramendy, S., Genkin, A., Ju, W.-H., Lewis, D.D., Madigan, D.: Sparse Bayesian Classifiers for Text Categorization. Technical Report, Department of Statistics, Rutgers University (2003)
Google Scholar
Fawcett, T.: ROC graphs: Notes and practical considerations for researchers, Technical report, HP Laboratories, MS 1143, 1501 Page Mill Road, Palo Alto CA 94304, USA (April 2004)
Google Scholar
Goldberger, J., Roweis, S., Hinton, G., Salakhutdinov, R.: Neighbourhood Component Analysis. In: Neural Information Processing Systems (NIPS 2004) 17, pp. 513–520 (2004)
Google Scholar
Hofmann, T.: Probabilistic Latent Semantic Analysis. In: Proceedings of the 15th Conference on Uncertainty in Artificial Intelligence (UAI 1999) (1999)
Google Scholar
Joachims, T.: Text Categorization with Support Vector Machines: Learning with Many Relevant Features. In: Proceedings of the European Conference on Machine Learning (1998)
Google Scholar
McCallum, A.K.: Bow: A toolkit for statistical language modeling, text retrieval, classification and clustering (1996), www.cs.cmu.edu/~mccallum/bow
Mitchell, T.: Machine Learning, ch. 6. McGraw Hill, New York (1997)
Google Scholar
Salton, G., Wong, A., Yang, C.S.: A vector space model for automatic indexing. Communications of the ACM 18, 613–620 (1975)
Article MATH Google Scholar
Saul, L., Pereira, F.: Aggregate Markov Models for statistical language processing. In: Proc. of the Second Conference on Empirical Methods in Natural Language Processing, pp. 81–89 (1997)
Google Scholar
Sebastiani, F.: Machine learning in automated text categorization. ACM Computing Surveys 34, 1–47 (2002)
Article MathSciNet Google Scholar
Shevade, S.K., Keerthi, S.S.: A Simple and Efficient Algorithm for Gene Selection using Sparse Logistic Regression, Technical Report No. CD-02-22, Control Division, Department of Mechanical Engineering, National University of Singapore, Singapore - 117 576 (2002)
Google Scholar
Yang, Y.: An Evaluation of Statistical Approaches to Text Categorization. Information Retrieval 1(1/2), 69–90 (1999)
Article Google Scholar

Download references

Author information

Authors and Affiliations

School of Computer Science, The University of Birmingham, Birmingham, B15 2TT, UK
Ata Kabán

Authors

Ata Kabán
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

INSA Lyon, LIRIS CNRS UMR 5205, University of Lyon, 69621, Villeurbanne Cedex, France
Jean-François Jean-Fran
Department of Computer and Information Science, University of Konstanz, Box M 712, 78457, Konstanz, Germany
Michael R. Berthold
University of Bonn and Fraunhofer IAIS, Schloss Birlinghoven, 53754, Sankt Augustin, Germany
Tamás Horváth

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Kabán, A. (2008). A Probabilistic Neighbourhood Translation Approach for Non-standard Text Categorisation. In: Jean-Fran, JF., Berthold, M.R., Horváth, T. (eds) Discovery Science. DS 2008. Lecture Notes in Computer Science(), vol 5255. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-88411-8_31

Download citation

DOI: https://doi.org/10.1007/978-3-540-88411-8_31
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-88410-1
Online ISBN: 978-3-540-88411-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics