Semi-supervised Metrics for Textual Data Visualization

Blanco, Ángela; Martín-Merino, Manuel

doi:10.1007/978-3-540-74695-9_45

Ángela Blanco¹ &
Manuel Martín-Merino¹

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 4669))

Included in the following conference series:

International Conference on Artificial Neural Networks

1885 Accesses

Abstract

Multidimensional Scaling algorithms (MDS) are useful tools that help to discover high dimensional object relationships. They have been applied to a wide range of practical problems and particularly to the visualization of the semantic relations among documents or terms in textual databases.

The MDS algorithms proposed in the literature often suffer from a low discriminant power due to its unsupervised nature and to the ‘curse of dimensionality’. Fortunately, textual databases provide frequently a manually created classification for a subset of documents that may help to overcome this problem.

In this paper we propose a semi-supervised version of the Torgerson MDS algorithm that takes advantage of this document classification to improve the discriminant power of the word maps generated. The algorithm has been applied to the visualization of term relationships. The experimental results show that the model proposed outperforms well known unsupervised alternatives.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Aggarwal, C.C., Gates, S.C., Yu, P.S.: On Using Partial Supervision for Text Categorization. IEEE Transactions on Knowledge and Data Engineering 16(2), 245–255 (2004)
Article Google Scholar
Aggarwal, C.C.: Re-designing distance functions and distance-based applications for high dimensional applications. In: Proc. of SIGMOD-PODS, vol. 1, pp. 13–18 (2001)
Google Scholar
Baeza-Yates, R., Ribeiro-Neto, B.: Modern information retrieval. Addison-Wesley, Wokingham, UK (1999)
Google Scholar
Bartell, B.T., Cottrell, G.W., Belew, R.K.: Latent Semantic Indexing is an Optimal Special Case of Multidimensional Scaling. In: Proceedings of ACM SIGIR Conference, Copenhagen, pp. 161–167. ACM Press, New York (1992)
Google Scholar
Berry, M.W., Drmac, Z., Jessup, E.R.: Matrices, vector spaces and information retrieval. SIAM review 41(2), 335–362 (1999)
Article MATH Google Scholar
Buja, A., Logan, B., Reeds, F., Shepp, R.: Inequalities and positive default functions arising from a problem in multidimensional scaling. Annals of Statistics 22, 406–438 (1994)
MATH Google Scholar
Chapelle, O., Weston, J., Schölkopf, B.: Cluster kernels for semi-supervised learning. In: Conference Neural Information Processing Systems, vol. 15 (2003)
Google Scholar
Chen, H., Houston, A.L., Sewell, R.R., Schatz, B.R.: Internet browsing and searching: User evaluations of category map and concept space techniques. Journal of the American Society for Information Science (JASIS) 49(7), 582–603 (1998)
Google Scholar
Chung, Y.M., Lee, J.Y.: A corpus-based approach to comparative evaluation of statistical term association measures. Journal of the American Society for Information Science and Technology 52(4), 283–296 (2001)
Article Google Scholar
Cox, T.F., Cox, M.A.A.: Multidimensional scaling, 2nd edn. Chapman & Hall/CRC, USA (2001)
MATH Google Scholar
Golub, G.H., Van Loan, C.F.: Matrix Computations, 3rd edn. Johns Hopkins university press, Baltimore, Maryland, USA (1996)
MATH Google Scholar
Joachims, T.: Learning to Classify Text using Support Vector Machines. Methods, Theory and Algorithms. Kluwer Academic Publishers, Boston (2002)
Google Scholar
Kaufman, L., Rousseeuw, P.J.: Finding groups in data. An introduction to cluster analysis. John Wiley & Sons, New York (1990)
Google Scholar
Kothari, R., Jain, V.: Learning from Labeled and Unlabeled Data Using a Minimal Number of Queries. IEEE Transactions on Neural Networks 14(6), 1496–1505 (2003)
Article Google Scholar
Lebart, L., Salem, A., Berry, L.: Exploring Textual Data. Kluwer Academic Publishers, Netherlands (1998)
Google Scholar
Mao, J., Jain, A.K.: Artificial neural networks for feature extraction and multivariate data projection. IEEE Transactions on Neural Networks 6(2) (March 1995)
Google Scholar
Martín-Merino, M., Muñoz, A.: A New MDS Algorithm for Textual Data Analysis. In: Pal, N.R., Kasabov, N., Mudi, R.K., Pal, S., Parui, S.K. (eds.) ICONIP 2004. LNCS, vol. 3316, pp. 860–867. Springer, Heidelberg (2004)
Google Scholar
Martín-Merino, M., Muñoz, A.: A New Sammon Algorithm for Sparse Data Visualization. In: International Conference on Pattern Recognition, Cambridge, vol. 1, pp. 477–481 (August, 2004)
Google Scholar
Mladenié, D.: Turning Yahoo into an Automatic Web-Page Classifier. In: Proceedings 13th European Conference on Aritficial Intelligence, Brighton, pp. 473–474 (1998)
Google Scholar
Pedrycz, W., Vukovich, G.: Fuzzy Clustering with Supervision. Pattern Recognition 37, 1339–1349 (2004)
Article MATH Google Scholar
Strehl, A., Ghosh, J., Mooney, R.: Impact of similarity measures on web-page clustering. In: Proceedings of the 17th National Conference on Artificial Intelligence: Workshop of Artificial Intelligence for Web Search, Austin, USA, pp. 58–64 (July 2000)
Google Scholar
Vapnik, V.N.: Statistical Learning Theory. John Wiley & Sons, New York (1998)
MATH Google Scholar
Yang, Y., Pedersen, J.O.: A comparative study on feature selection in text categorization. In: Proc. of the 14th International Conference on Machine Learning, Nashville, Tennessee, USA, pp. 412–420 (July, 1997)
Google Scholar

Download references

Author information

Authors and Affiliations

Universidad Pontificia de Salamanca, C/Compañía 5, 37002, Salamanca, Spain
Ángela Blanco & Manuel Martín-Merino

Authors

Ángela Blanco
View author publications
You can also search for this author in PubMed Google Scholar
Manuel Martín-Merino
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Joaquim Marques de Sá Luís A. Alexandre Włodzisław Duch Danilo Mandic

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Blanco, Á., Martín-Merino, M. (2007). Semi-supervised Metrics for Textual Data Visualization. In: de Sá, J.M., Alexandre, L.A., Duch, W., Mandic, D. (eds) Artificial Neural Networks – ICANN 2007. ICANN 2007. Lecture Notes in Computer Science, vol 4669. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-74695-9_45

Download citation

DOI: https://doi.org/10.1007/978-3-540-74695-9_45
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-74693-5
Online ISBN: 978-3-540-74695-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics