Abstract
Multidimensional Scaling algorithms (MDS) are useful tools that help to discover high dimensional object relationships. They have been applied to a wide range of practical problems and particularly to the visualization of the semantic relations among documents or terms in textual databases.
The MDS algorithms proposed in the literature often suffer from a low discriminant power due to its unsupervised nature and to the ‘curse of dimensionality’. Fortunately, textual databases provide frequently a manually created classification for a subset of documents that may help to overcome this problem.
In this paper we propose a semi-supervised version of the Torgerson MDS algorithm that takes advantage of this document classification to improve the discriminant power of the word maps generated. The algorithm has been applied to the visualization of term relationships. The experimental results show that the model proposed outperforms well known unsupervised alternatives.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Aggarwal, C.C., Gates, S.C., Yu, P.S.: On Using Partial Supervision for Text Categorization. IEEE Transactions on Knowledge and Data Engineering 16(2), 245–255 (2004)
Aggarwal, C.C.: Re-designing distance functions and distance-based applications for high dimensional applications. In: Proc. of SIGMOD-PODS, vol. 1, pp. 13–18 (2001)
Baeza-Yates, R., Ribeiro-Neto, B.: Modern information retrieval. Addison-Wesley, Wokingham, UK (1999)
Bartell, B.T., Cottrell, G.W., Belew, R.K.: Latent Semantic Indexing is an Optimal Special Case of Multidimensional Scaling. In: Proceedings of ACM SIGIR Conference, Copenhagen, pp. 161–167. ACM Press, New York (1992)
Berry, M.W., Drmac, Z., Jessup, E.R.: Matrices, vector spaces and information retrieval. SIAM review 41(2), 335–362 (1999)
Buja, A., Logan, B., Reeds, F., Shepp, R.: Inequalities and positive default functions arising from a problem in multidimensional scaling. Annals of Statistics 22, 406–438 (1994)
Chapelle, O., Weston, J., Schölkopf, B.: Cluster kernels for semi-supervised learning. In: Conference Neural Information Processing Systems, vol. 15 (2003)
Chen, H., Houston, A.L., Sewell, R.R., Schatz, B.R.: Internet browsing and searching: User evaluations of category map and concept space techniques. Journal of the American Society for Information Science (JASIS) 49(7), 582–603 (1998)
Chung, Y.M., Lee, J.Y.: A corpus-based approach to comparative evaluation of statistical term association measures. Journal of the American Society for Information Science and Technology 52(4), 283–296 (2001)
Cox, T.F., Cox, M.A.A.: Multidimensional scaling, 2nd edn. Chapman & Hall/CRC, USA (2001)
Golub, G.H., Van Loan, C.F.: Matrix Computations, 3rd edn. Johns Hopkins university press, Baltimore, Maryland, USA (1996)
Joachims, T.: Learning to Classify Text using Support Vector Machines. Methods, Theory and Algorithms. Kluwer Academic Publishers, Boston (2002)
Kaufman, L., Rousseeuw, P.J.: Finding groups in data. An introduction to cluster analysis. John Wiley & Sons, New York (1990)
Kothari, R., Jain, V.: Learning from Labeled and Unlabeled Data Using a Minimal Number of Queries. IEEE Transactions on Neural Networks 14(6), 1496–1505 (2003)
Lebart, L., Salem, A., Berry, L.: Exploring Textual Data. Kluwer Academic Publishers, Netherlands (1998)
Mao, J., Jain, A.K.: Artificial neural networks for feature extraction and multivariate data projection. IEEE Transactions on Neural Networks 6(2) (March 1995)
Martín-Merino, M., Muñoz, A.: A New MDS Algorithm for Textual Data Analysis. In: Pal, N.R., Kasabov, N., Mudi, R.K., Pal, S., Parui, S.K. (eds.) ICONIP 2004. LNCS, vol. 3316, pp. 860–867. Springer, Heidelberg (2004)
Martín-Merino, M., Muñoz, A.: A New Sammon Algorithm for Sparse Data Visualization. In: International Conference on Pattern Recognition, Cambridge, vol. 1, pp. 477–481 (August, 2004)
Mladenié, D.: Turning Yahoo into an Automatic Web-Page Classifier. In: Proceedings 13th European Conference on Aritficial Intelligence, Brighton, pp. 473–474 (1998)
Pedrycz, W., Vukovich, G.: Fuzzy Clustering with Supervision. Pattern Recognition 37, 1339–1349 (2004)
Strehl, A., Ghosh, J., Mooney, R.: Impact of similarity measures on web-page clustering. In: Proceedings of the 17th National Conference on Artificial Intelligence: Workshop of Artificial Intelligence for Web Search, Austin, USA, pp. 58–64 (July 2000)
Vapnik, V.N.: Statistical Learning Theory. John Wiley & Sons, New York (1998)
Yang, Y., Pedersen, J.O.: A comparative study on feature selection in text categorization. In: Proc. of the 14th International Conference on Machine Learning, Nashville, Tennessee, USA, pp. 412–420 (July, 1997)
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 2007 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Blanco, Á., Martín-Merino, M. (2007). Semi-supervised Metrics for Textual Data Visualization. In: de Sá, J.M., Alexandre, L.A., Duch, W., Mandic, D. (eds) Artificial Neural Networks – ICANN 2007. ICANN 2007. Lecture Notes in Computer Science, vol 4669. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-74695-9_45
Download citation
DOI: https://doi.org/10.1007/978-3-540-74695-9_45
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-74693-5
Online ISBN: 978-3-540-74695-9
eBook Packages: Computer ScienceComputer Science (R0)