Extensible Markup Language (XML) has been used as a standard format for data representation over the Internet. An XML document is usually organized by a set of textual data according to a predefined logical structure. It has been shown that storing documents having similar structures together can reduce the fragmentation problem and improve query efficiency. Unlike the flat text document, the XML document has no vectorial representation, which is required in most existing classification algorithms.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
C.J.C. Burges. A tutorial on support vector machines for pattern recognition. Data Min. Knowl. Discov., 2(2):121-167, 1998.
C.J.C. Burges. Geometric methods for feature extraction and dimensional reduction — a guided tour. In The Data Mining and Knowledge Discovery Handbook, pages 59-92. Springer, New York, 2005.
S. Boyd and L. Xiao. Least-squares covariance matrix adjustment. SIAM Journal on Matrix Analysis and Applications, 27(2):532-546, 2005. Available from World Wide Web: http://link.aip.org/link/?SML/27/532/1.
T.F. Cox and M.A.A. Cox. Multidimensional Scaling. Monographs on Statistics and Applied Probability. Chapman & Hall/CRC, Boca Raton, 2nd edition, 2001.
W. Chen. New algorithm for ordered tree-to-tree correction problem. J. Algorithms, 40(2):135-158, 2001.
C.-C. Chang and C.-J. Lin. LIBSVM: a library for support vector machines, 2001. Software available at http://www.csie.ntu.edu.tw/∼cjlin/ libsvm.
F.H. Clark. Optimization and Nonsmooth Analysis. John Wiley and Sons, New York, 1983.
E.R. Canfield and G. Xing. Approximate xml document matching. In SAC ’05: Proceedings of the 2005 ACM Symposium on Applied Computing, pages 787-788. ACM Press, New Work, 2005.
L. Denoyer and P. Gallinari. XML Document Mining Challenge. Database available at http://xmlmining.lip6.fr/.
L. Denoyer and P. Gallinari. Bayesian network model for semistructured document classification. Inf. Process. Manage., 40(5):807-827, 2004.
M. Garofalakis, A. Gionis, R. Rastogi, S. Seshadri, and K. Shim. Xtract: a system for extracting document type descriptors from xml documents. In SIGMOD ’00: Proceedings of the 2000 ACM SIGMOD International Conference on Management of Data, pages 165-176. ACM Press, New York, 2000.
S. Guha, H.V. Jagadish, N. Koudas, D. Srivastava, and T. Yu. Integrating xml data sources using approximate joins. ACM Trans. Database Syst., 31(1):161-207, 2006.
N.J. Higham. Computing the nearest correlation matrix — a problem from finance. IMA Journal of Numerical Analysis, 22(3):329-343, 2002.
T. Joachims. Text categorization with suport vector machines: learning with many relevant features. In Claire N édellec and C éline Rouveirol, editors, Proceedings of ECML-98, 10th European Conference on Machine Learning, volume 1398 of Lecture Notes in Computer Science, pages 137-142. Springer, New York, 1998. Available from World Wide Web: citeseer.ist.psu.edu/ joachims97text.html.
G.R.G. Lanckriet, N. Cristianini, P. Bartlett, L. El Ghaoui, and M.I. Jordan. Learning the kernel matrix with semidefinite programming. J. Mach. Learn. Res., 5:27-72,2004.
W. Lian, D.W. Cheung, N. Mamoulis, and S.-M. Yiu. An efficient and scalable algorithm for clustering xml documents by structure. IEEE Transactions on Knowledge and Data Engineering, 16(1):82-96, 2004.
J. Malick. A dual approach to semidefinite least-squares problems. SIAM J. Matrix Anal. Appl., 26(1):272-284, 2005.
M. Murata. Hedge automata: a formal model for XML schemata. Web page, 2000. Available from World Wide Web: citeseer.ist.psu.edu/article/ murata99hedge.html.
A. Nierman and H.V. Jagadish. Evaluating structural similarity in xml documents. In WebDB, pages 61-66, 2002.
H. Qi and D. Sun. A quadratically convergent newton method for computing the nearest correlation matrix. SIAM J. Matrix Anal. Appl., 28(2):360-385, 2006.
R.T. Rockafellar. Conjugate duality and optimization. Society for Industrial and Applied Mathematics, Philadelphia, 1974.
F. Sebastiani. Machine learning in automated text categorization. ACM Comput. Surv., 34(1):1-47, 2002.
J.F. Sturm. Using SeDuMi 1.02, a MATLAB toolbox for optimization over symmetric cones. Optimization Methods and Software, 11-12:625-653, 1999. Available from World Wide Web: citeseer.ist.psu.edu/sturm99using. html. Special issue on Interior Point Methods (CD supplement with software).
B. Schölkopf, K. Tsuda, and J.P. Vert. Kernel Methods in Computational Biology. MIT Press, Cambridge, MA, 2004.
D. Shasha and K. Zhang. Approximate tree pattern matching. In Pattern Matching Algorithms, pages 341-371. Oxford University Press, New York, 1997. Available from World Wide Web: citeseer.ist.psu.edu/shasha95approximate.html.
J.-T. Sun, B.-Y. Zhang, Z. Chen, Y.-C. Lu, C.-Y. Shi, and W.-Y. Ma. Ge-cko: A method to optimize composite kernels for web page classification. In WI ’04: Proceedings of the 2004 IEEE/WIC/ACM International Conference on Web Intelligence, pages 299-305, 2004.
M.J. Zaki and C.C. Aggarwal. Xrules: an effective structural classifier for xml data. In KDD ’03: Proceedings of the Ninth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 316-325. ACM Press, New York, 2003.
T. Zhang and V.S. Iyengar. Recommender systems using linear classifiers. J. Mach. Learn. Res., 2:313-334, 2002.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2008 Springer-Verlag London Limited
About this chapter
Cite this chapter
Xia, Z., Xing, G., Qi, H., Li, Q. (2008). Applications of Semidefinite Programming in XML Document Classification. In: Berry, M.W., Castellanos, M. (eds) Survey of Text Mining II. Springer, London. https://doi.org/10.1007/978-1-84800-046-9_7
Download citation
DOI: https://doi.org/10.1007/978-1-84800-046-9_7
Publisher Name: Springer, London
Print ISBN: 978-1-84800-045-2
Online ISBN: 978-1-84800-046-9
eBook Packages: Computer ScienceComputer Science (R0)