Applications of Semidefinite Programming in XML Document Classification

Xia, Zhonghang; Xing, Guangming; Qi, Houduo; Li, Qi

doi:10.1007/978-1-84800-046-9_7

Zhonghang Xia³,
Guangming Xing³,
Houduo Qi⁴ &
…
Qi Li³

2143 Accesses

Extensible Markup Language (XML) has been used as a standard format for data representation over the Internet. An XML document is usually organized by a set of textual data according to a predefined logical structure. It has been shown that storing documents having similar structures together can reduce the fragmentation problem and improve query efficiency. Unlike the flat text document, the XML document has no vectorial representation, which is required in most existing classification algorithms.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Hardcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

C.J.C. Burges. A tutorial on support vector machines for pattern recognition. Data Min. Knowl. Discov., 2(2):121-167, 1998.
Article Google Scholar
C.J.C. Burges. Geometric methods for feature extraction and dimensional reduction — a guided tour. In The Data Mining and Knowledge Discovery Handbook, pages 59-92. Springer, New York, 2005.
Chapter Google Scholar
S. Boyd and L. Xiao. Least-squares covariance matrix adjustment. SIAM Journal on Matrix Analysis and Applications, 27(2):532-546, 2005. Available from World Wide Web: http://link.aip.org/link/?SML/27/532/1.
Google Scholar
T.F. Cox and M.A.A. Cox. Multidimensional Scaling. Monographs on Statistics and Applied Probability. Chapman & Hall/CRC, Boca Raton, 2nd edition, 2001.
Google Scholar
W. Chen. New algorithm for ordered tree-to-tree correction problem. J. Algorithms, 40(2):135-158, 2001.
Article MATH MathSciNet Google Scholar
C.-C. Chang and C.-J. Lin. LIBSVM: a library for support vector machines, 2001. Software available at http://www.csie.ntu.edu.tw/∼cjlin/ libsvm.
F.H. Clark. Optimization and Nonsmooth Analysis. John Wiley and Sons, New York, 1983.
Google Scholar
E.R. Canfield and G. Xing. Approximate xml document matching. In SAC ’05: Proceedings of the 2005 ACM Symposium on Applied Computing, pages 787-788. ACM Press, New Work, 2005.
Chapter Google Scholar
L. Denoyer and P. Gallinari. XML Document Mining Challenge. Database available at http://xmlmining.lip6.fr/.
L. Denoyer and P. Gallinari. Bayesian network model for semistructured document classification. Inf. Process. Manage., 40(5):807-827, 2004.
Article Google Scholar
M. Garofalakis, A. Gionis, R. Rastogi, S. Seshadri, and K. Shim. Xtract: a system for extracting document type descriptors from xml documents. In SIGMOD ’00: Proceedings of the 2000 ACM SIGMOD International Conference on Management of Data, pages 165-176. ACM Press, New York, 2000.
Chapter Google Scholar
S. Guha, H.V. Jagadish, N. Koudas, D. Srivastava, and T. Yu. Integrating xml data sources using approximate joins. ACM Trans. Database Syst., 31(1):161-207, 2006.
Article Google Scholar
N.J. Higham. Computing the nearest correlation matrix — a problem from finance. IMA Journal of Numerical Analysis, 22(3):329-343, 2002.
Article MATH MathSciNet Google Scholar
T. Joachims. Text categorization with suport vector machines: learning with many relevant features. In Claire N édellec and C éline Rouveirol, editors, Proceedings of ECML-98, 10th European Conference on Machine Learning, volume 1398 of Lecture Notes in Computer Science, pages 137-142. Springer, New York, 1998. Available from World Wide Web: citeseer.ist.psu.edu/ joachims97text.html.
Google Scholar
G.R.G. Lanckriet, N. Cristianini, P. Bartlett, L. El Ghaoui, and M.I. Jordan. Learning the kernel matrix with semidefinite programming. J. Mach. Learn. Res., 5:27-72,2004.
MathSciNet Google Scholar
W. Lian, D.W. Cheung, N. Mamoulis, and S.-M. Yiu. An efficient and scalable algorithm for clustering xml documents by structure. IEEE Transactions on Knowledge and Data Engineering, 16(1):82-96, 2004.
Article Google Scholar
J. Malick. A dual approach to semidefinite least-squares problems. SIAM J. Matrix Anal. Appl., 26(1):272-284, 2005.
Article MathSciNet Google Scholar
M. Murata. Hedge automata: a formal model for XML schemata. Web page, 2000. Available from World Wide Web: citeseer.ist.psu.edu/article/ murata99hedge.html.
Google Scholar
A. Nierman and H.V. Jagadish. Evaluating structural similarity in xml documents. In WebDB, pages 61-66, 2002.
Google Scholar
H. Qi and D. Sun. A quadratically convergent newton method for computing the nearest correlation matrix. SIAM J. Matrix Anal. Appl., 28(2):360-385, 2006.
Article MATH MathSciNet Google Scholar
R.T. Rockafellar. Conjugate duality and optimization. Society for Industrial and Applied Mathematics, Philadelphia, 1974.
MATH Google Scholar
F. Sebastiani. Machine learning in automated text categorization. ACM Comput. Surv., 34(1):1-47, 2002.
Article Google Scholar
J.F. Sturm. Using SeDuMi 1.02, a MATLAB toolbox for optimization over symmetric cones. Optimization Methods and Software, 11-12:625-653, 1999. Available from World Wide Web: citeseer.ist.psu.edu/sturm99using. html. Special issue on Interior Point Methods (CD supplement with software).
Google Scholar
B. Schölkopf, K. Tsuda, and J.P. Vert. Kernel Methods in Computational Biology. MIT Press, Cambridge, MA, 2004.
Google Scholar
D. Shasha and K. Zhang. Approximate tree pattern matching. In Pattern Matching Algorithms, pages 341-371. Oxford University Press, New York, 1997. Available from World Wide Web: citeseer.ist.psu.edu/shasha95approximate.html.
Google Scholar
J.-T. Sun, B.-Y. Zhang, Z. Chen, Y.-C. Lu, C.-Y. Shi, and W.-Y. Ma. Ge-cko: A method to optimize composite kernels for web page classification. In WI ’04: Proceedings of the 2004 IEEE/WIC/ACM International Conference on Web Intelligence, pages 299-305, 2004.
Google Scholar
M.J. Zaki and C.C. Aggarwal. Xrules: an effective structural classifier for xml data. In KDD ’03: Proceedings of the Ninth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 316-325. ACM Press, New York, 2003.
Chapter Google Scholar
T. Zhang and V.S. Iyengar. Recommender systems using linear classifiers. J. Mach. Learn. Res., 2:313-334, 2002.
Article MATH Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science, Western Kentucky University, 1906 College Heights Boulevard #11076, Bowling Green, KY 42101-1076
Zhonghang Xia, Guangming Xing & Qi Li
Department of Mathematics, University of Southampton, Highfield Southampton, SO17 1BJ, UK
Houduo Qi

Authors

Zhonghang Xia
View author publications
You can also search for this author in PubMed Google Scholar
Guangming Xing
View author publications
You can also search for this author in PubMed Google Scholar
Houduo Qi
View author publications
You can also search for this author in PubMed Google Scholar
Qi Li
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Computer Science, University of Tennessee, USA
Michael W. Berry
Hewlett-Packard Laboratories, Palo Alto, California, USA
Malu Castellanos

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Xia, Z., Xing, G., Qi, H., Li, Q. (2008). Applications of Semidefinite Programming in XML Document Classification. In: Berry, M.W., Castellanos, M. (eds) Survey of Text Mining II. Springer, London. https://doi.org/10.1007/978-1-84800-046-9_7

Download citation

DOI: https://doi.org/10.1007/978-1-84800-046-9_7
Publisher Name: Springer, London
Print ISBN: 978-1-84800-045-2
Online ISBN: 978-1-84800-046-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics