Skip to main content

Applications of Semidefinite Programming in XML Document Classification

  • Chapter
Survey of Text Mining II
  • 2143 Accesses

Extensible Markup Language (XML) has been used as a standard format for data representation over the Internet. An XML document is usually organized by a set of textual data according to a predefined logical structure. It has been shown that storing documents having similar structures together can reduce the fragmentation problem and improve query efficiency. Unlike the flat text document, the XML document has no vectorial representation, which is required in most existing classification algorithms.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 54.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  • C.J.C. Burges. A tutorial on support vector machines for pattern recognition. Data Min. Knowl. Discov., 2(2):121-167, 1998.

    Article  Google Scholar 

  • C.J.C. Burges. Geometric methods for feature extraction and dimensional reduction — a guided tour. In The Data Mining and Knowledge Discovery Handbook, pages 59-92. Springer, New York, 2005.

    Chapter  Google Scholar 

  • S. Boyd and L. Xiao. Least-squares covariance matrix adjustment. SIAM Journal on Matrix Analysis and Applications, 27(2):532-546, 2005. Available from World Wide Web: http://link.aip.org/link/?SML/27/532/1.

    Google Scholar 

  • T.F. Cox and M.A.A. Cox. Multidimensional Scaling. Monographs on Statistics and Applied Probability. Chapman & Hall/CRC, Boca Raton, 2nd edition, 2001.

    Google Scholar 

  • W. Chen. New algorithm for ordered tree-to-tree correction problem. J. Algorithms, 40(2):135-158, 2001.

    Article  MATH  MathSciNet  Google Scholar 

  • C.-C. Chang and C.-J. Lin. LIBSVM: a library for support vector machines, 2001. Software available at http://www.csie.ntu.edu.tw/∼cjlin/ libsvm.

  • F.H. Clark. Optimization and Nonsmooth Analysis. John Wiley and Sons, New York, 1983.

    Google Scholar 

  • E.R. Canfield and G. Xing. Approximate xml document matching. In SAC ’05: Proceedings of the 2005 ACM Symposium on Applied Computing, pages 787-788. ACM Press, New Work, 2005.

    Chapter  Google Scholar 

  • L. Denoyer and P. Gallinari. XML Document Mining Challenge. Database available at http://xmlmining.lip6.fr/.

  • L. Denoyer and P. Gallinari. Bayesian network model for semistructured document classification. Inf. Process. Manage., 40(5):807-827, 2004.

    Article  Google Scholar 

  • M. Garofalakis, A. Gionis, R. Rastogi, S. Seshadri, and K. Shim. Xtract: a system for extracting document type descriptors from xml documents. In SIGMOD ’00: Proceedings of the 2000 ACM SIGMOD International Conference on Management of Data, pages 165-176. ACM Press, New York, 2000.

    Chapter  Google Scholar 

  • S. Guha, H.V. Jagadish, N. Koudas, D. Srivastava, and T. Yu. Integrating xml data sources using approximate joins. ACM Trans. Database Syst., 31(1):161-207, 2006.

    Article  Google Scholar 

  • N.J. Higham. Computing the nearest correlation matrix — a problem from finance. IMA Journal of Numerical Analysis, 22(3):329-343, 2002.

    Article  MATH  MathSciNet  Google Scholar 

  • T. Joachims. Text categorization with suport vector machines: learning with many relevant features. In Claire N édellec and C éline Rouveirol, editors, Proceedings of ECML-98, 10th European Conference on Machine Learning, volume 1398 of Lecture Notes in Computer Science, pages 137-142. Springer, New York, 1998. Available from World Wide Web: citeseer.ist.psu.edu/ joachims97text.html.

    Google Scholar 

  • G.R.G. Lanckriet, N. Cristianini, P. Bartlett, L. El Ghaoui, and M.I. Jordan. Learning the kernel matrix with semidefinite programming. J. Mach. Learn. Res., 5:27-72,2004.

    MathSciNet  Google Scholar 

  • W. Lian, D.W. Cheung, N. Mamoulis, and S.-M. Yiu. An efficient and scalable algorithm for clustering xml documents by structure. IEEE Transactions on Knowledge and Data Engineering, 16(1):82-96, 2004.

    Article  Google Scholar 

  • J. Malick. A dual approach to semidefinite least-squares problems. SIAM J. Matrix Anal. Appl., 26(1):272-284, 2005.

    Article  MathSciNet  Google Scholar 

  • M. Murata. Hedge automata: a formal model for XML schemata. Web page, 2000. Available from World Wide Web: citeseer.ist.psu.edu/article/ murata99hedge.html.

    Google Scholar 

  • A. Nierman and H.V. Jagadish. Evaluating structural similarity in xml documents. In WebDB, pages 61-66, 2002.

    Google Scholar 

  • H. Qi and D. Sun. A quadratically convergent newton method for computing the nearest correlation matrix. SIAM J. Matrix Anal. Appl., 28(2):360-385, 2006.

    Article  MATH  MathSciNet  Google Scholar 

  • R.T. Rockafellar. Conjugate duality and optimization. Society for Industrial and Applied Mathematics, Philadelphia, 1974.

    MATH  Google Scholar 

  • F. Sebastiani. Machine learning in automated text categorization. ACM Comput. Surv., 34(1):1-47, 2002.

    Article  Google Scholar 

  • J.F. Sturm. Using SeDuMi 1.02, a MATLAB toolbox for optimization over symmetric cones. Optimization Methods and Software, 11-12:625-653, 1999. Available from World Wide Web: citeseer.ist.psu.edu/sturm99using. html. Special issue on Interior Point Methods (CD supplement with software).

    Google Scholar 

  • B. Schölkopf, K. Tsuda, and J.P. Vert. Kernel Methods in Computational Biology. MIT Press, Cambridge, MA, 2004.

    Google Scholar 

  • D. Shasha and K. Zhang. Approximate tree pattern matching. In Pattern Matching Algorithms, pages 341-371. Oxford University Press, New York, 1997. Available from World Wide Web: citeseer.ist.psu.edu/shasha95approximate.html.

    Google Scholar 

  • J.-T. Sun, B.-Y. Zhang, Z. Chen, Y.-C. Lu, C.-Y. Shi, and W.-Y. Ma. Ge-cko: A method to optimize composite kernels for web page classification. In WI ’04: Proceedings of the 2004 IEEE/WIC/ACM International Conference on Web Intelligence, pages 299-305, 2004.

    Google Scholar 

  • M.J. Zaki and C.C. Aggarwal. Xrules: an effective structural classifier for xml data. In KDD ’03: Proceedings of the Ninth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 316-325. ACM Press, New York, 2003.

    Chapter  Google Scholar 

  • T. Zhang and V.S. Iyengar. Recommender systems using linear classifiers. J. Mach. Learn. Res., 2:313-334, 2002.

    Article  MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2008 Springer-Verlag London Limited

About this chapter

Cite this chapter

Xia, Z., Xing, G., Qi, H., Li, Q. (2008). Applications of Semidefinite Programming in XML Document Classification. In: Berry, M.W., Castellanos, M. (eds) Survey of Text Mining II. Springer, London. https://doi.org/10.1007/978-1-84800-046-9_7

Download citation

  • DOI: https://doi.org/10.1007/978-1-84800-046-9_7

  • Publisher Name: Springer, London

  • Print ISBN: 978-1-84800-045-2

  • Online ISBN: 978-1-84800-046-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics