Skip to main content

Automatically Classify Chinese Judgment Documents Utilizing Machine Learning Algorithms

  • Conference paper
  • First Online:
Database Systems for Advanced Applications (DASFAA 2017)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 10179))

Included in the following conference series:

Abstract

In law, a judgment is a decision by a court that resolves a controversy and determines the rights and liabilities of parties in a legal action or proceeding. In 2013, China Judgments Online system was launched officially for record keeping and notification, up to now, over 23 million electronic judgment documents are recorded. The huge amount of judgment documents has witnessed the improvement of judicial justice and openness. Document categorization becomes increasingly important for judgments indexing and further analysis. However, it is almost impossible to categorize them manually due to their large volume and rapid growth. In this paper, we propose a machine learning approach to automatically classify Chinese judgment documents using machine learning algorithms including Naive Bayes (NB), Decision Tree (DT), Random Forest (RF) and Support Vector Machine (SVM). A judgment document is represented as vector space model (VSM) using TF-IDF after words segmentation. To improve performance, we construct a set of judicial stop words. Besides, as TF-IDF generates a high dimensional feature vector, which leads to an extremely high time complexity, we utilize three dimensional reduction methods. Based on 6735 pieces of judgment documents, extensive experiments demonstrate the effectiveness and high classification performance of our proposed method.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Aggarwal, C.C., Zhai, C.X.: An introduction to text mining. In: Mining Text Data, pp. 1–10 (2012)

    Google Scholar 

  2. Strzalkowski, T.: Document representation in natural language text retrieval. In: Proceedings of the Workshop on Human Language Technology, pp. 364–369 (1994)

    Google Scholar 

  3. Jiang, S., Lewris, J., Voltmer, M.: Integrating rich document representations for text classification. In: Systems and Information Engineering Design Symposium (SIEDS) (2016)

    Google Scholar 

  4. Liu, Y., Song, W., Liu, L.: Document representation based on semantic smoothed topic model. In: International Conference on Software Engineering, Artificial Intelligence, Networking and Parallel/Distributed Computing (SNPD) (2016)

    Google Scholar 

  5. Yang, S., Guo, J.: A novel approach for business document representation and processing without semantic ambiguity in e-commerce. In: 6th IEEE Conference on Software Engineering and Service Science (ICSESS) (2015)

    Google Scholar 

  6. Arguello, J., Elsas, J.L., Callan, J., Carbonell, J.G.: Document representation and query expansion models for blog recommendation. In: Proceedings of the 2nd International Conference on Weblogs and Social Media (ICWSM) (2008)

    Google Scholar 

  7. Berry, M.: Large-scale sparse singular value computations. Int. J. Supercomput. Appl. 6(1), 13–49 (1992)

    Google Scholar 

  8. Blei, D., Lafferty, J.: Dynamic topic models. In: ICML, pp. 113–120 (2006)

    Google Scholar 

  9. Hofmann, T.: Probabilistic latent semantic analysis. In: UAI, p. 21 (1999)

    Google Scholar 

  10. Blei, D., Ng, A., Jordan, M.: Latent Dirichlet allocation. J. Mach. Learn. 3, 993–1022 (2003)

    MATH  Google Scholar 

  11. Apte, C., Damerau, F., Weiss, S.: Automated learning of decision rules for text categorization. ACM Trans. Inf. Syst. 12(3), 233–251 (1994)

    Article  Google Scholar 

  12. Baker, L., McCallum, A.: Distributional clustering of words for text classification. In: ACM SIGIR Conference (1998)

    Google Scholar 

  13. Drucker, H., Wu, D., Vapnik, V.: Support vector machines for spam categorization. IEEE Trans. Neural Netw. 10(5), 1048–1054 (1999)

    Article  Google Scholar 

  14. Ng, A.Y., Jordan, M.I., On discriminative vs. generative classifiers: a comparison of logistic regression and Naive Bayes. In: NIPS, pp. 841–848 (2001)

    Google Scholar 

  15. Sun, J.-T., Chen, Z., Zeng, H.-J., Lu, Y., Shi, C.-Y., Ma, W.-Y.: Supervised latent semantic indexing for document categorization. In: ICDM Conference (2004)

    Google Scholar 

  16. Zhou, Z.: Machine Learning (2015)

    Google Scholar 

  17. Che, W., Li, Z., Liu, T.: LTP: a Chinese language technology platform. In: Proceedings of the COLING: Demonstrations, Beijing, China, pp. 13–16, August 2010

    Google Scholar 

Download references

Acknowledgement

This work was supported by the Key Program of Research and Development of China (2016YFC0800803), the National Natural Science Foundation, China (No. 61572162, 61572251), the Fundamental Research Funds for the Central Universities.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jidong Ge .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer International Publishing AG

About this paper

Cite this paper

Lei, M. et al. (2017). Automatically Classify Chinese Judgment Documents Utilizing Machine Learning Algorithms. In: Bao, Z., Trajcevski, G., Chang, L., Hua, W. (eds) Database Systems for Advanced Applications. DASFAA 2017. Lecture Notes in Computer Science(), vol 10179. Springer, Cham. https://doi.org/10.1007/978-3-319-55705-2_1

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-55705-2_1

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-55704-5

  • Online ISBN: 978-3-319-55705-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics