Abstract
In law, a judgment is a decision by a court that resolves a controversy and determines the rights and liabilities of parties in a legal action or proceeding. In 2013, China Judgments Online system was launched officially for record keeping and notification, up to now, over 23 million electronic judgment documents are recorded. The huge amount of judgment documents has witnessed the improvement of judicial justice and openness. Document categorization becomes increasingly important for judgments indexing and further analysis. However, it is almost impossible to categorize them manually due to their large volume and rapid growth. In this paper, we propose a machine learning approach to automatically classify Chinese judgment documents using machine learning algorithms including Naive Bayes (NB), Decision Tree (DT), Random Forest (RF) and Support Vector Machine (SVM). A judgment document is represented as vector space model (VSM) using TF-IDF after words segmentation. To improve performance, we construct a set of judicial stop words. Besides, as TF-IDF generates a high dimensional feature vector, which leads to an extremely high time complexity, we utilize three dimensional reduction methods. Based on 6735 pieces of judgment documents, extensive experiments demonstrate the effectiveness and high classification performance of our proposed method.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Aggarwal, C.C., Zhai, C.X.: An introduction to text mining. In: Mining Text Data, pp. 1–10 (2012)
Strzalkowski, T.: Document representation in natural language text retrieval. In: Proceedings of the Workshop on Human Language Technology, pp. 364–369 (1994)
Jiang, S., Lewris, J., Voltmer, M.: Integrating rich document representations for text classification. In: Systems and Information Engineering Design Symposium (SIEDS) (2016)
Liu, Y., Song, W., Liu, L.: Document representation based on semantic smoothed topic model. In: International Conference on Software Engineering, Artificial Intelligence, Networking and Parallel/Distributed Computing (SNPD) (2016)
Yang, S., Guo, J.: A novel approach for business document representation and processing without semantic ambiguity in e-commerce. In: 6th IEEE Conference on Software Engineering and Service Science (ICSESS) (2015)
Arguello, J., Elsas, J.L., Callan, J., Carbonell, J.G.: Document representation and query expansion models for blog recommendation. In: Proceedings of the 2nd International Conference on Weblogs and Social Media (ICWSM) (2008)
Berry, M.: Large-scale sparse singular value computations. Int. J. Supercomput. Appl. 6(1), 13–49 (1992)
Blei, D., Lafferty, J.: Dynamic topic models. In: ICML, pp. 113–120 (2006)
Hofmann, T.: Probabilistic latent semantic analysis. In: UAI, p. 21 (1999)
Blei, D., Ng, A., Jordan, M.: Latent Dirichlet allocation. J. Mach. Learn. 3, 993–1022 (2003)
Apte, C., Damerau, F., Weiss, S.: Automated learning of decision rules for text categorization. ACM Trans. Inf. Syst. 12(3), 233–251 (1994)
Baker, L., McCallum, A.: Distributional clustering of words for text classification. In: ACM SIGIR Conference (1998)
Drucker, H., Wu, D., Vapnik, V.: Support vector machines for spam categorization. IEEE Trans. Neural Netw. 10(5), 1048–1054 (1999)
Ng, A.Y., Jordan, M.I., On discriminative vs. generative classifiers: a comparison of logistic regression and Naive Bayes. In: NIPS, pp. 841–848 (2001)
Sun, J.-T., Chen, Z., Zeng, H.-J., Lu, Y., Shi, C.-Y., Ma, W.-Y.: Supervised latent semantic indexing for document categorization. In: ICDM Conference (2004)
Zhou, Z.: Machine Learning (2015)
Che, W., Li, Z., Liu, T.: LTP: a Chinese language technology platform. In: Proceedings of the COLING: Demonstrations, Beijing, China, pp. 13–16, August 2010
Acknowledgement
This work was supported by the Key Program of Research and Development of China (2016YFC0800803), the National Natural Science Foundation, China (No. 61572162, 61572251), the Fundamental Research Funds for the Central Universities.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer International Publishing AG
About this paper
Cite this paper
Lei, M. et al. (2017). Automatically Classify Chinese Judgment Documents Utilizing Machine Learning Algorithms. In: Bao, Z., Trajcevski, G., Chang, L., Hua, W. (eds) Database Systems for Advanced Applications. DASFAA 2017. Lecture Notes in Computer Science(), vol 10179. Springer, Cham. https://doi.org/10.1007/978-3-319-55705-2_1
Download citation
DOI: https://doi.org/10.1007/978-3-319-55705-2_1
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-55704-5
Online ISBN: 978-3-319-55705-2
eBook Packages: Computer ScienceComputer Science (R0)