Automatically Classify Chinese Judgment Documents Utilizing Machine Learning Algorithms

Lei, Miaomiao; Ge, Jidong; Li, Zhongjin; Li, Chuanyi; Zhou, Yemao; Zhou, Xiaoyu; Luo, Bin

doi:10.1007/978-3-319-55705-2_1

Miaomiao Lei¹⁷,
Jidong Ge¹⁷,
Zhongjin Li¹⁷,
Chuanyi Li¹⁷,
Yemao Zhou¹⁷,
Xiaoyu Zhou¹⁷ &
…
Bin Luo¹⁷

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 10179))

Included in the following conference series:

International Conference on Database Systems for Advanced Applications

1668 Accesses
5 Citations

Abstract

In law, a judgment is a decision by a court that resolves a controversy and determines the rights and liabilities of parties in a legal action or proceeding. In 2013, China Judgments Online system was launched officially for record keeping and notification, up to now, over 23 million electronic judgment documents are recorded. The huge amount of judgment documents has witnessed the improvement of judicial justice and openness. Document categorization becomes increasingly important for judgments indexing and further analysis. However, it is almost impossible to categorize them manually due to their large volume and rapid growth. In this paper, we propose a machine learning approach to automatically classify Chinese judgment documents using machine learning algorithms including Naive Bayes (NB), Decision Tree (DT), Random Forest (RF) and Support Vector Machine (SVM). A judgment document is represented as vector space model (VSM) using TF-IDF after words segmentation. To improve performance, we construct a set of judicial stop words. Besides, as TF-IDF generates a high dimensional feature vector, which leads to an extremely high time complexity, we utilize three dimensional reduction methods. Based on 6735 pieces of judgment documents, extensive experiments demonstrate the effectiveness and high classification performance of our proposed method.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Aggarwal, C.C., Zhai, C.X.: An introduction to text mining. In: Mining Text Data, pp. 1–10 (2012)
Google Scholar
Strzalkowski, T.: Document representation in natural language text retrieval. In: Proceedings of the Workshop on Human Language Technology, pp. 364–369 (1994)
Google Scholar
Jiang, S., Lewris, J., Voltmer, M.: Integrating rich document representations for text classification. In: Systems and Information Engineering Design Symposium (SIEDS) (2016)
Google Scholar
Liu, Y., Song, W., Liu, L.: Document representation based on semantic smoothed topic model. In: International Conference on Software Engineering, Artificial Intelligence, Networking and Parallel/Distributed Computing (SNPD) (2016)
Google Scholar
Yang, S., Guo, J.: A novel approach for business document representation and processing without semantic ambiguity in e-commerce. In: 6th IEEE Conference on Software Engineering and Service Science (ICSESS) (2015)
Google Scholar
Arguello, J., Elsas, J.L., Callan, J., Carbonell, J.G.: Document representation and query expansion models for blog recommendation. In: Proceedings of the 2nd International Conference on Weblogs and Social Media (ICWSM) (2008)
Google Scholar
Berry, M.: Large-scale sparse singular value computations. Int. J. Supercomput. Appl. 6(1), 13–49 (1992)
Google Scholar
Blei, D., Lafferty, J.: Dynamic topic models. In: ICML, pp. 113–120 (2006)
Google Scholar
Hofmann, T.: Probabilistic latent semantic analysis. In: UAI, p. 21 (1999)
Google Scholar
Blei, D., Ng, A., Jordan, M.: Latent Dirichlet allocation. J. Mach. Learn. 3, 993–1022 (2003)
MATH Google Scholar
Apte, C., Damerau, F., Weiss, S.: Automated learning of decision rules for text categorization. ACM Trans. Inf. Syst. 12(3), 233–251 (1994)
Article Google Scholar
Baker, L., McCallum, A.: Distributional clustering of words for text classification. In: ACM SIGIR Conference (1998)
Google Scholar
Drucker, H., Wu, D., Vapnik, V.: Support vector machines for spam categorization. IEEE Trans. Neural Netw. 10(5), 1048–1054 (1999)
Article Google Scholar
Ng, A.Y., Jordan, M.I., On discriminative vs. generative classifiers: a comparison of logistic regression and Naive Bayes. In: NIPS, pp. 841–848 (2001)
Google Scholar
Sun, J.-T., Chen, Z., Zeng, H.-J., Lu, Y., Shi, C.-Y., Ma, W.-Y.: Supervised latent semantic indexing for document categorization. In: ICDM Conference (2004)
Google Scholar
Zhou, Z.: Machine Learning (2015)
Google Scholar
Che, W., Li, Z., Liu, T.: LTP: a Chinese language technology platform. In: Proceedings of the COLING: Demonstrations, Beijing, China, pp. 13–16, August 2010
Google Scholar

Download references

Acknowledgement

This work was supported by the Key Program of Research and Development of China (2016YFC0800803), the National Natural Science Foundation, China (No. 61572162, 61572251), the Fundamental Research Funds for the Central Universities.

Author information

Authors and Affiliations

State Key Laboratory for Novel Software Technology, Software Institute, Nanjing University, Nanjing, 210093, Jiangsu, China
Miaomiao Lei, Jidong Ge, Zhongjin Li, Chuanyi Li, Yemao Zhou, Xiaoyu Zhou & Bin Luo

Authors

Miaomiao Lei
View author publications
You can also search for this author in PubMed Google Scholar
Jidong Ge
View author publications
You can also search for this author in PubMed Google Scholar
Zhongjin Li
View author publications
You can also search for this author in PubMed Google Scholar
Chuanyi Li
View author publications
You can also search for this author in PubMed Google Scholar
Yemao Zhou
View author publications
You can also search for this author in PubMed Google Scholar
Xiaoyu Zhou
View author publications
You can also search for this author in PubMed Google Scholar
Bin Luo
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jidong Ge .

Editor information

Editors and Affiliations

Royal Melbourne Institute of Technology , Melbourne, Australia
Zhifeng Bao
Northwestern University , Evanston, Illinois, USA
Goce Trajcevski
University of New South Wales , Sydney, New South Wales, Australia
Lijun Chang
The University of Queensland , Brisbane, Queensland, Australia
Wen Hua

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Lei, M. et al. (2017). Automatically Classify Chinese Judgment Documents Utilizing Machine Learning Algorithms. In: Bao, Z., Trajcevski, G., Chang, L., Hua, W. (eds) Database Systems for Advanced Applications. DASFAA 2017. Lecture Notes in Computer Science(), vol 10179. Springer, Cham. https://doi.org/10.1007/978-3-319-55705-2_1

Download citation

DOI: https://doi.org/10.1007/978-3-319-55705-2_1
Published: 22 March 2017
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-55704-5
Online ISBN: 978-3-319-55705-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics