Abstract
The Brazilian judicial system is currently one of the largest in the world with more than 77 million legal cases awaiting decision. The use of machine learning could help to improve celerity through text classification. This paper aims to compare some supervised machine processing techniques. TF-IDF text representation was used. The paper discusses comparison among classification methods such as Random Forest, Adaboost using decision trees, Support Vector Machine, K-Nearest Neighbors, Naive Bayes and Multilayer Perceptron. The data set consists of 30,000 documents distributed among ten classes, which represent possible procedural movements resulting from court decisions. The classification results are quite satisfactory since some techniques were able to overcome a f1-score of 90%.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
- 2.
- 3.
References
Bertalan, V.G.F., Ruiz, E.E.S.: Predicting judicial outcomes in the Brazilian legal system using textual features. In: DHandNLP@ PROPOR, pp. 22–32 (2020)
Bhatt, G.: The Haves and Have-nots (2021)
Bibal, A., et al.: Impact of legal requirements on explainability in machine learning. arXiv preprint arXiv:2007.05479 (2020)
Breiman, L.: Random forests. Mach. Learn. 45(1), 5–32 (2001)
Calvo-Zaragoza, J., et al.: Improving kNN multi-label classification in prototype selection scenarios. Pattern Recogn. 48(5), 1608–1622 (2015)
Chalkidis, I., et al.: Extreme multi-label legal text classification: a case study in EU legislation. arXiv preprint arXiv:1905.10892 (2019)
CNJ - National Council of Justice: SINAPSES (2019)
CNJ - National Council of Justice: Justiça em Números: ano-base 2019 (2020)
Cortes, C., Vapnik, V.: Support-vector networks. Mach. Learn. 20(3), 273–297 (1995)
Howe, J., et al.: Legal area classification: a comparative study of text classifiers on Singapore Supreme Court judgments. arXiv preprint arXiv:1904.06470 (2019)
Jones, K.S.: A statistical interpretation of term specificity and its application in retrieval. J. Documentation 60, 493–502 (1972)
Lippi, M., et al.: CLAUDETTE: an automated detector of potentially unfair clauses in online terms of service. Artif. Intell. Law 27(2), 117–139 (2019)
Maia, M., Junquilho, T.: Projeto victor: Perspectivas de aplicação da inteligência artificial ao direito. Revista de Direitos e Garantias Fundamentais 19(3), 219–237 (2018)
Fernandes de Mello, R., Antonelli Ponti, M.: A brief introduction on Kernels. In: Machine Learning, pp. 325–362. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-94989-5_6
Neill, J.O., et al.: Classifying sentential modality in legal language: a use case in financial regulations, acts and directives. In: Proceedings of the ICAIL, pp. 159–168 (2017)
Quinlan, J.R., et al.: Bagging, boosting, and c4.5. In: AAAI/IAAI, vol. 1, pp. 725–730 (1996)
Rish, I., et al.: An empirical study of the naive Bayes classifier. In: IJCAI 2001 Workshop on Empirical Methods in Artificial Intelligence, vol. 3, pp. 41–46 (2001)
Russel, S., Norvig, P.: Artificial Intelligence: A Modern Approach. Prentice Hall (2010)
Srivastava, N., et al.: Dropout: a simple way to prevent neural networks from overfitting. J. Mach. Learn. Res. 15(1), 1929–1958 (2014)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 Springer Nature Switzerland AG
About this paper
Cite this paper
Araújo, D.C., Lima, A., Lima, J.P., Costa, J.A. (2021). A Comparison of Classification Methods Applied to Legal Text Data. In: Marreiros, G., Melo, F.S., Lau, N., Lopes Cardoso, H., Reis, L.P. (eds) Progress in Artificial Intelligence. EPIA 2021. Lecture Notes in Computer Science(), vol 12981. Springer, Cham. https://doi.org/10.1007/978-3-030-86230-5_6
Download citation
DOI: https://doi.org/10.1007/978-3-030-86230-5_6
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-86229-9
Online ISBN: 978-3-030-86230-5
eBook Packages: Computer ScienceComputer Science (R0)