A Comparison of Classification Methods Applied to Legal Text Data

Araújo, Diógenes Carlos; Lima, Alexandre; Lima, João Pedro; Costa, José Alfredo

doi:10.1007/978-3-030-86230-5_6

Diógenes Carlos Araújo¹³,
Alexandre Lima¹³,
João Pedro Lima¹³ &
…
José Alfredo Costa¹³

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 12981))

Included in the following conference series:

EPIA Conference on Artificial Intelligence

1892 Accesses
4 Citations

Abstract

The Brazilian judicial system is currently one of the largest in the world with more than 77 million legal cases awaiting decision. The use of machine learning could help to improve celerity through text classification. This paper aims to compare some supervised machine processing techniques. TF-IDF text representation was used. The paper discusses comparison among classification methods such as Random Forest, Adaboost using decision trees, Support Vector Machine, K-Nearest Neighbors, Naive Bayes and Multilayer Perceptron. The data set consists of 30,000 documents distributed among ten classes, which represent possible procedural movements resulting from court decisions. The classification results are quite satisfactory since some techniques were able to overcome a f1-score of 90%.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 109.00; Price excludes VAT (USA)

Softcover Book: USD 139.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

References

Bertalan, V.G.F., Ruiz, E.E.S.: Predicting judicial outcomes in the Brazilian legal system using textual features. In: DHandNLP@ PROPOR, pp. 22–32 (2020)
Google Scholar
Bhatt, G.: The Haves and Have-nots (2021)
Google Scholar
Bibal, A., et al.: Impact of legal requirements on explainability in machine learning. arXiv preprint arXiv:2007.05479 (2020)
Breiman, L.: Random forests. Mach. Learn. 45(1), 5–32 (2001)
Article Google Scholar
Calvo-Zaragoza, J., et al.: Improving kNN multi-label classification in prototype selection scenarios. Pattern Recogn. 48(5), 1608–1622 (2015)
Article Google Scholar
Chalkidis, I., et al.: Extreme multi-label legal text classification: a case study in EU legislation. arXiv preprint arXiv:1905.10892 (2019)
CNJ - National Council of Justice: SINAPSES (2019)
Google Scholar
CNJ - National Council of Justice: Justiça em Números: ano-base 2019 (2020)
Google Scholar
Cortes, C., Vapnik, V.: Support-vector networks. Mach. Learn. 20(3), 273–297 (1995)
MATH Google Scholar
Howe, J., et al.: Legal area classification: a comparative study of text classifiers on Singapore Supreme Court judgments. arXiv preprint arXiv:1904.06470 (2019)
Jones, K.S.: A statistical interpretation of term specificity and its application in retrieval. J. Documentation 60, 493–502 (1972)
Article Google Scholar
Lippi, M., et al.: CLAUDETTE: an automated detector of potentially unfair clauses in online terms of service. Artif. Intell. Law 27(2), 117–139 (2019)
Article Google Scholar
Maia, M., Junquilho, T.: Projeto victor: Perspectivas de aplicação da inteligência artificial ao direito. Revista de Direitos e Garantias Fundamentais 19(3), 219–237 (2018)
Google Scholar
Fernandes de Mello, R., Antonelli Ponti, M.: A brief introduction on Kernels. In: Machine Learning, pp. 325–362. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-94989-5_6
Neill, J.O., et al.: Classifying sentential modality in legal language: a use case in financial regulations, acts and directives. In: Proceedings of the ICAIL, pp. 159–168 (2017)
Google Scholar
Quinlan, J.R., et al.: Bagging, boosting, and c4.5. In: AAAI/IAAI, vol. 1, pp. 725–730 (1996)
Google Scholar
Rish, I., et al.: An empirical study of the naive Bayes classifier. In: IJCAI 2001 Workshop on Empirical Methods in Artificial Intelligence, vol. 3, pp. 41–46 (2001)
Google Scholar
Russel, S., Norvig, P.: Artificial Intelligence: A Modern Approach. Prentice Hall (2010)
Google Scholar
Srivastava, N., et al.: Dropout: a simple way to prevent neural networks from overfitting. J. Mach. Learn. Res. 15(1), 1929–1958 (2014)
MathSciNet MATH Google Scholar

Download references

Author information

Authors and Affiliations

Universidade Federal do Rio Grande do Norte, Avenue Senador Salgado Filho 3000, Natal, Brazil
Diógenes Carlos Araújo, Alexandre Lima, João Pedro Lima & José Alfredo Costa

Authors

Diógenes Carlos Araújo
View author publications
You can also search for this author in PubMed Google Scholar
Alexandre Lima
View author publications
You can also search for this author in PubMed Google Scholar
João Pedro Lima
View author publications
You can also search for this author in PubMed Google Scholar
José Alfredo Costa
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

ISEP/GECAD, Polytechnic Institute of Porto, Porto, Portugal
Goreti Marreiros
IST/INESC-ID, University of Lisbon, Porto Salvo, Portugal
Francisco S. Melo
DETI/IEETA, University of Aveiro, Aveiro, Portugal
Nuno Lau
FEUP/LIACC, University of Porto, Porto, Portugal
Henrique Lopes Cardoso
FEUP/LIACC, University of Porto, Porto, Portugal
Luís Paulo Reis

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Araújo, D.C., Lima, A., Lima, J.P., Costa, J.A. (2021). A Comparison of Classification Methods Applied to Legal Text Data. In: Marreiros, G., Melo, F.S., Lau, N., Lopes Cardoso, H., Reis, L.P. (eds) Progress in Artificial Intelligence. EPIA 2021. Lecture Notes in Computer Science(), vol 12981. Springer, Cham. https://doi.org/10.1007/978-3-030-86230-5_6

Download citation

DOI: https://doi.org/10.1007/978-3-030-86230-5_6
Published: 03 September 2021
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-86229-9
Online ISBN: 978-3-030-86230-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics