The analysis of judicial processes is an expensive task, requiring a long time of judge and advisors, either to make decisions or to classify according to the current jurisprudence. However, this process is repetitive and extracting the semantics of this corpus can be a step to support this process. The purpose of this research is to develop a methodology able of automatically generating classifications of legal documents, making use of techniques of natural language processing. Firstly, we collected 430,000 Brazilian labor court judgments from 2006 to 2018. Secondly, we propose the use of word embedding techniques for data representation. Thirdly, we use clustering techniques to semantically group the similar judicial decisions. Fourth, the clusters are used to create artificial labels for each document. Finally, we use classification techniques to produce models able to capture the semantics of judicial text. Results show a promise towards capturing the semantic context of legal texts and thus, this methodology may be used as support for the Brazilian decision-making process.
References
Abadi, M., Barham, P., Chen, J., Chen, Z., Davis, A., Dean, J., Devin, M., Ghemawat, S., Irving, G., Isard,
Breiman, L. Random forests. Machine learning 45 (1): 5–32, 2001.
Criminisi, A., Shotton, J., Konukoglu, E., et al. Decision forests: A unified framework for classification, regression, density estimation, manifold learning and semi-supervised learning. Foundations and Trends R in Computer Graphics and Vision 7 (2–3): 81–227, 2012.
Devlin, J., Chang, M.-W., Lee, K., and Toutanova, K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint ArXiv:1810.04805 , 2018.
Fersini, E., Messina, E., Archetti, F., and Cislaghi, M. Semantics and machine learning: A new generation of court management systems. In International Joint Conference on Knowledge Discovery, Knowledge Engineering, and Knowledge Management. Springer, pp. 382–398, 2010.
Gardner, M. W. and Dorling, S. Artificial neural networks (the multilayer perceptron)—a review of applications in the atmospheric sciences. Atmospheric environment 32 (14-15): 2627–2636, 1998.
Gers, F. A., Schmidhuber, J., and Cummins, F. Learning to forget: Continual prediction with lstm, 1999.
Junior, E. S., Rotta, M., Vieira, P., da Silva, E. R. G., Rover, A. J., and Sell, D. Modelagem de sistema baseado em conhecimento em um tribunal de justiça utilizando commonkads. Revista Democracia Digital e Governo Eletrônico 2 (7), 2012.
Júnior, M. d. S. B. Proposta de modelo RBC para a recuperação inteligente de jurisprudência na Justiça Federal. Ph.D. thesis, Universidade Federal De Santa Catarina, Centro Tecnológico, 2001.
Justiça em Números. (2018). Conselho nacional de justiça.
Justiça em Números. (2019). Conselho nacional de justiça.
Justiça em Números. (2020). Conselho nacional de justiça.
Kim, Y. Convolutional neural networks for sentence classification. arXiv preprint arXiv:1408.5882 , 2014.
Kruskal, W. H. and Wallis, W. A. Use of ranks in one-criterion variance analysis. Journal of the American statistical Association 47 (260): 583–621, 1952.
Le, Q. and Mikolov, T. Distributed representations of sentences and documents. In International conference on machine learning. pp. 1188–1196, 2014.
Lu, Q., Conrad, J. G., Al-Kofahi, K., and Keenan, W. Legal document clustering with built-in topic segmentation. In Proceedings of the 20th ACM international conference on Information and knowledge management. ACM, pp. 383–392, 2011.
M., et al. Tensorflow: A system for large-scale machine learning. In 12th fUSENIXg Symposium on Operating Systems Design and Implementation (fOSDIg 16). pp. 265–283, 2016.
MacQueen, J. et al. Some methods for classification and analysis of multivariate observations. In Proceedings of the fifth Berkeley symposium on mathematical statistics and probability. Vol. 1. Oakland, CA, USA, pp. 281–297, 1967.
Maia Filho, M. S. and Junquilho, T. A. Projeto victor: perspectivas de aplicação da inteligência artificial ao direito. Revista de Direitos e Garantias Fundamentais 19 (3): 218–237, 2018.
McCarty, L. T. Deep semantic interpretations of legal texts. In Proceedings of the 11th international conference on Artificial intelligence and law. ACM, pp. 217–224, 2007.
Mikolov, T., Chen, K., Corrado, G., and Dean, J. Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781 , 2013.
Rajpurkar, P., Zhang, J., Lopyrev, K., and Liang, P. Squad: 100,000+ questions for machine comprehension of text. arXiv preprint arXiv:1606.05250 , 2016.
Rousseeuw, P. J. Silhouettes: a graphical aid to the interpretation and validation of cluster analysis. Journal of computational and applied mathematics vol. 20, pp. 53–65, 1987.
Simanjuntak, D. A., Ipung, H. P., Nugroho, A. S., et al. Text classification techniques used to faciliate cyber terrorism investigation. In 2010 Second International Conference on Advances in Computing, Control, and Telecommunication Technologies. IEEE, pp. 198–200, 2010.
Tribunal de Justiça de Rondônia. https://www.tjro.jus.br/, 2019a. Acessado: 21 maio 2019.
Tribunal Superior Eleitoral. https://www.tse.jus.br/, 2019b. Acessado: 15 fev. 2019.
Vapnik, V. The nature of statistical learning theory. Springer science & business media, 2013.
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, Ł., and Polosukhin, I. Attention is all you need. In Advances in neural information processing systems. pp. 5998–6008, 2017.
Wagh, R. S. Knowledge discovery from legal documents dataset using text mining techniques. International Journal of Computer Applications 66 (23), 2013.
Wagstaff, K., Cardie, C., Rogers, S., Schrödl, S., et al. Constrained k-means clustering with background knowledge. In Icml. Vol. 1. pp. 577–584, 2001.
Walker, V. R., Han, J. H., Ni, X., and Yoseda, K. Semantic types for computational legal reasoning: propositional connectives and sentence roles in the veterans’ claims dataset. In Proceedings of the 16th edition of the International Conference on Articial Intelligence and Law, ICAIL 2017, London, United Kingdom, June 12-16, 2017. pp. 217–226, 2017.
Wan, Y. and Gao, Q. An ensemble sentiment classification system of twitter data for airline services analysis. In 2015 IEEE international conference on data mining workshop (ICDMW). IEEE, pp. 1318–1325, 2015.
Wang, A., Singh, A., Michael, J., Hill, F., Levy, O., and Bowman, S. R. Glue: A multi-task benchmark and analysis platform for natural language understanding. arXiv preprint arXiv:1804.07461 , 2018.
Zellers, R., Bisk, Y., Schwartz, R., and Choi, Y. Swag: A large-scale adversarial dataset for grounded commonsense inference. arXiv preprint arXiv:1808.05326 , 2018.