Multi-label charge predictions leveraging label co-occurrence in imbalanced data scenario

被引:0
作者
Hongsong Dong
Fengbao Yang
Xiaoxia Wang
机构
[1] North University of China,School of Information and Communication Engineering
[2] Shanxi Agricultural University,College of Information Science and Engineering
来源
Soft Computing | 2020年 / 24卷
关键词
Charge prediction; Imbalanced data; Few-shot charges; Multi-label classification; Label co-occurrence;
D O I
暂无
中图分类号
学科分类号
摘要
Charge prediction is to predict associated charges based on fact descriptions and plays a significant role in legal aid systems. It is a fundamental and challenging task to automatically predict charges in the multi-label classification paradigm, which is fit to real applications. Existing works either focus on balanced data scenario and multiple charges or few-shot charges with a single label. Moreover, previous models utilize special initialization with label patterns to improve the performance of the multi-label classification task, which is only applicable when there is less training data, resulting in poor robustness. To this end, a multi-task convolutional neural network combined with bidirectional long short-time memory leveraging label co-occurrence framework, called CBLLC, is introduced to predict multiple charges with article information on imbalanced data occasion. We develop a new learning mechanism to train the framework of charge and article patterns when there is a lot of training data, increasing its robustness. In CBLLC, the data preprocessing process serves to aid the training in a more generalized manner and reduce overfitting. A salient word annotation is introduced to deal with few-shot charges. A better classification result is obtained with processed data and improves the generality of the model. Experimental results of Chinese AI and Law Challenge test set show the superiority of our proposed method compared with the state-of-the-art methods. In particular, a macro-F1 score of 92.9% for charges and 86.6% for articles is achieved with co-occurrence of charges and patterns of articles.
引用
收藏
页码:17821 / 17846
页数:25
相关论文
共 172 条
[1]  
Akcay S(2018)Using deep convolutional neural network architectures for object classification and detection within X-ray baggage security imagery IEEE Trans Inf Forensics Secur 13 2203-2215
[2]  
Kundegorski M(2019)Automatic extraction of cancer registry reportable information from free-text pathology reports using multitask convolutional neural networks J Am Med Inform Assoc 22 7281-7291
[3]  
Willcocks C(2018)Sentiment analysis and spam detection in short informal text using learning classifier systems Soft Comput 30 2163-2172
[4]  
Breckon T(2018)Biased random forest for dealing with the class imbalance problem IEEE Trans Neural Netw Learn Syst 35 1798-1828
[5]  
Alawad M(2013)Representation learning: a review and new perspectives IEEE Trans Pattern Anal Mach Intell 44 65-77
[6]  
Gao S(2018)Distinguishing between facts and opinions for sentiment analysis: survey and challenges Inf Fusion 37 50-56
[7]  
Qiu J(2004)Crime data mining: a general framework and some examples Computer 11 34-44
[8]  
Yoon H(2016)Learning user and product distributed representations using a sequence model for sentiment analysis IEEE Comput Intell Mag 44 22-32
[9]  
Blair C(2018)Semi-supervised clue fusion for spammer detection in Sina Weibo Inf Fusion 26 266-280
[10]  
Arif MH(2018)A neural approach to source dependence based context model for statistical machine translation IEEE/ACM Trans Audio Speech Lang Process 30 1602-1608