CN110119449B

CN110119449B - Criminal case criminal name prediction method based on sequence-enhanced capsule network

Info

Publication number: CN110119449B
Application number: CN201910396510.8A
Authority: CN
Inventors: 彭黎; 何从庆
Original assignee: Hunan University
Current assignee: Hunan University
Priority date: 2019-05-14
Filing date: 2019-05-14
Publication date: 2020-12-25
Anticipated expiration: 2039-05-14
Also published as: CN110119449A

Abstract

The invention relates to the field of intelligent laws, in particular to a criminal case criminal name forecasting method based on a sequence-enhanced capsule network. The method comprises the following steps: s1, constructing a training data set, and acquiring fact description of a case and a result of penalty for a crime as training data; s2, constructing a sequence-enhanced capsule network model and training through training data; s3, the sequence enhanced capsule network model after S2 training inputs the fact description text of the new case into the sequence capsule network model, and the model automatically predicts the corresponding guilty name as the guilty name prediction result. The model provided by the invention not only can better capture the remarkable characteristics and semantic information of the legal text, but also has better competitiveness on the low-frequency criminal name prediction problem; a focal loss function is introduced to serve as a loss function of the sequence enhanced capsule network model, and the problem of high imbalance of the crime names of the low-frequency crime name prediction task is further solved.

Description

Criminal case criminal name prediction method based on sequence-enhanced capsule network

Technical Field

The invention relates to the field of intelligent laws, in particular to a criminal case criminal name forecasting method based on a sequence-enhanced capsule network.

Background

In recent years, artificial intelligence technologies represented by deep learning and natural language processing have made a great breakthrough, and the attention of the academic world and the industrial world has been drawn to the field of intelligent laws. The intelligent law endows the machine with the capability of understanding legal texts and analyzing cases, and intelligent case handling can be performed according to cases.

The automatic criminal name prediction is one of the most representative subtasks in intelligent law, plays an important role in a law assistant system, and is widely applied to real life. For example, the system can provide criminal reference of case notifiers for legal experts (such as lawyers and judges) so as to assist the judges in case judgment and improve the working efficiency; while providing legal consultancy for ordinary people unfamiliar with legal terms and complex procedures. The automatic criminal name prediction is to use machine learning or deep learning technique to train the judge of criminal name (such as theft, robbery, traffic accident, etc.) of the person under the case. Previous research work has proposed a number of methods to implement automatic criminal name prediction. These methods are mainly classified into three categories: (1) a conventional method; (2) a machine learning method; (3) provided is a deep learning method.

The traditional method usually adopts mathematical formulas or quantitative calculation. Kort [ Fred Kort. predicting Supreme Court details chemistry: A quantitative analysis of the "right to counter" cases. American policy Science Review,1957,51(1): 1-12 ] attempted to use quantitative methods to predict human events that are generally considered highly uncertain, i.e., the decision of the highest Court in the United states. The study is intended to demonstrate that, at least in one area of judicial examination, cases that have already been decided upon are used to determine the factual factors that influence the decision, these factors are formulated to value, and the decision for the remaining cases is then correctly predicted in the specified area. Nagel [ Stuart S nagel.application correlation analysis to case prediction. tex.l.rev.,1963,42:1006] considered that litigation outcomes could be scientifically predicted, which using the reassignment example demonstrated that prediction was possible by assigning correlation coefficients to the four variables that occur in the case. This prediction will help the parties planning litigation, the theorems understanding judicial programs, the legislators explaining judicial responses, and the public seeking to comply with laws. Keown [ R Keown. chemical models for legal prediction. computer/LJ,1980,2:829] proposes the possibility of predicting judicial decisions mathematically. He correctly predicted 99% of the decisions in over 1000 cases using linear models of Haar, Sawyer and Cummings and nearest neighbor of Mackaay and Robillard. This success provides real opportunity and urgent need for developing linear models in other specific areas, not only to empirically verify that the method is generally effective, but also to provide additional predictive models for the legal industry. These traditional methods have achieved some effect in some scenarios, but they are limited to small datasets with a small number of tags.

Because of the success of machine learning in many areas, researchers have begun to use machine learning methods to deal with criminal name predictions. This type of work typically focuses on extracting features from case facts, and then using machine learning algorithms for predictions. Liu et al [ Chao-Lin Liu, Cheng-Tsung Chang, Jim-How Ho. case instance generation and refinement for case-based summary judgment in Chinese 2004 ], Chao-Lin Liu, Chuwn-Dar Hsieh. expanding phrase-based classification of statistical contributions for statistical signatures in Chinese, Proc of International Symposium method for intellectual systems Springer,2006,681 + 690] propose a K-New Neighbor-based algorithm for automatic generation of refined and real-world decision cases for simple case decisions from decision texts. The algorithm attempts to extract important legal information from the past litigation documents to construct case instances, which are then refined by merging similar cases and removing relatively irrelevant information from the cases. Lin et al [ Wan-Chen Lin, Tsung-Ting Kuo, Tung-Jia Chang. explicit great friend models for Chinese leave documents labelin, case classification, and present prediction. ROclinG XXIV (2012),2012.140] define 21 legal element labels for "robbery" and "threatening crime", and then classify "robbery" and "threatening crime" by using legal element information and predict the decision period of the two crimes. Mackaay et al [ Ejan Mackaay, Pierre Rolling. prediction judging: The nearest neighbor rule and visual representation of case patterns.1974] extract features by clustering semantically similar N-grams. Sulea et al [ Octavia-Maria Sulea, Marcos Zampieri, Shervin Malmsii, et al. expanding the Use of Text Classification in the Legal domain. CoRR,2017, abs/1710.09306] investigated the application of Text Classification methods in the Legal domain using cases and adjudications of the highest French court, and then proposed a decision system based on case description, time span and decision features of a support vector machine to predict the Legal domain and accuracy in decision of cases. However, these methods only extract shallow text features or manual labels, which are difficult to collect on large datasets. Therefore, when the amount of data is large, their performance is not good.

In recent years, with the success of deep neural networks in the fields of Natural Language Processing (NLP), Computer Vision (CV) and speech, some work has begun to apply them to the task of automated criminal name prediction and has shown a tremendous performance increase. Luo et al [ Bingfeng Luo, Yansong, Jianbo Xu, et al.learning to Predict targets for clinical Cases with Legal basis.arXiv prediction arXiv:1707.09168,2017 ] consider relevant Legal provisions to play a very important role in this task for the task of predicting a crime name. Therefore, the attention-based neural network method is provided, and the criminal name prediction task and the related clause extraction task are subjected to combined modeling under a unified framework, so that the proper criminal names of cases with different expression modes can be effectively predicted. However, this work does not address the problem of low frequency criminal name prediction as well as multiple criminal name prediction. Zhong et al [ Haoxi Zhong, Guo Zhoupping, Cunchao Tu, et al, Legal Judge Prediction sight national learning. in: Proc of Proceedings of the 2018Conference on Empirical Methods in Natural Language processing.2018, 3540-3549 ] propose a framework of topology multitask learning by considering the Topological dependencies among the subtasks of the legal provisions of the names of crimes, laws, penalties, penalty deadline, and incorporating the dependencies of multiple subtasks into the Prediction of crime Judgment. Hu et al [ Zikun Hu, Xiang Li, Cunchao Tu, et al. Few-shot charge prediction with differential legal attributes. In: Proc of Proceedings of the 27th International Conference on Computational Linguitics.2018, 487-498 ] introduce several discriminant attributes of a crime as an internal mapping between the factual description of the crime and the crime name, which provide additional information for the low-frequency crime name and effective features to distinguish the confusion name, and then propose an Attribute-Attribute prediction model to infer the crime Attribute and the crime name at the same time. Through further analysis of the research content of the above scholars, it can be found that, although a series of automatic criminal name prediction algorithms based on deep learning have been proposed in the academic world and the industrial world, the development is not small. However, the existing method still has the defects that: (1) most of the existing works [9,10] ignore the low-frequency criminal name scene of the automatic criminal name prediction task, and only consider the high-frequency criminal name scene, so that the problem of low-frequency criminal name prediction cannot be well solved. (2) Hu et al [11] achieved good results in low-frequency criminal scenes using artificially generated auxiliary information, however, manually labeling information wastes a lot of time and cannot implement an end-to-end deep learning model.

The invention discloses a criminal case and criminal name prediction method based on a memory neural network (published: 2019.02.22). A training data set is built by taking standard case description and criminal names thereof as training data, a built memory neural network model is trained through the training data set, case description characteristic vectors and criminal name codes are converted into key-value pairs stored in the memory neural network model, and criminal case names are judged by adopting a multi-layer perceptron classifier.

Disclosure of Invention

Through the intensive analysis of research results of numerous scholars at home and abroad, aiming at the problems in the prior art, the invention provides a criminal case criminal name prediction method based on a sequence-enhanced capsule network, so as to relieve the problem of low-frequency criminal name prediction in criminal cases.

In order to achieve the purpose, the invention adopts the technical scheme that a criminal case criminal name forecasting method based on a sequence-enhanced capsule network comprises the following steps:

s1, constructing a training data set, and acquiring fact description of a case and a result of penalty for a crime as training data;

s2, constructing a sequence-enhanced capsule network model and training through training data, wherein the method comprises the following steps:

s2.1, constructing a sequence-enhanced capsule network model, and specifically comprising the following steps:

s2.1.1 construction of the initial capsule layer: segmenting the fact description text of the case, mapping the fact description text into a word vector sequence, and taking the word vector sequence as an initial capsule layer u ═ { u ═₁,u₂,…,u_n}；

S2.1.2 Multiple seq-caps layers were constructed: extracting features by using a Multiple seq-caps layer to obtain a main feature vector of a case fact description text by using the initial capsule layer u obtained from S2.1.1, wherein the Multiple seq-caps layer consists of two seq-caps layers;

s2.1.3, constructing a residual error unit layer (attention layer) based on an attention mechanism, and obtaining an auxiliary feature vector c of the case fact description text by using the attention mechanism on the S2.1.1 obtained initial capsule layer u:

the attention layers are as follows: n initial capsules u in the initial capsule layer u_iAnd (i is 1,2, …, n) obtaining a vector e after matrix transformation by a weight matrix W_iThen to vector e_iObtaining each initial capsule u through a softmax function_iIs weighted by the importance of_iAdding all the initial capsules according to the importance weights to finally obtain an auxiliary feature vector c of the case fact description text; the formula is as follows:

e_i＝tanh(Wu_i+b)

where W is the weight matrix and b is the bias vector.

S2.1.4, an output layer is constructed, the main feature vector of the case fact description text obtained by S2.1.2 and the auxiliary feature vector c of the case fact description text obtained by S2.1.3 are combined and transmitted to a full-link network.

S2.2, training a sequence enhanced capsule network model;

s3, the sequence enhanced capsule network model after S2 training inputs the fact description text of the new case into the sequence capsule network model, and the model automatically predicts the corresponding guilty name as the guilty name prediction result.

Further, the data set in S1 is from real criminal cases published by the Chinese judge paper network, each case includes two parts, the fact description of the case and the result of penalty for the name of the case, which are used as training data.

Further, S2.1.1, the Word segmentation adopts the Beijing university sourcing tool pkuseg, and maps Word2vec trained Word vectors into Word vector sequences by using the Embedding technology.

Further, a focal loss function training sequence is adopted in S2.2 to enhance the capsule network model.

Compared with the prior art:

(1) the invention provides a sequence-enhanced capsule network model, which can better capture the remarkable characteristics and semantic information of legal texts and has better competitiveness on the aspect of low-frequency criminal name prediction.

(2) A focal loss function is introduced to serve as a loss function of the sequence enhanced capsule network model, and the problem of high imbalance of the crime names of the low-frequency crime name prediction task is further solved.

(3) By comparing the most advanced method at present, the sequence-enhanced capsule network model provided by the invention realizes 4.5% and 6.4% of F1 promotion in the real data sets Criminal-S and Criminal-L respectively. The experimental results prove the superiority and competitiveness of the sequence-enhanced capsule network model in solving low-frequency criminal scenes.

Drawings

FIG. 1 is a flow chart of a method of the present invention;

FIG. 2 is a schematic diagram of the sequential capsule network model of the present invention;

FIG. 3 is a schematic representation of the Seq-caps layer of the present invention;

fig. 4 is a schematic diagram of the Attention layer of the present invention.

Detailed Description

The invention is further described with reference to the drawings and the specific embodiments in the following description.

The brief flow block diagram of the invention is shown in figure 1, and the criminal case criminal name forecasting method based on the sequence capsule network model comprises the following steps:

the invention carries out experiments on three disclosed real data sets, wherein the data sets are all from three criminal cases disclosed in a Chinese referee document network, and the fact description and the criminal penalty result of the case are obtained as training data; since only the case's principal names are retained in the disclosed dataset, each name need only be mapped to a unique integer for encoding.

s2.1, constructing a sequence enhanced capsule network model, wherein the sequence enhanced capsule network model is shown in figure 2. The construction of the model comprises the following steps:

s2.1.1 construction of the initial capsule layer: performing Word segmentation on the fact description text of the case, mapping Word vectors trained by Word2vec into Word vector sequences by using an Embedding technology, and taking the Word vectors as an initial capsule layer u ═ { u ═₁,u₂,…,u_n}。

S2.1.2, constructing Multiple seq-caps layer, and obtaining main feature vector of case fact description text by using the Multiple seq-caps layer for S2.1.1 obtained initial capsule layer u.

The Multiple seq-caps layer is composed of two seq-caps layers, and each seq-caps layer is composed of a Sequence Information encoder (Sequence Information encoder) and a Dynamic route converter (Dynamic route) as shown in FIG. 3ng) is prepared. The present invention uses a long short term memory network (LSTM) as a sequence information encoder. Taking the first seq-caps layer as an example, let the initial capsule layer u ═ { u ═₁,u₂,…,u_nThe transmission into the seq-caps layer, the formula of the long-short term memory network is as follows:

f_t＝σ(W_fu_t+U_fh_t-1+b_f)，

i_t＝σ(W_iu_t+U_ih_t-1+b_i)，

o_T＝σ(W_ou_t+U_oh_t-1+b_o)，

h_t＝O_tOtanh(c_t)

solving for h by the above formula_tSequence information of time instants, wherein_t、i_t、o_tRespectively a forgetting gate, an input gate and an output gate of the LSTM,

candidate value representing the current moment of time, c_tIndicates the state of the current time, h_tAn output value, W, representing the current time_f、W_i、W_o、W_cAll represent a weight matrix, U_f、U_i、U_o、U_cAll represent a weight matrix, b_f、b_i、b_o、b_cRepresenting an offset vector u_tRepresenting the current input value, c_t-1Indicates the state of the last time, h_t-1The output value at the previous time is represented, and σ represents a sigmoid function.

The output of the sequence information encoder is then transmitted to the mobileIn the state routing converter, the lower layer capsule u is firstly encapsulated_j|iBy means of a matrix w_jMapping to a lower capsule copy. Next, the low-level capsule replica utilizes a dynamic routing mechanism to route u_j|iThe output v ═ v of the dynamic route converter is obtained in the step of aggregating into a high-level capsule layer₁,v₂,…,v_nV denotes the dominant feature vector of case fact description text.

S2.1.3 residual unit layer (attention layer) based on attention mechanism is constructed, and initial capsule layer u is { u }₁,u₂,…,u_nUsing an attention mechanism, obtaining an assistant feature vector c of case fact description text.

The attention layer is shown in fig. 4:

n initial capsules u in the initial capsule layer u_iAnd (i is 1,2, …, n) obtaining a vector e after matrix transformation by a weight matrix W_iThen to vector e_iObtaining each initial capsule u through a softmax function_iIs weighted by the importance of_iAdding all the initial capsule vectors according to the importance weight, and finally obtaining an auxiliary feature vector c of the case fact description text; the formula is as follows:

e_i＝tanh(Wu_i+b)

where W is the weight matrix and b is the bias vector.

S2.2 training sequence enhanced capsule network model: and training the sequence enhancement capsule network model obtained by S2.1 by utilizing the focal loss function. The focal loss function is expressed as follows:

wherein the content of the first and second substances,

is the model estimated probability calculated by the softmax function, and alpha is the alpha-balanced variable of focal loss.

Is a tuning factor, γ (γ ≠ 0) is a tunable parameter in order to improve the effect of the tuning factor.

To illustrate the effectiveness of the criminal case criminal name prediction method based on the sequential capsule network proposed by the present invention, the present invention compares it with several classical text classification methods and two most advanced criminal name prediction methods in the prior art in three data sets. In addition, in order to prove the effectiveness of the model in processing low-frequency criminal name prediction, a group of criminal name prediction experiments with different frequencies are carried out.

Table 1 shows the results of the baseline model based on three data sets. In general, the criminal case criminal name prediction method based on the sequence capsule network has the advantages that the performance of the criminal case criminal name prediction method based on the sequence capsule network on three data sets is superior to that of all base lines, and the method has remarkable advantages. Specifically, compared with the most advanced criminal name prediction method, the model of the invention utilizes F1 evaluation index to respectively obtain 4.5%, 2.5% and 6.4% absolute considerable improvements on three data sets, and the effectiveness of the criminal case criminal name prediction method based on the sequence capsule network on the criminal name prediction task is demonstrated. The trend shows that the criminal case criminal name prediction method based on the sequence capsule network can capture high-level semantic representation of legal text which is crucial to criminal name prediction.

Table 1: and comparing the prediction results of the names of the crimes under the real data sets, wherein MP represents macro precision, MR represents macro call, and F1 represents macro F1.

Low frequency criminal name comparison

Table 2: low frequency criminal name comparison under real data set

In order to further illustrate the effectiveness of the criminal case criminal name prediction method based on the sequence capsule network in the aspect of processing low-frequency criminal names, a group of criminal name segmentation experiments with different frequencies are carried out. We divide the names of guilties into three parts by frequency (low, medium and high). The low frequency is defined as the crime appearing in all data sets less than 10 times (including 10 times), the high frequency is defined as the crime appearing in all data sets more than 100 times (except 100 times), and the others belong to the medium frequency.

Table 2 shows the performance of the Criminal case Criminal name prediction method based on the sequential capsule network proposed by the present invention on criminol-S data set at different frequencies, and we compared the low frequency, medium frequency and high frequency results of the model of the present invention with the most advanced Criminal name prediction model and the most advanced text classification model at macro-f 1. As can be seen from the table, the low-frequency macro-f1 is 53.8%, which is improved by more than 65% compared with the LSTM-200 model and is improved by 4.1% compared with the most advanced guilt name prediction model. With the help of the SECaps model, the problem of low-frequency criminal name prediction is relieved, an end-to-end model is provided, and manual data marks are reduced. The SECaps model has strong vector representation capability and sequence representation capability, and the focal loss has good performance in the aspect of processing the problems of unbalanced classification and difficult classification, so that the defect of low-frequency criminal name prediction can be overcome.

Claims

1. A criminal case criminal name forecasting method based on a sequence-enhanced capsule network is characterized by comprising the following steps:

s2.1.1 construction of the initial capsule layer: segmenting the fact description text of the case, mapping the fact description text into a word vector sequence, and taking the word vector sequence as an initial capsule layer u ═ { u ═₁，u₂，…，u_n}；

S2.1.2 Multiple seq-caps layers were constructed: extracting features by using a Multiple seq-caps layer from the initial capsule layer u obtained from S2.1.1 to obtain a main feature vector of the case fact description text;

the Multiple seq-caps layer consists of two seq-caps layers; each seq-caps layer consists of a sequence information encoder and a dynamic route converter;

s2.1.3, constructing an attention layer, and obtaining an auxiliary feature vector c of a case fact description text by using an attention mechanism for the S2.1.1 obtained initial capsule layer u;

s2.1.4, constructing an output layer, combining the main feature vector of the case fact description text obtained by S2.1.2 and the auxiliary feature vector c of the case fact description text obtained by S2.1.3, and transmitting the combined result to a full-connection layer network;

s2.2, training a sequence enhanced capsule network model;

2. A criminal case criminal name prediction method based on a sequence-enhanced capsule network according to claim 1, characterized in that: the data set in S1 is from real criminal cases published by the chinese judge paper web, each case comprising two parts: and the fact description of the case and the result of the penalty of the criminal name are used as training data.

3. A criminal case criminal name prediction method based on a sequence-enhanced capsule network according to claim 1, characterized in that: s2.1.1, the Word segmentation adopts the Beijing university sourcing tool pkuseg, and uses the Embedding technology to map Word2vec training Word vectors into Word vector sequences.

4. A criminal case criminal name prediction method based on a sequence-enhanced capsule network according to claim 1, characterized in that: s2.1.2, a long short term memory network is used as a sequence information encoder.

5. A criminal case criminal name prediction method based on a sequence-enhanced capsule network according to claim 1, characterized in that: s2.1.3, the attention layers are as follows: n initial capsules u in the initial capsule layer u_iAnd (i is 1,2, …, n) obtaining a vector e after matrix transformation by a weight matrix W_iThen to vector e_iObtaining each initial capsule u through a softmax function_iIs weighted by the importance of_iAdding all the initial capsules according to the importance weights to finally obtain an auxiliary feature vector c of the case fact description text; the formula is as follows:

e_i＝tanh(Wu_i+b)

where W is the weight matrix and b is the bias vector.

6. A criminal case criminal name prediction method based on a sequence-enhanced capsule network according to claim 1, characterized in that: and S2.2, enhancing the capsule network model by adopting a focal loss function training sequence.

7. The criminal case criminal name forecasting method based on the sequence-enhanced capsule network is characterized by comprising the following steps of: the focal loss function is expressed as follows:

wherein the content of the first and second substances,

is the model estimated probability calculated by the softmax function, alpha is the alpha-balanced variable of focal loss,