CN107577785B - Hierarchical multi-label classification method suitable for legal identification - Google Patents
Hierarchical multi-label classification method suitable for legal identification Download PDFInfo
- Publication number
- CN107577785B CN107577785B CN201710832304.8A CN201710832304A CN107577785B CN 107577785 B CN107577785 B CN 107577785B CN 201710832304 A CN201710832304 A CN 201710832304A CN 107577785 B CN107577785 B CN 107577785B
- Authority
- CN
- China
- Prior art keywords
- label
- class
- feature
- category
- case
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Images
Landscapes
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention discloses a hierarchical multi-label classification method suitable for legal identification, which comprises the following steps: step 1, extracting case facts and legal provisions thereof from a preprocessed referee document; step 2, expanding legal provisions corresponding to case facts based on the hierarchical structure of the label space, and enabling category labels of case samples to be a subset of the label space; step 3, performing word segmentation and part-of-speech tagging on case fact texts, performing feature selection on segmentation results, and selecting feature words capable of sufficiently representing case facts to construct feature vectors; step 4, constructing a prediction model: finding out a k neighbor sample set N (x) of the unseen example x in the extended multi-label training set, setting weight for each neighbor sample, calculating confidence coefficient of the unseen example belonging to each class according to the classification weight of the k neighbor samples to each class, and finally predicting a class label set of the unseen example.
Description
Technical Field
The invention belongs to the field of computer data analysis and mining, and relates to a hierarchical multi-label classification method suitable for legal identification.
Background
Hierarchical multi-label classification is a special case of multi-label classification. Unlike general multi-label classification, in a hierarchical multi-label classification problem, each sample can have multiple class labels, while the sample label space is organized in a tree or directed acyclic graph hierarchy. In the directed acyclic graph, one node may have a plurality of father nodes, which is more complex than a tree structure and has greater difficulty in designing an algorithm, so that the current research on hierarchical multi-label classification mainly aims at the category label structure of the tree. According to different modes of observing category hierarchical structures by algorithms, the hierarchical multi-label classification algorithm can be divided into a local algorithm and a global algorithm.
And (3) the local algorithm inspects the local classification information of each internal node in the category hierarchy one by one, and converts the hierarchical multi-label classification problem into a plurality of multi-label classification problems. And when training the multi-label classifier on the internal node, an appropriate local sample set needs to be selected. And in the prediction stage, a top-down equal prediction mode is adopted to enable the prediction result to meet the hierarchical requirement. Document ESULI A, FAGNI T, SEBASTIANI F.TreeBoost.MH A boosting algorithm for multi-label text classification [ C ]// String Processing and formatting retrieval 2006: 13-24. TreeBoost.MH algorithm is proposed to handle the hierarchical multi-label text classification problem. The algorithm recursively trains multi-label classifiers on each non-leaf node in the class label tree, the base classifier selects adaboost. The experimental effect proves that the TreeBoost.MH algorithm is better than the AdaBoost.MH algorithm in time efficiency and prediction performance. The documents CERRI R, BARROS R C, DE CARVALHO AC. hierarchical multi-layer-label using local neural networks [ J ]. Journal of Computer and System sciences,2014,80(1): 39-56, a local hierarchical multi-label classification algorithm based on a multi-layer perceptron is proposed, a multi-layer perceptron network is trained at each layer of the category hierarchy, each neural network is associated with a category hierarchy for predicting the category label at the hierarchy, and the prediction result of the neural network at a certain layer is used as the input of the neural network at the next layer. Because each layer of neural network is trained on the same sample set, the prediction result can not meet the hierarchical constraint, and the prediction result needs to be subjected to subsequent processing to ensure that the prediction result meets the hierarchical constraint.
The local algorithm has the disadvantages that on one hand, a plurality of classifiers are required to be trained, so that the model is relatively complex, and the understandability of the model is influenced; on the other hand, a blocking problem occurs in the prediction process, that is, the samples which are misclassified at the upper layer cannot reach the classifier at the lower layer, and although three strategies of reducing the threshold, limiting voting and expanding threshold multiplication are proposed to solve the blocking problem of the local algorithm, the local algorithm is not ideal in prediction accuracy.
The global algorithm considers the hierarchical structure of the category as a whole, trains a single-level multi-label classifier and predicts unseen examples. Global algorithms can be mainly classified into the following according to the way they process class label hierarchies: one global algorithm is to use class clustering to first compute the similarity of the test samples to each class and then classify the test samples into the closest class. The other method is to convert the hierarchical multi-label classification problem into a multi-label classification problem for processing: documents KIRITCHENKO S, MATWIN S, famill f.functional organization of genetic temporal organization [ J ],2005. extend the class labels of training samples, add their ancestor class labels, convert the hierarchical multi-label classification problem into a multi-label classification problem for processing. In the testing stage, the adopted multi-label classification algorithm AdaBoost.MH does not consider the hierarchical structure of the categories, so the same problem as that of a local algorithm is faced, namely, the predicted result has the situation of inconsistent hierarchies, and the output of the model also needs to be corrected to ensure that the hierarchy limit is met. There is also a global algorithm that adapts existing non-hierarchical classification algorithms to directly process hierarchical information and use the hierarchical information to improve performance. The literature VENS C, STRUFF J, SCHIETGAT L, et al, precision treeesfor hierarchical multilabelellar classification [ J ] Machine Learning,2008,73(2): 185-. Experimental results show that the global Clus-HMC algorithm is better than the Clus-SC and Clus-HSC algorithms in prediction performance and is better in time efficiency.
In general, global algorithms have two features: considering the hierarchical structure of the categories as a whole once; there is no modularity specific to the local algorithm. The key difference between the global algorithm and the local algorithm lies in the training process, and in the testing stage, the global algorithm can even use a top-down mode like the local algorithm to predict the category of the unseen instances.
Since the organization of class labels in the hierarchical multi-label classification problem is hierarchical, if a sample has a class label ciThen the sample implicitly has ciAll ancestor category labels of; on the other hand, in predicting the category of the unseen instance, the hierarchical constraint is also satisfied, i.e., it cannot happen that the unseen instance belongs to a category and not to an ancestor category of the category. A general hierarchical multi-label classification algorithm often cannot ensure that a prediction result meets the hierarchical limitation, or cannot obtain the optimal learning effect because the hierarchical structure features of a label space are not utilized. Therefore, the hierarchical multi-label classification algorithm not only needs to make full use of the association and the hierarchical structure between the class labels to improve the prediction performance of the classification model, but also needs to enable the prediction result to meet the hierarchical limitation.
The problem of automatic identification of case applicable laws is essentially a hierarchical multi-label classification problem, the category labels of the samples, namely, the legal provisions applicable to the cases, are organized in a tree structure, one case may be applicable to a plurality of legal provisions, and the specific degrees of the legal provisions applicable to the case may be different. The corresponding hierarchical multi-label classification algorithm for solving the problem of automatic identification of case applicable law needs to be capable of processing tree-shaped class hierarchical structures, and is a non-mandatory leaf node prediction algorithm, and predicted class labels can correspond to any nodes in the class hierarchical structures.
Disclosure of Invention
The purpose of the invention is as follows: the invention aims to solve the technical problem of providing an effective hierarchical multi-label classification method suitable for legal identification aiming at the defects of the prior art.
The technical scheme is as follows: the invention discloses a hierarchical multi-label classification method suitable for legal identification, which comprises the following steps:
And 2, because the organization of the legal provision in the legal system is in a tree structure, correspondingly, the label space formed by the category labels in the multi-label training set is in a tree structure. Based on a hierarchical structure of a label space formed by category labels in a multi-label training sample set, legal provisions corresponding to case facts of all case samples are expanded, so that the category label corresponding to each case fact is a subset of the label space and meets the hierarchical limitation;
The step 2 comprises the following steps:
step 2-1, in the hierarchical multi-label classification problem, a d-dimensional instance space is given(Is a real number set) and a label space Y containing q classes { Y ═ Y1,y2,…,q},yvThe label of the v-th class is represented, v is more than or equal to 1 and less than or equal to q, the spatial hierarchy of the class label can be represented by a binary group (Y and less), the < represents the partial order relation of the class label, if Y existsv,yuE is Y and Yv<yuThen class label yvBelongs to category label yu,yvIs yuDescendant class tag of, yuIs yvThe ancestor class label, < represents the partial order relationship of the class label, the partial order relationship < can be understood as "belonging to" the relationship, i.e., if there is yv,yuE is Y and Yv<yuThen class label yvBelongs to category label yu,yvIs yuDescendant class tag of, yuIs yvThe partial order relationship < has asymmetry, non-reflexivity and transitivity, and can be described by the following four characteristics:
a) the only root node in the class label hierarchy is represented by a virtual class label R for any yiE is Y, has Yi<R;
c) Arbitrary yiIs e.g. Y, has
d) Arbitrary yi,yj,yk∈Y,yi<yjAnd y isj<ykThen there is yi<yk。
The multi-label classification problem in which the organizational structure of the category labels satisfies the above four features can be regarded as a hierarchical multi-label classification problem. As can be seen from the above formal definitions, in the hierarchical class label space, all other class nodes (excluding the start node) on the unique path formed by tracing back from any class node to the root node are ancestor class nodes of the class node. Thus if the sample has a class label yiThen the sample implicitly also has yiAll ancestor class labels of, this requires that the classifier must satisfy a hierarchical constraint on the set of class labels h (x) that predict unseen instance x, i.e.,and y '< y': y ', (x) where y' is the class label in h (x) and y 'is an ancestor class label for y';
step 2-2, for each multi-labeled case sample (x)i,hi) I is more than or equal to 1 and less than or equal to m, m is the number of all the obtained referee document samples, xie.X is a feature vector with d dimension for representing the fact part of the case,is equal to xiA corresponding set of class labels, i.e. xiCorresponding legal provision, the expanded category label set is hi′,Then h isiIn' contains hiAll category labels in (1) and all ancestor category labels thereof. In a formalized way, the method comprises the following steps of,
wherein y' is hiThe ancestor class label of class label y in (1), y ∈ hi;
The label extension process explicitly expresses the hierarchical relationship of the category labels in the category labels of the sample: if a sample is marked as a certain category, then the ancestor categories of the categories are also explicitly assigned to the sample through label expansion; the category label of each sample can be viewed as a subtree of the label space tree, and the top level of each subtree is the root node. It can be seen that if there is yi,yjE is Y and Yi<yjFor example, in the k neighbor sample in the extended multi-label training set, there is a class label yiMust not be less than having a class label yjThe number of samples of (1). The label expansion is an important step for ensuring that the prediction result of the learning algorithm meets the level limit.
The step 3 comprises the following steps:
and 3-1, the purpose of feature selection is to reduce dimensions of features, and since a common text feature selection algorithm cannot directly process a multi-label data set, multi-label sample data needs to be converted into single-label sample data for processing. The conversion method comprises the following steps: for each multi-labeled case sample (x)i,hi) (i is more than or equal to 1 and less than or equal to m) using | hiI represents a category label set h of a multi-labeled case sampleiThe number of label categories in the label list is replaced by | hiL single-label case sample (x)i′,yi′)(1≤i′≤|hi|,yi′∈hi) Class label y for each single label samplei′I.e. the category label set hiA category label of; the multi-label case samples comprise multi-label training samples and multi-label testing samples; table 1 gives an example of converting a multi-label exemplar to a single label exemplar according to the above strategy.
TABLE 1 Multi-tag sample conversion Process
And 3-2, converting the multi-label case sample into a case sample with a plurality of single labels through the conversion process of the step 3-1, performing feature selection on the word segmentation result obtained from the original training set in the step 1 by using a general feature selection algorithm, selecting a certain number of feature words with distinguishing capability (usually, the total information gain of the selected feature words is as large as possible and the number of the feature words is not too much, for example, when the feature selection is performed by using an information gain algorithm, at least 100 feature words are generally selected) to form a feature space, and using the feature words from the feature space to represent the case fact part of each case sample. Wherein, the attribute value corresponding to each feature word, namely the feature weight, is calculated by adopting a commonly used TF-IDF algorithm. And (3) regarding the case fact part of each case sample with the single label as a document with words segmented, and then combining the case fact parts of all case samples with the single label into a document set. Feature weight tf-idf of j-th dimension feature in ith document in document seti″j″The definition is as follows:
wherein tf-idfi″j″Representation feature word tj″In document di″Frequency of occurrence of idfj″Representation feature word tj″Inverse document frequency in a document collection, N represents the total number of documents in the document collection, Nj″Representation feature word tj″Frequency of documents in a document set, i.e. occurrence of a characteristic word t in a document setj″The denominator of the number of documents is a normalization factor;
and 3-3, performing feature selection on the word segmentation result in the step 1 by using an information gain algorithm or a chi-square statistical algorithm, and selecting about 100 feature words with the highest distinguishing capability to form a feature vector. Commonly used text feature selection methods are mainly based on Document Frequency (DF), Mutual Information (MI), Information Gain (IG), chi-square statistics (χ)2Statistical, CHI) and the like. The feature selection based on the document frequency is too simple, the feature words with the most classified information cannot be selected, and the mutual information has the defect that the mutual information is easily influenced by the marginal probability of the feature words, so that the hierarchical multi-label classification method selects information gain or chi-square statistical algorithm to select the features.
Step 3-3 comprises: and (3) selecting features by adopting an information gain algorithm: the information gain ig (t) of the feature word t is defined as follows:
wherein, p (y)v) Presentation category label yvProbability of occurrence, p (t) represents the probability of occurrence of the feature word t,indicating a category label y on the premise of the occurrence of a characteristic word tvThe probability of occurrence of the event is,indicates the probability that the feature word t does not occur,indicating a category label y on the premise that the feature word t does not appearvThe probability of occurrence, for each feature word in the document set, calculating the information gain, the information gain value is lower than the set threshold (for example, 0.15, the threshold is set so that the total information gain of the selected feature words is as large as possible and the number of feature words is not largeToo many) feature words are not included in the feature space;
step 3-3 can also adopt chi-square statistical algorithm to carry out feature selection: it is assumed that the feature words are not related to class, and if the test value calculated using the CHI distribution deviates more from the threshold, then there is more confidence in negating the original hypothesis, accepting an alternative hypothesis to the original hypothesis: i.e. the characteristic words have a high degree of correlation with the categories.
Let A be a label y containing a feature word t and belonging to a category v1 < v < q, B is a label y containing a feature word t but not belonging to a categoryvC is a label y belonging to a category without including a feature word tvD is a label y which does not contain the feature word t and does not belong to the categoryvN is the total number of documents in the document set, then the characteristic word t and the category label yvChi square statistic of2(t,yv) Is defined as:
characteristic word t and category yvWhen independent, the chi-square statistic is 0, the chi-square statistic about each category is calculated for one feature word, and then the mean χ is calculated respectively2 avg(t) and maximum valueBy Chi2 avg(t) andtaking into account, a certain number (about 100) of feature words with differentiating capability are selected, wherein p (y)v) Presentation category label yvProbability of occurrence:
the main advantage of the chi-squared statistical feature selection algorithm over mutual information is that it is a normalized value, and therefore can better scale different feature words in the same category.
In step 4, when k neighbors are found, no example x and no neighbor sample in the extended multi-label training sample set are found(xa,ha) Distance d (x, x)a) Wherein (x)a,ha)∈N′(x),1≤a≤k,,haIs xaAnd calculating the reciprocal of cosine similarity of the feature vectors of the corresponding class labels, wherein the cosine similarity cos (gamma, lambda) of the feature vector gamma of the example x and the feature vector lambda of the adjacent sample is calculated by the following formula:
where S denotes the index of the component of the feature vector, i.e. the position of the component in the feature vector, S denotes the dimension of the feature vector, γsDenotes the s-th component, λ, of the feature vector γsRepresenting the s-th component of the feature vector lambda.
In step 4, d (x, x) is useda) Indicating that example x is not found and the neighboring samples (x) in the extended multi-labeled training sample seta,ha) The distance of the extended multi-label training sample set is calculated by adopting a full label distance weight method or an entropy label distance weight method to haClass label y in (1)jBy classification weight waj,1≤j≤q;
Calculation of w by full-label distance weight methodaj:
Calculation of w by entropy label distance weight methodaj:
Unseen instance x belongs to the class label yjConfidence of (c) (x, y)j) The calculation formula is as follows:
wherein warRepresents haR ofIdentification label yrThe classification weight of (2);
the class label set h (x) for the prediction of missing instance x is:
and selecting 0.5 as a decision threshold, and when the confidence of the unseen example x belonging to each class label is smaller than the decision threshold, returning the class label with the maximum confidence as the class label of the unseen example.
As a hierarchical multi-label classification method, the prediction result thereof needs to satisfy the hierarchical constraint, that is, and y '< y' ∈ h (x). The following is a demonstration: from the confidence calculation formula, if the algorithm predicts that no instance x has a class label ya(yaE Y), then x belongs to category YaConfidence of (c) (x, y)a) Greater than the threshold t or maximum in all categories. Investigation class yaAncestor class y ofb(yb∈Y,ya<yb) If y isbCorresponding to the virtual root node in the category hierarchy, then x has a category label yaClearly meeting the hierarchical constraint; otherwise, for any neighbor sample of x (x)i,yi) e.N (x), if ya∈YiThen y is also presentb∈YiAnd otherwise, the result is not necessarily true, and the label extension process of the training set ensures that the conclusion is true. Therefore, with the full tag distance weight method and the entropy tag distance weight method, it can be derived:
on the denominatormax1≤r≤qwirRemain unchanged, so x belongs to class ybConfidence of (c) (x, y)b) X belongs to the category yaConfidence of (c) (x, y)a) If there is c (x, y)a)>t, must also have c (x, y)b)>t, so the prediction results satisfy the hierarchical constraint.
Finally, the performance evaluation index of the learning method adopts a hierarchical evaluation index: the hierarchical precision (hP), the hierarchical recall (hR) and the hierarchical F metric (hF) are defined as follows:
wherein the content of the first and second substances,is the set of classes to which the predicted test sample i belongs and its ancestor classes,is the set of classes to which the test sample i actually belongs and its ancestor classes, and the summation operation is to compute the values over all test samples.
In order to make the identification of case applicable laws more practical, the target category predicted by the algorithm is preferably a specific legal provision, not just a broad law, so the method considers the prediction performance of the target category in both cases of the whole legal provision and the specific legal provision. Hereinafter, the hierarchical precision, recall rate and F metric value of the system when the target category is all legal provisions are denoted by hP _ all, hR _ all and hF _ all, respectively, and the hierarchical precision, recall rate and F metric value of the algorithm when the target category is a specific legal provision are denoted by hP _ partial, hR _ partial and hF _ partial.
Besides the hierarchical evaluation index, the precision, the recall rate and the F metric value of each category can be calculated respectively, and the average value of the precision, the recall rate and the F metric value of all the categories is used as the evaluation index of the system performance, namely Macro-averaging (Macro-averaging) of the precision, the recall rate and the F metric value. For each category, let TP denote the number of true positive examples, FP denote the number of false positive examples, TN denote the number of true negative examples, and FN denote the number of false negative examples, the formula for computing the Macro-average Macro-P, Macro-R, Macro-F of accuracy, recall rate, and F value is as follows:
the invention relates to a global hierarchical multi-label classification method, which considers the hierarchical structure of class labels on the whole and ensures that the prediction result also meets the hierarchical limitation. The learning method is an inertia learning algorithm, a clear prediction model is not required to be constructed on a training set, and only the original multi-label sample is subjected to label expansion and then stored, so that incremental learning is supported; in the prediction stage, k adjacent samples of the unseen examples in the training set are firstly found, the confidence coefficient of the examples belonging to each class is determined according to the classification weight of the adjacent samples to each class, and then the class of the unseen examples is predicted. The learning method is simple in model, supports incremental learning, and can be well applied to automatic identification of the problem of multi-level multi-label classification which contains massive data and continuously increases data in case-applicable law.
Has the advantages that: the hierarchical multi-label classification method suitable for legal recognition provided by the invention fully considers the tree-shaped hierarchical structure of the legal provision label space on the whole, so that the prediction result meets the hierarchical limitation, and the prediction result does not need to be additionally corrected. Meanwhile, the method is simple in model, supports incremental learning, and can be well applied to automatic identification of the problem of multi-level multi-label classification which contains massive data and continuously increases data in case-applicable law.
Drawings
The foregoing and other advantages of the invention will become more apparent from the following detailed description of the invention when taken in conjunction with the accompanying drawings.
FIG. 1 is a main flow chart of the present invention.
FIG. 2 is a sample official document.
FIG. 3 legal provisions tag space tree structure.
The legal provisions of fig. 4 combine frequency distributions.
Fig. 5 shows the performance comparison of the hierarchical indexes under different neighbor numbers.
Figure 6 compares macro average indicator performance for different neighbor numbers.
FIG. 7 is a comparison of performance of indexes under different weighting strategies.
Detailed Description
The invention is further explained below with reference to the drawings and the embodiments.
The invention discloses a hierarchical multi-label classification method suitable for legal identification, which comprises the following steps:
extracting case facts and applicable legal provisions thereof from the case facts according to a literary structure of a referee document, wherein the case facts are used for generating feature vectors of case samples, and the applicable legal provisions are used for representing category labels of the case samples, and converting an original text data set into a semi-structured multi-label training set and a semi-structured testing set;
correcting errors and format inconsistency in case-applicable legal provisions;
and utilizing a language technology platform LTP of the Hadamard to perform word segmentation and part of speech tagging on the case fact description.
And 2, because the organization of the legal provision in the legal system is in a tree structure, correspondingly, the label space formed by the category labels in the multi-label training set is in a tree structure. Based on the hierarchical structure of the label space, expanding legal provisions corresponding to case facts of all samples, and enabling a category label set corresponding to each case fact to be a subset of the label space and meet the hierarchical limitation;
The step 2 comprises the following steps:
step 2-1, in the hierarchical multi-label classification problem, a d-dimensional instance space is givenAnd a label space Y containing q classes { Y ═ Y1,y2,…,yq},yiRepresenting the ith class, the class label spatial hierarchy can be represented by a binary (Y, <) < representing a partial ordering relationship for the class labels, which can be understood as "belonging to" the relationship, i.e., if there is Yi,yjE is Y and Yi<yjThen class yiBelong to the category yj,yiIs yjDescendant class of, yjIs yiThe ancestor class of (1). The partial order relationship < has asymmetry, non-reflexibility and transitivity, and can be described by the following four characteristics:
e) the only root node in the class label hierarchy is represented by a virtual class label R for any yiE is Y, has Yi<R;
h) Arbitrary yi,yj,yk∈Y,yi<yjAnd y isj<ykThen there is yi<yk。
The multi-label classification problem in which the organizational structure of the category labels satisfies the above four features can be regarded as a hierarchical multi-label classification problem. As can be seen from the above formal definitions, in the hierarchical class label space, all other class nodes (excluding the start node) on the unique path formed by tracing back from any class node to the root node are ancestor class nodes of the class node. Thus if the sample has a class label ciThen the sample implicitly has ciAll ancestor class labels of, this requires that the classifier also satisfy the hierarchy constraint for the set of prediction classes for unseen instances, h (x), i.e.,and y '< y' ∈ h (x).
Step 2-2, for any training sample (x)i,yi) (i is more than or equal to 1 and less than or equal to m), m is the number of all the obtained referee document samples, xiE X is dThe feature vector of the dimension(s),is equal to xiA corresponding set of category labels. Let the expanded category label set be yi', then yi' therein contains yiAll category labels in (1) and all ancestor category labels thereof. In a formalized way, the method comprises the following steps of,
the label extension process explicitly expresses the hierarchical relationship of the category labels in the category labels of the sample: if a sample is marked as a certain category, then the ancestor categories of the categories are also explicitly assigned to the sample through label expansion; the category label of each sample can be viewed as a subtree of the label space tree, and the top level of each subtree is the root node. It can be seen that if there is yi,yjE is Y and Yi<yjFor example, in the k neighbor sample in the extended multi-label training set, there is a class label yiMust not be less than having a class label yjThe number of samples of (1). The label expansion is an important step for ensuring that the prediction result of the learning algorithm meets the level limit.
The step 3 comprises the following steps:
and 3-1, the purpose of feature selection is to reduce dimensions of features, and since a general text feature selection algorithm cannot directly process a multi-label data set, multi-label data needs to be converted into single-label data for processing. The conversion method comprises the following steps: for each multi-label sample (x, h), the number of label categories in the label category set h is represented by | h |, and is replaced by | h | new single-label samples (x, y)i)(1≤i≤|y|,yiE h), class y for each new sampleiThat is, a category label in the original multi-label sample category label set h, table 1 gives an example of converting a multi-label sample into a single-label sample according to the above strategy.
TABLE 1 Multi-tag sample conversion Process
And 3-2, converting the multi-label case sample into a single-label case sample through the conversion process of the step 3-1, performing feature selection on the word segmentation result obtained from the original training set in the step 1 by using a general feature selection algorithm, and selecting about 100 feature words with the highest distinguishing capability to form a feature space. And (3) representing the case fact part of each case sample by using feature words from the feature space, wherein the attribute value, namely the feature weight, corresponding to each feature word is calculated by adopting a common TF-IDF algorithm. And considering the case fact part of each sample as a document with words segmented, the case fact parts of all samples form a document set. Feature weight tf-idf of jth dimension feature in ith documentijThe definition is as follows:
wherein, tfijRepresentation feature word tjIn document diFrequency of occurrence of idfjRepresentation feature word tjInverse document frequency in a document collection, N represents the total number of documents in the document collection, NjRepresentation feature word tjFrequency of documents in a document set, i.e. occurrence of a characteristic word t in a document setjThe denominator is the normalization factor.
And 3-3, performing feature selection on the word segmentation result obtained from the original training set in the step 1, and selecting a certain number of feature words with distinguishing capability to form feature vectors. Commonly used text feature selection methods are mainly based on Document Frequency (DF), Mutual Information (MI), Information Gain (IG), chi-square statistics (χ)2Statistical, CHI) and the like. The feature selection based on the document frequency is too simple, the feature words with the most classified information cannot be selected frequently, and the mutual information has the defect that the mutual information is easily influenced by the marginal probability of the feature words, so that the information is selected by the hierarchical multi-label classification methodAnd carrying out feature selection by using a gain or chi-square statistical algorithm.
Step 3-3 comprises: and (3) selecting features by adopting an information gain algorithm: the information gain ig (t) of the feature word t is defined as follows:
wherein, Pr(yi) Represents a category yiProbability of occurrence, Pr(t) represents the probability of occurrence of the feature t, Pr(yiI t) represents the category y on the premise that the feature t appearsiThe probability of occurrence of the event is,indicating the probability that the feature t does not occur,indicating class y without the occurrence of feature tiThe probability of occurrence. And calculating the information gain of each feature word in the document set, wherein the feature words with the information gain value lower than a set threshold value are not included in the feature space.
Step 3-3 can also adopt the chi-square statistical algorithm to carry on the feature selection to the case fact text in the training set: it is assumed that the feature words are not related to class, and if the test value calculated using the CHI distribution deviates more from the threshold, then there is more confidence in negating the original hypothesis, accepting an alternative hypothesis to the original hypothesis: i.e. the characteristic words have a high degree of correlation with the categories. Let A be the number of documents containing a feature word t and belonging to category y, B be the number of documents containing a feature word t but not belonging to category y, C be the number of documents not containing a feature word t but belonging to category y, D be the number of documents not containing a feature word t but not belonging to category y, and N be the total number of documents, then chi-square statistic χ of feature word t and category y2(t, y) is defined as:
characteristic word t and category y are uniqueAt that time, the chi-square statistic is 0, the chi-square statistic about each category is calculated for one feature word, and then the mean χ is calculated respectively2 avg(t) and the maximum value χ2 max(t), comprehensively considering the two modes, and selecting the most distinguishing characteristic words:
χ2 avg(t)=∑i=1Pr(yi)χ2(t,yi),
X2 max(t)=maxi=1,...,qX2(t,yi)。
Pr(yi) Represents a category yiThe probability of occurrence. The main advantage of the chi-squared statistical feature selection algorithm over mutual information is that it is a normalized value, and therefore can better weigh different feature words in the same category.
In step 4, when k is found to be close to the neighbor, no example x and no sample (x) are foundi,hi) Distance d (x, x)i) The inverse of the cosine similarity of their feature vectors is used for the measurement. The cosine similarity cos (γ, λ) between the feature vector γ of the unseen example and the feature vector λ of the neighboring sample is calculated as follows:
where S denotes the index of the vector component, i.e. the position of the component in the vector, S denotes the vector dimension, γsDenotes the s-th component of the vector y, λsRepresenting the s-th component of the vector lambda.
In step 4, d (x, x) is usedi) Show example x and sample (x)i,hi) Using full label distance weight method to calculate sample ((x)i,hi) e.N (x)) for class yjBy classification weight wij:
Calculation of w by full-label distance weight methodij:
Calculation of w by entropy label distance weight methodij:
Unseen examples belong to category yjConfidence of (c) (x, y)j) The calculation formula is as follows:
and selecting 0.5 as a decision threshold, and returning the class with the highest confidence as the class to which the unseen instance belongs when the confidence of the unseen instance belonging to each class is less than the decision threshold.
Examples
As shown in fig. 1, the steps of the present invention are:
step one, crawling a required referee document original text data set from the Internet by using a crawler technology based on a jsup, and randomly dividing the referee document original text data set into a training set and a testing set according to a ratio of 7: 3. Then, preprocessing the official document, and mainly completing the following work:
extracting case facts and applicable legal provisions thereof from the case facts according to a literary structure of a referee document, wherein the case facts are used for generating feature vectors of case samples, and the applicable legal provisions are used for representing category labels of the case samples, and converting an original text data set into a semi-structured multi-label training set and a semi-structured testing set;
correcting errors and format inconsistency in case-applicable legal provisions;
and utilizing a language technology platform LTP of the Hadamard to perform word segmentation and part of speech tagging on the case fact description.
Expanding legal provisions corresponding to case facts of all samples based on a hierarchical structure of a label space, so that a category label corresponding to each case fact is a subset of the label space and meets the hierarchical limitation;
step three, performing feature selection on the word segmentation result obtained from the original training set in the step 1, and selecting feature words capable of sufficiently representing case facts to construct feature vectors; obtaining a structured extended multi-label training set Tr and a test set Te through text representation;
step four, constructing a prediction model: firstly, finding out k neighbor sample sets N (x) of unseen examples x from an extended multi-label test set Te in an extended multi-label training set Tr, setting weight for each neighbor sample, calculating confidence coefficients of the unseen examples belonging to each category in a label space according to the classification weight of the k neighbor samples to each category in the label space, predicting category label sets h (x) of the unseen examples, and h (x) meeting the hierarchical constraint. And finally, removing the hierarchical restriction in the prediction type set h (x) (namely the inverse process of label expansion) according to the tree structure of the label space to obtain the specific applicable legal provisions of the unseen examples.
The implementation data is obtained from official documents of people's court at all levels of Zhejiang province published by Zhejiang court.
FIG. 2 is a sample official document in which the straight underline marked portion is the case fact portion and the curved underline marked portion is the applicable legal provisions for the case. And extracting case facts and legal provisions thereof according to the law of the official document. The pretreatment work is mainly the cleaning and the correction of the applicable legal part of the case.
In fig. 3, a tree structure of a legal provision tag space is shown. Based on the hierarchical structure, the legal provision corresponding to each case fact is subjected to label expansion.
Fig. 4 is a legal provision combination histogram. According to the frequency of citation of each legal provision, 26 laws such as ' people's republic of China ' litigation law, ' people's republic of China ' and ' 451 specific legal provisions contained in the laws are selected as category labels to form a label space, namely, the dimension of the label space is 477. The set of category labels for each case sample is represented in the form of a label vector, each dimension of which represents a category label in the label space, i.e., a complete legal provision. If a case is applicable to a certain legal provision, the corresponding label entry values of the legal provision and all legal provisions containing the legal provision in the label vector are both 1, otherwise, the corresponding label entry values are 0. Therefore, the label vector of each sample corresponds to one legal provision combination, the frequency of occurrence of each combination is the number of corresponding case samples, and the frequency of occurrence of each legal provision combination can also reflect some properties of the case sample set. By calculating each, and selecting the combination with higher frequency of occurrence and arranging it in order from large to small, fig. 4 can be obtained. As can be seen from the figure, the occurrence frequency of the legal provision combinations is approximately in a long tail distribution, the occurrence frequency of a few legal provision combinations is extremely high, which indicates that a large number of case samples are suitable for the legal provision combinations, and in addition, the occurrence frequency of most legal provision combinations is relatively balanced.
And step three, selecting an information gain algorithm to select characteristics. Through calculating the information gain of each feature word, it can be found that most words with higher information gain are verbs or nouns, and table 2 shows the proportion of verbs and nouns in the feature words with the highest information gain value, so that the nouns and verbs have higher distinguishing capability in the problem of legal identification compared with words with other properties, and on the other hand, the words except the verbs and nouns in the text can be removed through part-of-speech tagging, so that the number of words in the text is reduced, and the subsequent calculation is simplified.
Table 2 ratio of verb nouns in feature words:
number of feature words | Verb noun number ratio | Verb noun information gain total proportion |
100 | 88.0% | 87.9% |
200 | 80.0% | 82.3% |
300 | 81.0% | 82.5% |
400 | 80.5% | 82.0% |
500 | 76.8% | 79.7% |
Table 3 summary of experimental training set and test set:
number of samples | Sample average class label number | |
Training set | 102608 | 7.6344 |
Test set | 44210 | 7.6397 |
Fig. 5 and 6 are comparisons of the performance of the hierarchical index and the macro-average index when different numbers of neighbors are taken.
As can be seen from fig. 5: when the number of the neighbors is an even number, the precision of the algorithm is high, and the recall rate is low; when the number of the neighbors is odd, the precision of the algorithm is low, and the recall rate is high. This distinction becomes progressively smaller as the number of neighbors increases. This phenomenon can be explained by analyzing the principles of the algorithm: the decision threshold set by the algorithm is 0.5, and when the number of neighbors is even, only the class label with the occurrence frequency exceeding k-2 is predicted as the class label of the unseen instance due to the addition of the smoothing parameter, and the class label with the occurrence frequency just equal to k-2 is not endowed with the unseen instance. Therefore, when the number of neighbors is even, the condition that each class label endows the unseen instance is more severe, so that the prediction precision of the algorithm is higher, and the recall rate is correspondingly lower. This effect is gradually reduced as the number of neighbors increases, and thus the difference becomes smaller. It can also be seen from the figure that when the target category is all legal provisions, each prediction index of the algorithm is higher than that when the target category is a specific legal provision. This is because broader legal categories contain more case samples, thus making the model more predictive in these categories. In summary, when the k value of the number of neighbors is 5, the comprehensive prediction performance of the algorithm is the best.
From fig. 6 it can be found that: as the number of neighbors increases, the macro-average precision, recall ratio and F metric value of the algorithm all decrease. The reason for this may be that as the number of neighbors increases, it is more difficult for the classes with fewer samples to reach the decision threshold, thus leading to a decrease in the prediction performance of most classes and ultimately to a decrease in the corresponding macro average performance.
Fig. 7 shows the performance of the algorithm on each evaluation index when the number of the fixed neighbors is 5 and the sample weight strategy is a full label distance weight method and an entropy label distance weight method, respectively. In summary, regardless of hierarchical indexes or macro-average indexes, the entropy label distance weighting strategy can achieve better effect on precision, and the full label distance weighting strategy can achieve better effect on recall rate and F metric value. The entropy label weight strategy is biased to samples with fewer class labels, and in the expanded hierarchical multi-label samples, the more specific the class to which the sample belongs, the more class labels, the smaller the classification weight under the entropy label weight strategy, so that the prediction result is more biased to the upper class by adopting the entropy label weight strategy, and the larger the generalization error. Although the algorithm has a decline in performance when the target category is a specific legal provision, there is still a hierarchical accuracy close to 80% and a hierarchical recall rate of more than 65%, indicating that case-applicable legal identification based on the present hierarchical multi-label classification algorithm is valid.
In consideration of the two cases that the target category is all legal provisions and specific legal provisions, the macro average precision, recall rate and F metric value of the algorithm when the target category is all legal provisions are respectively represented by mP _ all, mP _ all and mP _ all, and the macro average precision, recall rate and F metric value of the algorithm when the target category is specific legal provisions are represented by mP _ partial, mP _ partial and mP _ partial.
Two common hierarchical multi-label classification algorithms, namely a TreeBoost.MH local algorithm and a Clus-HMC global algorithm, are selected respectively in the implementation and are compared with the prediction performance of the hierarchical multi-label classification algorithm, the performance comparison of the hierarchical multi-label classification algorithm on each hierarchical index is given in a table 5, and the prediction performance comparison of the hierarchical multi-label classification algorithm on each macro-average index is given in a table 6.
Table 5 comparison of hierarchical index performance of each algorithm:
table 6 macro-average performance comparison of algorithms:
the fact proves that the multi-label classification algorithm of the level can achieve better prediction performance than the existing method. By combining the characteristic that the Lazy-HMC algorithm supports incremental learning, an effective and applicable automatic case law identification system can be constructed by utilizing the Lazy-HMC algorithm.
The present invention provides a hierarchical multi-label classification method suitable for legal identification, and a plurality of methods and ways for implementing the technical scheme, and the above description is only a preferred embodiment of the present invention, it should be noted that, for those skilled in the art, a plurality of improvements and decorations can be made without departing from the principle of the present invention, and these improvements and decorations should also be regarded as the protection scope of the present invention. All the components not specified in the present embodiment can be realized by the prior art.
Claims (2)
1. A hierarchical multi-label classification method suitable for legal identification is characterized by comprising the following steps:
step 1, acquiring an original text data set of a referee document, dividing the original text data set into a training sample set and a testing sample set, and preprocessing: extracting case facts and applicable legal provisions thereof from the case facts according to a literary structure of a referee document, wherein the case facts are used for generating feature vectors of case samples, the case samples comprise training samples and testing samples, the applicable legal provisions are used for representing class labels of the case samples, and an original text data set is converted into a semi-structured multi-label training sample set and a multi-label testing sample set; performing word segmentation and part-of-speech tagging on case fact description;
step 2, expanding legal provisions corresponding to case facts of all case samples based on a hierarchical structure of a label space formed by category labels in a multi-label training sample set, so that the category label corresponding to each case fact is a subset of the label space and meets the hierarchical limitation;
step 3, performing feature selection on the word segmentation result in the step 1, and selecting feature words capable of representing case facts to construct feature vectors; obtaining a structured extended multi-label training sample set Tr and an extended multi-label testing sample set Te through text representation;
step 4, constructing a prediction model: finding k neighbor sample sets N' (x) of unseen examples x from an extended multi-label test sample set Te in an extended multi-label training sample set Tr, wherein the unseen examples are the case facts to be classified, setting weight for each neighbor sample, calculating confidence coefficients of the unseen examples x belonging to each class according to the classification weight of the k neighbor samples to each class, predicting class label sets h (x) of the unseen examples x, wherein h (x) meets the hierarchical constraint, and finally removing the hierarchical constraint in the class label sets h (x) of the unseen examples x according to the tree structure of a label space to obtain the specific applicable legal provisions of the unseen examples;
in the step 1, randomly dividing an original text data set of a referee document into a training sample set and a testing sample set according to the proportion of 7: 3;
the step 2 comprises the following steps:
step 2-1, in the hierarchical multi-label classification problem, a d-dimensional instance space is given Is a real number set and contains q classes of label space Y ═ Y1,y2,…,yq},yvThe label of the v-th class is represented, and v is more than or equal to 1 and less than or equal to q, then the class label space hierarchy uses the binary groupIt is shown that,shows the partial order relationship of the category label if y existsv,yuIs epsilon of Y andthen the category label yvBelongs to category label yu,yvIs yuDescendant class tag of, yuIs yvThe classifier must satisfy the hierarchical constraint on the set of class labels h (x) predicted to miss instance x, i.e.,and y '< y': y ', (x) where y' is the class label in h (x) and y 'is an ancestor class label for y';
step 2-2, for each multi-labeled case sample (x)i,hi) I is more than or equal to 1 and less than or equal to m, m is the number of all the obtained referee document samples, xie.X' is a feature vector with d dimension for representing the fact part of the case,is equal to xiA corresponding set of class labels, i.e. xiCorresponding legal provision, the expanded category label set is hi′,Then h isiIn which h is includediAll class labels in (1) and all ancestor class labels thereof:
wherein y' is hiThe ancestor class label of class label y in (1), y ∈ hi;
The step 3 comprises the following steps:
step 3-1, converting multi-label sample data intoProcessing the single-label sample data: for each multi-labeled case sample (x)i,hi) (i is more than or equal to 1 and less than or equal to m) using | hiI represents a category label set h of a multi-labeled case sampleiThe number of label categories in the label list is replaced by | hiL single-label case sample (x)i′,yi′)(1≤i′≤|hi|,yi′∈hi) Class label y for each single label samplei′I.e. the category label set hiA category label of; the multi-label case samples comprise multi-label training samples and multi-label testing samples;
step 3-2, through the conversion process of step 3-1, the multi-label case samples are converted into a plurality of single-label case samples, the case fact part of each single-label case sample is regarded as a document with words already segmented, the case fact parts of all single-label case samples form a document set, and the feature weight tf-idf of the jth 'dimension feature in the ith' document in the document seti″j″The definition is as follows:
wherein tf-idfi″j″Representation feature word tj″In document di″Frequency of occurrence of idfj″Representation feature word tj″Inverse document frequency in a document collection, N represents the total number of documents in the document collection, Nj″Representation feature word tj″Frequency of documents in a document set, i.e. occurrence of a characteristic word t in a document setj″The denominator of the number of documents is a normalization factor;
3-3, performing feature selection on the word segmentation result in the step 1 by using an information gain algorithm or a chi-square statistical algorithm, and selecting a certain number of feature words with distinguishing capability to form a feature space;
and (3) selecting features by adopting an information gain algorithm: the information gain ig (t) of the feature word t is defined as follows:
wherein, p (y)v) Presentation category label yvProbability of occurrence, p (t) represents probability of occurrence of feature word t, p (y)vI t) indicates the category label y on the premise that the feature word t appearsvThe probability of occurrence of the event is,indicates the probability that the feature word t does not occur,indicating a category label y on the premise that the feature word t does not appearvCalculating the information gain of each feature word in the document set according to the occurrence probability, wherein the feature words with the information gain value lower than a set threshold value are not included in a feature space;
and (3) selecting features by adopting a chi-square statistical algorithm:
let A be a label y containing a feature word t and belonging to a categoryv1 < v < q, B is a label y containing a feature word t but not belonging to a categoryvC is a label y belonging to a category without including a feature word tvD is a label y which does not contain the feature word t and does not belong to the categoryvN is the total number of documents in the document set, then the characteristic word t and the category label yvChi square statistic of2(t,yv) Is defined as:
characteristic word t and category yvWhen independent, the chi-square statistic is 0, the chi-square statistic about each category is calculated for one feature word, and then the mean χ is calculated respectively2 avg(t) and maximum valueBy Chi2 avg(t)Andcomprehensively considering, selecting a certain number of characteristic words with distinguishing capability, wherein p (y)v) Presentation category label yvProbability of occurrence:
in step 4, when k neighbors are found, no example x and no neighbor sample (x) in the extended multi-label training sample set are founda,ha) Distance d (x, x)a) Wherein (x)a,ha)∈N′(x),1≤a≤k,haIs xaAnd calculating the reciprocal of cosine similarity of the feature vectors of the corresponding class labels, wherein the cosine similarity cos (gamma, lambda) of the feature vector gamma of the example x and the feature vector lambda of the adjacent sample is calculated by the following formula:
where S denotes the index of the component of the feature vector, i.e. the position of the component in the feature vector, S denotes the dimension of the feature vector, γsDenotes the s-th component, λ, of the feature vector γsRepresenting the s-th component of the feature vector lambda.
2. The method of claim 1, wherein: in step 4, d (x, x) is useda) Indicating that example x is not found and the neighboring samples (x) in the extended multi-labeled training sample seta,ha) The distance of the extended multi-label training sample set is calculated by adopting a full label distance weight method or an entropy label distance weight method to haClass label y in (1)jBy classification weight waj,1≤j≤q;
Calculation of w by full-label distance weight methodaj:
Calculation of w by entropy label distance weight methodaj:
Unseen instance x belongs to the class label yjConfidence of (c) (x, y)j) The calculation formula is as follows:
wherein warRepresents haR class label yrThe classification weight of (2);
the class label set h (x) for the prediction of missing instance x is:
and selecting 0.5 as a decision threshold, and when the confidence of the unseen example x belonging to each class label is smaller than the decision threshold, returning the class label with the maximum confidence as the class label of the unseen example.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710832304.8A CN107577785B (en) | 2017-09-15 | 2017-09-15 | Hierarchical multi-label classification method suitable for legal identification |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710832304.8A CN107577785B (en) | 2017-09-15 | 2017-09-15 | Hierarchical multi-label classification method suitable for legal identification |
Publications (2)
Publication Number | Publication Date |
---|---|
CN107577785A CN107577785A (en) | 2018-01-12 |
CN107577785B true CN107577785B (en) | 2020-02-07 |
Family
ID=61035969
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201710832304.8A Active CN107577785B (en) | 2017-09-15 | 2017-09-15 | Hierarchical multi-label classification method suitable for legal identification |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN107577785B (en) |
Families Citing this family (43)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108304386A (en) * | 2018-03-05 | 2018-07-20 | 上海思贤信息技术股份有限公司 | A kind of logic-based rule infers the method and device of legal documents court verdict |
CN108334500B (en) * | 2018-03-05 | 2022-02-22 | 上海思贤信息技术股份有限公司 | Referee document labeling method and device based on machine learning algorithm |
CN110245907A (en) * | 2018-03-09 | 2019-09-17 | 北京国双科技有限公司 | The generation method and device of court's trial notes content |
CN108664924B (en) * | 2018-05-10 | 2022-07-08 | 东南大学 | Multi-label object identification method based on convolutional neural network |
CN108763361A (en) * | 2018-05-17 | 2018-11-06 | 南京大学 | A kind of multi-tag taxonomy model method based on topic model |
CN110895703B (en) * | 2018-09-12 | 2023-05-23 | 北京国双科技有限公司 | Legal document case recognition method and device |
CN110909157B (en) * | 2018-09-18 | 2023-04-11 | 阿里巴巴集团控股有限公司 | Text classification method and device, computing equipment and readable storage medium |
CN111126053B (en) * | 2018-10-31 | 2023-07-04 | 北京国双科技有限公司 | Information processing method and related equipment |
CN109543178B (en) * | 2018-11-01 | 2023-02-28 | 银江技术股份有限公司 | Method and system for constructing judicial text label system |
CN109685158B (en) * | 2019-01-08 | 2020-10-16 | 东北大学 | Clustering result semantic feature extraction and visualization method based on strong item set |
CN109919368B (en) * | 2019-02-26 | 2020-11-17 | 西安交通大学 | Law recommendation prediction system and method based on association graph |
CN109961094B (en) * | 2019-03-07 | 2021-04-30 | 北京达佳互联信息技术有限公司 | Sample acquisition method and device, electronic equipment and readable storage medium |
CN110046256A (en) * | 2019-04-22 | 2019-07-23 | 成都四方伟业软件股份有限公司 | The prediction technique and device of case differentiation result |
CN110163849A (en) * | 2019-04-28 | 2019-08-23 | 上海鹰瞳医疗科技有限公司 | Training data processing method, disaggregated model training method and equipment |
CN110245229B (en) * | 2019-04-30 | 2023-03-28 | 中山大学 | Deep learning theme emotion classification method based on data enhancement |
CN110135592B (en) * | 2019-05-16 | 2023-09-19 | 腾讯科技(深圳)有限公司 | Classification effect determining method and device, intelligent terminal and storage medium |
CN110287287B (en) * | 2019-06-18 | 2021-11-23 | 北京百度网讯科技有限公司 | Case prediction method and device and server |
CN110347839B (en) * | 2019-07-18 | 2021-07-16 | 湖南数定智能科技有限公司 | Text classification method based on generative multi-task learning model |
CN110633365A (en) * | 2019-07-25 | 2019-12-31 | 北京国信利斯特科技有限公司 | Word vector-based hierarchical multi-label text classification method and system |
CN110442722B (en) * | 2019-08-13 | 2022-05-13 | 北京金山数字娱乐科技有限公司 | Method and device for training classification model and method and device for data classification |
CN110543634B (en) * | 2019-09-02 | 2021-03-02 | 北京邮电大学 | Corpus data set processing method and device, electronic equipment and storage medium |
CN110825879B (en) * | 2019-09-18 | 2024-05-07 | 平安科技(深圳)有限公司 | Decide a case result determination method, device, equipment and computer readable storage medium |
CN110751188B (en) * | 2019-09-26 | 2020-10-09 | 华南师范大学 | User label prediction method, system and storage medium based on multi-label learning |
CN110851596B (en) * | 2019-10-11 | 2023-06-27 | 平安科技(深圳)有限公司 | Text classification method, apparatus and computer readable storage medium |
CN110968693A (en) * | 2019-11-08 | 2020-04-07 | 华北电力大学 | Multi-label text classification calculation method based on ensemble learning |
CN110837735B (en) * | 2019-11-17 | 2023-11-03 | 内蒙古中媒互动科技有限公司 | Intelligent data analysis and identification method and system |
US11379758B2 (en) | 2019-12-06 | 2022-07-05 | International Business Machines Corporation | Automatic multilabel classification using machine learning |
CN111143569B (en) * | 2019-12-31 | 2023-05-02 | 腾讯科技(深圳)有限公司 | Data processing method, device and computer readable storage medium |
CN110781650B (en) * | 2020-01-02 | 2020-04-14 | 四川大学 | Method and system for automatically generating referee document based on deep learning |
CN111540468B (en) * | 2020-04-21 | 2023-05-16 | 重庆大学 | ICD automatic coding method and system for visualizing diagnostic reasons |
CN111738303B (en) * | 2020-05-28 | 2023-05-23 | 华南理工大学 | Long-tail distribution image recognition method based on hierarchical learning |
CN111723208B (en) * | 2020-06-28 | 2023-04-18 | 西南财经大学 | Conditional classification tree-based legal decision document multi-classification method and device and terminal |
CN111930944B (en) * | 2020-08-12 | 2023-08-22 | 中国银行股份有限公司 | File label classification method and device |
CN112464973B (en) * | 2020-08-13 | 2024-02-02 | 浙江师范大学 | Multi-label classification method based on average distance weight and value calculation |
CN112016430B (en) * | 2020-08-24 | 2022-10-11 | 郑州轻工业大学 | Hierarchical action identification method for multi-mobile-phone wearing positions |
CN111737479B (en) * | 2020-08-28 | 2020-11-17 | 深圳追一科技有限公司 | Data acquisition method and device, electronic equipment and storage medium |
CN112182213B (en) * | 2020-09-27 | 2022-07-05 | 中润普达(十堰)大数据中心有限公司 | Modeling method based on abnormal lacrimation feature cognition |
CN112131884B (en) * | 2020-10-15 | 2024-03-15 | 腾讯科技(深圳)有限公司 | Method and device for entity classification, method and device for entity presentation |
CN112232524B (en) * | 2020-12-14 | 2021-06-29 | 北京沃东天骏信息技术有限公司 | Multi-label information identification method and device, electronic equipment and readable storage medium |
CN113407727B (en) * | 2021-03-22 | 2023-01-13 | 天津汇智星源信息技术有限公司 | Qualitative measure and era recommendation method based on legal knowledge graph and related equipment |
CN114117040A (en) * | 2021-11-08 | 2022-03-01 | 重庆邮电大学 | Text data multi-label classification method based on label specific features and relevance |
CN114860892B (en) * | 2022-07-06 | 2022-09-06 | 腾讯科技(深圳)有限公司 | Hierarchical category prediction method, device, equipment and medium |
CN117216688B (en) * | 2023-11-07 | 2024-01-23 | 西南科技大学 | Enterprise industry identification method and system based on hierarchical label tree and neural network |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104199857A (en) * | 2014-08-14 | 2014-12-10 | 西安交通大学 | Tax document hierarchical classification method based on multi-tag classification |
CN104881689A (en) * | 2015-06-17 | 2015-09-02 | 苏州大学张家港工业技术研究院 | Method and system for multi-label active learning classification |
CN105868773A (en) * | 2016-03-23 | 2016-08-17 | 华南理工大学 | Hierarchical random forest based multi-tag classification method |
CN106126972A (en) * | 2016-06-21 | 2016-11-16 | 哈尔滨工业大学 | A kind of level multi-tag sorting technique for protein function prediction |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20150161198A1 (en) * | 2013-12-05 | 2015-06-11 | Sony Corporation | Computer ecosystem with automatically curated content using searchable hierarchical tags |
-
2017
- 2017-09-15 CN CN201710832304.8A patent/CN107577785B/en active Active
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104199857A (en) * | 2014-08-14 | 2014-12-10 | 西安交通大学 | Tax document hierarchical classification method based on multi-tag classification |
CN104881689A (en) * | 2015-06-17 | 2015-09-02 | 苏州大学张家港工业技术研究院 | Method and system for multi-label active learning classification |
CN105868773A (en) * | 2016-03-23 | 2016-08-17 | 华南理工大学 | Hierarchical random forest based multi-tag classification method |
CN106126972A (en) * | 2016-06-21 | 2016-11-16 | 哈尔滨工业大学 | A kind of level multi-tag sorting technique for protein function prediction |
Also Published As
Publication number | Publication date |
---|---|
CN107577785A (en) | 2018-01-12 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN107577785B (en) | Hierarchical multi-label classification method suitable for legal identification | |
CN110825877A (en) | Semantic similarity analysis method based on text clustering | |
CN107798033B (en) | Case text classification method in public security field | |
CN112256939B (en) | Text entity relation extraction method for chemical field | |
CN108009135B (en) | Method and device for generating document abstract | |
CN112632228A (en) | Text mining-based auxiliary bid evaluation method and system | |
CN111832289A (en) | Service discovery method based on clustering and Gaussian LDA | |
WO2020063071A1 (en) | Sentence vector calculation method based on chi-square test, and text classification method and system | |
Joshi et al. | Categorizing the document using multi class classification in data mining | |
CN116501875B (en) | Document processing method and system based on natural language and knowledge graph | |
CN114265935A (en) | Science and technology project establishment management auxiliary decision-making method and system based on text mining | |
CN115952292B (en) | Multi-label classification method, apparatus and computer readable medium | |
CN112836029A (en) | Graph-based document retrieval method, system and related components thereof | |
Gao et al. | A maximal figure-of-merit (MFoM)-learning approach to robust classifier design for text categorization | |
Ikram et al. | Arabic text classification in the legal domain | |
CN116128544A (en) | Active auditing method and system for electric power marketing abnormal business data | |
Alsaidi et al. | English poems categorization using text mining and rough set theory | |
CN113590827B (en) | Scientific research project text classification device and method based on multiple angles | |
Abdollahpour et al. | Image classification using ontology based improved visual words | |
CN112270189B (en) | Question type analysis node generation method, system and storage medium | |
Balaneshin-kordan et al. | Sequential query expansion using concept graph | |
Hamdi et al. | Machine learning vs deterministic rule-based system for document stream segmentation | |
Xiao et al. | Revisiting table detection datasets for visually rich documents | |
Wang et al. | A Method of Hot Topic Detection in Blogs Using N-gram Model. | |
Zhang et al. | Extending associative classifier to detect helpful online reviews with uncertain classes |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |