CN113254636A - Remote supervision entity relationship classification method based on example weight dispersion - Google Patents
Remote supervision entity relationship classification method based on example weight dispersion Download PDFInfo
- Publication number
- CN113254636A CN113254636A CN202110456426.8A CN202110456426A CN113254636A CN 113254636 A CN113254636 A CN 113254636A CN 202110456426 A CN202110456426 A CN 202110456426A CN 113254636 A CN113254636 A CN 113254636A
- Authority
- CN
- China
- Prior art keywords
- packet
- training
- weight
- follows
- package
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/35—Clustering; Classification
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/36—Creation of semantic tools, e.g. ontology or thesauri
- G06F16/367—Ontology
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/047—Probabilistic or stochastic networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Life Sciences & Earth Sciences (AREA)
- Computational Linguistics (AREA)
- General Health & Medical Sciences (AREA)
- Mathematical Physics (AREA)
- Biophysics (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- Biomedical Technology (AREA)
- Artificial Intelligence (AREA)
- Evolutionary Computation (AREA)
- Software Systems (AREA)
- Health & Medical Sciences (AREA)
- Databases & Information Systems (AREA)
- Probability & Statistics with Applications (AREA)
- Animal Behavior & Ethology (AREA)
- Data Exchanges In Wide-Area Networks (AREA)
Abstract
The invention relates to a remote supervision entity relation classification method based on example weight dispersion, which is characterized in that sentence examples generated based on a remote supervision method are packaged, and a feature vector of each example is obtained through a segmented convolution network; calculating the relevance weight of the example and the packet in which the example is positioned by using an attention mechanism; calculating the average value and the standard deviation of all the example relevance weights in the package, and updating the example relevance weights according to the design threshold value of the average value and the standard deviation; combining the example feature vectors in the package into a package feature vector according to the example updated relevance weights; inputting the packet feature vectors into a classifier to obtain a classification result of the packet, comparing the classification result with the labels of the packet, and calculating a loss function; if the F1 value of the continuous three-wheel training is not lifted or the training of the current wheel reaches the preset training times, finishing the training; otherwise, updating the parameters according to the loss function, and performing the next round of training. The method provided by the invention reduces the influence of the error labeling example on model training in remote supervision, and improves the accuracy of the remote supervision entity relationship classification model.
Description
Technical Field
The invention relates to the field of multi-instance learning and information extraction, in particular to a remote supervision entity relationship classification method based on instance weight dispersion.
Background
The entity relationship classification is one of the most important tasks of information extraction, and aims to allocate a predefined relationship type for an entity pair according to context semantics on the basis of labeling a text entity, and the method can be divided into a supervised method, a semi-supervised method, an unsupervised method and a remote supervision method.
The supervised entity relationship classification method requires a large number of accurately labeled data sets, and consumes manpower; the semi-supervised method is sensitive to given seeds and has semantic drift problem, and the accuracy rate is low; the unsupervised method utilizes corpus information clustering to define the relationship on the basis of clustering results, and the method has the problems of difficulty in describing the relationship and low recall rate of low-frequency instances.
The current mainstream method is remote supervision relation classification, a structured knowledge base is aligned with unstructured texts, and a labeling data set is automatically generated for model training, so that a large amount of labor cost is avoided. The method assumes that if two entities have some relationship in the knowledge base, then a sentence containing both entities expresses this relationship. However, the above assumptions may lead to the example of mislabeling, i.e., a sentence containing a target entity pair does not actually describe the relationship type of the entity pair in the knowledge base. Thus, remote supervised entity relationship classification can reduce the impact of mislabeled data in conjunction with multi-instance learning.
Multiple Instance Learning (MIL) proposes the concept of a package, which is defined as a collection of Multiple instances. The input of the model is not an example with a single category label, but a plurality of labeled packets, each packet is a positive example packet (positive packet) as long as at least one positive example is contained in the packet, and is negative otherwise. In the extraction of the relationship between the remote supervising entities, when a certain relationship exists between a pair of entities, at least one example in the package formed by the pair of entities can express the relationship, and because of the assumption, the example in the package which does not describe the relationship correctly can cause serious interference to the model training. Therefore, how to solve the influence of the error labeling examples in the package on the model training provides powerful support for the application of multi-example learning in other fields, and becomes a technical problem which needs to be solved urgently.
Disclosure of Invention
The invention mainly aims to overcome the defects of the prior art and provide a remote supervision entity relation classification method based on example weight dispersion, wherein a relevance weight threshold is designed by using a mean value and a standard deviation of example weights in a package, then the relevance weights of the examples in the package are updated according to the threshold, and the examples with smaller relevance weights and larger dispersion with the mean value are filtered, so that the influence of error labeling examples on a model is reduced, and powerful support is provided for the application of multi-example learning in other fields.
In order to achieve the purpose, the invention adopts the following technical scheme:
a remote supervision entity relation classification method based on example weight dispersion comprises the following steps:
step 1, packing sentence examples generated based on a remote supervision method, and obtaining a feature vector of each example through a segmented convolution network;
step 2, calculating the relevance weight of the example and the package in which the example is positioned by using an attention mechanism;
step 3, calculating the average value and standard deviation of all example correlation weights in the package, and updating the example correlation weights according to the design threshold value;
step 4, combining the example feature vectors in the package into a package feature vector according to the example updated relevance weight;
step 5, inputting the packet feature vectors into a classifier to obtain a classification result of the packet, comparing the classification result with the labels of the packet, and calculating a loss function;
step 6, if the F1 value of the continuous three-wheel training is not lifted or the training of the current wheel reaches the preset training times, finishing the training;
otherwise, updating the parameters according to the loss function, and performing the next round of training.
Preferably, in step 1, the sentence examples generated based on the remote supervision method are packaged, and the feature vector of each example is obtained through a segmented convolution network, and the specific steps are as follows:
(1-1) putting instances containing the same entity pair into the same set, forming a multi-instance package, and constructing a remote supervision data setnsIs the number of packets, BL={S1,S2,...,SmIs a multi-instance packet in the dataset, L e [1, ns]M is the number of instances in the packet, SiFor each example in a package, i ∈ [1, m]。
(1-2) obtaining feature vectors b of examples in a packet through a segmented convolutional networkl={s1,s2,...,sm},l∈[1,ns],sjFor each example feature vector, j ∈ [1, m]。
Preferably, in the step 2, the relevance weight of the example and the package where the example is located is calculated by using an attention mechanism, and the specific steps are as follows:
(2-1) performing inner product on the feature vector of each example in the packet and the packet label vector, wherein the result is used as the correlation weight of the example and the packet, namely the example weight, and the specific calculation formula is as follows:
ej=sjAq (1)
wherein s isjFor the example feature vector output in step 1, a represents a weight parameter matrix, q represents a query vector, and is used for querying the feature vector corresponding to the relationship label from a;
(2-2) normalizing the example weights, wherein a specific calculation formula is as follows:
wherein k is ∈ [1, m ].
Preferably, in step 3, the average and standard deviation of all the example correlation weights in the package are calculated, and the example correlation weights are updated according to the design threshold, and the specific steps are as follows:
(3-1) outputting a threshold value for calculating the correlation weight according to the step 2, wherein the specific calculation formula is as follows:
wherein M is an example weight correlation mean, δ is a standard deviation, and M- δ is a correlation weight threshold;
(3-2) updating the example relevance weight according to a threshold, if the example relevance weight is smaller than the threshold, updating the relevance weight to be 0, otherwise, keeping the relevance weight, wherein the specific calculation formula is as follows:
(3-3) normalizing the updated weight, wherein the specific calculation formula is as follows:
preferably, in the step 4, the example feature vectors in the packet are combined into a packet feature vector according to the example relevance weight, and the specific steps are as follows:
and (3) obtaining the feature vector of the packet according to the example relevance weight output in the step (3), wherein a specific calculation formula is as follows:
wherein x istRepresenting a 230-dimensional package feature vector, t ∈ [1, ns],βkExemplary correlation weights, s, output for step (3-3)kAnd (5) illustrating the feature vectors in the packet output in the step (1-2).
Preferably, in the step 5, the packet feature vector is input into the classifier to obtain a classification result of the packet, and the classification result is compared with the label thereof to calculate the loss function. The classifier comprises a full connection layer and a normalization layer, and the specific steps are as follows:
(5-1) mapping the packet feature vector output in the step 4 into a 53-dimensional vector through a full connection layer of a classifier, wherein a specific calculation formula is as follows:
Ot=Wxt+d (8)
wherein the content of the first and second substances,is the output vector of the full connection layer, and represents the score of each relation type, nrNumber of relationship types, W is a trainable parameter matrix, xtAnd d is a bias vector, wherein the packet feature vector is output in the step (4-1).
(5-2) normalizing the result of the step (5-1) and outputting probability distribution of each relation category, wherein the specific calculation formula is as follows:
wherein, BLFor the example package described in step (1-1), rL∈[1,nr]Is a bag BLThe number corresponding to the type of the relationship is assigned, theta represents a trainable parameter set of the model and comprises a segmented convolution network in the step (1-2), a weight matrix in the step (2-1), a full link layer parameter in the step (5-1), and ocRepresents the score of the package on the c-th relationship type, c ∈ [1, nr]。
(5-3) calculating a loss function for updating the model parameters, wherein the calculation formula is as follows:
preferably, in the step 6, it is determined whether the iterative training of the model is required to be continued, and the specific steps are as follows:
if the F1 value of the continuous three-wheel training is not lifted or the training of the current wheel reaches the preset training times, finishing the training; otherwise, updating the parameter set theta according to the loss function, and performing the next round of training, wherein the parameter updating calculation formula is as follows:
Compared with the prior art, the invention has the following obvious and prominent substantive characteristics and remarkable technical progress:
1. according to the method, the example weight threshold is designed through the mean value and the standard deviation of the example correlation weights in the package, the example weights are updated based on the threshold, and then the examples with smaller weights and larger dispersion degree with the mean value are filtered, so that the influence of wrong labeling data in multi-example learning on model training is reduced;
2. the method improves the calculation method of the example relevance weight in the package based on the attention mechanism, and improves the accuracy of the remote supervision entity relationship classification model.
Drawings
FIG. 1 is a flow of remote supervised entity relationship classification based on example weight dispersion.
FIG. 2 is a comparison of the PR curves of the experimental results of the method of the present invention with other methods.
FIG. 3 is an example weight calculation process based on mean and standard deviation.
Detailed Description
In order to make the technical problems, technical solutions and advantageous effects to be solved by the embodiments of the present invention more clearly apparent, the present invention is further described in detail below with reference to the accompanying drawings and the embodiments. It should be emphasized that the specific embodiments described herein are merely illustrative of the invention and are not limiting.
The preferred embodiments of the present invention are described in detail below with reference to the accompanying drawings:
the first embodiment is as follows:
in this embodiment, referring to fig. 1, a method for remote supervised entity relationship classification based on example weight dispersion includes the following steps:
step 1, packing sentence examples generated based on a remote supervision method, and obtaining a feature vector of each example through a segmented convolution network;
step 2, calculating the relevance weight of the example and the package in which the example is positioned by using an attention mechanism;
step 3, calculating the average value and standard deviation of all example correlation weights in the package, and updating the example correlation weights according to the design threshold value;
step 4, combining the example feature vectors in the package into a package feature vector according to the example updated relevance weight;
step 5, inputting the packet feature vectors into a classifier to obtain a classification result of the packet, comparing the classification result with the labels of the packet, and calculating a loss function;
step 6, if the F1 value of the continuous three-wheel training is not lifted or the training of the current wheel reaches the preset training times, finishing the training;
otherwise, updating the parameters according to the loss function, and performing the next round of training.
In the remote supervision entity relationship classification method based on the example weight dispersion, the method designs a relevance weight threshold by using a mean value and a standard deviation of example weights in a package, then updates the relevance weights of the examples in the package according to the threshold, and filters the examples with smaller relevance weights and larger mean value dispersion, so as to reduce the influence of a wrong labeling example on a model.
Example two:
in the foregoing embodiment, a remote supervising entity relationship classification method based on example weight dispersion includes the following steps:
step 1, packing examples in a training set generated based on a remote supervision method, and obtaining a feature vector of each example through a segmented convolution network. The specific process is as follows:
(1-1) putting instances containing the same entity pair into the same set, forming a multi-instance package, and constructing a remote supervision data setnsIs the number of packets, BL={S1,S2,...,SmIs a multi-instance packet in the dataset, L e [1, ns]M is the number of instances in the packet, SiFor each example in a package, i ∈ [1, m]。
(1-2) obtaining feature vectors b of examples in a packet through a segmented convolutional networkl={s1,s2,...,sm},l∈[1,ns],sjFor each example feature vector, j ∈ [1, m]。
And 2, calculating the relevance weight of the example and the package in which the example is positioned by using the attention mechanism. The specific process is as follows:
(2-1) performing inner product on the feature vector of each example in the packet and the packet label vector, wherein the result is used as the correlation weight of the example and the packet, namely the example weight, and the specific calculation formula is as follows:
ej=sjAq (1)
wherein s isjAnd (3) for the example feature vector output in the step (1-2), A represents a weight parameter matrix, and q represents a query vector, and the query vector is used for querying the feature vector corresponding to the relationship label from A.
(2-2) normalizing the example weights, wherein a specific calculation formula is as follows:
wherein k is ∈ [1, m ].
Step 3, calculating the average value and standard deviation of all example relevance weights in the package, and updating the example relevance weights according to the design threshold value, wherein the process is as follows:
(3-1) calculating a threshold value of the correlation weight according to the output of the step (2-2), wherein a specific calculation formula is as follows:
where M is an example weighted correlation mean, δ is the standard deviation, and M- δ is the correlation weight threshold.
(3-2) updating the example relevance weight according to a threshold, if the example relevance weight is smaller than the threshold, updating the relevance weight to be 0, otherwise, keeping the relevance weight, wherein the specific calculation formula is as follows:
(3-3) normalizing the updated weight, wherein the specific calculation formula is as follows:
step 4, combining the example feature vectors in the packet into a packet feature vector according to the example relevance weight, wherein the process is as follows:
(4-1) obtaining a feature vector of the packet according to the example relevance weight output in the step (3-3), wherein a specific calculation formula is as follows:
wherein x istRepresenting a 230-dimensional package feature vector, t ∈ [1, ns],βkExemplary correlation weights, s, output for step (3-3)kFor the output of step (1-2)The feature vectors are exemplified in the packet(s) of (c).
And 5, inputting the packet feature vectors into a classifier to obtain a classification result of the packet, comparing the classification result with the labels of the packet, and calculating a loss function. The classifier comprises a full connection layer and a normalization layer, and the process is as follows:
(5-1) adjusting the packet feature vector output in the step (4-1) through a full connection layer, wherein a specific calculation formula is as follows:
Ot=Wxt+d (8)
wherein the content of the first and second substances,is the output vector of the full connection layer, and represents the score of each relation type, nrNumber of relationship types, W is a trainable parameter matrix, xtAnd d is a bias vector, wherein the packet feature vector is output in the step (4-1).
(5-2) normalizing the result of the step (5-1) and outputting probability distribution of each relation category, wherein the specific calculation formula is as follows:
wherein, BLFor the example package described in step (1-1), rL∈[1,nr]Is a bag BLThe number corresponding to the type of the relationship is assigned, theta represents a trainable parameter set of the model and comprises a segmented convolution network in the step (1-2), a weight matrix in the step (2-1), a full link layer parameter in the step (5-1), and ocRepresents the score of the package on the c-th relationship type, c ∈ [1, nr]。
(5-3) calculating a loss function for updating the model parameters, wherein the calculation formula is as follows:
step 6, judging whether the iterative training model needs to be continued or not, wherein the process is as follows:
(6-1) if the F1 value of the continuous three-wheel training is not increased or the training of the current wheel reaches the preset training times, finishing the training; otherwise, updating the parameter set theta according to the loss function, and performing the next round of training, wherein the parameter updating calculation formula is as follows:
In the embodiment, an example weight threshold is designed through the mean value and the standard deviation of example relevance weights in a package, the example weights are updated based on the threshold, and then examples with smaller weights and larger dispersion with the mean value are filtered, so that the influence of error labeling data in multi-example learning on model training is reduced.
Example three:
in order to verify the effectiveness of the method, experiments are developed by taking the news corpus in New York as data, and the method is further explained by combining the attached drawings.
In this embodiment, a remote supervision entity Relationship Classification method (DSRC-SWD) based on example Weight Dispersion firstly packages examples in a training set generated based on a remote supervision method, and obtains a feature vector of each example in the package through a segmented convolution network (PCNN); then, calculating the relevance weight of the example and the packet in which the example is positioned by using an attention mechanism; calculating the average value and the standard deviation of all the example correlation weights in the package, designing an example correlation weight threshold according to the average value and the standard deviation, if the example correlation weight is smaller than the threshold, updating the correlation weight to be 0, otherwise, keeping the correlation weight and normalizing the correlation weight; the example feature vectors in the packet are then combined into a packet feature vector according to the example relevance weights. And finally, inputting the packet feature vector into a classifier to output a classification result of the packet, comparing the classification result with the label of the packet, and calculating a loss function. If the F1 value of the continuous three-wheel training is not lifted or the training of the current wheel reaches the preset training times, finishing the training; otherwise, updating the parameters according to the loss function, and performing the next round of training.
Referring to the remote supervision relationship extraction flow chart of fig. 1, in the embodiment, an example weight threshold is designed through a mean value and a standard deviation, and an example weight is updated, so that the influence of a mislabeling example on a model in multi-example learning is reduced.
Step 1: packing sentence examples (examples for short) in a training set generated based on a remote supervision method, and obtaining a feature vector of each example through a segmented convolution network, wherein the specific process comprises the following steps:
(1-1) packing examples in a training set generated based on a remote supervision method. For example, if there is a sentence of BillGatessisthfufacement Microsoft, entity BillGates and Microsoft appear in a triplet (Microsoft, found, BillGates) of the atlas, then the relationship label of the sentence is (business, person, company). Putting the examples containing the entity pair into the same set, forming a multi-example package, and constructing a remote supervision data setnsIs the number of packets, BL={S1,S2,...,SmIs a multi-instance packet in the dataset, L e [1, ns]M is the number of instances in the packet, SiFor each example in a package, i ∈ [1, m]。
(1-2) obtaining feature vectors b of examples in a packet through a segmented convolutional networkl={s1,s2,...,sm},l∈[1,ns],sjFor each example feature vector, j ∈ [1, m]。
Step 2: the method comprises the following steps of calculating the relevance weight of an example and a package in which the example is located by using an attention mechanism, wherein the specific process comprises the following steps:
(2-1) performing inner product on the feature vector of each example in the packet and the packet label vector, wherein the result is used as the correlation weight of the example and the packet, namely the example weight, and the specific calculation formula is as follows:
ej=sjAq (1)
wherein s isjFor the example feature vector output in the step (1-2), A represents a weight parameter matrix, q represents a query vector, and the query vector is used for querying the feature vector corresponding to the relationship label from A;
(2-2) normalizing the example weights, wherein a specific calculation formula is as follows:
wherein k is ∈ [1, m ].
3. The mean and standard deviation of all the example correlation weights within the package are calculated and the example correlation weights are updated according to their design thresholds, as shown in fig. 3, which proceeds as follows:
(3-1) calculating a threshold value of the correlation weight according to the output of the step (2-2), wherein a specific calculation formula is as follows:
where M is an example weighted correlation mean, δ is the standard deviation, and M- δ is the correlation weight threshold.
(3-2) updating the example relevance weight according to a threshold, if the example relevance weight is smaller than the threshold, updating the relevance weight to be 0, otherwise, keeping the relevance weight, wherein the specific calculation formula is as follows:
(3-3) normalizing the updated weight, wherein the specific calculation formula is as follows:
4. combining the example feature vectors in the packet into a packet feature vector according to the example relevance weights, which is performed as follows:
(4-1) obtaining a feature vector of the packet according to the example relevance weight output in the step (3-3), wherein a specific calculation formula is as follows:
wherein x istRepresenting a 230-dimensional package feature vector, t ∈ [1, ns],βkExemplary correlation weights, s, output for step (3-3)kAnd (5) illustrating the feature vectors in the packet output in the step (1-2).
5. And inputting the packet feature vector into a classifier to obtain a classification result of the packet, comparing the classification result with the label of the packet, and calculating a loss function. The classifier is divided into a full connection layer and a normalization layer, and the process is as follows:
(5-1) adjusting the packet feature vector output in the step (4-1) through a full connection layer, wherein a specific calculation formula is as follows:
Ot=Wxt+d (8)
wherein the content of the first and second substances,is the output vector of the full connection layer, and represents the score of each relation type, nr53 is the number of relationship types, W is the trainable parameter matrix, xtAnd d is a bias vector, wherein the packet feature vector is output in the step (4-1).
(5-2) normalizing the result of the step (5-1) and outputting probability distribution of each relation category, wherein the specific calculation formula is as follows:
wherein, BLFor the example package described in step (1-1), rL∈[1,nr]Is a bag BLThe number corresponding to the type of the relationship is assigned, theta represents a trainable parameter set of the model and comprises a segmented convolution network in the step (1-2), a weight matrix in the step (2-1), a full link layer parameter in the step (5-1), and ocRepresents the score of the package on the c-th relationship type, c ∈ [1, nr]。
(5-3) calculating a loss function for updating the model parameters, wherein the calculation formula is as follows:
wherein n issAnd theta represents a trainable parameter set of the model and comprises the segmented convolution network in the step (1-2), the weight matrix in the step (2-1) and the full connection layer parameters in the step (5-1).
6. Judging whether the iterative training model needs to be continued or not, wherein the process is as follows:
(6-1) if the F1 value of the continuous three-wheel training is not increased or the training of the current wheel reaches the preset training times, finishing the training; otherwise, updating the parameter set theta according to the loss function, and performing the next round of training, wherein the parameter updating calculation formula is as follows:
Compared with other methods, the method of the embodiment designs the example weight threshold value through the mean value and the standard deviation of the example weights in the package, further updates the example weight threshold value in the package, and filters the examples with smaller weights and larger dispersion with the mean value, thereby reducing the influence of wrong labeling data in multi-example learning.
Description of experimental tests and results:
the data set used in this example is the data set of "New York Times" (http:// t. cn/RPsjAY), and is divided into a training set, a verification set and a test set. Wherein the training set includes 466876 sentence examples, the verification set includes 55167 sentence examples, and the test set includes 172448 sentence examples. The experimental indexes adopt a Precision-Recall ratio Curve (PR Curve) and P @ N, wherein the PR Curve represents the Precision ratio of the model under different Recall ratios, the Curve is closer to the upper right corner of a coordinate system to indicate that the comprehensive performance of the model is better, and the P @ N represents the accuracy of the first N pieces of test data.
TABLE 1 results of P @ N
P@N(%) | 100 | 200 | 300 | Average |
ONE | 64.3 | 62.6 | 58.2 | 61.7 |
AVG | 67.8 | 64.4 | 60.4 | 64.2 |
ATT | 71.1 | 67.6 | 64.5 | 67.7 |
DSRC-SWD | 72.3 | 69.7 | 66.1 | 69.3 |
FIG. 2 is a comparison of PR plots of the present invention process with other processes, and Table 1 is a comparison of the P @ N index of the present invention process with other processes. Each method employs a segmented convolutional network for feature extraction, with the difference being an exemplary weight calculation method. Wherein ONE indicates that the example with the largest relevance weight retains its weight, and the other examples reset 0; AVG indicates that all example correlation weights are the same; ATT denotes the calculation of the relevance weights according to the attention mechanism; DSRC-SWD represents an exemplary weight calculation method of the present invention. An experimental PR curve shows that the weight calculation method of the invention obtains the highest accuracy rate when the recall rate is lower; the difference in accuracy between the process of the invention and the ATT process is small when the recall rate is between 0.05 and 0.17, but still higher than other processes; after the recall rate is 0.17, the accuracy of the method is greatly improved compared with that of all other methods. The experimental P @ N result shows that the accuracy of the method is highest under P @100, P @200 and P @300, the average accuracy reaches 69.3%, and the method is respectively improved by 7.6%, 5.1% and 1.6% compared with other methods.
The ONE method only selects the example with the highest score among the packets, other examples, although relatively low in score, may contain the relational semantic information of the packets, and this method loses a large amount of valid information. The AVG method makes the weights of the used examples the same, fails to distinguish the wrong labeled examples from the correct examples, and has great negative influence on the model. Although the ATT method gives different weights to the examples through the attention mechanism, the wrongly labeled examples still participate in the training of the model, so that the extraction effect of the model relation is limited. On the basis of an attention mechanism, the standard deviation of example weight distribution is calculated, an example weight threshold value is further obtained, examples with low scores and large dispersion degree with the mean value are filtered, and only examples with high scores are made to participate in model training. Therefore, the experimental conclusion is that under the condition that the example feature extraction methods are the same, the method provided by the invention achieves a better effect in relation classification of the multi-example learning remote supervision entity.
In summary, the above embodiment of the remote supervised entity relationship classification method based on example weight dispersion packs the sentence examples (examples for short) generated based on the remote supervised method, and obtains the feature vector of each example through the segmented convolution network. The relevance weights of the examples and their packages are calculated using an attention mechanism. The mean and standard deviation of all example relevance weights within a package are calculated and the example relevance weights are updated according to their design thresholds. The example feature vectors in the package are combined into a package feature vector according to the example updated relevance weights. And inputting the packet feature vector into a classifier to obtain a classification result of the packet, comparing the classification result with the label of the packet, and calculating a loss function. If the F1 value of the continuous three-wheel training is not lifted or the training of the current wheel reaches the preset training times, finishing the training; otherwise, updating the parameters according to the loss function, and performing the next round of training. The method improves the calculation method of the relevance weight of the example in the package based on the attention mechanism, designs the weight threshold by using the average value and the standard deviation of the relevance weight, and filters the examples with smaller relevance weight and larger dispersion degree with the average value according to the threshold, thereby reducing the influence of the error labeling example in remote supervision on model training and improving the accuracy of the entity relationship classification model in remote supervision.
The foregoing is a more detailed description of the invention in connection with specific/preferred embodiments and is not intended to limit the practice of the invention to those descriptions. It will be apparent to those skilled in the art that various substitutions and modifications can be made to the described embodiments without departing from the spirit of the invention, and these substitutions and modifications should be considered to fall within the scope of the invention.
Claims (7)
1. An example weight calculation method for multi-example learning is characterized by comprising the following steps:
step 1, packing sentence examples generated based on a remote supervision method, and obtaining a feature vector of each example through a segmented convolution network.
And 2, calculating the relevance weight of the example and the package in which the example is positioned by using the attention mechanism.
And 3, calculating the average value and the standard deviation of all the example correlation weights in the package, and updating the example correlation weights according to the design threshold value.
And 4, combining the example feature vectors in the package into a package feature vector according to the example updated relevance weight.
And 5, inputting the packet feature vectors into a classifier to obtain a classification result of the packet, comparing the classification result with the labels of the packet, and calculating a loss function.
Step 6, if the F1 value of the continuous three-wheel training is not lifted or the training of the current wheel reaches the preset training times, finishing the training; otherwise, updating the parameters according to the loss function, and performing the next round of training.
2. The method of claim 1, wherein: the step 1 is to pack sentence examples generated based on a remote supervision method, and obtain a feature vector of each example through a segmented convolution network, and the process is as follows:
(2-1) putting instances containing the same entity pair into the same set, forming a multi-instance package, and constructing a remote supervision data setnsIs the number of packets, BL={S1,S2,...,SmIs a multi-instance packet in the dataset, L e [1, ns]M is the number of instances in the packet, SiFor each example in a package, i ∈ [1, m]。
(2-2) general formulaObtaining feature vectors b of examples in packets by a segmented convolutional networkl={s1,s2,...,sm},l∈[1,ns],sjFor each example feature vector, j ∈ [1, m]。
3. The method of claim 1, wherein: in the step 2, the relevance weight of the example and the packet where the example is located is calculated by using an attention mechanism, and the process is as follows:
(3-1) performing inner product on the feature vector of each example in the packet and the packet label vector, wherein the result is used as the correlation weight of the example and the packet, namely the example weight, and the specific calculation formula is as follows:
ej=sjAq (1)
wherein s isjAnd (3) for the example feature vector output in the step (2-2), A represents a weight parameter matrix, and q represents a query vector, and the query vector is used for querying the feature vector corresponding to the relationship label from A.
(3-2) normalizing the example weights, wherein a specific calculation formula is as follows:
wherein k is ∈ [1, m ].
4. The method of claim 1, wherein: in step 3, the average and standard deviation of all example relevance weights in the package are calculated, and the example relevance weights are updated according to the design threshold, and the process is as follows:
(4-1) calculating a threshold value of the correlation weight according to the output of the step (3-2), wherein a specific calculation formula is as follows:
where M is an example weighted correlation mean, δ is the standard deviation, and M- δ is the correlation weight threshold.
(4-2) updating the example relevance weight according to the threshold, if the example relevance weight is smaller than the threshold, updating the relevance weight to be 0, otherwise, keeping the relevance weight, and the specific calculation formula is as follows:
(4-3) normalizing the updated weight, wherein the specific calculation formula is as follows:
5. the method of claim 1, wherein: the step 4 combines the example feature vectors in the packet into a packet feature vector according to the example relevance weights, and the process is as follows:
(5-1) obtaining a feature vector of the packet according to the example relevance weight output in the step (4-3), wherein a specific calculation formula is as follows:
wherein x istRepresenting a 230-dimensional package feature vector, t ∈ [1, ns],βkExemplary correlation weights, s, output for step (4-3)kAn example feature vector output for step (2-2).
6. The method of claim 1, wherein: and 5, inputting the packet feature vectors into a classifier to obtain a classification result of the packet, comparing the classification result with the label of the packet, and calculating a loss function. The classifier comprises a full connection layer and a normalization layer, and the process is as follows:
(6-1) adjusting the packet feature vector output in the step (5-1) through the full connection layer, wherein a specific calculation formula is as follows:
Ot=Wxt+d (8)
wherein the content of the first and second substances,is the output vector of the full connection layer, and represents the score of each relation type, nrNumber of relationship types, W is a trainable parameter matrix, xtAnd d is an offset, wherein d is the packet feature vector output in the step (5-1).
(6-2) normalizing the result of the step (6-1) and outputting probability distribution of each relation category, wherein the specific calculation formula is as follows:
wherein, BLFor the example package described in step (2-1), rL∈[1,nr]Is a bag BLThe number corresponding to the type of the relationship is assigned, theta represents a trainable parameter set of the model, and comprises a segmented convolution network in the step (2-2), a weight parameter matrix in the step (3-1), a full link layer parameter in the step (6-1), and ocRepresents the score of the package on the c-th relationship type, c ∈ [1, nr]。
(6-3) calculating a loss function for updating the model parameters, wherein the calculation formula is as follows:
7. a method according to claim 1, characterized in that: step 6 is to judge whether the iterative training model needs to be continued, and the process is as follows:
(7-1) if the F1 value of the continuous three-wheel training is not increased or the training of the current wheel reaches the preset training times, finishing the training; otherwise, updating the parameter set theta according to the loss function, and performing the next round of training, wherein the parameter updating calculation formula is as follows:
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110456426.8A CN113254636A (en) | 2021-04-27 | 2021-04-27 | Remote supervision entity relationship classification method based on example weight dispersion |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110456426.8A CN113254636A (en) | 2021-04-27 | 2021-04-27 | Remote supervision entity relationship classification method based on example weight dispersion |
Publications (1)
Publication Number | Publication Date |
---|---|
CN113254636A true CN113254636A (en) | 2021-08-13 |
Family
ID=77222101
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110456426.8A Pending CN113254636A (en) | 2021-04-27 | 2021-04-27 | Remote supervision entity relationship classification method based on example weight dispersion |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113254636A (en) |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104361059A (en) * | 2014-11-03 | 2015-02-18 | 中国科学院自动化研究所 | Harmful information identification and web page classification method based on multi-instance learning |
CN106682696A (en) * | 2016-12-29 | 2017-05-17 | 华中科技大学 | Multi-example detection network based on refining of online example classifier and training method thereof |
CN111191031A (en) * | 2019-12-24 | 2020-05-22 | 上海大学 | Entity relation classification method of unstructured text based on WordNet and IDF |
CN111414749A (en) * | 2020-03-18 | 2020-07-14 | 哈尔滨理工大学 | Social text dependency syntactic analysis system based on deep neural network |
CN111966917A (en) * | 2020-07-10 | 2020-11-20 | 电子科技大学 | Event detection and summarization method based on pre-training language model |
-
2021
- 2021-04-27 CN CN202110456426.8A patent/CN113254636A/en active Pending
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104361059A (en) * | 2014-11-03 | 2015-02-18 | 中国科学院自动化研究所 | Harmful information identification and web page classification method based on multi-instance learning |
CN106682696A (en) * | 2016-12-29 | 2017-05-17 | 华中科技大学 | Multi-example detection network based on refining of online example classifier and training method thereof |
CN111191031A (en) * | 2019-12-24 | 2020-05-22 | 上海大学 | Entity relation classification method of unstructured text based on WordNet and IDF |
CN111414749A (en) * | 2020-03-18 | 2020-07-14 | 哈尔滨理工大学 | Social text dependency syntactic analysis system based on deep neural network |
CN111966917A (en) * | 2020-07-10 | 2020-11-20 | 电子科技大学 | Event detection and summarization method based on pre-training language model |
Non-Patent Citations (1)
Title |
---|
乐金雄: "《基于内外部语义特征及优先注意力机制的远程监督实体关系抽取方法及应用研究》", 《中国优秀博硕士学位论文全文数据库(硕士) 信息科技辑》 * |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN103336766B (en) | Short text garbage identification and modeling method and device | |
WO2019218514A1 (en) | Method for extracting webpage target information, device, and storage medium | |
CN102289522B (en) | Method of intelligently classifying texts | |
CN102411563B (en) | Method, device and system for identifying target words | |
CN110110225B (en) | Online education recommendation model based on user behavior data analysis and construction method | |
CN105843799B (en) | A kind of academic paper label recommendation method based on multi-source heterogeneous information graph model | |
CN107862089B (en) | Label extraction method based on perception data | |
WO2020253043A1 (en) | Intelligent text classification method and apparatus, and computer-readable storage medium | |
CN109376352A (en) | A kind of patent text modeling method based on word2vec and semantic similarity | |
CN113626607B (en) | Abnormal work order identification method and device, electronic equipment and readable storage medium | |
CN107292097A (en) | The feature selection approach of feature based group and traditional Chinese medical science primary symptom system of selection | |
CN109818971B (en) | Network data anomaly detection method and system based on high-order association mining | |
CN103336852A (en) | Cross-language ontology construction method and device | |
Baralis et al. | I‐prune: Item selection for associative classification | |
CN110019820A (en) | Main suit and present illness history symptom Timing Coincidence Detection method in a kind of case history | |
CN105224955A (en) | Based on the method for microblogging large data acquisition network service state | |
CN111831822A (en) | Text multi-classification method for unbalanced data set based on text multi-classification mixed equipartition clustering sampling algorithm | |
Zhang et al. | Medical document clustering using ontology-based term similarity measures | |
CN107038224A (en) | Data processing method and data processing equipment | |
CN112256865B (en) | Chinese text classification method based on classifier | |
CN113298253B (en) | Model training method, recognition method and device for named entity recognition | |
US20120005207A1 (en) | Method and system for web extraction | |
CN109446522A (en) | A kind of examination question automatic classification system and method | |
WO2022061877A1 (en) | Event extraction and extraction model training method, apparatus and device, and medium | |
CN113254636A (en) | Remote supervision entity relationship classification method based on example weight dispersion |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20210813 |