CN113254636A - Remote supervision entity relationship classification method based on example weight dispersion - Google Patents

Remote supervision entity relationship classification method based on example weight dispersion Download PDF

Info

Publication number
CN113254636A
CN113254636A CN202110456426.8A CN202110456426A CN113254636A CN 113254636 A CN113254636 A CN 113254636A CN 202110456426 A CN202110456426 A CN 202110456426A CN 113254636 A CN113254636 A CN 113254636A
Authority
CN
China
Prior art keywords
packet
training
weight
follows
package
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110456426.8A
Other languages
Chinese (zh)
Inventor
陈雪
刘振贤
骆祥峰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
University of Shanghai for Science and Technology
Original Assignee
University of Shanghai for Science and Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by University of Shanghai for Science and Technology filed Critical University of Shanghai for Science and Technology
Priority to CN202110456426.8A priority Critical patent/CN113254636A/en
Publication of CN113254636A publication Critical patent/CN113254636A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/36Creation of semantic tools, e.g. ontology or thesauri
    • G06F16/367Ontology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/047Probabilistic or stochastic networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Mathematical Physics (AREA)
  • Biophysics (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Biomedical Technology (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Databases & Information Systems (AREA)
  • Probability & Statistics with Applications (AREA)
  • Animal Behavior & Ethology (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)

Abstract

The invention relates to a remote supervision entity relation classification method based on example weight dispersion, which is characterized in that sentence examples generated based on a remote supervision method are packaged, and a feature vector of each example is obtained through a segmented convolution network; calculating the relevance weight of the example and the packet in which the example is positioned by using an attention mechanism; calculating the average value and the standard deviation of all the example relevance weights in the package, and updating the example relevance weights according to the design threshold value of the average value and the standard deviation; combining the example feature vectors in the package into a package feature vector according to the example updated relevance weights; inputting the packet feature vectors into a classifier to obtain a classification result of the packet, comparing the classification result with the labels of the packet, and calculating a loss function; if the F1 value of the continuous three-wheel training is not lifted or the training of the current wheel reaches the preset training times, finishing the training; otherwise, updating the parameters according to the loss function, and performing the next round of training. The method provided by the invention reduces the influence of the error labeling example on model training in remote supervision, and improves the accuracy of the remote supervision entity relationship classification model.

Description

Remote supervision entity relationship classification method based on example weight dispersion
Technical Field
The invention relates to the field of multi-instance learning and information extraction, in particular to a remote supervision entity relationship classification method based on instance weight dispersion.
Background
The entity relationship classification is one of the most important tasks of information extraction, and aims to allocate a predefined relationship type for an entity pair according to context semantics on the basis of labeling a text entity, and the method can be divided into a supervised method, a semi-supervised method, an unsupervised method and a remote supervision method.
The supervised entity relationship classification method requires a large number of accurately labeled data sets, and consumes manpower; the semi-supervised method is sensitive to given seeds and has semantic drift problem, and the accuracy rate is low; the unsupervised method utilizes corpus information clustering to define the relationship on the basis of clustering results, and the method has the problems of difficulty in describing the relationship and low recall rate of low-frequency instances.
The current mainstream method is remote supervision relation classification, a structured knowledge base is aligned with unstructured texts, and a labeling data set is automatically generated for model training, so that a large amount of labor cost is avoided. The method assumes that if two entities have some relationship in the knowledge base, then a sentence containing both entities expresses this relationship. However, the above assumptions may lead to the example of mislabeling, i.e., a sentence containing a target entity pair does not actually describe the relationship type of the entity pair in the knowledge base. Thus, remote supervised entity relationship classification can reduce the impact of mislabeled data in conjunction with multi-instance learning.
Multiple Instance Learning (MIL) proposes the concept of a package, which is defined as a collection of Multiple instances. The input of the model is not an example with a single category label, but a plurality of labeled packets, each packet is a positive example packet (positive packet) as long as at least one positive example is contained in the packet, and is negative otherwise. In the extraction of the relationship between the remote supervising entities, when a certain relationship exists between a pair of entities, at least one example in the package formed by the pair of entities can express the relationship, and because of the assumption, the example in the package which does not describe the relationship correctly can cause serious interference to the model training. Therefore, how to solve the influence of the error labeling examples in the package on the model training provides powerful support for the application of multi-example learning in other fields, and becomes a technical problem which needs to be solved urgently.
Disclosure of Invention
The invention mainly aims to overcome the defects of the prior art and provide a remote supervision entity relation classification method based on example weight dispersion, wherein a relevance weight threshold is designed by using a mean value and a standard deviation of example weights in a package, then the relevance weights of the examples in the package are updated according to the threshold, and the examples with smaller relevance weights and larger dispersion with the mean value are filtered, so that the influence of error labeling examples on a model is reduced, and powerful support is provided for the application of multi-example learning in other fields.
In order to achieve the purpose, the invention adopts the following technical scheme:
a remote supervision entity relation classification method based on example weight dispersion comprises the following steps:
step 1, packing sentence examples generated based on a remote supervision method, and obtaining a feature vector of each example through a segmented convolution network;
step 2, calculating the relevance weight of the example and the package in which the example is positioned by using an attention mechanism;
step 3, calculating the average value and standard deviation of all example correlation weights in the package, and updating the example correlation weights according to the design threshold value;
step 4, combining the example feature vectors in the package into a package feature vector according to the example updated relevance weight;
step 5, inputting the packet feature vectors into a classifier to obtain a classification result of the packet, comparing the classification result with the labels of the packet, and calculating a loss function;
step 6, if the F1 value of the continuous three-wheel training is not lifted or the training of the current wheel reaches the preset training times, finishing the training;
otherwise, updating the parameters according to the loss function, and performing the next round of training.
Preferably, in step 1, the sentence examples generated based on the remote supervision method are packaged, and the feature vector of each example is obtained through a segmented convolution network, and the specific steps are as follows:
(1-1) putting instances containing the same entity pair into the same set, forming a multi-instance package, and constructing a remote supervision data set
Figure BDA0003040668950000021
nsIs the number of packets, BL={S1,S2,...,SmIs a multi-instance packet in the dataset, L e [1, ns]M is the number of instances in the packet, SiFor each example in a package, i ∈ [1, m]。
(1-2) obtaining feature vectors b of examples in a packet through a segmented convolutional networkl={s1,s2,...,sm},l∈[1,ns],sjFor each example feature vector, j ∈ [1, m]。
Preferably, in the step 2, the relevance weight of the example and the package where the example is located is calculated by using an attention mechanism, and the specific steps are as follows:
(2-1) performing inner product on the feature vector of each example in the packet and the packet label vector, wherein the result is used as the correlation weight of the example and the packet, namely the example weight, and the specific calculation formula is as follows:
ej=sjAq (1)
wherein s isjFor the example feature vector output in step 1, a represents a weight parameter matrix, q represents a query vector, and is used for querying the feature vector corresponding to the relationship label from a;
(2-2) normalizing the example weights, wherein a specific calculation formula is as follows:
Figure BDA0003040668950000031
wherein k is ∈ [1, m ].
Preferably, in step 3, the average and standard deviation of all the example correlation weights in the package are calculated, and the example correlation weights are updated according to the design threshold, and the specific steps are as follows:
(3-1) outputting a threshold value for calculating the correlation weight according to the step 2, wherein the specific calculation formula is as follows:
Figure BDA0003040668950000032
Figure BDA0003040668950000033
wherein M is an example weight correlation mean, δ is a standard deviation, and M- δ is a correlation weight threshold;
(3-2) updating the example relevance weight according to a threshold, if the example relevance weight is smaller than the threshold, updating the relevance weight to be 0, otherwise, keeping the relevance weight, wherein the specific calculation formula is as follows:
Figure BDA0003040668950000034
(3-3) normalizing the updated weight, wherein the specific calculation formula is as follows:
Figure BDA0003040668950000035
preferably, in the step 4, the example feature vectors in the packet are combined into a packet feature vector according to the example relevance weight, and the specific steps are as follows:
and (3) obtaining the feature vector of the packet according to the example relevance weight output in the step (3), wherein a specific calculation formula is as follows:
Figure BDA0003040668950000036
wherein x istRepresenting a 230-dimensional package feature vector, t ∈ [1, ns],βkExemplary correlation weights, s, output for step (3-3)kAnd (5) illustrating the feature vectors in the packet output in the step (1-2).
Preferably, in the step 5, the packet feature vector is input into the classifier to obtain a classification result of the packet, and the classification result is compared with the label thereof to calculate the loss function. The classifier comprises a full connection layer and a normalization layer, and the specific steps are as follows:
(5-1) mapping the packet feature vector output in the step 4 into a 53-dimensional vector through a full connection layer of a classifier, wherein a specific calculation formula is as follows:
Ot=Wxt+d (8)
wherein the content of the first and second substances,
Figure BDA0003040668950000045
is the output vector of the full connection layer, and represents the score of each relation type, nrNumber of relationship types, W is a trainable parameter matrix, xtAnd d is a bias vector, wherein the packet feature vector is output in the step (4-1).
(5-2) normalizing the result of the step (5-1) and outputting probability distribution of each relation category, wherein the specific calculation formula is as follows:
Figure BDA0003040668950000041
wherein, BLFor the example package described in step (1-1), rL∈[1,nr]Is a bag BLThe number corresponding to the type of the relationship is assigned, theta represents a trainable parameter set of the model and comprises a segmented convolution network in the step (1-2), a weight matrix in the step (2-1), a full link layer parameter in the step (5-1), and ocRepresents the score of the package on the c-th relationship type, c ∈ [1, nr]。
(5-3) calculating a loss function for updating the model parameters, wherein the calculation formula is as follows:
Figure BDA0003040668950000042
preferably, in the step 6, it is determined whether the iterative training of the model is required to be continued, and the specific steps are as follows:
if the F1 value of the continuous three-wheel training is not lifted or the training of the current wheel reaches the preset training times, finishing the training; otherwise, updating the parameter set theta according to the loss function, and performing the next round of training, wherein the parameter updating calculation formula is as follows:
Figure BDA0003040668950000043
wherein epsilon is the learning rate,
Figure BDA0003040668950000044
is the gradient of the loss function.
Compared with the prior art, the invention has the following obvious and prominent substantive characteristics and remarkable technical progress:
1. according to the method, the example weight threshold is designed through the mean value and the standard deviation of the example correlation weights in the package, the example weights are updated based on the threshold, and then the examples with smaller weights and larger dispersion degree with the mean value are filtered, so that the influence of wrong labeling data in multi-example learning on model training is reduced;
2. the method improves the calculation method of the example relevance weight in the package based on the attention mechanism, and improves the accuracy of the remote supervision entity relationship classification model.
Drawings
FIG. 1 is a flow of remote supervised entity relationship classification based on example weight dispersion.
FIG. 2 is a comparison of the PR curves of the experimental results of the method of the present invention with other methods.
FIG. 3 is an example weight calculation process based on mean and standard deviation.
Detailed Description
In order to make the technical problems, technical solutions and advantageous effects to be solved by the embodiments of the present invention more clearly apparent, the present invention is further described in detail below with reference to the accompanying drawings and the embodiments. It should be emphasized that the specific embodiments described herein are merely illustrative of the invention and are not limiting.
The preferred embodiments of the present invention are described in detail below with reference to the accompanying drawings:
the first embodiment is as follows:
in this embodiment, referring to fig. 1, a method for remote supervised entity relationship classification based on example weight dispersion includes the following steps:
step 1, packing sentence examples generated based on a remote supervision method, and obtaining a feature vector of each example through a segmented convolution network;
step 2, calculating the relevance weight of the example and the package in which the example is positioned by using an attention mechanism;
step 3, calculating the average value and standard deviation of all example correlation weights in the package, and updating the example correlation weights according to the design threshold value;
step 4, combining the example feature vectors in the package into a package feature vector according to the example updated relevance weight;
step 5, inputting the packet feature vectors into a classifier to obtain a classification result of the packet, comparing the classification result with the labels of the packet, and calculating a loss function;
step 6, if the F1 value of the continuous three-wheel training is not lifted or the training of the current wheel reaches the preset training times, finishing the training;
otherwise, updating the parameters according to the loss function, and performing the next round of training.
In the remote supervision entity relationship classification method based on the example weight dispersion, the method designs a relevance weight threshold by using a mean value and a standard deviation of example weights in a package, then updates the relevance weights of the examples in the package according to the threshold, and filters the examples with smaller relevance weights and larger mean value dispersion, so as to reduce the influence of a wrong labeling example on a model.
Example two:
in the foregoing embodiment, a remote supervising entity relationship classification method based on example weight dispersion includes the following steps:
step 1, packing examples in a training set generated based on a remote supervision method, and obtaining a feature vector of each example through a segmented convolution network. The specific process is as follows:
(1-1) putting instances containing the same entity pair into the same set, forming a multi-instance package, and constructing a remote supervision data set
Figure BDA0003040668950000051
nsIs the number of packets, BL={S1,S2,...,SmIs a multi-instance packet in the dataset, L e [1, ns]M is the number of instances in the packet, SiFor each example in a package, i ∈ [1, m]。
(1-2) obtaining feature vectors b of examples in a packet through a segmented convolutional networkl={s1,s2,...,sm},l∈[1,ns],sjFor each example feature vector, j ∈ [1, m]。
And 2, calculating the relevance weight of the example and the package in which the example is positioned by using the attention mechanism. The specific process is as follows:
(2-1) performing inner product on the feature vector of each example in the packet and the packet label vector, wherein the result is used as the correlation weight of the example and the packet, namely the example weight, and the specific calculation formula is as follows:
ej=sjAq (1)
wherein s isjAnd (3) for the example feature vector output in the step (1-2), A represents a weight parameter matrix, and q represents a query vector, and the query vector is used for querying the feature vector corresponding to the relationship label from A.
(2-2) normalizing the example weights, wherein a specific calculation formula is as follows:
Figure BDA0003040668950000061
wherein k is ∈ [1, m ].
Step 3, calculating the average value and standard deviation of all example relevance weights in the package, and updating the example relevance weights according to the design threshold value, wherein the process is as follows:
(3-1) calculating a threshold value of the correlation weight according to the output of the step (2-2), wherein a specific calculation formula is as follows:
Figure BDA0003040668950000062
Figure BDA0003040668950000063
where M is an example weighted correlation mean, δ is the standard deviation, and M- δ is the correlation weight threshold.
(3-2) updating the example relevance weight according to a threshold, if the example relevance weight is smaller than the threshold, updating the relevance weight to be 0, otherwise, keeping the relevance weight, wherein the specific calculation formula is as follows:
Figure BDA0003040668950000064
(3-3) normalizing the updated weight, wherein the specific calculation formula is as follows:
Figure BDA0003040668950000071
step 4, combining the example feature vectors in the packet into a packet feature vector according to the example relevance weight, wherein the process is as follows:
(4-1) obtaining a feature vector of the packet according to the example relevance weight output in the step (3-3), wherein a specific calculation formula is as follows:
Figure BDA0003040668950000072
wherein x istRepresenting a 230-dimensional package feature vector, t ∈ [1, ns],βkExemplary correlation weights, s, output for step (3-3)kFor the output of step (1-2)The feature vectors are exemplified in the packet(s) of (c).
And 5, inputting the packet feature vectors into a classifier to obtain a classification result of the packet, comparing the classification result with the labels of the packet, and calculating a loss function. The classifier comprises a full connection layer and a normalization layer, and the process is as follows:
(5-1) adjusting the packet feature vector output in the step (4-1) through a full connection layer, wherein a specific calculation formula is as follows:
Ot=Wxt+d (8)
wherein the content of the first and second substances,
Figure BDA0003040668950000075
is the output vector of the full connection layer, and represents the score of each relation type, nrNumber of relationship types, W is a trainable parameter matrix, xtAnd d is a bias vector, wherein the packet feature vector is output in the step (4-1).
(5-2) normalizing the result of the step (5-1) and outputting probability distribution of each relation category, wherein the specific calculation formula is as follows:
Figure BDA0003040668950000073
wherein, BLFor the example package described in step (1-1), rL∈[1,nr]Is a bag BLThe number corresponding to the type of the relationship is assigned, theta represents a trainable parameter set of the model and comprises a segmented convolution network in the step (1-2), a weight matrix in the step (2-1), a full link layer parameter in the step (5-1), and ocRepresents the score of the package on the c-th relationship type, c ∈ [1, nr]。
(5-3) calculating a loss function for updating the model parameters, wherein the calculation formula is as follows:
Figure BDA0003040668950000074
step 6, judging whether the iterative training model needs to be continued or not, wherein the process is as follows:
(6-1) if the F1 value of the continuous three-wheel training is not increased or the training of the current wheel reaches the preset training times, finishing the training; otherwise, updating the parameter set theta according to the loss function, and performing the next round of training, wherein the parameter updating calculation formula is as follows:
Figure BDA0003040668950000081
wherein epsilon is the learning rate,
Figure BDA0003040668950000082
is the gradient of the loss function.
In the embodiment, an example weight threshold is designed through the mean value and the standard deviation of example relevance weights in a package, the example weights are updated based on the threshold, and then examples with smaller weights and larger dispersion with the mean value are filtered, so that the influence of error labeling data in multi-example learning on model training is reduced.
Example three:
in order to verify the effectiveness of the method, experiments are developed by taking the news corpus in New York as data, and the method is further explained by combining the attached drawings.
In this embodiment, a remote supervision entity Relationship Classification method (DSRC-SWD) based on example Weight Dispersion firstly packages examples in a training set generated based on a remote supervision method, and obtains a feature vector of each example in the package through a segmented convolution network (PCNN); then, calculating the relevance weight of the example and the packet in which the example is positioned by using an attention mechanism; calculating the average value and the standard deviation of all the example correlation weights in the package, designing an example correlation weight threshold according to the average value and the standard deviation, if the example correlation weight is smaller than the threshold, updating the correlation weight to be 0, otherwise, keeping the correlation weight and normalizing the correlation weight; the example feature vectors in the packet are then combined into a packet feature vector according to the example relevance weights. And finally, inputting the packet feature vector into a classifier to output a classification result of the packet, comparing the classification result with the label of the packet, and calculating a loss function. If the F1 value of the continuous three-wheel training is not lifted or the training of the current wheel reaches the preset training times, finishing the training; otherwise, updating the parameters according to the loss function, and performing the next round of training.
Referring to the remote supervision relationship extraction flow chart of fig. 1, in the embodiment, an example weight threshold is designed through a mean value and a standard deviation, and an example weight is updated, so that the influence of a mislabeling example on a model in multi-example learning is reduced.
Step 1: packing sentence examples (examples for short) in a training set generated based on a remote supervision method, and obtaining a feature vector of each example through a segmented convolution network, wherein the specific process comprises the following steps:
(1-1) packing examples in a training set generated based on a remote supervision method. For example, if there is a sentence of BillGatessisthfufacement Microsoft, entity BillGates and Microsoft appear in a triplet (Microsoft, found, BillGates) of the atlas, then the relationship label of the sentence is (business, person, company). Putting the examples containing the entity pair into the same set, forming a multi-example package, and constructing a remote supervision data set
Figure BDA0003040668950000083
nsIs the number of packets, BL={S1,S2,...,SmIs a multi-instance packet in the dataset, L e [1, ns]M is the number of instances in the packet, SiFor each example in a package, i ∈ [1, m]。
(1-2) obtaining feature vectors b of examples in a packet through a segmented convolutional networkl={s1,s2,...,sm},l∈[1,ns],sjFor each example feature vector, j ∈ [1, m]。
Step 2: the method comprises the following steps of calculating the relevance weight of an example and a package in which the example is located by using an attention mechanism, wherein the specific process comprises the following steps:
(2-1) performing inner product on the feature vector of each example in the packet and the packet label vector, wherein the result is used as the correlation weight of the example and the packet, namely the example weight, and the specific calculation formula is as follows:
ej=sjAq (1)
wherein s isjFor the example feature vector output in the step (1-2), A represents a weight parameter matrix, q represents a query vector, and the query vector is used for querying the feature vector corresponding to the relationship label from A;
(2-2) normalizing the example weights, wherein a specific calculation formula is as follows:
Figure BDA0003040668950000091
wherein k is ∈ [1, m ].
3. The mean and standard deviation of all the example correlation weights within the package are calculated and the example correlation weights are updated according to their design thresholds, as shown in fig. 3, which proceeds as follows:
(3-1) calculating a threshold value of the correlation weight according to the output of the step (2-2), wherein a specific calculation formula is as follows:
Figure BDA0003040668950000092
Figure BDA0003040668950000093
where M is an example weighted correlation mean, δ is the standard deviation, and M- δ is the correlation weight threshold.
(3-2) updating the example relevance weight according to a threshold, if the example relevance weight is smaller than the threshold, updating the relevance weight to be 0, otherwise, keeping the relevance weight, wherein the specific calculation formula is as follows:
Figure BDA0003040668950000094
(3-3) normalizing the updated weight, wherein the specific calculation formula is as follows:
Figure BDA0003040668950000095
4. combining the example feature vectors in the packet into a packet feature vector according to the example relevance weights, which is performed as follows:
(4-1) obtaining a feature vector of the packet according to the example relevance weight output in the step (3-3), wherein a specific calculation formula is as follows:
Figure BDA0003040668950000101
wherein x istRepresenting a 230-dimensional package feature vector, t ∈ [1, ns],βkExemplary correlation weights, s, output for step (3-3)kAnd (5) illustrating the feature vectors in the packet output in the step (1-2).
5. And inputting the packet feature vector into a classifier to obtain a classification result of the packet, comparing the classification result with the label of the packet, and calculating a loss function. The classifier is divided into a full connection layer and a normalization layer, and the process is as follows:
(5-1) adjusting the packet feature vector output in the step (4-1) through a full connection layer, wherein a specific calculation formula is as follows:
Ot=Wxt+d (8)
wherein the content of the first and second substances,
Figure BDA0003040668950000102
is the output vector of the full connection layer, and represents the score of each relation type, nr53 is the number of relationship types, W is the trainable parameter matrix, xtAnd d is a bias vector, wherein the packet feature vector is output in the step (4-1).
(5-2) normalizing the result of the step (5-1) and outputting probability distribution of each relation category, wherein the specific calculation formula is as follows:
Figure BDA0003040668950000103
wherein, BLFor the example package described in step (1-1), rL∈[1,nr]Is a bag BLThe number corresponding to the type of the relationship is assigned, theta represents a trainable parameter set of the model and comprises a segmented convolution network in the step (1-2), a weight matrix in the step (2-1), a full link layer parameter in the step (5-1), and ocRepresents the score of the package on the c-th relationship type, c ∈ [1, nr]。
(5-3) calculating a loss function for updating the model parameters, wherein the calculation formula is as follows:
Figure BDA0003040668950000104
wherein n issAnd theta represents a trainable parameter set of the model and comprises the segmented convolution network in the step (1-2), the weight matrix in the step (2-1) and the full connection layer parameters in the step (5-1).
6. Judging whether the iterative training model needs to be continued or not, wherein the process is as follows:
(6-1) if the F1 value of the continuous three-wheel training is not increased or the training of the current wheel reaches the preset training times, finishing the training; otherwise, updating the parameter set theta according to the loss function, and performing the next round of training, wherein the parameter updating calculation formula is as follows:
Figure BDA0003040668950000105
wherein, epsilon is 0.5 as the learning rate,
Figure BDA0003040668950000111
is the gradient of the loss function.
Compared with other methods, the method of the embodiment designs the example weight threshold value through the mean value and the standard deviation of the example weights in the package, further updates the example weight threshold value in the package, and filters the examples with smaller weights and larger dispersion with the mean value, thereby reducing the influence of wrong labeling data in multi-example learning.
Description of experimental tests and results:
the data set used in this example is the data set of "New York Times" (http:// t. cn/RPsjAY), and is divided into a training set, a verification set and a test set. Wherein the training set includes 466876 sentence examples, the verification set includes 55167 sentence examples, and the test set includes 172448 sentence examples. The experimental indexes adopt a Precision-Recall ratio Curve (PR Curve) and P @ N, wherein the PR Curve represents the Precision ratio of the model under different Recall ratios, the Curve is closer to the upper right corner of a coordinate system to indicate that the comprehensive performance of the model is better, and the P @ N represents the accuracy of the first N pieces of test data.
TABLE 1 results of P @ N
P@N(%) 100 200 300 Average
ONE 64.3 62.6 58.2 61.7
AVG 67.8 64.4 60.4 64.2
ATT 71.1 67.6 64.5 67.7
DSRC-SWD 72.3 69.7 66.1 69.3
FIG. 2 is a comparison of PR plots of the present invention process with other processes, and Table 1 is a comparison of the P @ N index of the present invention process with other processes. Each method employs a segmented convolutional network for feature extraction, with the difference being an exemplary weight calculation method. Wherein ONE indicates that the example with the largest relevance weight retains its weight, and the other examples reset 0; AVG indicates that all example correlation weights are the same; ATT denotes the calculation of the relevance weights according to the attention mechanism; DSRC-SWD represents an exemplary weight calculation method of the present invention. An experimental PR curve shows that the weight calculation method of the invention obtains the highest accuracy rate when the recall rate is lower; the difference in accuracy between the process of the invention and the ATT process is small when the recall rate is between 0.05 and 0.17, but still higher than other processes; after the recall rate is 0.17, the accuracy of the method is greatly improved compared with that of all other methods. The experimental P @ N result shows that the accuracy of the method is highest under P @100, P @200 and P @300, the average accuracy reaches 69.3%, and the method is respectively improved by 7.6%, 5.1% and 1.6% compared with other methods.
The ONE method only selects the example with the highest score among the packets, other examples, although relatively low in score, may contain the relational semantic information of the packets, and this method loses a large amount of valid information. The AVG method makes the weights of the used examples the same, fails to distinguish the wrong labeled examples from the correct examples, and has great negative influence on the model. Although the ATT method gives different weights to the examples through the attention mechanism, the wrongly labeled examples still participate in the training of the model, so that the extraction effect of the model relation is limited. On the basis of an attention mechanism, the standard deviation of example weight distribution is calculated, an example weight threshold value is further obtained, examples with low scores and large dispersion degree with the mean value are filtered, and only examples with high scores are made to participate in model training. Therefore, the experimental conclusion is that under the condition that the example feature extraction methods are the same, the method provided by the invention achieves a better effect in relation classification of the multi-example learning remote supervision entity.
In summary, the above embodiment of the remote supervised entity relationship classification method based on example weight dispersion packs the sentence examples (examples for short) generated based on the remote supervised method, and obtains the feature vector of each example through the segmented convolution network. The relevance weights of the examples and their packages are calculated using an attention mechanism. The mean and standard deviation of all example relevance weights within a package are calculated and the example relevance weights are updated according to their design thresholds. The example feature vectors in the package are combined into a package feature vector according to the example updated relevance weights. And inputting the packet feature vector into a classifier to obtain a classification result of the packet, comparing the classification result with the label of the packet, and calculating a loss function. If the F1 value of the continuous three-wheel training is not lifted or the training of the current wheel reaches the preset training times, finishing the training; otherwise, updating the parameters according to the loss function, and performing the next round of training. The method improves the calculation method of the relevance weight of the example in the package based on the attention mechanism, designs the weight threshold by using the average value and the standard deviation of the relevance weight, and filters the examples with smaller relevance weight and larger dispersion degree with the average value according to the threshold, thereby reducing the influence of the error labeling example in remote supervision on model training and improving the accuracy of the entity relationship classification model in remote supervision.
The foregoing is a more detailed description of the invention in connection with specific/preferred embodiments and is not intended to limit the practice of the invention to those descriptions. It will be apparent to those skilled in the art that various substitutions and modifications can be made to the described embodiments without departing from the spirit of the invention, and these substitutions and modifications should be considered to fall within the scope of the invention.

Claims (7)

1. An example weight calculation method for multi-example learning is characterized by comprising the following steps:
step 1, packing sentence examples generated based on a remote supervision method, and obtaining a feature vector of each example through a segmented convolution network.
And 2, calculating the relevance weight of the example and the package in which the example is positioned by using the attention mechanism.
And 3, calculating the average value and the standard deviation of all the example correlation weights in the package, and updating the example correlation weights according to the design threshold value.
And 4, combining the example feature vectors in the package into a package feature vector according to the example updated relevance weight.
And 5, inputting the packet feature vectors into a classifier to obtain a classification result of the packet, comparing the classification result with the labels of the packet, and calculating a loss function.
Step 6, if the F1 value of the continuous three-wheel training is not lifted or the training of the current wheel reaches the preset training times, finishing the training; otherwise, updating the parameters according to the loss function, and performing the next round of training.
2. The method of claim 1, wherein: the step 1 is to pack sentence examples generated based on a remote supervision method, and obtain a feature vector of each example through a segmented convolution network, and the process is as follows:
(2-1) putting instances containing the same entity pair into the same set, forming a multi-instance package, and constructing a remote supervision data set
Figure FDA0003040668940000012
nsIs the number of packets, BL={S1,S2,...,SmIs a multi-instance packet in the dataset, L e [1, ns]M is the number of instances in the packet, SiFor each example in a package, i ∈ [1, m]。
(2-2) general formulaObtaining feature vectors b of examples in packets by a segmented convolutional networkl={s1,s2,...,sm},l∈[1,ns],sjFor each example feature vector, j ∈ [1, m]。
3. The method of claim 1, wherein: in the step 2, the relevance weight of the example and the packet where the example is located is calculated by using an attention mechanism, and the process is as follows:
(3-1) performing inner product on the feature vector of each example in the packet and the packet label vector, wherein the result is used as the correlation weight of the example and the packet, namely the example weight, and the specific calculation formula is as follows:
ej=sjAq (1)
wherein s isjAnd (3) for the example feature vector output in the step (2-2), A represents a weight parameter matrix, and q represents a query vector, and the query vector is used for querying the feature vector corresponding to the relationship label from A.
(3-2) normalizing the example weights, wherein a specific calculation formula is as follows:
Figure FDA0003040668940000011
wherein k is ∈ [1, m ].
4. The method of claim 1, wherein: in step 3, the average and standard deviation of all example relevance weights in the package are calculated, and the example relevance weights are updated according to the design threshold, and the process is as follows:
(4-1) calculating a threshold value of the correlation weight according to the output of the step (3-2), wherein a specific calculation formula is as follows:
Figure FDA0003040668940000021
Figure FDA0003040668940000022
where M is an example weighted correlation mean, δ is the standard deviation, and M- δ is the correlation weight threshold.
(4-2) updating the example relevance weight according to the threshold, if the example relevance weight is smaller than the threshold, updating the relevance weight to be 0, otherwise, keeping the relevance weight, and the specific calculation formula is as follows:
Figure FDA0003040668940000023
(4-3) normalizing the updated weight, wherein the specific calculation formula is as follows:
Figure FDA0003040668940000024
5. the method of claim 1, wherein: the step 4 combines the example feature vectors in the packet into a packet feature vector according to the example relevance weights, and the process is as follows:
(5-1) obtaining a feature vector of the packet according to the example relevance weight output in the step (4-3), wherein a specific calculation formula is as follows:
Figure FDA0003040668940000025
wherein x istRepresenting a 230-dimensional package feature vector, t ∈ [1, ns],βkExemplary correlation weights, s, output for step (4-3)kAn example feature vector output for step (2-2).
6. The method of claim 1, wherein: and 5, inputting the packet feature vectors into a classifier to obtain a classification result of the packet, comparing the classification result with the label of the packet, and calculating a loss function. The classifier comprises a full connection layer and a normalization layer, and the process is as follows:
(6-1) adjusting the packet feature vector output in the step (5-1) through the full connection layer, wherein a specific calculation formula is as follows:
Ot=Wxt+d (8)
wherein the content of the first and second substances,
Figure FDA0003040668940000026
is the output vector of the full connection layer, and represents the score of each relation type, nrNumber of relationship types, W is a trainable parameter matrix, xtAnd d is an offset, wherein d is the packet feature vector output in the step (5-1).
(6-2) normalizing the result of the step (6-1) and outputting probability distribution of each relation category, wherein the specific calculation formula is as follows:
Figure FDA0003040668940000031
wherein, BLFor the example package described in step (2-1), rL∈[1,nr]Is a bag BLThe number corresponding to the type of the relationship is assigned, theta represents a trainable parameter set of the model, and comprises a segmented convolution network in the step (2-2), a weight parameter matrix in the step (3-1), a full link layer parameter in the step (6-1), and ocRepresents the score of the package on the c-th relationship type, c ∈ [1, nr]。
(6-3) calculating a loss function for updating the model parameters, wherein the calculation formula is as follows:
Figure FDA0003040668940000032
7. a method according to claim 1, characterized in that: step 6 is to judge whether the iterative training model needs to be continued, and the process is as follows:
(7-1) if the F1 value of the continuous three-wheel training is not increased or the training of the current wheel reaches the preset training times, finishing the training; otherwise, updating the parameter set theta according to the loss function, and performing the next round of training, wherein the parameter updating calculation formula is as follows:
Figure FDA0003040668940000033
wherein epsilon is the learning rate,
Figure FDA0003040668940000034
is the gradient of the loss function.
CN202110456426.8A 2021-04-27 2021-04-27 Remote supervision entity relationship classification method based on example weight dispersion Pending CN113254636A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110456426.8A CN113254636A (en) 2021-04-27 2021-04-27 Remote supervision entity relationship classification method based on example weight dispersion

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110456426.8A CN113254636A (en) 2021-04-27 2021-04-27 Remote supervision entity relationship classification method based on example weight dispersion

Publications (1)

Publication Number Publication Date
CN113254636A true CN113254636A (en) 2021-08-13

Family

ID=77222101

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110456426.8A Pending CN113254636A (en) 2021-04-27 2021-04-27 Remote supervision entity relationship classification method based on example weight dispersion

Country Status (1)

Country Link
CN (1) CN113254636A (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104361059A (en) * 2014-11-03 2015-02-18 中国科学院自动化研究所 Harmful information identification and web page classification method based on multi-instance learning
CN106682696A (en) * 2016-12-29 2017-05-17 华中科技大学 Multi-example detection network based on refining of online example classifier and training method thereof
CN111191031A (en) * 2019-12-24 2020-05-22 上海大学 Entity relation classification method of unstructured text based on WordNet and IDF
CN111414749A (en) * 2020-03-18 2020-07-14 哈尔滨理工大学 Social text dependency syntactic analysis system based on deep neural network
CN111966917A (en) * 2020-07-10 2020-11-20 电子科技大学 Event detection and summarization method based on pre-training language model

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104361059A (en) * 2014-11-03 2015-02-18 中国科学院自动化研究所 Harmful information identification and web page classification method based on multi-instance learning
CN106682696A (en) * 2016-12-29 2017-05-17 华中科技大学 Multi-example detection network based on refining of online example classifier and training method thereof
CN111191031A (en) * 2019-12-24 2020-05-22 上海大学 Entity relation classification method of unstructured text based on WordNet and IDF
CN111414749A (en) * 2020-03-18 2020-07-14 哈尔滨理工大学 Social text dependency syntactic analysis system based on deep neural network
CN111966917A (en) * 2020-07-10 2020-11-20 电子科技大学 Event detection and summarization method based on pre-training language model

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
乐金雄: "《基于内外部语义特征及优先注意力机制的远程监督实体关系抽取方法及应用研究》", 《中国优秀博硕士学位论文全文数据库(硕士) 信息科技辑》 *

Similar Documents

Publication Publication Date Title
CN103336766B (en) Short text garbage identification and modeling method and device
WO2019218514A1 (en) Method for extracting webpage target information, device, and storage medium
CN102289522B (en) Method of intelligently classifying texts
CN102411563B (en) Method, device and system for identifying target words
CN110110225B (en) Online education recommendation model based on user behavior data analysis and construction method
CN105843799B (en) A kind of academic paper label recommendation method based on multi-source heterogeneous information graph model
CN107862089B (en) Label extraction method based on perception data
WO2020253043A1 (en) Intelligent text classification method and apparatus, and computer-readable storage medium
CN109376352A (en) A kind of patent text modeling method based on word2vec and semantic similarity
CN113626607B (en) Abnormal work order identification method and device, electronic equipment and readable storage medium
CN107292097A (en) The feature selection approach of feature based group and traditional Chinese medical science primary symptom system of selection
CN109818971B (en) Network data anomaly detection method and system based on high-order association mining
CN103336852A (en) Cross-language ontology construction method and device
Baralis et al. I‐prune: Item selection for associative classification
CN110019820A (en) Main suit and present illness history symptom Timing Coincidence Detection method in a kind of case history
CN105224955A (en) Based on the method for microblogging large data acquisition network service state
CN111831822A (en) Text multi-classification method for unbalanced data set based on text multi-classification mixed equipartition clustering sampling algorithm
Zhang et al. Medical document clustering using ontology-based term similarity measures
CN107038224A (en) Data processing method and data processing equipment
CN112256865B (en) Chinese text classification method based on classifier
CN113298253B (en) Model training method, recognition method and device for named entity recognition
US20120005207A1 (en) Method and system for web extraction
CN109446522A (en) A kind of examination question automatic classification system and method
WO2022061877A1 (en) Event extraction and extraction model training method, apparatus and device, and medium
CN113254636A (en) Remote supervision entity relationship classification method based on example weight dispersion

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20210813