CN114077676A - Knowledge graph noise detection method based on path confidence - Google Patents

Knowledge graph noise detection method based on path confidence Download PDF

Info

Publication number
CN114077676A
CN114077676A CN202111393836.9A CN202111393836A CN114077676A CN 114077676 A CN114077676 A CN 114077676A CN 202111393836 A CN202111393836 A CN 202111393836A CN 114077676 A CN114077676 A CN 114077676A
Authority
CN
China
Prior art keywords
path
confidence
matrix
triples
triplet
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202111393836.9A
Other languages
Chinese (zh)
Other versions
CN114077676B (en
Inventor
马江涛
周辰宇
王艳军
李端阳
贾泽臣
马宇科
李霆
卢威光
张蓓蕾
李清扬
赵一帆
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Henan Tupu Information Technology Co ltd
Zhengzhou University of Light Industry
Original Assignee
Henan Tupu Information Technology Co ltd
Zhengzhou University of Light Industry
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Henan Tupu Information Technology Co ltd, Zhengzhou University of Light Industry filed Critical Henan Tupu Information Technology Co ltd
Priority to CN202111393836.9A priority Critical patent/CN114077676B/en
Publication of CN114077676A publication Critical patent/CN114077676A/en
Application granted granted Critical
Publication of CN114077676B publication Critical patent/CN114077676B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/36Creation of semantic tools, e.g. ontology or thesauri
    • G06F16/367Ontology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/10Complex mathematical operations
    • G06F17/16Matrix or vector computation, e.g. matrix-matrix or matrix-vector multiplication, matrix factorization
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/047Probabilistic or stochastic networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • Computing Systems (AREA)
  • Software Systems (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Computational Linguistics (AREA)
  • Biomedical Technology (AREA)
  • Computational Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Biophysics (AREA)
  • Artificial Intelligence (AREA)
  • Health & Medical Sciences (AREA)
  • Pure & Applied Mathematics (AREA)
  • Mathematical Optimization (AREA)
  • Databases & Information Systems (AREA)
  • Molecular Biology (AREA)
  • Mathematical Analysis (AREA)
  • Animal Behavior & Ethology (AREA)
  • Probability & Statistics with Applications (AREA)
  • Algebra (AREA)
  • Complex Calculations (AREA)

Abstract

The invention provides a knowledge graph noise detection method based on path confidence coefficient, which comprises the following steps: firstly, initializing triples, finding all paths of all triples, carrying out embedded representation on each triplet of each path by using a translation model TransE algorithm, and representing all paths of the triples as path embedded sequences; wherein, a node is formed between adjacent triples in the path embedding sequence; secondly, sequentially inputting the nodes into the CPLL to calculate the confidence score of each node in each path; respectively obtaining a scoring matrix of each path from each path of Bi-GRU; and finally, taking the L2 norm of the score matrix of each path as a path confidence coefficient, and taking the corresponding score matrix when the path confidence coefficient is highest as the optimal embedding matrix of the triplet. The invention combines the method based on the path and the method based on the rule, and improves the efficiency of detecting the noise in the knowledge graph, thereby improving the quality of the knowledge graph.

Description

Knowledge graph noise detection method based on path confidence
Technical Field
The invention relates to the technical field of knowledge graphs, in particular to a knowledge graph noise detection method based on path confidence.
Background
Nowadays, knowledge-graphs play an important role in solving the task of artificial intelligence. However, manually or automatically constructed knowledgemaps have a number of quality issues, and often contain some erroneous or missing triples. Noise in the knowledge-graph may be caused by human error or errors in the data, with most noise appearing as erroneous entities or relationships in the triples. Currently, more and more scholars are beginning to focus on the problem of knowledge-graph noise and come up with many solutions.
Noise detection methods in knowledge-graphs can be broadly divided into path-based methods and rule-based methods. Path-based methods start with TransE, TransH, TransR, etc. translation models, which, although they are mostly used for knowledge-graph embedded representation and completion, can also be used to detect noise in the knowledge-graph. The PaTyBRED model proposed by Melo et al, which incorporates type and path features into a local relationship classifier, preserving a specific path for each relationship to indicate whether a triplet is erroneous. Xie et al propose a CKRL model that utilizes the local and global information of triples to represent the probability of a triplet being erroneous. However, the path-based approach is weak in the ability to find noise and is not suitable for processing knowledge-graphs containing complex relationships. Rule-based methods generally have a stronger noise detection capability than path-based methods. The PSL model proposed by Brocheler et al extracts the most likely correct triples from the uncertain triples using first order predicate logic and weighting rules. Abedini et al propose Correction Tower, identifying discrepancies, inconsistencies, and error relationships in triples in three steps. However, rule-based methods lack the ability to represent knowledge, i.e., after the rule-based methods detect and reject noise in the knowledge-graph, it is also necessary to map the knowledge-graph to a continuous vector space in order to make it easier to manipulate the knowledge-graph in downstream tasks.
If the path-based approach and the rule-based approach can be combined, not only noise can be found, but a noise-free knowledge graph representation can also be constructed. Specifically, firstly, in the path of the triple, a rule is made to screen out the effective features. These features are required to distinguish noise information from correct information, and the correct information includes global triplet information and local triplet information. And then, the noise detection and the triple representation work are completed by utilizing the characteristics, so that the quality of the knowledge graph is improved, and the user experience is improved.
Disclosure of Invention
The invention provides a method for detecting noise of a knowledge graph based on path confidence, which is used for solving the technical problems that the existing method based on the path is weak in noise finding capability and is not suitable for processing the knowledge graph containing complex relationships and the rule-based method lacks the capability of knowledge representation.
The technical scheme of the invention is realized as follows:
a knowledge graph noise detection method based on path confidence includes the following steps:
the method comprises the following steps: initializing the number of triples, finding out all paths of all triples, carrying out embedded representation on each triplet of each path by using a translation model TransE algorithm, and representing all paths of triples as path embedded sequences; a node is formed between adjacent triples in the path embedding sequence, and the number of the nodes is n;
step two: sequentially inputting the nodes to a probability logic layer (CPLL) based on the confidence degree and based on the relevance degree, and calculating a confidence degree score matrix of each node in each path;
step three: respectively inputting the confidence coefficient score matrixes of all nodes in each path into the Bi-GRU to obtain a score matrix of each path;
step four: and taking the L2 norm of the score matrix of each path as the path confidence coefficient, and taking the corresponding score matrix when the path confidence coefficient is highest as the optimal embedding matrix of the triples.
Preferably, in the second step, the specific method is as follows:
s21, initializing the input node T:
T=N′i·(N′i+1)T (1);
N′i=(x′i,r′i,x′i+1) (2);
N′i+1=(x′i+1,r′i+1,x′i+2) (3);
wherein, N'iAn embedded matrix, N ', representing the ith triplet on the path'i+1An embedded matrix representing the (i +1) th triplet on the path, (N'i+1)TRepresenting triplet embedding matrix N'i+1Transpose of x'i、x′i+1、x′i+2All represent entity, r'iAnd r'i+1All represent relationships;
s22, connecting the node T with the parameter matrix W0The global confidence between the triples is obtained by multiplying, namely the global triple confidence:
GTT(i,i+1)=T·W0 (4);
wherein, GTT (i, i +1) is the confidence of the global triple;
s23, entering into Separate by the node T&In the padd layer, the sub-matrix block T on the diagonal of T is separated1,T2,T3Then T is added1,T2,T3Respectively with the parameter matrix W1,W2,W3Multiplying to obtain D, E and F; and performing logic operation based on the correlation degrees by using the D, the E and the F, and adding to obtain a local confidence coefficient between the triples, namely the local triple confidence coefficient:
T1=x′i·x′i+1,T2=r′i·r′i+1,T3=x′i+1·x′i+2 (5);
D=T1·W1,E=T2·W2,F=T3·W3 (6);
Figure BDA0003369718830000021
Figure BDA0003369718830000022
Figure BDA0003369718830000023
Figure BDA0003369718830000024
Figure BDA0003369718830000031
wherein MIN (-) represents the minimum value of the matrix, MAX (-) represents the maximum value of the matrix, 1 represents that the elements in the matrix are all 1, -1 represents that the elements in the matrix are all-1,
Figure BDA0003369718830000032
respectively representing different logic operations, wherein LTT (i, j) is a local triple confidence;
s24, multiplying the confidence coefficient of the global triple and the confidence coefficient of the local triple to obtain the confidence coefficient score G of the node Ti
Gi=GTT(i,i+1)·LTT(i,i+1) (12)。
Preferably, in step three, the specific method is as follows:
s31, selecting the confidence score G of each nodeiAnd confidence G of neighboring nodesi+1、Gi-1As the input of the bidirectional GRU, the calculation modes of the ith forward GRU and the backward GRU are respectively as follows:
Figure BDA0003369718830000033
Figure BDA0003369718830000034
wherein the content of the first and second substances,
Figure BDA0003369718830000035
which represents the output result of the forward GRU,
Figure BDA0003369718830000036
represents the output result of backward GRU, GRU (-) represents the gated loop network.
S32, performing concatenation, linear and normalization operations on the final outputs of the forward GRU and the backward GRU to obtain a path score matrix:
Figure BDA0003369718830000037
wherein h (p) represents the output result of the gated loop network, i.e. the path score matrix,
Figure BDA0003369718830000038
representing the final output result of the forward GRU,
Figure BDA0003369718830000039
represents the final output result of the backward GRU, concat () represents the join function, line () represents the linear function, and softmax () represents the normalization function.
Preferably, in step four, the path confidence and the optimal triplet are calculated by the following methods:
Figure BDA00033697188300000310
when in use
Figure BDA00033697188300000311
When, h (f)k)=h(pj) (17);
Wherein g (p) represents path confidence, h (p)j) A matrix of the scores of the paths is represented,
Figure BDA00033697188300000312
l2 function, g (f), representing a matrixk) Indicates the maximum path confidence, h (f)k) The optimal path score matrix representing the triplet is also the optimal embedding matrix for the triplet.
Preferably, the designed loss function is as follows:
L=∑(h,r,t)∈{T'∪T”}log[1+exp(l(h,r,t)·P(h,r,t))] (18);
Figure BDA0003369718830000041
the method comprises the following steps that exp () represents an exponential function with a natural constant e as a base, log () represents a logarithmic function, L represents a loss function, P (h, r, T) represents a path from a head entity h to a tail entity T, r represents a relation, T 'represents a set of valid triples, T' represents a set of invalid triples, the invalid triples refer to triples formed by randomly switching one head entity or one tail entity of original triples, and the valid triples refer to the original triples.
Compared with the prior art, the invention has the following beneficial effects:
1) on the basis of internal structure information in a knowledge graph based on a path, a probability model based on correlation degree is introduced and fused into a neural network structure to detect noise in the knowledge graph and perform knowledge graph representation.
2) The invention constructs a path confidence network to calculate the global triple confidence and the local triple confidence, and obtains the path confidence and the path score matrix of the triple by combining a bidirectional gating circulation network; the path confidence is used to determine whether the triplet is correct, and the path score matrix is used to represent the triplet.
3) The invention solves the problem of knowledge graph noise, completes the representation of the knowledge graph and obtains good effect in the detection test of the knowledge graph noise.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.
FIG. 1 is a sub-graph of all paths from entity "champions" to entity "teams";
FIG. 2 is a flow chart of the present invention;
FIG. 3 is a flow chart of the proposed model of the present invention;
FIG. 4 is a block diagram of a correlation-based probabilistic logic model of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be obtained by a person skilled in the art without inventive effort based on the embodiments of the present invention, are within the scope of the present invention.
In general, the existence of some relationship between triplets in a knowledge-graph can be expressed in the form of a path. When the triplet f is expressed as (h, R, t), the path P from the head entity h to the tail entity t as (h, R, t) is an option that cannot be ignored. Wherein, R includes at least one relationship, and possibly several entities, and these entities and relationships may form several triples N, which is referred to as path triples in the present invention. Every two adjacent triplets constitute a node. And R ≧ R, when R ≧ R, path P is equal to f, indicating that f is the shortest path.
There may be multiple paths from the head entity to the tail entity, but some paths are not correct, some paths are not complete, and information in some paths is not suitable for use in the triplet representation. FIG. 1 shows f1The set of all paths for ("champion", "joining", "team"), i.e., the set of paths for the entity "champion" to the entity "team". In FIG. 1, f1Is the shortest path, also the triplet itself, f2(","3The correct triplet is the "basketball game", "equals" and "match". Thus, f2Or f3The combined path with the other triplets is noisy. These noisy paths must undergo some processing before their path score matrices can be used to represent the triples.
However, most path-based knowledge map representation methods do not exclude noise contained in the path. But the rule-based approach is well suited to solve the problem of noise contained in the path. Specifically, a confidence level is given to each node in the path to indicate how likely the node is correct, and then a path confidence level is obtained by probability combination, and the path confidence level indicates how likely the path is correct. If the path from the head entity to the tail entity only has the triplet itself, then the triplet is the only node in the path. At this time, the triple confidence, the node confidence and the path confidence are equal. In fact, there may be multiple paths, and it is most appropriate to take the path with the highest path confidence to represent the triplet. If the triples are represented in the form of a matrix, the path score matrix is obtained by the probability combination between the node confidence degrees, and the L2 norm of the path score matrix is used as the confidence degree of the path.
As shown in fig. 2, an embodiment of the present invention provides a method for detecting noise in a knowledge-graph based on a path confidence, which includes the following specific steps:
the method comprises the following steps: for the triples with the number of E, finding all paths of all the triples, initializing the number of the triples with the number of E as E, and traversing all the triples; and traversing all paths of the triples, wherein the number of the paths is P, and the number of the initialized paths is P. Embedding each triple of each path by using a translation model TransE algorithm, and representing all paths of the triples as path embedding sequences; a node is formed between adjacent triples in the path embedding sequence, and the number of the nodes is n; the number of initialization nodes is N. The structure of the present invention is shown in fig. 3.
Step two: as shown in fig. 4, the nodes are sequentially input to a probability logic layer (CPLL) based on the correlation, and the confidence score of each node in each path is calculated;
in the second step, the specific method is as follows:
s21, initializing the input node T:
T=N′i·(N′i+1)T (1);
N′i=(x′i,r′i,x′i+1) (2);
N′i+1=(x′i+1,r′i+1,x′i+2) (3);
wherein, N'i,N′i+1Embedded matrices representing the ith and i +1 triplets on a path, respectively, (N'i+1)TRepresenting triplet embedding matrix N'i+1Transpose of x'i、x′i+1、x′i+2Represents entity r'iAnd r'i+1Representing the relationship.
S22, connecting the node T with the parameter matrix W0The global confidence between the triples is obtained by multiplying, namely the global triple confidence:
GTT(i,i+1)=T·W0 (4);
where GTT (i, i +1) is the global triple confidence.
S23, the node T enters separation&Filling operation layer, separating out sub-matrix block T on diagonal of T1,T2,T3Then T is added1,T2,T3Respectively with the parameter matrix W1,W2,W3Multiplying to obtain D, E and F; and performing logic operation based on the correlation degrees by using the D, the E and the F, and adding to obtain a local confidence coefficient between the triples, namely the local triple confidence coefficient:
T1=x′i·x′i+1,T2=r′i·r′i+1,T3=x′i+1·x′i+2 (5);
D=T1·W1,E=T2·W2,F=T3·W3 (6);
Figure BDA0003369718830000061
Figure BDA0003369718830000062
Figure BDA0003369718830000063
Figure BDA0003369718830000064
Figure BDA0003369718830000065
wherein MIN (-) represents the minimum value of the matrix, MAX (-) represents the maximum value of the matrix, 1 represents that the elements in the matrix are all 1, -1 represents that the elements in the matrix are all-1,
Figure BDA0003369718830000066
respectively representing different logical operations, LTT (i, j) is a local triple confidence.
S24, multiplying the confidence coefficient of the global triple and the confidence coefficient of the local triple to obtain the confidence coefficient score G of the node Ti
Gi=GTT(i,i+1)·LTT(i,i+1) (12)。
Step three: respectively inputting the confidence scores of all nodes in each path into a Bi-GRU (bidirectional gated-loop network) according to the front and back sequence to obtain a score matrix of each path;
in the third step, the specific method is as follows:
s31, selecting the confidence score G of each nodeiAnd confidence G of neighboring nodesi+1、Gi-1As the input of the bidirectional GRU, the calculation modes of the ith forward GRU and the backward GRU are respectively as follows:
Figure BDA0003369718830000067
Figure BDA0003369718830000068
wherein the content of the first and second substances,
Figure BDA0003369718830000071
which represents the output result of the forward GRU,
Figure BDA0003369718830000072
represents the output result of backward GRU, GRU (-) represents the gated loop network.
S32, in order to retain the effective information to the maximum, performing the connection, linear and normalization operations on the final outputs of the forward GRU and the backward GRU to obtain the path score matrix:
Figure BDA0003369718830000073
wherein h (p) represents the output result of the gated loop network, i.e. the path score matrix,
Figure BDA0003369718830000074
representing the final output result of the forward GRU,
Figure BDA0003369718830000075
represents the final output result of the backward GRU, concat () represents the join function, line () represents the linear function, and softmax () represents the normalization function.
Step four: and taking the L2 norm of the score matrix of each path as the path confidence coefficient, and taking the corresponding score matrix when the path confidence coefficient is highest as the optimal triple.
In the fourth step, the calculation methods of the path confidence coefficient and the optimal triplet are respectively as follows:
Figure BDA0003369718830000076
when in use
Figure BDA0003369718830000077
When, h (f)k)=h(pj)(17);
Wherein g (p) represents path confidence, h (p)j) A matrix of the scores of the paths is represented,
Figure BDA0003369718830000078
l2 function, g (f), representing a matrixk) Indicates the maximum path confidence, h (f)k) The optimal path score matrix representing the triplet is also the optimal embedding matrix for the triplet.
In order to train the model proposed by the present invention, the designed loss function is as follows:
L=∑(h,r,t)∈{T'∪T”}log[1+exp(l(h,r,t)·P(h,r,t))] (18);
Figure BDA0003369718830000079
the method comprises the following steps that exp () represents an exponential function with a natural constant e as a base, log () represents a logarithmic function, L represents a loss function, P (h, r, T) represents a path from a head entity h to a tail entity T, r represents a relation, T 'represents a set of valid triples, T' represents a set of invalid triples, the invalid triples refer to triples formed by randomly switching one head entity or one tail entity of original triples, and the valid triples refer to the original triples.
The present invention uses three reference datasets FB15K, WN18, and NELL995 of knowledge-map noise detection, which are constructed from information extracted from the Freebase, WordNet, and NELL knowledge bases, respectively. Their statistics are listed in table 1.
TABLE 1 statistics of the baseline data sets FB15K, WN18, and NELL995
Figure BDA00033697188300000710
Figure BDA0003369718830000081
To evaluate the performance of the model, noise needs to be added to the data set described above. The basic method is as follows: for a given positive triplet (h, r, t), one of the head or tail entities is randomly switched to form a negative triplet (h ', r, t) or (h, r, t') as noise. In this way, a data set containing 10%, 20%, 40% noise is constructed for each reference data set. These noisy data sets share the same entity, relationship, validation, and test sets as the original data set, and all the noise generated is fused into the original training set.
The invention combines the L2 norm of the path score matrix
Figure BDA0003369718830000082
As path confidence, all triples in the training set are then ranked according to these path confidence. The greater the path confidence of a triplet, the more effective the triplet is.
The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents, improvements and the like that fall within the spirit and principle of the present invention are intended to be included therein.

Claims (5)

1. A knowledge graph noise detection method based on path confidence is characterized by comprising the following steps:
the method comprises the following steps: initializing the number of triples, finding out all paths of all triples, carrying out embedded representation on each triplet of each path by using a translation model TransE algorithm, and representing all paths of triples as path embedded sequences; a node is formed between adjacent triples in the path embedding sequence, and the number of the nodes is n;
step two: sequentially inputting the nodes to a probability logic layer (CPLL) based on the confidence degree and based on the relevance degree, and calculating a confidence degree score matrix of each node in each path;
step three: respectively inputting the confidence coefficient score matrixes of all nodes in each path into the Bi-GRU to obtain a score matrix of each path;
step four: and taking the L2 norm of the score matrix of each path as the path confidence coefficient, and taking the corresponding score matrix when the path confidence coefficient is highest as the optimal embedding matrix of the triples.
2. The method for detecting knowledge-graph noise based on path confidence as claimed in claim 1, wherein in the second step, the specific method is:
s21, initializing the input node T:
T=N′i·(N′i+1)T (1);
N′i=(x′i,r′i,x′i+1) (2);
N′i+1=(x′i+1,r′i+1,x′i+2) (3);
wherein, N'iAn embedded matrix, N ', representing the ith triplet on the path'i+1An embedded matrix representing the (i +1) th triplet on the path, (N'i+1)TRepresenting triplet embedding matrix N'i+1Transpose of x'i、x′i+1、x′i+2All represent entity, r'iAnd r'i+1All represent relationships;
s22, connecting the node T with the parameter matrix W0The global confidence between the triples is obtained by multiplying, namely the global triple confidence:
GTT(i,i+1)=T·W0 (4);
wherein, GTT (i, i +1) is the confidence of the global triple;
s23, entering into Separate by the node T&In the padd layer, the sub-matrix block T on the diagonal of T is separated1,T2,T3Then T is added1,T2,T3Respectively with the parameter matrix W1,W2,W3Multiplying to obtain D, E and F; and performing logic operation based on the correlation degrees by using the D, the E and the F, and adding to obtain a local confidence coefficient between the triples, namely the local triple confidence coefficient:
T1=x′i·x′i+1,T2=r′i·r′i+1,T3=x′i+1·x′i+2 (5);
D=T1·W1,E=T2·W2,F=T3·W3 (6);
Figure FDA0003369718820000011
Figure FDA0003369718820000012
Figure FDA0003369718820000013
Figure FDA0003369718820000014
Figure FDA0003369718820000021
wherein MIN (-) represents the minimum value of the matrix, MAX (-) represents the maximum value of the matrix, 1 represents that the elements in the matrix are all 1, -1 represents that the elements in the matrix are all-1,
Figure FDA0003369718820000022
respectively representing different logic operations, wherein LTT (i, j) is a local triple confidence;
s24, multiplying the confidence coefficient of the global triple and the confidence coefficient of the local triple to obtain the confidence coefficient score G of the node Ti
Gi=GTT(i,i+1)·LTT(i,i+1) (12)。
3. The method for detecting knowledge-graph noise based on path confidence as claimed in claim 2, wherein in step three, the specific method is:
s31, selecting the confidence score G of each nodeiAnd confidence G of neighboring nodesi+1、Gi-1As the input of the bidirectional GRU, the calculation modes of the ith forward GRU and the backward GRU are respectively as follows:
Figure FDA0003369718820000023
Figure FDA0003369718820000024
wherein the content of the first and second substances,
Figure FDA0003369718820000025
which represents the output result of the forward GRU,
Figure FDA0003369718820000026
represents the output result of backward GRU, GRU (-) represents the gated loop network.
S32, performing concatenation, linear and normalization operations on the final outputs of the forward GRU and the backward GRU to obtain a path score matrix:
Figure FDA0003369718820000027
wherein h (p) represents the output result of the gated loop network, i.e. the path score matrix,
Figure FDA0003369718820000028
representing the final output result of the forward GRU,
Figure FDA0003369718820000029
representing the final output of the backward GRUAs a result, concat () represents a join function, line () represents a linear function, and softmax () represents a normalization function.
4. The method for knowledge-graph noise detection based on path confidence as claimed in claim 3, wherein in step four, the path confidence and the optimal triplet are calculated by:
Figure FDA00033697188200000210
when in use
Figure FDA00033697188200000211
When, h (f)k)=h(pj) (17);
Wherein g (p) represents path confidence, h (p)j) A matrix of the scores of the paths is represented,
Figure FDA00033697188200000212
l2 function, g (f), representing a matrixk) Indicates the maximum path confidence, h (f)k) The optimal path score matrix representing the triplet is also the optimal embedding matrix for the triplet.
5. The method of knowledge-graph noise detection based on path confidence of claim 4, wherein the designed loss function is as follows:
Figure FDA0003369718820000031
Figure FDA0003369718820000032
the method comprises the following steps that exp () represents an exponential function with a natural constant e as a base, log () represents a logarithmic function, L represents a loss function, P (h, r, T) represents a path from a head entity h to a tail entity T, r represents a relation, T 'represents a set of valid triples, T' represents a set of invalid triples, the invalid triples refer to triples formed by randomly switching one head entity or one tail entity of original triples, and the valid triples refer to the original triples.
CN202111393836.9A 2021-11-23 2021-11-23 Knowledge graph noise detection method based on path confidence Active CN114077676B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111393836.9A CN114077676B (en) 2021-11-23 2021-11-23 Knowledge graph noise detection method based on path confidence

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111393836.9A CN114077676B (en) 2021-11-23 2021-11-23 Knowledge graph noise detection method based on path confidence

Publications (2)

Publication Number Publication Date
CN114077676A true CN114077676A (en) 2022-02-22
CN114077676B CN114077676B (en) 2022-09-30

Family

ID=80284076

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111393836.9A Active CN114077676B (en) 2021-11-23 2021-11-23 Knowledge graph noise detection method based on path confidence

Country Status (1)

Country Link
CN (1) CN114077676B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114691896A (en) * 2022-05-31 2022-07-01 浙江大学 Knowledge graph data cleaning method and device

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180060733A1 (en) * 2016-08-31 2018-03-01 International Business Machines Corporation Techniques for assigning confidence scores to relationship entries in a knowledge graph
CN112035672A (en) * 2020-07-23 2020-12-04 深圳技术大学 Knowledge graph complementing method, device, equipment and storage medium
CN112732931A (en) * 2021-01-08 2021-04-30 中国人民解放军国防科技大学 Method and equipment for noise detection and knowledge completion of knowledge graph
CN112819162A (en) * 2021-02-02 2021-05-18 东北大学 Quality inspection method for knowledge graph triple
CN112836064A (en) * 2021-02-24 2021-05-25 吉林大学 Knowledge graph complementing method and device, storage medium and electronic equipment
CN113420163A (en) * 2021-06-25 2021-09-21 中国人民解放军国防科技大学 Heterogeneous information network knowledge graph completion method and device based on matrix fusion

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180060733A1 (en) * 2016-08-31 2018-03-01 International Business Machines Corporation Techniques for assigning confidence scores to relationship entries in a knowledge graph
CN112035672A (en) * 2020-07-23 2020-12-04 深圳技术大学 Knowledge graph complementing method, device, equipment and storage medium
CN112732931A (en) * 2021-01-08 2021-04-30 中国人民解放军国防科技大学 Method and equipment for noise detection and knowledge completion of knowledge graph
CN112819162A (en) * 2021-02-02 2021-05-18 东北大学 Quality inspection method for knowledge graph triple
CN112836064A (en) * 2021-02-24 2021-05-25 吉林大学 Knowledge graph complementing method and device, storage medium and electronic equipment
CN113420163A (en) * 2021-06-25 2021-09-21 中国人民解放军国防科技大学 Heterogeneous information network knowledge graph completion method and device based on matrix fusion

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
MA J: "a high-accuracy link prediction approach for knowledge graph completion", 《SYMMETRY》 *
谢文豪: "基于结构与文本联合表示学习的知识图谱补全任务", 《中国优秀硕士学位论文全文数据库 信息科技辑》 *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114691896A (en) * 2022-05-31 2022-07-01 浙江大学 Knowledge graph data cleaning method and device

Also Published As

Publication number Publication date
CN114077676B (en) 2022-09-30

Similar Documents

Publication Publication Date Title
WO2021159742A1 (en) Image segmentation method and apparatus, and storage medium
WO2021248938A1 (en) Image defogging method based on generative adversarial network fused with feature pyramid
CN113407759B (en) Multi-modal entity alignment method based on adaptive feature fusion
Ren et al. Exploring models and data for image question answering
Mallya et al. Learning informative edge maps for indoor scene layout prediction
CN110880019B (en) Method for adaptively training target domain classification model through unsupervised domain
CN113656596B (en) Multi-modal entity alignment method based on triple screening fusion
CN109753571B (en) Scene map low-dimensional space embedding method based on secondary theme space projection
CN110413704B (en) Entity alignment method based on weighted neighbor information coding
CN112258486B (en) Retinal vessel segmentation method for fundus image based on evolutionary neural architecture search
JP2011133988A5 (en)
JP2022018066A (en) Loop detection method based on convolutional perception hash algorithm
WO2022179384A1 (en) Social group division method and division system, and related apparatuses
CN110851491A (en) Network link prediction method based on multiple semantic influences of multiple neighbor nodes
CN114077676B (en) Knowledge graph noise detection method based on path confidence
CN107451617B (en) Graph transduction semi-supervised classification method
CN112364747A (en) Target detection method under limited sample
Osting et al. Statistical ranking using the l1-norm on graphs
CN114283315A (en) RGB-D significance target detection method based on interactive guidance attention and trapezoidal pyramid fusion
CN117009547A (en) Multi-mode knowledge graph completion method and device based on graph neural network and countermeasure learning
CN115471885A (en) Action unit correlation learning method and device, electronic device and storage medium
CN114942998A (en) Entity alignment method for sparse knowledge graph neighborhood structure fusing multi-source data
CN114782503A (en) Point cloud registration method and system based on multi-scale feature similarity constraint
CN116955846B (en) Cascade information propagation prediction method integrating theme characteristics and cross attention
Zhang et al. Heuristic search for homology localization problem and its application in cardiac trabeculae reconstruction

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant