CN111026865A

CN111026865A - Relation alignment method, device and equipment of knowledge graph and storage medium

Info

Publication number: CN111026865A
Application number: CN201910992324.0A
Authority: CN
Inventors: 凌岚; 刘嘉伟; 于修铭; 陈晨; 李可; 汪伟
Original assignee: Ping An Technology Shenzhen Co Ltd
Current assignee: Ping An Technology Shenzhen Co Ltd
Priority date: 2019-10-18
Filing date: 2019-10-18
Publication date: 2020-04-17
Anticipated expiration: 2039-10-18
Also published as: CN111026865B; WO2021072891A1

Abstract

The invention relates to the field of artificial intelligence, and discloses a relation alignment method, a device, equipment and a storage medium of a knowledge graph. The method comprises the following steps: acquiring triple group data in the knowledge graph; converting the ternary group data through a preset vector conversion model to obtain a relationship initial vector of each relationship; clustering all the relation initial vectors through a preset clustering algorithm to obtain a relation clustering vector of each relation; clustering initial relation vectors of the same relation in the triple data through a preset clustering algorithm to obtain a relation sub-vector of each relation; calculating the relationship similarity between the relationships in any two triples based on the relationship initial vector, the relationship clustering vector and the relationship sub-vector; and classifying the relation according to the relation similarity.

Description

Relation alignment method, device and equipment of knowledge graph and storage medium

Technical Field

The invention relates to the field of artificial intelligence, in particular to a relation alignment method, a device, equipment and a storage medium of a knowledge graph.

Background

Knowledge graph technology is increasingly becoming the basis of artificial intelligence, and is an important method for machine understanding of natural language and construction of knowledge network. In recent years, different knowledge graph fusion is invisibly created, and the knowledge graph fusion method helps workers in different working fields to retrieve related contents, so that the working quality and the working efficiency are improved.

The legal knowledge system is very complex and is a combination of various logics. In order to construct a complete knowledge system, knowledge from various fields needs to be fused, the knowledge system is continuously updated, new entities and relationships are added in the updating process, and a problem and a challenge are presented in how to correspond the new relationships to the existing knowledge graph. Relationships of the same meaning may have different expressions, such as "country of the country" and "nationality", and the same relationship also has multiple meanings, such as < new york, belong to, usa > and < display, belong to, tv > and partial relationship. However, no relevant research on relation classification exists at present, and the efficiency of relation classification in the knowledge graph is low.

Disclosure of Invention

The invention provides a relation alignment method, a device, equipment and a storage medium of a knowledge graph, which can determine whether each relation represents the same meaning and classify the relation by vectorizing the relation based on the upper and lower hierarchical structure information of the relation and calculating the similarity between the relation vectors, thereby improving the efficiency of the relation alignment in the knowledge graph.

A first aspect of an embodiment of the present invention provides a relation alignment method for a knowledge graph, including: acquiring triple data in a knowledge graph, wherein the triple comprises a main body, a relation and an attribute; converting the triple data through a preset vector conversion model to obtain a relationship initial vector of each relationship; clustering all the relation initial vectors through a preset clustering algorithm to obtain a relation clustering vector of each relation; clustering the initial relation vectors of the same relation in the triple data through the preset clustering algorithm to obtain a relation sub-vector of each relation; calculating the relationship similarity between the relationships in any two triples based on the relationship initial vector, the relationship cluster vector and the relationship sub-vector; and classifying the relation according to the relation similarity.

Optionally, in a first implementation manner of the first aspect of the embodiment of the present invention, the converting the triple data by using a preset vector conversion model to obtain a relationship initial vector of each relationship includes: mapping subjects and objects of a target triplet into low-dimensional vectors

And

the target triple is any triple in the triple data, and i is the total number of the triples in the triple data; adjustment by preset loss function

And

up to

At a minimum, wherein

When said

At the minimum, will

Said target triplets arranged as vector type, said

Is the relationship initial vector.

Optionally, in a second implementation manner of the first aspect of the embodiment of the present invention, the clustering all the relationship initial vectors by using a preset clustering algorithm to obtain a relationship clustering vector of each relationship includes: selecting k relation initial vectors from all the relation initial vectors as a first centroid vector, wherein k is an integer greater than 1; calculating Euclidean distances between each relationship initial vector and k first centroid vectors, and determining a first clustering vector of each relationship initial vector, wherein the first clustering vector is the first centroid vector with the smallest Euclidean distance with one relationship initial vector; classifying each relationship initial vector and the corresponding first clustering vector into a cluster class to obtain k first cluster classes; calculating a second centroid vector of each first cluster class based on a mean principle; calculating Euclidean distances between each relationship initial vector and k second centroid vectors, and determining a second clustering vector of each relationship initial vector, wherein the second clustering vector is the second centroid vector with the smallest Euclidean distance with one relationship initial vector; classifying each relationship initial vector and the corresponding second cluster vector into a cluster to obtain k second clusters; calculating a third centroid vector of each second cluster class based on a mean principle; and repeatedly calculating the (N + 1) th centroid vector of each N-th cluster until the N-th centroid vector is the same as the (N + 1) th centroid vector, taking the N-th cluster as k clusters, taking the (N + 1) th centroid vector as a relational clustering vector, and taking N as an integer greater than 1.

Optionally, in a third implementation manner of the first aspect of the embodiment of the present invention, the clustering the initial relationship vectors of the same relationship in the triple data by using the preset clustering algorithm to obtain a relationship sub-vector of each relationship includes: selecting n initial relation vectors from all the initial relation vectors of a target relation as n first sub-centroid vectors, wherein the target relation is the same at least one relation, and n is an integer greater than 1; calculating Euclidean distances between each relation initial vector of the target relation and n first sub-centroid vectors, and determining a first sub-vector of each relation initial vector of the target relation, wherein the first sub-vector is the first sub-centroid vector with the smallest Euclidean distance with one relation initial vector; taking each relationship initial vector of the target relationship and the corresponding first sub-vector as a sub-cluster class to obtain n first sub-cluster classes; calculating a second sub-centroid vector of each first sub-cluster class based on a mean principle; calculating Euclidean distances between each relationship initial vector of the target relationship and n second sub-centroid vectors, and determining a second sub-vector of each relationship initial vector of the target relationship, wherein the second sub-vector is a second sub-centroid vector with the minimum Euclidean distance from one relationship initial vector; classifying each relationship initial vector of the target relationship and the corresponding second sub-vector into a cluster class to obtain n second sub-cluster classes; calculating a third sub-centroid vector of each second sub-cluster class based on a mean principle; repeatedly calculating the M +1 sub-centroid vector of each Mth sub-cluster until the Mth sub-centroid vector is the same as the M +1 sub-centroid vector, taking n Mth sub-clusters as the subclasses of the target relationship, taking the M +1 sub-centroid vector as the relationship sub-vector of the target relationship, and taking M as an integer greater than 1.

Optionally, in a fourth implementation manner of the first aspect of the embodiment of the present invention, the calculating a relationship similarity between any two relationships based on the relationship initial vector, the relationship cluster vector, and the relationship sub-vector includes: generating a relationship vector of the relationship in each triplet through a preset formula

Wherein the content of the first and second substances,

the relationship vector is represented by a vector of the relationship,

representing the set of relational cluster vectors in a cluster of,

the initial vector of said relationship is represented by,

representing the relationship subvectors;

calculating the relationship similarity by cosine distance:

wherein

And

the relationship vector representing any two relationships.

Optionally, in a fifth implementation manner of the first aspect of the embodiment of the present invention, the classifying the relationship according to the relationship similarity includes: determining relationship similarity results among all relationships according to the relationship similarity, wherein the relationship similarity results comprise similarity and dissimilarity; if the relationship similarity results of the relationship pairs to be classified are similar, and the relationship pairs to be classified are any two relationships, bisecting the relationship pairs to be classified into a cluster; and if the similar results of the relations of the relation pairs to be classified are not similar, keeping the original clustering of each relation in the relation pairs to be classified.

Optionally, the determining a relationship similarity result between all relationships according to the relationship similarity includes: judging whether the relationship similarity of the relationship pairs to be classified exceeds a preset threshold value; if the relationship similarity of the relationship pair to be classified exceeds a preset threshold, determining that the relationship pair to be classified is similar; and if the relationship similarity of the relationship pair to be classified does not exceed the preset threshold, determining that the relationship pair to be classified is not similar.

A second aspect of an embodiment of the present invention provides a relation alignment apparatus for a knowledge graph, including: the acquisition unit is used for acquiring triple data in the knowledge graph, wherein the triple comprises a main body, a relation and an attribute; the conversion unit is used for converting the triple data through a preset vector conversion model to obtain a relationship initial vector of each relationship; the first clustering unit is used for clustering all the relation initial vectors through a preset clustering algorithm to obtain a relation clustering vector of each relation; the second clustering unit is used for clustering the initial relationship vectors of the same relationship in the triple data through the preset clustering algorithm to obtain a relationship sub-vector of each relationship; the calculating unit is used for calculating the relationship similarity between the relationships in any two triples based on the relationship initial vector, the relationship clustering vector and the relationship sub-vector; and the classification unit is used for classifying the relationship according to the relationship similarity.

Optionally, in a first implementation manner of the second aspect of the embodiment of the present invention, the conversion unit is specifically configured to: mapping subjects and objects of a target triplet into low-dimensional vectors

And

And

up to

At a minimum, wherein

When said

At the minimum, will

Said target triplets arranged as vector type, said

Is the relationship initial vector.

Optionally, in a second implementation manner of the second aspect of the embodiment of the present invention, the first clustering unit is specifically configured to: selecting k relation initial vectors from all the relation initial vectors as a first centroid vector, wherein k is an integer greater than 1; calculating Euclidean distances between each relationship initial vector and k first centroid vectors, and determining a first clustering vector of each relationship initial vector, wherein the first clustering vector is the first centroid vector with the smallest Euclidean distance with one relationship initial vector; classifying each relationship initial vector and the corresponding first clustering vector into a cluster class to obtain k first cluster classes; calculating a second centroid vector of each first cluster class based on a mean principle; calculating Euclidean distances between each relationship initial vector and k second centroid vectors, and determining a second clustering vector of each relationship initial vector, wherein the second clustering vector is the second centroid vector with the smallest Euclidean distance with one relationship initial vector; classifying each relationship initial vector and the corresponding second cluster vector into a cluster to obtain k second clusters; calculating a third centroid vector of each second cluster class based on a mean principle; and repeatedly calculating the (N + 1) th centroid vector of each N-th cluster until the N-th centroid vector is the same as the (N + 1) th centroid vector, taking the N-th cluster as k clusters, taking the (N + 1) th centroid vector as a relational clustering vector, and taking N as an integer greater than 1.

Optionally, in a third implementation manner of the second aspect of the embodiment of the present invention, the second clustering unit is specifically configured to: selecting n initial relation vectors from all the initial relation vectors of a target relation as n first sub-centroid vectors, wherein the target relation is the same at least one relation, and n is an integer greater than 1; calculating Euclidean distances between each relation initial vector of the target relation and n first sub-centroid vectors, and determining a first sub-vector of each relation initial vector of the target relation, wherein the first sub-vector is the first sub-centroid vector with the smallest Euclidean distance with one relation initial vector; taking each relationship initial vector of the target relationship and the corresponding first sub-vector as a sub-cluster class to obtain n first sub-cluster classes; calculating a second sub-centroid vector of each first sub-cluster class based on a mean principle; calculating Euclidean distances between each relationship initial vector of the target relationship and n second sub-centroid vectors, and determining a second sub-vector of each relationship initial vector of the target relationship, wherein the second sub-vector is a second sub-centroid vector with the minimum Euclidean distance from one relationship initial vector; classifying each relationship initial vector of the target relationship and the corresponding second sub-vector into a cluster class to obtain n second sub-cluster classes; calculating a third sub-centroid vector of each second sub-cluster class based on a mean principle; repeatedly calculating the M +1 sub-centroid vector of each Mth sub-cluster until the Mth sub-centroid vector is the same as the M +1 sub-centroid vector, taking n Mth sub-clusters as the subclasses of the target relationship, taking the M +1 sub-centroid vector as the relationship sub-vector of the target relationship, and taking M as an integer greater than 1.

Optionally, in a fourth implementation manner of the second aspect of the embodiment of the present invention, the calculation unit includes: a generating module, configured to generate a relationship vector of the relationship in each triplet according to a preset formula

Wherein the content of the first and second substances,

the relationship vector is represented by a vector of the relationship,

representing the set of relational cluster vectors in a cluster of,

the initial vector of said relationship is represented by,

representing the relationship subvectors; the calculating module is used for calculating the relation similarity through the cosine distance:

wherein

And

the relationship vector representing any two relationships.

Optionally, in a fifth implementation manner of the second aspect of the embodiment of the present invention, the classifying unit includes: the determining module is used for determining relationship similarity results among all relationships according to the relationship similarity, wherein the relationship similarity results comprise similarity and dissimilarity; the classification module is used for bisecting the relation to be classified into a cluster if the relation similarity results of the relation pairs to be classified are similar and the relation pairs to be classified are any two relations; and the classification module is also used for keeping the original clustering of each relation in the relation pairs to be classified if the similar results of the relations in the relation pairs to be classified are not similar.

Optionally, in a sixth implementation manner of the second aspect of the embodiment of the present invention, the determining module is specifically configured to: judging whether the relationship similarity of the relationship pairs to be classified exceeds a preset threshold value; if the relationship similarity of the relationship pair to be classified exceeds a preset threshold, determining that the relationship pair to be classified is similar; and if the relationship similarity of the relationship pair to be classified does not exceed the preset threshold, determining that the relationship pair to be classified is not similar.

A third aspect of the embodiments of the present invention provides a relation alignment apparatus of a knowledge graph, including a memory, a processor, and a computer program stored on the memory and executable on the processor, where the processor implements the relation alignment method of the knowledge graph according to any one of the above embodiments when executing the computer program.

A fourth aspect of an embodiment of the present invention provides a computer-readable storage medium, including instructions, which, when executed on a computer, cause the computer to perform the steps of the relation alignment method of a knowledge-graph according to any one of the above embodiments.

In the technical scheme provided by the embodiment of the invention, triple data in a knowledge graph is obtained, wherein the triple comprises a main body, a relation and an attribute; converting the triple data through a preset vector conversion model to obtain a relationship initial vector of each relationship; clustering all the relation initial vectors through a preset clustering algorithm to obtain a relation clustering vector of each relation; clustering the initial relation vectors of the same relation in the triple data through the preset clustering algorithm to obtain a relation sub-vector of each relation; calculating the relationship similarity between the relationships in any two triples based on the relationship initial vector, the relationship cluster vector and the relationship sub-vector; and classifying the relation according to the relation similarity. According to the embodiment of the invention, the upper and lower hierarchical structures of the relation of the knowledge graph are constructed, the relation is vectorized based on the upper and lower hierarchical structure information, whether each relation represents the same meaning or not is determined by calculating the similarity between the relation vectors, and the relation is classified, so that the efficiency of relation alignment in the knowledge graph is improved.

Drawings

FIG. 1 is a diagram of an embodiment of a relation alignment method of a knowledge graph in an embodiment of the present invention;

FIG. 2 is a diagram of another embodiment of the relation alignment method of the knowledge-graph in the embodiment of the invention;

FIG. 3 is a diagram of an embodiment of a relation alignment apparatus of a knowledge-graph in an embodiment of the present invention;

FIG. 4 is a schematic diagram of another embodiment of a relation alignment apparatus of a knowledge-graph in an embodiment of the invention;

FIG. 5 is a diagram of an embodiment of a relation alignment apparatus of a knowledge-graph in an embodiment of the present invention.

Detailed Description

The embodiment of the invention provides a relation alignment method, a device, equipment and a storage medium of a knowledge graph, which can determine whether each relation represents the same meaning and classify the relation by vectorizing the relation based on the upper and lower hierarchical structure information of the relation and calculating the similarity between relation vectors, thereby improving the efficiency of relation alignment in the knowledge graph.

In order to make the technical field of the invention better understand the scheme of the invention, the embodiment of the invention will be described in conjunction with the attached drawings in the embodiment of the invention.

The terms "first," "second," "third," "fourth," and the like in the description and in the claims, as well as in the drawings, if any, are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It will be appreciated that the data so used may be interchanged under appropriate circumstances such that the embodiments described herein may be practiced otherwise than as specifically illustrated or described herein. Furthermore, the terms "comprises," "comprising," or "having," and any variations thereof, are intended to cover non-exclusive inclusions, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.

Referring to fig. 1, a flowchart of a relation alignment method of a knowledge graph according to an embodiment of the present invention specifically includes:

101. and acquiring triple data in the knowledge graph, wherein the triple comprises a main body, a relation and an attribute.

The server acquires triple data in the legal knowledge graph, wherein the triple comprises a main body, a relation and an attribute. The data structure of the knowledge graph is a triplet, in which in a triplet (h, r, t), h is subject, t is object, and r is a relationship, such as a triplet (new york, belong to, usa), new york is subject, usa is object, belong to relationship. A triplet is an intuitive data structure, where a subject and an object are collectively referred to as an entity (entity), and the relationship has irreversible attributes, and in a triplet, the subject and the object cannot be interchanged.

102. And converting the ternary group data through a preset vector conversion model to obtain a relation initial vector of each relation.

And the server converts the ternary group data through a preset vector conversion model to obtain a relationship initial vector of each relationship. The preset vector conversion model converts character-type triple data into vector-type triple data, converts character-type relational data into relational initial vectors, converts character-type subject data into subject vectors, and converts character-type object data into object vectors. Each triplet corresponds to a set of vectors.

Specifically, the subject and the object of the target triple are mapped into low-dimensional vectors

And

And

up to

At a minimum, wherein

When in use

At the minimum, will

The target triplets are set to be of the vector type,

for the relationship initial vector, for example, the triplet (taiwan, belong to, china), the server maps the subject "taiwan" and the object "china" into a low-dimensional vector and adjusts the vectors to be (1, 0) and (0, 1), resulting in that the relationship initial vector of the relationship "belong to" is (1, -1).

It should be noted that, in a knowledge graph, the vector form of a subject, object or relationship in any triplet is unique.

The preset vector conversion model is a conversion embedding (TransE) model, the TransE model is trained based on samples in the legal knowledge graph, and triple data in the legal knowledge graph are converted into a vector form.

103. And clustering all the relation initial vectors through a preset clustering algorithm to obtain a relation clustering vector of each relation.

And the server clusters all the relation initial vectors through a preset clustering algorithm to obtain a relation clustering vector of each relation. In the knowledge graph, different relations may have the same meaning, for example, "country of the country" and "nationality" have the same meaning, the server classifies different relations representing the same meaning into one class, and sets a relation clustering vector of the same class of relations. In the same type of relationship, the Euclidean distance between the relationship initial vector of each relationship and the relationship clustering vector of the type is the minimum.

Specifically, the server selects k relation initial vectors from all relation initial vectors as a first centroid vector, wherein k is an integer greater than 1; the server calculates the Euclidean distance between each relation initial vector and k first centroid vectors, and determines a first clustering vector of each relation initial vector, wherein the first clustering vector is the first centroid vector with the minimum Euclidean distance to one relation initial vector; the server classifies each relationship initial vector and the corresponding first clustering vector into a cluster class to obtain k first cluster classes; the server calculates a second centroid vector of each first cluster class based on a mean principle; the server calculates Euclidean distances between each relation initial vector and k second centroid vectors, and determines a second clustering vector of each relation initial vector, wherein the second clustering vector is the second centroid vector with the minimum Euclidean distance to one relation initial vector; the server classifies each relationship initial vector and the corresponding second cluster vector into a cluster class to obtain k second cluster classes; the server calculates a third centroid vector of each second cluster based on a mean principle; and the server repeatedly calculates the (N + 1) th centroid vector of each N-th cluster until the N-th centroid vector is the same as the (N + 1) th centroid vector, the N-th cluster is used as k clusters, the (N + 1) th centroid vector is used as a relational clustering vector, and N is an integer greater than 1.

For example, when the relation initial vectors include (1, 0), (1, 2), (2, 1), (0, 3), the server selects 2 relation initial vectors (1, 0) and (2, 1) as the first centroid vector, and the euclidean distance between the relation initial vector (1, 2) and the first centroid vector (1, 0) is 2 and the euclidean distance between the relation initial vector (1, 2) and the first centroid vector (2, 1)

The first cluster vector of the relationship initial vector (1, 2) is (2, 1), the first cluster vector of the relationship initial vector (0, 3) is (2, 1), then (1, 0) is a first cluster class, (1, 2), (0, 3) and (2, 1) are another cluster class, and the server obtains the second centroid vectors of the first cluster class by calculation as (1/1, 0/1) and (2, 1)

I.e., (1, 0) and (1, 2), respectivelyAnd the server determines that the first centroid vector is the same as the second centroid vector, and then the server takes the second centroid vector as a relational clustering vector, and the 2 first clusters are 2 clusters.

104. And clustering the initial relation vectors of the same relation in the triple data through a preset clustering algorithm to obtain a relation sub-vector of each relation.

And the server clusters all the initial relation vectors of each relation through a preset clustering algorithm to obtain a relation sub-vector of each relation. The server converts the multiple triples of the relation through a preset vector conversion model to obtain multiple initial relation vectors of the relation. In a triplet, the subject, the object and the relationship are all converted into vectors through a preset vector conversion model, and the vectors obtained by converting different triples are different.

It should be noted that a relationship exists in different triples, a relationship has the same or different meaning in different triples, and an initial relationship vector converted from a relationship in different triples is different, for example, a "belonging" relationship exists in a triplet (new york, belong to, usa), and in a triplet (new york, belong to, usa), the "belonging" relationship represents a regional dependency relationship, and the initial relationship vector of the "belonging" relationship is (1,0,1, 1); the "belonging" relationship also exists in the triplet (display, belonging, tv), where the "belonging" relationship represents partial and global dependencies, and the "belonging" relationship's initial relationship vector is (1,0,0, 1); the "belonging" relationship also exists in the triplet (beijing, belonged to, china), in which the initial relationship vector of "belonging" is (1,0, 1), and the meaning of "belonging" relationship in the triplet (beijing, belonged to, china) is the same as that in the triplet (new york, belonged to, usa).

Specifically, the server selects n relation initial vectors from all initial relation vectors of a target relation as n first sub-centroid vectors, the target relation is the same at least one relation, and n is an integer greater than 1; the server calculates Euclidean distances between each relation initial vector of the target relation and n first sub-centroid vectors, and determines a first sub-vector of each relation initial vector of the target relation, wherein the first sub-vector is a first sub-centroid vector with the smallest Euclidean distance with one relation initial vector; the server takes each relationship initial vector of the target relationship and the corresponding first sub-vector as a sub-cluster class to obtain n first sub-cluster classes; the server calculates a second sub-centroid vector of each first sub-cluster class based on a mean principle;

the server calculates Euclidean distances between each relation initial vector of the target relation and n second sub-centroid vectors, and determines a second sub-vector of each relation initial vector of the target relation, wherein the second sub-vector is a second sub-centroid vector with the minimum Euclidean distance from one relation initial vector; the server classifies each relationship initial vector of the target relationship and the corresponding second sub-vector into a cluster class to obtain n second sub-cluster classes; the server calculates a third sub-centroid vector of each second sub-cluster class based on a mean principle; and the server repeatedly calculates the M +1 sub-centroid vector of each M sub-cluster until the M sub-centroid vector is the same as the M +1 sub-centroid vector, the n M sub-clusters are taken as the subclasses of the target relationship, the M +1 sub-centroid vector is taken as the relationship sub-vector of the target relationship, and M is an integer greater than 1.

105. And calculating the relationship similarity between the relationships in any two triples based on the relationship initial vector, the relationship cluster vector and the relationship sub-vector.

And the server calculates the relationship similarity between any two relationships based on the relationship initial vector, the relationship clustering vector and the relationship sub-vector. The relationship similarity is calculated by combining the relationship initial vector, the relationship clustering vector and the relationship sub-vector, and whether the meaning of the relationship in each triple is the same or not can be more accurately determined by utilizing the hierarchical structure information of the relationship, so that the relationship can be classified, and the accuracy of identifying the relationship is improved.

106. And classifying the relation according to the relation similarity.

The server classifies the relations according to the relation similarity, the similar relations are divided into a cluster, and if the two relations are a cluster, the similar relations are divided into a subclass.

It can be understood that, when a new relationship is added to the knowledge graph spectrum, the server first divides the distance of the relationship by calculating the euclidean distance between the initial relationship vector of the relationship and each relationship clustering vector; then, according to whether the same relation exists in the knowledge graph or not, the subclass of the relation is divided, if the same relation exists in the knowledge graph, the subclass of the relation is divided by calculating the Euclidean distance between the initial relation vector of the relation and each relation sub-vector of the same relation; if the relationship does not have the same relationship in the knowledge graph, the initial relationship vector of the relationship is used as a subclass of the relationship.

According to the embodiment of the invention, the relations are vectorized based on the upper and lower hierarchical structure information, whether the relations represent the same meaning or not is determined by calculating the similarity between the relation vectors, and the relations are classified, so that the efficiency of relation alignment in the knowledge graph is improved.

Referring to fig. 2, another embodiment of the method for aligning relationships of a knowledge graph according to an embodiment of the present invention includes:

201. and acquiring triple data in the knowledge graph, wherein the triple comprises a main body, a relation and an attribute.

The server acquires triple data in the legal knowledge graph, wherein the triple comprises a main body, a relation and an attribute. The data structure of the knowledge graph is a triplet, in which in a triplet (h, r, t), h is subject, t is object, and r is a relationship, such as a triplet (new york, belong to, usa), new york is subject, usa is object, belong to relationship. A triplet is an intuitive data structure, where the subject and object are collectively referred to as entity, and the relationship has irreversible attributes, and in a triplet, the subject and object cannot be interchanged.

202. And converting the ternary group data through a preset vector conversion model to obtain a relation initial vector of each relation.

And

And

up to

At a minimum, wherein

When in use

At the minimum, will

The target triplets are set to be of the vector type,

for a relationship initial vector, e.g., a triplet (taiwan, of, china), the server maps the subject "taiwan" and the object "china" to a low dimensional vector and adjusts the settings to (1,0) and (0, 1), resulting in a relationship initial vector of (1, -1) for the relationship "belonging".

The preset vector conversion model is a TransE model, the TransE model is trained based on samples in the legal knowledge graph, and the triple data in the legal knowledge graph are converted into a vector form.

203. And clustering all the relation initial vectors through a preset clustering algorithm to obtain a relation clustering vector of each relation.

Namely (1, 0) and (1, 2), the server determines that the first centroid vector is the same as the second centroid vector, and then the server takes the second centroid vector as a relational cluster vector, and 2 first clusters are 2 clusters.

204. And clustering the initial relation vectors of the same relation in the triple data through a preset clustering algorithm to obtain a relation sub-vector of each relation.

Specifically, the server selects n relation initial vectors from all initial relation vectors of a target relation as n first sub-centroid vectors, the target relation is the same at least one relation, and n is an integer greater than 1; the server calculates Euclidean distances between each relation initial vector of the target relation and n first sub-centroid vectors, and determines a first sub-vector of each relation initial vector of the target relation, wherein the first sub-vector is a first sub-centroid vector with the smallest Euclidean distance with one relation initial vector; the server takes each relationship initial vector of the target relationship and the corresponding first sub-vector as a sub-cluster class to obtain n first sub-cluster classes; the server calculates a second sub-centroid vector of each first sub-cluster class based on a mean principle; the server calculates Euclidean distances between each relation initial vector of the target relation and n second sub-centroid vectors, and determines a second sub-vector of each relation initial vector of the target relation, wherein the second sub-vector is a second sub-centroid vector with the minimum Euclidean distance from one relation initial vector; the server classifies each relationship initial vector of the target relationship and the corresponding second sub-vector into a cluster class to obtain n second sub-cluster classes; the server calculates a third sub-centroid vector of each second sub-cluster class based on a mean principle; and the server repeatedly calculates the M +1 sub-centroid vector of each M sub-cluster until the M sub-centroid vector is the same as the M +1 sub-centroid vector, the n M sub-clusters are taken as the subclasses of the target relationship, the M +1 sub-centroid vector is taken as the relationship sub-vector of the target relationship, and M is an integer greater than 1.

205. Generating a relation vector of the relation in each triple through a preset formula

Wherein the content of the first and second substances,

the relationship vector is represented by a vector of relationships,

a cluster vector of the relationships is represented,

the initial vector of the relationship is represented,

representing the relationship subvectors.

The server generates a relation vector of the relation in each triple through a preset formula

Wherein the content of the first and second substances,

the relationship vector is represented by a vector of relationships,

a cluster vector of the relationships is represented,

the initial vector of the relationship is represented,

representing the relationship subvectors. The preset formula combines the relation vector with the relation clustering vector, the relation initial vector and the relation sub-vector, so that the accuracy of identifying the relation is improved.

For example, in a triplet (taiwan, belonging, china), the relationship "belonging" has the relationship initial vector of (1, -1), the relationship clustering vector of (1, 0), and the relationship word vector of (0, 1), and then in the triplet (taiwan, belonging, china), the relationship vector of the relationship "belonging" is (2, 0).

206. Calculating the relationship similarity by cosine distance:

wherein

And

a relationship vector representing any two relationships.

And the server calculates the relationship similarity between any two relationships based on the relationship initial vector, the relationship clustering vector and the relationship sub-vector. The relationship similarity is calculated by combining the relationship initial vector, the relationship clustering vector and the relationship sub-vector, and whether the meaning of the relationship in each triple is the same or not can be more accurately determined by utilizing the hierarchical structure information of the relationship, so that the relationship can be classified, and the accuracy of identifying the relationship is improved. The cosine similarity calculation formula of the relationship similarity is as follows:

wherein

And

a relationship vector representing any two relationships.

It is understood that, in addition to calculating the relationship similarity through the cosine distance, the relationship similarity may also be calculated through other distance formulas, and is not limited herein.

207. And determining a relationship similarity result between all relationships according to the relationship similarity, wherein the relationship similarity result comprises similarity and dissimilarity.

And the server determines a relationship similarity result among all the relationships according to the relationship similarity. The server obtains a preset threshold value based on the relation similarity rule of the relation vectors in the database, judges that the relations are similar when the relation similarity exceeds the preset threshold value, and judges that the relations are dissimilar when the relation similarity does not exceed the preset threshold value. Specifically, the server judges whether the relationship similarity of the relationship pair to be classified exceeds a preset threshold value; if the relationship similarity of the relationship pair to be classified exceeds a preset threshold, the server determines that the relationship pair to be classified is similar; and if the relation similarity of the relation pair to be classified does not exceed the preset threshold, the server determines that the relation pair to be classified is not similar.

208. And if the similar results of the relations of the relation pairs to be classified are similar and the relation pairs to be classified are any two relations, bisecting the relations to be classified into a cluster.

And if the relationship similarity results of the relationship pairs to be classified are similar and the relationship pairs to be classified are any two relationships, the server divides the relationship pairs to be classified into a cluster. In the same cluster, the relational clustering vectors of each relation are the same. And if the similar results of the relations of the relation pairs to be classified are not similar, the server keeps the original cluster of each relation in the relation pairs to be classified.

It should be noted that, after determining the cluster of the relationship, the server determines the subclass of the relationship according to whether the same relationship exists in the cluster and the original subclass of the same relationship. If the same relationship of a relationship does not exist in the knowledge graph, the server divides the relationship into a subclass separately.

It can be understood that when a new relationship is added to the knowledge graph, if the server determines that there is no similarity relationship of the relationship in the knowledge graph, the server treats the relationship alone as a cluster.

According to the embodiment of the invention, the relations are vectorized according to the upper and lower hierarchical structure information of the relations, whether the relations represent the same meaning or not is determined by calculating the similarity between the relation vectors, and the relations are classified, so that the efficiency of relation alignment in the knowledge graph is improved.

In the above description of the relation alignment method of the knowledge graph in the embodiment of the present invention, referring to fig. 3, the relation alignment apparatus of the knowledge graph in the embodiment of the present invention is described below, and an embodiment of the relation alignment apparatus of the knowledge graph in the embodiment of the present invention includes:

an obtaining unit 301, configured to obtain triple data in a knowledge graph, where the triple includes a main body, a relationship, and an attribute;

a conversion unit 302, configured to convert the triple data through a preset vector conversion model to obtain a relationship initial vector of each relationship;

a first clustering unit 303, configured to cluster all the relationship initial vectors through a preset clustering algorithm to obtain a relationship clustering vector of each relationship;

a second clustering unit 304, configured to cluster the initial relationship vectors of the same relationship in the triple data through the preset clustering algorithm to obtain a relationship sub-vector of each relationship;

a calculating unit 305, configured to calculate a relationship similarity between relationships in any two triples based on the relationship initial vector, the relationship cluster vector, and the relationship sub-vector;

a classifying unit 306, configured to classify the relationship according to the relationship similarity.

According to the embodiment of the invention, the relations are vectorized and the similarity among the relation vectors is calculated through the upper and lower hierarchical structure information based on the relations, whether the relations represent the same meaning or not is determined, and the relations are classified, so that the relation alignment efficiency in the knowledge graph is improved.

Referring to fig. 4, an embodiment of a relationship alignment apparatus for a knowledge graph according to an embodiment of the present invention includes:

Optionally, the conversion unit 302 is specifically configured to:

mapping subjects and objects of a target triplet into low-dimensional vectors

And

And

up to

At a minimum, wherein

When said

At the minimum, will

Said target triplets arranged as vector type, said

Is the relationship initial vector.

Optionally, the first clustering unit 303 is specifically configured to:

selecting k relation initial vectors from all the relation initial vectors as a first centroid vector, wherein k is an integer greater than 1; calculating Euclidean distances between each relationship initial vector and k first centroid vectors, and determining a first clustering vector of each relationship initial vector, wherein the first clustering vector is the first centroid vector with the smallest Euclidean distance with one relationship initial vector; classifying each relationship initial vector and the corresponding first clustering vector into a cluster class to obtain k first cluster classes; calculating a second centroid vector of each first cluster class based on a mean principle; calculating Euclidean distances between each relationship initial vector and k second centroid vectors, and determining a second clustering vector of each relationship initial vector, wherein the second clustering vector is the second centroid vector with the smallest Euclidean distance with one relationship initial vector; classifying each relationship initial vector and the corresponding second cluster vector into a cluster to obtain k second clusters; calculating a third centroid vector of each second cluster class based on a mean principle; and repeatedly calculating the (N + 1) th centroid vector of each N-th cluster until the N-th centroid vector is the same as the (N + 1) th centroid vector, taking the N-th cluster as k clusters, taking the (N + 1) th centroid vector as a relational clustering vector, and taking N as an integer greater than 1.

Optionally, the second clustering unit 304 is specifically configured to:

selecting n initial relation vectors from all the initial relation vectors of a target relation as n first sub-centroid vectors, wherein the target relation is the same at least one relation, and n is an integer greater than 1; calculating Euclidean distances between each relation initial vector of the target relation and n first sub-centroid vectors, and determining a first sub-vector of each relation initial vector of the target relation, wherein the first sub-vector is the first sub-centroid vector with the smallest Euclidean distance with one relation initial vector; taking each relationship initial vector of the target relationship and the corresponding first sub-vector as a sub-cluster class to obtain n first sub-cluster classes; calculating a second sub-centroid vector of each first sub-cluster class based on a mean principle; calculating Euclidean distances between each relationship initial vector of the target relationship and n second sub-centroid vectors, and determining a second sub-vector of each relationship initial vector of the target relationship, wherein the second sub-vector is a second sub-centroid vector with the minimum Euclidean distance from one relationship initial vector; classifying each relationship initial vector of the target relationship and the corresponding second sub-vector into a cluster class to obtain n second sub-cluster classes; calculating a third sub-centroid vector of each second sub-cluster class based on a mean principle; repeatedly calculating the M +1 sub-centroid vector of each Mth sub-cluster until the Mth sub-centroid vector is the same as the M +1 sub-centroid vector, taking n Mth sub-clusters as the subclasses of the target relationship, taking the M +1 sub-centroid vector as the relationship sub-vector of the target relationship, and taking M as an integer greater than 1.

Optionally, the calculating unit 305 includes:

a generating module 3051, configured to generate a relationship vector of the relationship in each triplet according to a preset formula

Wherein the content of the first and second substances,

the relationship vector is represented by a vector of the relationship,

representing the set of relational cluster vectors in a cluster of,

the initial vector of said relationship is represented by,

representing the relationship subvectors; 3052, calculating a relationship similarity by cosine distance:

wherein

And

the relationship vector representing any two relationships.

Optionally, the classifying unit 306 includes:

a determining module 3061, configured to determine relationship similarity results among all relationships according to the relationship similarity, where the relationship similarity results include similarity and dissimilarity; the classification module 3062, configured to divide the relationship pair to be classified into two clusters if the relationship similarity results of the relationship pair to be classified are similar and the relationship pair to be classified is any two relationships; the classifying module 3062 is further configured to keep the original cluster of each relationship in the pair of relationships to be classified if the results of similarity of the relationship of the pair of relationships to be classified are not similar.

Optionally, the determining module 3061 is specifically configured to:

judging whether the relationship similarity of the relationship pairs to be classified exceeds a preset threshold value; if the relationship similarity of the relationship pair to be classified exceeds a preset threshold, determining that the relationship pair to be classified is similar; and if the relationship similarity of the relationship pair to be classified does not exceed the preset threshold, determining that the relationship pair to be classified is not similar.

Fig. 3 to 4 describe the relation alignment apparatus of the knowledge graph in the embodiment of the present invention in detail from the perspective of the modular functional entity, and the relation alignment apparatus of the knowledge graph in the embodiment of the present invention is described in detail from the perspective of hardware processing.

Fig. 5 is a schematic structural diagram of a relation alignment apparatus of a knowledge graph 500 according to an embodiment of the present invention, which may generate relatively large differences due to different configurations or performances, and may include one or more processors (CPUs) 501 (e.g., one or more processors) and a memory 509, and one or more storage media 508 (e.g., one or more mass storage devices) storing applications 507 or data 506. Memory 509 and storage medium 508 may be, among other things, transient storage or persistent storage. The program stored on the storage medium 508 may include one or more modules (not shown), each of which may include a series of instruction operations in a relationship alignment facility for a knowledge graph. Still further, the processor 501 may be configured to communicate with the storage medium 508 to execute a series of instruction operations in the storage medium 508 on the knowledge-graph relationship alignment apparatus 500.

The relation alignment apparatus 500 of the knowledge graph may also include one or more power supplies 502, one or more wired or wireless network interfaces 503, one or more input-output interfaces 504, and/or one or more operating systems 505, such as Windows Server, Mac OS X, Unix, Linux, FreeBSD, and the like. Those skilled in the art will appreciate that the knowledge-graph relationship alignment apparatus configuration shown in fig. 5 does not constitute a limitation of knowledge-graph relationship alignment apparatus and may include more or fewer components than shown, or some components in combination, or a different arrangement of components. The processor 501 may perform the functions of the acquisition unit 301, the conversion unit 302, the first clustering unit 303, the second clustering unit 304, the calculation unit 305, and the classification unit 306 in the above-described embodiments.

The following describes the components of the relation alignment apparatus of the knowledge-graph in detail with reference to fig. 5:

the processor 501 is a control center of the relation alignment apparatus of the knowledge map, and may perform processing according to a set relation alignment method. The processor 501 connects various portions of the overall knowledge-graph relationship alignment apparatus using various interfaces and lines to perform the knowledge-graph relationship alignment by running or executing software programs and/or modules stored in the memory 509 and invoking data stored in the memory 509 to perform various functions of the knowledge-graph relationship alignment apparatus and process the data. The storage medium 508 and the memory 509 are carriers for storing data, the storage medium 508 may be an internal memory with a small storage capacity but a high storage speed, and the memory 509 may be an external memory with a large storage capacity but a low storage speed.

The memory 509 may be used to store software programs and modules, and the processor 501 executes various functional applications and data processing of the knowledge-graph relationship alignment apparatus 500 by executing the software programs and modules stored in the memory 509. The memory 509 may mainly include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program (classifying a relationship, etc.) required for at least one function, and the like; the storage data area may store data created from use of the relation alignment apparatus of the knowledge map (such as relation similarity, etc.), and the like. Further, the memory 509 may include high-speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other volatile solid-state storage device. The relation alignment method program of the knowledge graph provided in the embodiment of the present invention and the received data stream are stored in the memory, and when they need to be used, the processor 501 calls from the memory 509.

When loaded and executed on a computer, cause the processes or functions described in accordance with the embodiments of the invention to occur, in whole or in part. The computer may be a general purpose computer, a special purpose computer, a network of computers, or other programmable device. The computer instructions may be stored in a computer readable storage medium or transmitted from one computer readable storage medium to another computer readable storage medium, for example, the computer instructions may be transmitted from one website, computer, server, or data center to another website, computer, server, or data center by wire (e.g., coaxial cable, optical fiber, twisted pair) or wireless (e.g., infrared, wireless, microwave, etc.). The computer-readable storage medium can be any available medium that a computer can store or a data storage device, such as a server, a data center, etc., that is integrated with one or more available media. The usable medium may be a magnetic medium (e.g., floppy disk, hard disk, magnetic tape), an optical medium (e.g., compact disk), or a semiconductor medium (e.g., Solid State Disk (SSD)), among others.

It is clear to those skilled in the art that, for convenience and brevity of description, the specific working processes of the above-described systems, apparatuses and units may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.

In the embodiments provided in the present invention, it should be understood that the disclosed system, apparatus and method may be implemented in other ways. For example, the above-described apparatus embodiments are merely illustrative, and for example, the division of the units is only one logical division, and other divisions may be realized in practice, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.

The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.

In addition, functional units in the embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.

The integrated unit, if implemented in the form of a software functional unit and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: various media capable of storing program codes, such as a usb disk, a removable hard disk, a read-only memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk.

The above-mentioned embodiments are only used for illustrating the technical solutions of the present invention, and not for limiting the same; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.

Claims

1. A relation alignment method of a knowledge graph is characterized by comprising the following steps:

acquiring triple data in a knowledge graph, wherein the triple comprises a main body, a relation and an attribute;

converting the triple data through a preset vector conversion model to obtain a relationship initial vector of each relationship;

clustering all the relation initial vectors through a preset clustering algorithm to obtain a relation clustering vector of each relation;

clustering the initial relation vectors of the same relation in the triple data through the preset clustering algorithm to obtain a relation sub-vector of each relation;

calculating the relationship similarity between the relationships in any two triples based on the relationship initial vector, the relationship cluster vector and the relationship sub-vector;

and classifying the relation according to the relation similarity.

2. The relation alignment method of the knowledge-graph according to claim 1, wherein the converting the triple data by a preset vector conversion model to obtain a relation initial vector of each relation comprises:

mapping subjects and objects of a target triplet into low-dimensional vectors

And

the target triple is any triple in the triple data, and i is the total number of the triples in the triple data;

adjustment by preset loss function

And

up to

At a minimum, wherein

When said

At the minimum, will

Said target triplets arranged as vector type, said

Is the relationship initial vector.

3. The method for aligning relationships of knowledge-graphs according to claim 1, wherein said clustering all of said relationship initial vectors by a preset clustering algorithm to obtain a relationship clustering vector of each relationship comprises:

selecting k relation initial vectors from all the relation initial vectors as a first centroid vector, wherein k is an integer greater than 1;

calculating Euclidean distances between each relationship initial vector and k first centroid vectors, and determining a first clustering vector of each relationship initial vector, wherein the first clustering vector is the first centroid vector with the smallest Euclidean distance with one relationship initial vector;

classifying each relationship initial vector and the corresponding first clustering vector into a cluster class to obtain k first cluster classes;

calculating a second centroid vector of each first cluster class based on a mean principle;

calculating Euclidean distances between each relationship initial vector and k second centroid vectors, and determining a second clustering vector of each relationship initial vector, wherein the second clustering vector is the second centroid vector with the smallest Euclidean distance with one relationship initial vector;

classifying each relationship initial vector and the corresponding second cluster vector into a cluster to obtain k second clusters;

calculating a third centroid vector of each second cluster class based on a mean principle;

and repeatedly calculating the (N + 1) th centroid vector of each N-th cluster until the N-th centroid vector is the same as the (N + 1) th centroid vector, taking the N-th cluster as k clusters, taking the (N + 1) th centroid vector as a relational clustering vector, and taking N as an integer greater than 1.

4. The method of aligning relationships of a knowledge-graph according to claim 1, wherein the clustering the initial relationship vectors of the same relationship in the triplet data by the preset clustering algorithm to obtain a relationship sub-vector of each relationship comprises:

selecting n initial relation vectors from all the initial relation vectors of a target relation as n first sub-centroid vectors, wherein the target relation is the same at least one relation, and n is an integer greater than 1;

calculating Euclidean distances between each relation initial vector of the target relation and n first sub-centroid vectors, and determining a first sub-vector of each relation initial vector of the target relation, wherein the first sub-vector is the first sub-centroid vector with the smallest Euclidean distance with one relation initial vector;

taking each relationship initial vector of the target relationship and the corresponding first sub-vector as a sub-cluster class to obtain n first sub-cluster classes;

calculating a second sub-centroid vector of each first sub-cluster class based on a mean principle;

calculating Euclidean distances between each relationship initial vector of the target relationship and n second sub-centroid vectors, and determining a second sub-vector of each relationship initial vector of the target relationship, wherein the second sub-vector is a second sub-centroid vector with the minimum Euclidean distance from one relationship initial vector;

classifying each relationship initial vector of the target relationship and the corresponding second sub-vector into a cluster class to obtain n second sub-cluster classes;

calculating a third sub-centroid vector of each second sub-cluster class based on a mean principle;

repeatedly calculating the M +1 sub-centroid vector of each Mth sub-cluster until the Mth sub-centroid vector is the same as the M +1 sub-centroid vector, taking n Mth sub-clusters as the subclasses of the target relationship, taking the M +1 sub-centroid vector as the relationship sub-vector of the target relationship, and taking M as an integer greater than 1.

5. The method of knowledge-graph relationship alignment of claim 1, wherein said calculating a relationship similarity between any two relationships based on said relationship initial vector, said relationship cluster vector and said relationship sub-vector comprises:

generating a relationship vector of the relationship in each triplet through a preset formula

Wherein the content of the first and second substances,

the relationship vector is represented by a vector of the relationship,

representing the set of relational cluster vectors in a cluster of,

the initial vector of said relationship is represented by,

representing the relationship subvectors;

by passingCosine distance calculation relationship similarity:

wherein

And

the relationship vector representing any two relationships.

6. The relation alignment method of the knowledge-graph according to any one of claims 1-5, wherein the classifying the relation according to the relation similarity comprises:

determining relationship similarity results among all relationships according to the relationship similarity, wherein the relationship similarity results comprise similarity and dissimilarity;

if the relationship similarity results of the relationship pairs to be classified are similar, and the relationship pairs to be classified are any two relationships, bisecting the relationship pairs to be classified into a cluster;

and if the similar results of the relations of the relation pairs to be classified are not similar, keeping the original clustering of each relation in the relation pairs to be classified.

7. The relation alignment method of knowledge-graph according to claim 6, wherein said determining relation similarity results between all relations according to said relation similarity comprises:

judging whether the relationship similarity of the relationship pairs to be classified exceeds a preset threshold value or not;

if the relationship similarity of the relationship pair to be classified exceeds a preset threshold, determining that the relationship pair to be classified is similar;

and if the relationship similarity of the relationship pair to be classified does not exceed the preset threshold, determining that the relationship pair to be classified is not similar.

8. A relationship alignment apparatus for a knowledge graph, comprising:

the acquisition unit is used for acquiring triple data in the knowledge graph, wherein the triple comprises a main body, a relation and an attribute;

the conversion unit is used for converting the triple data through a preset vector conversion model to obtain a relationship initial vector of each relationship;

the first clustering unit is used for clustering all the relation initial vectors through a preset clustering algorithm to obtain a relation clustering vector of each relation;

the second clustering unit is used for clustering the initial relationship vectors of the same relationship in the triple data through the preset clustering algorithm to obtain a relationship sub-vector of each relationship;

the calculating unit is used for calculating the relationship similarity between the relationships in any two triples based on the relationship initial vector, the relationship clustering vector and the relationship sub-vector;

and the classification unit is used for classifying the relationship according to the relationship similarity.

9. A relation alignment apparatus of a knowledge graph, comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the relation alignment method of a knowledge graph according to any one of claims 1 to 7 when executing the computer program.

10. A computer-readable storage medium comprising instructions that when executed on a computer cause the computer to perform a method of relationship alignment of a knowledge-graph as claimed in any one of claims 1 to 7.