CN112417159A

CN112417159A - Cross-language entity alignment method of context alignment enhanced graph attention network

Info

Publication number: CN112417159A
Application number: CN202011201832.1A
Authority: CN
Inventors: 刘进; 赵焜松; 谢志文; 崔晓晖; 黄勃; 周光有; 匡秋明
Original assignee: Wuhan University WHU
Current assignee: Wuhan University WHU
Priority date: 2020-11-02
Filing date: 2020-11-02
Publication date: 2021-02-26
Anticipated expiration: 2040-11-02
Also published as: CN112417159B

Abstract

The invention provides a cross-language entity alignment method of a context alignment enhancement graph attention network. Introducing a first knowledge graph and a second knowledge graph, screening and aligning a seed entity set, and translating each entity name into English; constructing a training set and a testing set from the translated aligned seed set, and converting the entity name into a word vector by using a word2vec algorithm; initial features of the two maps are constructed separately by summing the word vectors for each entity name. Dividing a training set into a context alignment seed set and a target alignment seed set, and taking the context alignment seed set and the target alignment seed set together with the initial characteristics as input data; the features of each entity containing map fusion information and multi-hop neighbor information are obtained by a cross-knowledge-map aggregation layer and an attention-based neural network. The method fully utilizes the context to align the seed set, and transmits the information between the maps through the cross-knowledge map aggregation layer; communicating entity neighbor information and entity alignment information across atlases is collected by an attention-based graph neural network.

Description

Cross-language entity alignment method of context alignment enhanced graph attention network

Technical Field

The invention relates to a cross-language entity alignment method, in particular to a method for collecting and propagating heterogeneous knowledge graph information by using a cross-knowledge graph neural network model to solve the problem of cross-language knowledge graph entity alignment, which comprises a cross-knowledge graph aggregation layer and an attention-based information propagation layer. The pre-aligned seed entity pair is regarded as a medium for information transfer between two heterogeneous knowledge graphs of different languages, the information transfer is carried out between the two heterogeneous knowledge graphs, so that more neighbor alignment characteristics are provided for equivalent entities in different knowledge graphs, and then the equivalent entities in different knowledge graphs are predicted through the learned characteristic representation.

Background

In recent years, knowledge-graphs have shown great potential in many natural language processing tasks, such as language modeling and question and answer. With the rapid growth of multi-language knowledge maps (e.g., DBpedia, YAGO), cross-language entity alignment has attracted much researchers' attention due to the lack of connections between cross-language entities. The cross-language entity alignment task aims to automatically search equivalent entities from different single-language knowledge maps so as to close gaps of different languages.

Recently, many Graph Neural Network (GNNs) based methods have been proposed for entity alignment tasks. GNN-based methods achieve good performance because GNN can learn representations of graph structure data by aggregating neighborhood information. However, existing GNN-based approach models separately model two knowledge-graphs across languages, ignoring the useful pre-aligned connections (seed entities) between the two knowledge-graphs. These GNN-based methods only optimize the objective function using the seed entities during the training process, but do not make full use of the seed entities that provide context alignment information, resulting in non-optimal results.

Disclosure of Invention

The invention provides a cross-language entity alignment method based on a context alignment enhancement graph attention network, aiming at the defects of the prior art in cross-language entity alignment, and the method comprises a cross-knowledge graph aggregation layer and an attention-based cross-knowledge graph propagation layer. The information across the knowledge graphs is propagated through the pre-aligned pairs of seed entities, so that feature representations of different knowledge graphs are obtained, and then equivalent entities in different knowledge graphs are predicted through learned feature representations (embedding).

In order to achieve the above object, the present invention is conceived as follows: firstly, regarding a seed entity pair as a medium for information transfer between two heterogeneous knowledge graphs, and collecting cross-knowledge graph information by using a cross-knowledge graph aggregation layer; the neighborhood information for the entity is then collected using an attention-based graph neural network. And learning multi-hop neighbor information by stacking a plurality of the two layers. Finally, model parameters are optimized using an edge-based loss function.

According to the conception, the invention adopts a technical scheme that: a cross-language entity alignment method of a context alignment enhanced graph attention network is provided, which comprises the following steps:

step 1: introducing a first knowledge graph and a second knowledge graph, screening an aligned seed entity set according to the first knowledge graph and the second knowledge graph, translating the name of each entity in each aligned entity pair in the aligned seed entity set into English, defining the translated name of each entity in each aligned entity pair in the aligned seed entity set, and constructing a training set and a test set from the translated aligned seed entity set;

step 2: converting the translated name of each entity in the aligned entity pair in the aligned seed entity set into a word vector of the entity name by using a word2vec algorithm, and 2, summing the word vectors of each entity name to serve as the initialization feature of the entity, and respectively constructing the initialization feature of the first knowledge graph and the initialization feature of the second knowledge graph;

and step 3: randomly dividing a training set into a context alignment seed set and a target alignment seed set, and constructing input data of a neural network through the context alignment seed set, the target alignment seed set, the initialization feature of a first knowledge graph and the initialization feature of a second knowledge graph;

and 4, step 4: information transmission of different knowledge graphs is carried out through a cross-knowledge graph aggregation layer;

and 5: collecting neighbor information of each entity in the first knowledge graph and the second knowledge graph through the attention-based neural network;

step 6: aligning the context with the seed set and inputting the initialized characteristics of the entity into the model, and obtaining the characteristics of each entity containing map fusion information and multi-hop neighbor information through the steps 4 and 5;

preferably, the first knowledge-graph in step 1 is:

G₁＝(E₁，R₁，T₁)

step 1 the second knowledge-graph is:

G₂＝(E₂，R₂，T₂)

wherein E is₁Set of entities, R, representing a first knowledge-graph₁Set of relationships, T, representing a first knowledge-graph₁Three tuple sets representing a first knowledge graph, E₂Set of entities, R, representing a second knowledge-graph₂Set of relationships, T, representing a second knowledge-graph₂A set of triplets representing a second knowledge-graph,

step 1, aligning the seed entity sets:

A＝{a_k＝(e_k,1,i，e_k,2,j)|e_k,1,i∈E₁，e_k,2,j∈E₂}

i∈[1,M]，j∈[1,N]

k∈[1,K]

wherein A represents an aligned set of seed entities, a_kRepresents the K-th aligned entity pair, K represents the number of aligned entity pairs in the aligned set of seed entities, e_k,1,iRepresenting the ith entity from the entity set of the first knowledge-graph in the k-th aligned pair of entities, e_k,2,jRepresenting the jth entity of the k aligned pair of entities from the set of entities of the second knowledge-graph, e_k,1,iAnd e_k,2,jThe two Chinese meanings are the same, M represents the number of the entities from the first knowledge graph in A, and N represents the number of the entities from the second knowledge graph in A;

step 1, translating the name of each aligned entity pair in the aligned seed entity set into english as follows:

align the names of the entities in each aligned entity pair in A, i.e. e_k,1,i,e_k,2,jAll translated into English, and the set of aligned seed entities after translation is marked as A^*Specifically defined as:

A^*＝{a^* _k＝(e^* _k,1,i，e^* _k,2,j)}

i∈[1,M]，j∈[1,N]

k∈[1,K]

wherein A is^*Representing a set of post-translationally aligned seed entities, a^* _kRepresenting the K-th aligned entity pair after translation, K representing the number of aligned entity pairs in the set of seed entities aligned after translation, e^* _k,1,iRepresenting the translated ith entity from the entity set of the first knowledge-graph in the k aligned pair of entities, e^* _k,2,jRepresenting the i-th translated entity from the entity set of the first knowledge-graph in the k-th aligned entity pair, M represents A^*The number of entities from the first knowledge-graph in (A), N represents^*The number of entities from the second knowledge-graph;

step 1, defining the name of each translated entity as:

the translated name of each entity is contained in a plurality of English words, and is specifically represented as follows:

wherein, word_{k，1，i，t}Representing the tth word, of the translated ith entity from the entity set of the first knowledge-graph in the kth aligned pair of entities_{k，2，j，t}Representing the tth word of the translated jth entity from the entity set of the second knowledge-graph in the kth aligned entity pair, n being the total number of words of the translated ith entity from the entity set of the first knowledge-graph in the kth aligned entity pair, m being the total number of words of the translated jth entity from the entity set of the second knowledge-graph in the kth aligned entity pair;

step 1, the construction of a training set and a test set from the set of translated aligned seed entities is as follows:

from a set of post-translationally aligned seed entities, namely A^*Randomly selecting P entities from the K aligned entity pairs as a training set, and using A_trainIs represented by A^*The remaining K-P aligned entity pairs in the test set, using A_testRepresents;

preferably, the word vector of each entity name in step 2 is:

i∈[1,M]，j∈[1,N]

k∈[1,K]

wherein the content of the first and second substances,

representing a translated set of entities from the first knowledge-graph in the k-th aligned pair of entitiesThe word vector for the t word for the ith entity,

a word vector representing the t word of the translated j entity from the entity set of the second knowledge-graph in the K aligned entity pair, K represents the number of aligned entity pairs in the set of translated aligned seed entities, M represents A^*The number of entities from the first knowledge-graph in (A), N represents^*The number of entities from the second knowledge-graph;

step 2, summing the word vectors of each entity name as the initialization feature of the entity, specifically as follows:

where n is the total number of words of the translated ith entity from the entity set of the first knowledge-graph in the kth aligned pair of entities, m is the total number of words of the translated ith entity from the entity set of the first knowledge-graph in the kth aligned pair of entities,

representing initialization features of the translated ith entity from the entity set of the first knowledge-graph in the kth aligned pair of entities,

representing initialized features of a translated jth entity of the kth aligned pair of entities from the entity set of the second knowledge-graph;

step 2, respectively constructing the initialization features from the first knowledge graph and the initialization features from the second knowledge graph as follows:

initializing a feature of the first knowledge-graph

The specific definition is as follows:

initialization features of the second knowledge-graph are noted

The specific definition is as follows:

wherein E is₁Set of entities representing a first knowledge-graph, E₂Set of entities representing a second knowledge-graph, e_k,1,iRepresenting the ith entity from the entity set of the first knowledge-graph in the k-th aligned pair of entities, e_k,2,jRepresenting the jth entity, A, of the k-th aligned pair of entities from the set of entities of the second knowledge-graph^*Representing a set of post-translationally aligned seed entities, e^* _k,1,iRepresenting the translated ith entity from the entity set of the first knowledge-graph in the k aligned pair of entities, e^* _k,2,jRepresenting a translated jth entity from the entity set of the second knowledge-graph in the kth aligned pair of entities;

preferably, the training set in step 3 is A_train；

Step 3, the context alignment seed set is A_ctxThe input data is used as the input data of the model and is used for transmitting information between the first knowledge graph and the second knowledge graph;

step 3, the target alignment seed set is A_objFor calculating a loss function;

step 3, the initialization characteristic of the first knowledge graph is

As input to the modelInputting data;

step 3 the initialization characteristic of the second knowledge-graph is

As input data for the model;

preferably, the information propagation of different knowledge graphs through the cross-knowledge-graph aggregation layer in the step 4 is as follows:

step 4.1: calculating the weight of the information fusion of the first knowledge graph and the second knowledge graph by using a door mechanism according to the initialization characteristics of the translated entity from the entity set of the first knowledge graph in the seed set and the aligned entity pair in the following text alignment and the initialization characteristics of the translated entity from the entity set of the second knowledge graph in the aligned entity pair, and specifically calculating the weight as follows:

i∈[1,M]，j∈[1,N]

k∈[1,K]

wherein, | | represents the concatenation operation of the vector, W is the weight matrix learned in the training process, b is the deviation, and σ is the sigmoid activation function.

Is A^*From knowledge-graph G in the k-th entity pair₁The embedding of the ith entity of (1),

is A^*From knowledge-graph G in the k-th entity pair₂Embedding of the j-th entity in (a). l refers to the l-th layer of the network. K represents the number of aligned entity pairs in the set of post-translation aligned seed entities, M represents A^*From the first knowledge-graph G1The number of entities, N represents A^*The number of entities from the second knowledge-graph G2;

is represented by A^*From knowledge-graph G in the k-th entity pair₁The fusion weight of the ith entity of (1),

is represented by A^*From knowledge-graph G in the k-th entity pair₂The fusion weight of the jth entity of (1);

step 4.2: and fusing the information of the first knowledge graph and the second knowledge graph by using a door mechanism to obtain a fused representation of the first knowledge graph and the second knowledge graph, wherein the specific calculation is as follows:

wherein the content of the first and second substances,

is represented by A^*From knowledge-graph G in the k-th entity pair₁The embedding of the ith entity of (1),

A^*the k th best inBody centered from knowledge graph G₁(ii) embedding of the ith entity of (1);

is represented by A^*From knowledge-graph G in the k-th entity pair₁The ith entity of (2) is subjected to embedding after a door mechanism,

is represented by A^*From knowledge-graph G in the k-th entity pair₂The jth entity of (1) is subjected to embedding after a door mechanism,

step 4.3: calculating characteristic matrixes after cross-knowledge-map aggregation layers for all entities in the first knowledge map and the second knowledge map;

the characteristic matrix after the cross-knowledge-map aggregation layer is as follows:

wherein, the calculation formula of crossAggr is as follows:

wherein the content of the first and second substances,

representing a feature representation of an ith entity from the first knowledge-graph in a kth entity pair of the set of translated aligned seed entities,

representing post-translation aligned set of seed entities A^*(ii) a feature representation of an ith entity from the second knowledge-graph in the kth pair of entities;

preferably, the step 5 collects the neighbor information of each entity in the first knowledge graph and the second knowledge graph by the attention-based neural network as follows:

step 5.1: the weights, i.e. attention, are calculated according to the neighbor entities of the entity, and the specific calculation formula is as follows:

wherein, the first and second guide rollers are arranged in a row,

representing neutralizing entity e in a first knowledge-graph_k，1，iSet of linked neighbour entities, exp is an exponential function;

representing the characteristic representation of the ith entity from the first knowledge graph in the kth entity pair in the translated aligned seed entity set after a door mechanism;

and

respectively represent A^*Of the kth pair of entities from the ith entity of the first knowledge-graph, a_1，k，iRepresenting the weight of the ith entity from the first knowledge-graph G1 in the kth pair of entities in a;

the calculation formula is as follows:

wherein LeakyRelu is the activation function and V is the mathematical formulaThe network parameter matrix to be learned is,

show that

And

both features represent the execution of the splicing operation,

as entity e_k，1，iIs determined to be a certain one of the neighbour entities,

representing feature representations of the ith entity from the first knowledge-graph in the kth entity pair of the post-translation aligned seed entity set after a door mechanism

Step 5.2: and (3) fusing neighbor information according to the weight calculated in the step (5.1) to collect the neighbor information of the entity, wherein the calculation formula is as follows:

wherein Relu is the activation function, α_{k，1，i，p}Is the first knowledge-graph G₁The weight of the p-th neighbor of the ith entity in (1). crossAtt is the attention-based graph neural network layer, crosssaggr is the cross-knowledgegraph aggregation layer,

is represented by A^*From knowledge-graph G in the k-th entity pair₁Is characterized by the i-th entity of

Is represented by A^*In the kth entity pair, the ith entity from the first knowledge graph is represented by the characteristics of the gate mechanism;

is represented by A^*(ii) a characteristic representation of a p-th neighbor of the k-th entity pair from the i-th entity of the first knowledge-graph; alpha is alpha_1，k，iRepresenting the weight of the ith entity from the first knowledge-graph G1 in the kth pair of entities in a;

step 5.3: calculating weights of all entities of the first knowledge graph and the second knowledge graph according to neighbor entities of the entities and fusing neighbor information according to the calculated weights so as to collect the neighbor information of the entities, namely step 5.1 and step 5.2, and then obtaining a feature matrix after the neighbor information is collected by a graph neural network layer based on attention, namely:

wherein, crossAtt is the neural network layer of the attention-based map,

is the output feature representation of the first knowledge-graph through the cross-knowledge-graph aggregation layer;

is the output feature representation of the second knowledge-graph by the cross-knowledge-graph aggregation layer;

preferably, the characteristics of each entity containing the graph fusion information and the multi-hop neighbor information in step 6 are as follows:

wherein the content of the first and second substances,

is the output of the first knowledge-graph through the l-th CGAT layer; wherein

Is the output of the second knowledge-graph through the l-th CGAT layer;

features initialized for the first knowledge-graph

Features initialized for the second knowledge-graph

crossAtt is the attention-based graph neural network layer; crossknowledgeable aggregation layer. By superposing L CGAT layers, the characteristics of the first knowledge graph and the second knowledge graph can be updated for L times, and finally the updated characteristics are output

And

calculating a target loss function on the target alignment seed set, wherein the formula is as follows:

wherein the content of the first and second substances,

represents the final embedding of the entity,

represents the L1 distance between two entities, φ is a model parameter that can be trained, e^-Refers to a negative entity corresponding to a certain entity, A_objIs the data used to optimize the model parameters.

And optimizing and updating model parameters phi by using an Adam algorithm, wherein the model parameters phi comprise parameters which are trainable across the knowledge graph aggregation layer and the attention-based graph neural network layer, and constructing a context-based alignment enhancement graph attention network model according to the optimized parameters phi.

The method has the advantages that the context alignment seed set is fully utilized, and the information among the maps is transmitted through the cross-knowledge map aggregation layer; communicating entity neighbor information and entity alignment information across atlases is collected by an attention-based graph neural network.

Drawings

FIG. 1 is a schematic diagram of a context enhancement graph attention network in the present invention.

Detailed Description

The present invention will be described in further detail with reference to examples for the purpose of facilitating understanding and practice of the invention by those of ordinary skill in the art, and it is to be understood that the embodiments described herein are merely illustrative and explanatory of the invention and are not restrictive thereof.

The following describes an embodiment of the present invention with reference to fig. 1:

step 1, introducing a first knowledge graph and a second knowledge graph, screening an aligned seed entity set according to the first knowledge graph and the second knowledge graph, translating the name of each entity in each aligned entity pair in the aligned seed entity set into English, defining the translated name of each entity in each aligned entity pair in the aligned seed entity set, and constructing a training set and a test set from the translated aligned seed entity set;

step 1 the first knowledge-graph is:

G₁＝(E₁，R₁，T₁)

step 1 the second knowledge-graph is:

G₂＝(E₂，R₂，T₂)

step 1, aligning the seed entity sets:

A＝{a_k＝(e_k,1,i，e_k,2,j)|e_k,1,i∈E₁，e_k,2,j∈E₂}

i∈[1,M]，j∈[1,N]

k∈[1,K]

wherein A represents an aligned set of seed entities, a_kDenotes the K-th aligned entity pair, K15000 denotes the number of aligned entity pairs in the set of aligned seed entities, e_k,1,iRepresenting the ith entity from the entity set of the first knowledge-graph in the k-th aligned pair of entities, e_k,2,jRepresenting the jth entity of the k aligned pair of entities from the set of entities of the second knowledge-graph, e_k,1,iAnd e_k,2,jBoth belong to the kth aligned entity pair and have the same Chinese meaning, M66469 represents the number of entities from the first knowledge-graph in a, and N98125 represents the number of entities from the second knowledge-graph in a;

A^*＝{a^* _k＝(e^* _k,1,i，e^* _k,2,j)}

i∈[1,M]，j∈[1,N]

k∈[1,K]

wherein A is^*Representing a set of post-translationally aligned seed entities, a^* _kDenotes the K-th aligned entity pair after translation, K15000 denotes the number of aligned entity pairs in the set of aligned seed entities after translation, e^* _k,1,iRepresenting the translated ith entity from the entity set of the first knowledge-graph in the k aligned pair of entities, e^* _k,2,jRepresenting the i-th translated entity from the entity set of the first knowledge-graph in the k-th aligned entity pair, M66469 represents a^*Wherein N is 98125 represents a^*The number of entities from the second knowledge-graph;

step 1, defining the name of each translated entity as:

from a set of post-translationally aligned seed entities, namely A^*In 15000 aligned entity pairs, P4500 are randomly selected as training set, and A is used_trainIs represented by A^*The remaining 10500 aligned entity pairs of K-P in (A) are used as test set_testRepresents;

step 2, converting the translated name of each entity in the aligned entity pair in the aligned seed entity set into a word vector of the entity name by using a word2vec algorithm, and summing the word vectors of each entity name as the initialization feature of the entity in the step 2 to respectively construct the initialization feature of the first knowledge graph and the initialization feature of the second knowledge graph;

step 2, the word vector of each entity name is:

i∈[1,M]，j∈[1,N]

k∈[1,K]

wherein the content of the first and second substances,

a word vector representing the t word of the translated ith entity from the entity set of the first knowledge-graph in the k aligned pair of entities,

a word vector representing the t word of the translated j entity from the entity set of the second knowledge-graph in the K aligned entity pair, K4500 representing the number of aligned entity pairs in the set of translated aligned seed entities, M4500 representing a^*Wherein N is 4500, which represents a^*The number of entities from the second knowledge-graph;

initializing a feature of the first knowledge-graph

The specific definition is as follows:

initialization features of the second knowledge-graph are noted

The specific definition is as follows:

wherein E is₁Set of entities representing a first knowledge-graph, E₂Set of entities representing a second knowledge-graph, e_k,1,iTo representThe ith entity from the entity set of the first knowledge-graph in the k aligned pair of entities, e_k,2,jRepresenting the jth entity, A, of the k-th aligned pair of entities from the set of entities of the second knowledge-graph^*Representing a set of post-translationally aligned seed entities, e^* _k,1,iRepresenting the translated ith entity from the entity set of the first knowledge-graph in the k aligned pair of entities, e^* _k,2,jRepresenting a translated jth entity from the entity set of the second knowledge-graph in the kth aligned pair of entities;

and step 3, the context alignment enhancement graph attention network (CGAT) mainly comprises a cross-knowledge graph aggregation layer and an attention-based graph neural network layer. The cross-knowledge-graph aggregation layer is used to pass cross-knowledge-graph information between two knowledge graphs, while the attention-based graph neural network layer is used to collect neighbor information for each entity in the knowledge graphs. By superposing a plurality of CGAT layers, multi-hop cross-knowledge map information and neighbor information are spread in a knowledge map. The following describes the construction and training process of the model in detail.

And 4, step 4: randomly dividing a training set into a context alignment seed set and a target alignment seed set, and constructing input data of a neural network through the context alignment seed set, the target alignment seed set, the initialization feature of a first knowledge graph and the initialization feature of a second knowledge graph;

step 4 the training set is A_train；

Step 4, the context alignment seed set is A_ctxThe input data is used as the input data of the model and is used for transmitting information between the first knowledge graph and the second knowledge graph;

step 4, the target alignment seed set is A_objFor calculating a loss function;

step 4 the initialization characteristic of the first knowledge-graph is

As input data for the model;

step 4 the initialization characteristic of the second knowledge-graph is

As input data for the model;

and 5: information transmission of different knowledge graphs is carried out through a cross-knowledge graph aggregation layer;

and 5, the information propagation of different knowledge graphs through the cross-knowledge-graph aggregation layer comprises the following steps:

step 5.1: calculating the weight of the information fusion of the first knowledge graph and the second knowledge graph by using a door mechanism according to the initialization characteristics of the translated entity from the entity set of the first knowledge graph in the seed set and the aligned entity pair in the following text alignment and the initialization characteristics of the translated entity from the entity set of the second knowledge graph in the aligned entity pair, and specifically calculating the weight as follows:

i∈[1,M]，j∈[1,N]

k∈[1,K]

is A^*From knowledge-graph G in the k-th entity pair₂Embedding of the j-th entity in (a). l refers to the l-th layer of the network. K4500 denotes the number of aligned entity pairs in the set of post-translation aligned seed entities, M4500 denotes a^*From a first knowledge-graph G1 entityThe number, N4500, represents A^*The number of entities from the second knowledge-graph G2;

step 5.2: and fusing the information of the first knowledge graph and the second knowledge graph by using a door mechanism to obtain a fused representation of the first knowledge graph and the second knowledge graph, wherein the specific calculation is as follows:

wherein the content of the first and second substances,

is represented by A^*To middlek pairs of entities from knowledge graph G₁(ii) embedding of the ith entity of (1);

step 5.3: calculating characteristic matrixes after cross-knowledge-map aggregation layers for all entities in the first knowledge map and the second knowledge map;

wherein, the calculation formula of crossAggr is as follows:

wherein the content of the first and second substances,

representing post-translation aligned set of seed entities A^*The feature table of the ith entity from the second knowledge-graph in the kth entity pairShown in the specification;

step 6: collecting neighbor information of each entity in the first knowledge graph and the second knowledge graph through the attention-based neural network;

step 6, collecting neighbor information of each entity in the first knowledge graph and the second knowledge graph based on the attention neural network is as follows:

step 6.1: the weights, i.e. attention, are calculated according to the neighbor entities of the entity, and the specific calculation formula is as follows:

wherein, the first and second guide rollers are arranged in a row,

and

the calculation formula is as follows:

wherein LeakyRelu is an activation function, V is a learnable network parameter matrix,

show that

And

both features represent the execution of the splicing operation,

Step 6.2: and fusing neighbor information according to the weight calculated in the step 6.1 to collect the neighbor information of the entity, wherein the calculation formula is as follows:

step 6.3: calculating weights of all entities of the first knowledge graph and the second knowledge graph according to neighbor entities of the entities and fusing neighbor information according to the calculated weights so as to collect the neighbor information of the entities, namely step 6.1 and step 6.2, and then obtaining a feature matrix after the neighbor information is collected by a graph neural network layer based on attention, namely:

wherein, crossAtt is the neural network layer of the attention-based map,

and 7: the context is aligned to the seed set and the initialized characteristic of the entity is input into the model, and the characteristic representation of each entity containing the map fusion information and the multi-hop neighbor information can be obtained through the steps 5 and 6, and the formula is as follows:

wherein the content of the first and second substances,

is the output of the first knowledge-graph through the l-th CGAT layer; wherein

Is the output of the second knowledge-graph through the l-th CGAT layer;

features initialized for the first knowledge-graph

Features initialized for the second knowledge-graph

And

wherein the content of the first and second substances,

represents the final embedding of the entity,

It should be understood that parts of the application not described in detail are prior art.

It should be understood that the above description of the preferred embodiments is given for clearness of understanding and no unnecessary limitations should be understood therefrom, and all changes and modifications may be made by those skilled in the art without departing from the scope of the invention as defined by the appended claims.

Claims

1. A cross-language entity alignment method of a context alignment enhanced graph attention network is characterized by comprising the following steps:

step 6: and (5) inputting the context alignment seed set and the initialized characteristic of the entity into the model, and obtaining the characteristic of each entity containing the map fusion information and the multi-hop neighbor information through the steps 4 and 5.

2. The method of cross-language entity alignment for a context alignment enhanced graph attention network of claim 1, wherein:

step 1 the first knowledge-graph is:

G₁＝(E₁，R₁，T₁)

step 1 the second knowledge-graph is:

G₂＝(E₂，R₂，T₂)

step 1, aligning the seed entity sets:

A＝{a_k＝(e_k,1,i，e_k,2,j)|e_k,1,i∈E₁，e_k,2,j∈E₂}

i∈[1,M]，j∈[1,N]

k∈[1,K]

wherein A represents alignedSet of seed entities, a_kRepresents the K-th aligned entity pair, K represents the number of aligned entity pairs in the aligned set of seed entities, e_k,1,iRepresenting the ith entity from the entity set of the first knowledge-graph in the k-th aligned pair of entities, e_k,2,jRepresenting the jth entity of the k aligned pair of entities from the set of entities of the second knowledge-graph, e_k,1,iAnd e_k,2,jThe two Chinese meanings are the same, M represents the number of the entities from the first knowledge graph in A, and N represents the number of the entities from the second knowledge graph in A;

A^*＝{a^* _k＝(e^* _k,1,i，e^* _k,2,j)}

i∈[1,M]，j∈[1,N]

k∈[1,K]

step 1, defining the name of each translated entity as:

step 1, the construction of a training set and a test set from the set of translated aligned seed entities is as follows: from a set of post-translationally aligned seed entities, namely A^*Randomly selecting P entities from the K aligned entity pairs as a training set, and using A_trainIs represented by A^*The remaining K-P aligned entity pairs in the test set, using A_testAnd (4) showing.

3. The method of cross-language entity alignment for a context alignment enhanced graph attention network of claim 1, wherein:

step 2, the word vector of each entity name is:

i∈[1,M]，j∈[1,N]

k∈[1,K]

wherein the content of the first and second substances,

initializing a feature of the first knowledge-graph

The specific definition is as follows:

initialization features of the second knowledge-graph are noted

The specific definition is as follows:

wherein E is₁Set of entities representing a first knowledge-graph, E₂Set of entities representing a second knowledge-graph, e_k,1,iRepresenting the ith entity from the entity set of the first knowledge-graph in the k-th aligned pair of entities, e_k,2,jRepresenting the jth entity, A, of the k-th aligned pair of entities from the set of entities of the second knowledge-graph^*Representing a set of post-translationally aligned seed entities, e^* _k,1,iRepresenting the translated ith entity from the entity set of the first knowledge-graph in the k aligned pair of entities, e^* _k,2,jRepresenting a translated jth entity from the entity set of the second knowledge-graph in the kth aligned pair of entities.

4. The method of cross-language entity alignment for a context alignment enhanced graph attention network of claim 1, wherein:

step 3 the training set is A_train；

Step 3, the context alignment seed set is A_ctxAs input data for the model, for the first knowledge-graph, the second knowledge-graphInformation is transmitted between knowledge maps;

step 3, the target alignment seed set is A_objFor calculating a loss function;

step 3, the initialization characteristic of the first knowledge graph is

As input data for the model;

step 3 the initialization characteristic of the second knowledge-graph is

As input data for the model.

5. The method of cross-language entity alignment for a context alignment enhanced graph attention network of claim 1, wherein:

and 4, the information propagation of different knowledge graphs through the cross-knowledge-graph aggregation layer comprises the following steps:

i∈[1,M]，j∈[1,N]

k∈[1,K]

wherein, | | represents the splicing operation of the vector, W is a weight matrix learned in the training process, b is a deviation, and σ is a sigmoid activation function;

is A^*From knowledge-graph G in the k-th entity pair₁The characteristic representation of the i-th entity of (1),

is A^*From knowledge-graph G in the k-th entity pair₂A characteristic representation of the jth entity in (a); l refers to the l-th layer of the network; k represents the number of aligned entity pairs in the set of post-translation aligned seed entities, M represents A^*The number of entities from the first knowledge-graph G1, N denotes A^*The number of entities from the second knowledge-graph G2;

wherein the content of the first and second substances,

is represented by A^*From knowledge-graph G in the k-th entity pair₁The characteristic representation of the i-th entity of (1),

from knowledge-graph G in the k-th entity pair₁A characterization of the ith entity of (1);

is represented by A^*From knowledge-graph G in the k-th entity pair₁Characterized by the door mechanism of the ith entity,

is represented by A^*From knowledge-graph G in the k-th entity pair₂Characterized by the door mechanism of the jth entity of (a),

wherein, the calculation formula of crossAggr is as follows:

wherein the content of the first and second substances,

representing post-translation aligned set of seed entities A^*The feature representation of the ith entity from the second knowledge-graph in the kth pair of entities.

6. The method of cross-language entity alignment for a context alignment enhanced graph attention network of claim 1, wherein:

step 5, collecting neighbor information of each entity in the first knowledge graph and the second knowledge graph based on the attention neural network is as follows:

wherein, the first and second guide rollers are arranged in a row,

representing post-translational aligned seedsPerforming door mechanism on the ith entity from the first knowledge graph in the kth entity pair in the entity set to represent the characteristics;

and

the calculation formula is as follows:

show that

And

both features represent the execution of the splicing operation,

representing post-translation aligned sets of seed entitiesIn the kth entity pair, the ith entity from the first knowledge-graph is represented by the characteristics of the gate mechanism

wherein Relu is the activation function, α_{k，1，i，p}Is the first knowledge-graph G₁The weight of the p-th neighbor of the ith entity in (1); crossAtt is the attention-based graph neural network layer, crosssaggr is the cross-knowledgegraph aggregation layer,

wherein, crossAtt is the neural network layer of the attention-based map,

is that the second knowledge-graph is represented by output features across knowledge-graph aggregation layers.

7. The method of cross-language entity alignment for a context alignment enhanced graph attention network of claim 1, wherein:

step 6, the characteristics of each entity containing the map fusion information and the multi-hop neighbor information are as follows:

wherein the content of the first and second substances,

is the output of the first knowledge-graph through the l-th CGAT layer; wherein

Is the output of the second knowledge-graph through the l-th CGAT layer;

features initialized for the first knowledge-graph

Features initialized for the second knowledge-graph

crossAtt is the attention-based graph neural network layer; cross-knowledge-map aggregation layer; by superposing L CGAT layers, the characteristics of the first knowledge graph and the second knowledge graph can be updated for L times, and finally the updated characteristics are output

And

wherein the content of the first and second substances,

the final characteristic representation of the representation entity,

represents the L1 distance between two entities, φ is a model parameter that can be trained, e^-Refers to a negative entity corresponding to a certain entity, A_objIs data used to optimize model parameters;