CN116842109A

CN116842109A - Information retrieval knowledge graph embedding method, device and computer equipment

Info

Publication number: CN116842109A
Application number: CN202310766394.0A
Authority: CN
Inventors: 黄雨; 朱话时; 雷鸣; 徐德轩; 李航; 金芝
Original assignee: Peking University
Current assignee: Peking University
Priority date: 2023-06-27
Filing date: 2023-06-27
Publication date: 2023-10-03

Abstract

The application relates to an information retrieval knowledge graph embedding method, an information retrieval knowledge graph embedding device and computer equipment. The method comprises the following steps: acquiring an information retrieval knowledge graph, and acquiring each piece of sub-graph information of the information retrieval knowledge graph; converting each piece of sub-picture information into a text sequence corresponding to each piece of sub-picture information, and performing code conversion processing on each text sequence to obtain a masking entity vector corresponding to each text sequence and a masking relation vector of each text sequence; updating the hidden entity vector of the text sequence and the hidden relation vector of the text sequence according to the graph structure information of the sub-graph information corresponding to the text sequence, and reconstructing the vectors to obtain the entity vectors corresponding to the information retrieval knowledge graph and the relation vectors corresponding to the information retrieval knowledge graph. By adopting the method, the accuracy of the entity vector and the relation vector for generating the information retrieval knowledge graph can be improved.

Description

Information retrieval knowledge graph embedding method, device and computer equipment

Technical Field

The present application relates to the field of artificial intelligence technologies, and in particular, to a method, an apparatus, and a computer device for embedding an information retrieval knowledge graph.

Background

In recent years, the rapid development of artificial intelligence and big data technology has induced the domain knowledge graph of various industries. In which the application requirements of the information retrieval field on knowledge patterns are increasing, researchers want to enable a practical application system by constructing the information retrieval knowledge patterns and using the prior knowledge contained in the information retrieval knowledge patterns. However, the information retrieval knowledge graph contains a large amount of graph structure information and text information, so that the information retrieval knowledge graph cannot be directly applied to confirm the retrieval task in the information retrieval process, and therefore the information retrieval knowledge graph in the information retrieval field needs to be embedded in the knowledge graph, so that the confirmation accuracy of the retrieval task is improved. The knowledge graph embedding task is to generate accurate vector representation for entities and relations in the knowledge graph so as to enable downstream tasks and improve the effect of the downstream tasks.

The existing information retrieval knowledge graph embedding related research mainly identifies semantic information of the information retrieval knowledge graph through a graph structure information method, and generates an entity and a relation vector of the information retrieval knowledge graph based on the semantic information. However, by the above technical solution, only the graph structure information of the information retrieval knowledge graph is analyzed, so that the accuracy of generating the entity vector and the relationship vector of the information retrieval knowledge graph is low.

Disclosure of Invention

In view of the foregoing, it is desirable to provide an information retrieval knowledge graph embedding method, apparatus, computer device, computer readable storage medium, and computer program product.

In a first aspect, the present application provides an information retrieval knowledge graph embedding method. The method comprises the following steps:

acquiring an information retrieval knowledge graph, and acquiring each piece of sub-graph information of the information retrieval knowledge graph through a sub-graph sampling algorithm;

converting each piece of sub-graph information into a text sequence corresponding to each piece of sub-graph information through a graph-text conversion strategy, and performing coding conversion processing on each text sequence through a text encoder to obtain a masking entity vector corresponding to each text sequence and a masking relation vector of each text sequence;

and updating the hidden entity vector of the text sequence and the hidden relation vector of the text sequence according to the graph structure information of the sub-graph information corresponding to the text sequence, reconstructing the entity vector corresponding to the updated hidden entity vector of the text sequence and the relation vector corresponding to the updated hidden relation vector of the text sequence by a relation graph decoder to obtain each entity vector corresponding to the information retrieval knowledge graph and each relation vector corresponding to the information retrieval knowledge graph.

Optionally, the collecting each piece of sub-image information of the sample information retrieval knowledge graph includes:

acquiring a plurality of entity nodes of the information retrieval knowledge graph, traversing each entity node through the information retrieval knowledge graph by a sub-sampling algorithm, and obtaining each path information corresponding to each entity node;

and taking the intersecting points of the two-by-two intersecting path information as a central entity, and taking the central entity and the path information intersecting with the central entity as sub-image information.

Optionally, the entity node includes text information, and the converting each piece of sub-graph information into a text sequence corresponding to each piece of sub-graph information through a graphics-text conversion policy includes:

identifying path information contained in the sub-graph information for each piece of sub-graph information, and generating corresponding sub-text sequence information of the path information based on text information of each entity node contained in the path information for each piece of path information;

and based on the center entity of the sub-graph information, splicing sub-text sequence information corresponding to each path information of the sub-graph information through a text splicing strategy to obtain a text sequence corresponding to the sub-graph information.

Optionally, the text encoder includes a entity mask sub-encoder and a relation mask sub-encoder, and the encoding conversion process is performed on each text sequence by the text encoder to obtain a mask entity vector corresponding to each text sequence and a mask relation vector of each text sequence, including:

and for each text sequence, performing entity masking reconstruction processing on the text sequence through the entity masking sub-encoder to obtain a masking entity vector corresponding to the text sequence, and performing relationship masking reconstruction processing on the text sequence through the relationship masking sub-encoder to obtain a masking relationship vector corresponding to the text sequence.

Optionally, the updating, by the relational graph decoder, the masked entity vector of the text sequence and the masked relational vector of the text sequence based on the graph structure information of the sub-graph information corresponding to the text sequence includes:

determining the sub-graph structure information of each entity node of the text sequence based on the graph structure information of the sub-graph information corresponding to the text sequence, and carrying out pooling processing on the sub-graph structure information of each entity node to obtain a graph structure vector of each entity node;

For each entity node of the text sequence, carrying out fusion processing on a sub-masking entity vector of the entity node in the masking entity vector and a graph structure vector of the entity node to obtain a sub-structure masking entity vector of the entity node, and carrying out fusion processing on a sub-masking relation vector of the entity node in the masking entity vector and a graph structure vector of the entity node to obtain a sub-structure masking relation vector of the entity node;

and respectively carrying out convolution processing on the sub-structure covering entity vector of the entity node and the sub-structure covering relation vector of the entity node through a graph convolution neural network of the relation graph decoder to obtain an updated sub-covering entity vector of the entity node and an updated sub-covering relation vector of the entity node, taking the covering entity vector containing the updated sub-covering entity vectors of all entity nodes as the updated covering entity vector of the text sequence, and taking the covering relation vector containing the updated sub-covering relation vector of all entity nodes as the updated covering relation vector of the text sequence.

Optionally, the reconstructing the entity vector corresponding to the updated masked entity vector of the text sequence and the relation vector corresponding to the updated masked relation vector of the text sequence to obtain each entity vector corresponding to the information retrieval knowledge graph and each relation vector corresponding to the information retrieval knowledge graph includes:

identifying a first masking vector in the updated masking entity vector of the text sequence and a second masking vector in the updated masking relation vector of the text sequence, reconstructing the first masking vector through a linear layer of the relation diagram decoder to obtain an entity vector corresponding to the text sequence, and reconstructing the second masking vector through the linear layer to obtain a relation vector corresponding to the text sequence;

and taking the entity vectors of the text sequences corresponding to all the sub-image information of the information retrieval knowledge graph as the entity vectors corresponding to the information retrieval knowledge graph, and taking the relation vectors of the text sequences corresponding to all the sub-image information of the information retrieval knowledge graph as the relation vectors of the information retrieval knowledge graph.

In a second aspect, the application further provides an information retrieval knowledge graph embedding device. The device comprises:

the acquisition module is used for acquiring an information retrieval knowledge graph and acquiring each piece of sub-graph information of the information retrieval knowledge graph through a sub-graph sampling algorithm;

the coding module is used for converting each piece of sub-picture information into a text sequence corresponding to each piece of sub-picture information through a picture-text conversion strategy, and carrying out coding conversion processing on each text sequence through a text encoder to obtain a masking entity vector corresponding to each text sequence and a masking relation vector of each text sequence;

and the generation module is used for updating the hidden entity vector of the text sequence and the hidden relation vector of the text sequence according to the graph structure information of the sub-graph information corresponding to the text sequence by a relation graph decoder, reconstructing the entity vector corresponding to the updated hidden entity vector of the text sequence and the relation vector corresponding to the updated hidden relation vector of the text sequence, and obtaining each entity vector corresponding to the information retrieval knowledge graph and each relation vector corresponding to the information retrieval knowledge graph.

Optionally, the acquiring module is specifically configured to:

Optionally, the encoding module is specifically configured to:

Optionally, the generating module is specifically configured to:

In a third aspect, the present application also provides a computer device. The computer device comprises a memory storing a computer program and a processor implementing the steps of the method of any of the first aspects when the processor executes the computer program.

In a fourth aspect, the present application also provides a computer-readable storage medium. The computer readable storage medium having stored thereon a computer program which, when executed by a processor, implements the steps of the method of any of the first aspects.

In a fifth aspect, the present application also provides a computer program product. The computer program product comprising a computer program which, when executed by a processor, implements the steps of the method of any of the first aspects.

The information retrieval knowledge graph embedding method, the information retrieval knowledge graph embedding device, the computer equipment, the storage medium and the computer program product acquire each piece of sub-graph information of the information retrieval knowledge graph through acquiring the information retrieval knowledge graph and through a sub-graph sampling algorithm; converting each piece of sub-graph information into a text sequence corresponding to each piece of sub-graph information through a graph-text conversion strategy, and performing coding conversion processing on each text sequence through a text encoder to obtain a masking entity vector corresponding to each text sequence and a masking relation vector of each text sequence; and updating the hidden entity vector of the text sequence and the hidden relation vector of the text sequence according to the graph structure information of the sub-graph information corresponding to the text sequence, reconstructing the entity vector corresponding to the updated hidden entity vector of the text sequence and the relation vector corresponding to the updated hidden relation vector of the text sequence by a relation graph decoder to obtain each entity vector corresponding to the information retrieval knowledge graph and each relation vector corresponding to the information retrieval knowledge graph. Dividing the information retrieval knowledge graph into a plurality of sub-graph information through a sub-graph conversion algorithm, improving the analysis accuracy of different parts of the information retrieval knowledge graph, extracting text sequences corresponding to each sub-graph information of the information retrieval knowledge graph through a graph-graph conversion strategy, determining text information corresponding to each sub-graph information, then encoding a covering entity vector and a covering relation vector corresponding to each text sequence through a text encoder, fusing the covering entity vector and the covering relation vector of each text sequence with graph structure information of the sub-graph information through a relation graph decoder, and finally decoding the text sequence to obtain each entity vector corresponding to the information retrieval knowledge graph, accurately determining the information retrieval knowledge graph into a plurality of sub-graph information, and analyzing the entity vector and the structure vector corresponding to the sub-graph information by combining graph structure information and text information of the sub-graph information, thereby improving the accuracy of the entity vector and the relation vector for generating the information retrieval knowledge graph.

Drawings

FIG. 1 is a flow chart of a knowledge graph embedding method for information retrieval in one embodiment;

FIG. 2 is a flow diagram of an example of information retrieval knowledge graph embedding in one embodiment;

FIG. 3 is a block diagram of an information retrieval knowledge-graph embedding apparatus in one embodiment;

fig. 4 is an internal structural diagram of a computer device in one embodiment.

Detailed Description

The present application will be described in further detail with reference to the drawings and examples, in order to make the objects, technical solutions and advantages of the present application more apparent. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the application.

The information retrieval knowledge graph embedding method provided by the embodiment of the application can be applied to a terminal, a server and a system comprising the terminal and the server, and is realized through interaction of the terminal and the server. The terminal may be, but not limited to, various personal computers, notebook computers, smart phones, tablet computers, and the like. The server may be implemented as a stand-alone server or as a server cluster composed of a plurality of servers. The terminal divides the information retrieval knowledge graph into a plurality of sub-graph information through a sub-graph conversion algorithm, improves analysis accuracy of different parts of the information retrieval knowledge graph, extracts text sequences corresponding to each sub-graph information of the information retrieval knowledge graph through a graph-text conversion strategy, determines text information corresponding to each sub-graph information, encodes a covering entity vector and a covering relation vector corresponding to each text sequence, fuses the covering entity vector and the covering relation vector of each text sequence with graph structure information of the sub-graph information through a relation graph decoder, decodes the text sequences to obtain entity vectors corresponding to the information retrieval knowledge graph, accurately determines the information retrieval knowledge graph into a plurality of sub-graph information, combines graph structure information and text information of the sub-graph information, analyzes the entity vector and the structure vector corresponding to the sub-graph information, and improves the accuracy of generating the entity vector and the relation vector of the information retrieval knowledge graph.

In one embodiment, as shown in fig. 1, an information retrieval knowledge graph embedding method is provided, and the method is applied to a terminal for illustration, and includes the following steps:

step S101, an information retrieval knowledge graph is obtained, and each piece of sub-graph information of the information retrieval knowledge graph is collected through a sub-graph sampling algorithm.

In this embodiment, the terminal obtains an information retrieval knowledge graph to be embedded in the information retrieval knowledge graph in response to an information retrieval knowledge graph embedding operation of the terminal. And then, the terminal performs sub-sampling operation on the information retrieval knowledge graph through a sub-sampling algorithm to obtain a plurality of pieces of sub-graph information corresponding to the information retrieval knowledge graph. Wherein each sub-graph information is graph information in which a plurality of path information centered on one entity node are combined, the path information including a forward path (forward-path) and a backward path (backward-path). The information retrieval knowledge graph is graph information formed by a plurality of entity nodes containing text information and graph structure information. The specific process of collecting the plurality of sub-graph information of the information retrieval knowledge-graph will be described in detail later. The sub-sampling algorithm is a multi-path sampling algorithm (center-based path sampling) based on a central entity, wherein the information retrieval knowledge graph can be a paper retrieval knowledge graph, each piece of sub-graph information of the paper retrieval knowledge graph is a sub-knowledge graph corresponding to each piece of paper information in the paper retrieval knowledge graph, the sub-knowledge graph corresponding to the paper information comprises a plurality of sub-information contents of the paper information, and the sub-information contents can be, but are not limited to, summary information, name (paper name, author name, provenance name and the like) information, paper content information, numbering information and the like. Each entity node of the paper retrieval knowledge graph corresponds to one paper information.

Step S102, converting each piece of sub-picture information into a text sequence corresponding to each piece of sub-picture information through a picture-text conversion strategy, and performing coding conversion processing on each text sequence through a text encoder to obtain a masking entity vector corresponding to each text sequence and a masking relation vector of each text sequence.

In this embodiment, the terminal converts each piece of path information in the sub-graph information into a text sequence corresponding to the permutation and combination of each entity node of the path information through a graphic-text conversion strategy. And then, the terminal carries out code conversion processing on each text sequence through a text encoder to obtain a masking entity vector corresponding to each text sequence and a masking relation vector corresponding to each text sequence. Wherein the graphic conversion strategy is a graphic conversion (Graph 2 Seq) strategy based on a natural language model trained. The specific transformation process will be described in detail later. Wherein the text encoder is a text vectorization network trained based on a Masked Modeling idea (Masked Modeling) corresponding entity masking reconstruction (Masked Entity Modeling, MEM) and a relationship masking reconstruction (Masked Relation Modeling, MRM). Specifically, the specific transcoding process will be described in detail later.

Specifically, the training process obtains a plurality of sample text sequences for the terminal, and a sample masking entity vector for each sample text sequence, and a sample masking relationship vector for each sample text sequence. And then the terminal carries out code conversion processing on the sample text sequence through an initial text encoder to obtain a test masking entity vector corresponding to the sample text sequence and a test masking relation vector of each text sequence, and calculates a first similarity between the test masking entity vector corresponding to the sample text sequence and the sample masking entity vector of the sample text sequence and a second similarity between the test masking relation vector corresponding to the sample text sequence and the sample masking relation vector of the sample text sequence through a vector similarity algorithm. And finally, presetting a similarity threshold by the terminal, and deleting the entity masking parameter of the initial text encoder corresponding to the first similarity by the terminal under the condition that the first similarity is lower than the similarity threshold. And under the condition that the second similarity is lower than a similarity threshold, deleting the relation covering parameter of the initial text encoder corresponding to the second similarity by the terminal, and returning to execute the step of performing transcoding processing on the sample text sequence through the initial text encoder to obtain a test covering entity vector corresponding to the sample text sequence and a test covering relation vector of each text sequence until the initial text encoder obtained in the last iteration is used as the text encoder under the condition that the first similarity and the second similarity are both higher than the similarity threshold. The MEM masks one first-order neighbor entity of the central entity at a time (the corresponding entity description will also be masked, avoiding information leakage).

Specifically, the relationship between the entity to be masked and the central entity can be classified into a first entity to be masked and a last entity to be masked, namely:

<c，r，[MASK]>，<[MASK]，r，c>

where c represents the central entity and [ MASK ] is a special token in the BERT model. Similarly, the MRM masks the relationship of the central entity to one first-order neighbor entity node at a time:

<c，[MASK]，t>，<h，[MASK]，c>

where t is a first-order forward neighboring entity node of the central entity and h is a first-order reverse neighboring entity node of the central entity.

Step S103, updating the hidden entity vector of the text sequence and the hidden relation vector of the text sequence by a relation diagram decoder based on the diagram structure information of the sub-diagram information corresponding to the text sequence, and reconstructing the entity vector corresponding to the updated hidden entity vector of the text sequence and the relation vector corresponding to the updated hidden relation vector of the text sequence to obtain each entity vector corresponding to the information retrieval knowledge graph and each relation vector corresponding to the information retrieval knowledge graph.

In this embodiment, for each text sequence, the terminal performs fusion processing on the graph structure information of the sub-graph information corresponding to the text sequence, each masking entity vector of the text sequence, and each masking hanging vector of the text storage through the relational graph decoder, to obtain a masking entity vector of the updated text sequence, and a masking relational vector of the text sequence. And then, reconstructing entity vectors corresponding to the updated masked entity vectors of the text sequence and relation vectors corresponding to the updated masked relation vectors of the text sequence by the terminal based on a vector reconstruction network of the relation graph decoder to obtain all entity vectors corresponding to the information retrieval knowledge graph and all relation vectors corresponding to the information retrieval knowledge graph. The relation graph decoder is a graph neural network model capable of processing multidimensional relations. The specific fusion process, and the reconstruction process will be described in detail later.

Based on the scheme, the information retrieval knowledge graph is divided into a plurality of sub-graph information through a sub-graph conversion algorithm, the analysis accuracy of different parts of the information retrieval knowledge graph is improved, then text sequences corresponding to each sub-graph information of the information retrieval knowledge graph are extracted through a graph-text conversion strategy, text information corresponding to each sub-graph information is determined, then a text encoder encodes a covering entity vector and a covering relation vector corresponding to each text sequence, the covering entity vector and the covering relation vector of each text sequence are fused with graph structure information of the sub-graph information through a relation graph decoder, finally the text sequence is decoded to obtain each entity vector corresponding to the information retrieval knowledge graph, the information retrieval knowledge graph is accurately determined, and the entity vector and the structure vector corresponding to the sub-graph information are analyzed by combining graph structure information and text information of each sub-graph information, so that the accuracy of generating the entity vector and the relation vector of the information retrieval knowledge graph is improved.

Optionally, collecting each piece of sub-graph information of the sample information retrieval knowledge graph includes: acquiring a plurality of entity nodes of the information retrieval knowledge graph, traversing the information retrieval knowledge graph by each entity node through a sub-sampling algorithm, and obtaining each path information corresponding to each entity node; and taking the intersecting points of the two-by-two intersecting path information as a central entity, and taking the central entity and the path information intersecting the central entity as sub-image information.

In this embodiment, the terminal acquires each entity node of the information retrieval knowledge graph, and then randomly selects one entity node as a target entity node. The terminal traverses all the entity nodes connected with the entity node through the sub-sampling algorithm until the traversed entity node has no adjacent entity nodes except the entity node which passes through last time, and the terminal uses all the traversed entity nodes as a piece of path information. Wherein the path information includes a forward path and a reverse path. The terminal takes the intersection point of the paths of the lighted family as a central entity point. The terminal then builds a sub-map information based on the central entity and all path information intersecting the central entity. Likewise, through the scheme, the terminal obtains all sub-graph information of the information retrieval knowledge graph.

Specifically, one entity of the terminal in each entity node is taken as a central entity, and then the entity is taken as a current entity. The terminal randomly selects one from a plurality of triples taking the current entity as a head entity (a tail entity), and walks to the tail entity (the head entity) of the triplet. The terminal repeats the above steps until the number of passing entities reaches the maximum path length or there is no triplet with the current entity as the head entity (tail entity), the terminal has one more forward path (reverse path). And finally, the terminal constructs sub-graph information centering on the center entity by utilizing the generated multiple forward paths and reverse paths.

Based on the scheme, sub-image information centering on a single entity node is obtained through a sub-image sampling algorithm, so that the topological structure information in each sub-image information is improved, the embedding operation of the information retrieval knowledge graph is completed based on each sub-image information, and the accuracy of subsequently acquiring entity vectors and relationship vectors is improved.

Optionally, the entity node includes text information, and converts each piece of sub-image information into a text sequence corresponding to each piece of sub-image information through a graphic-text conversion strategy, including: identifying path information contained in the sub-graph information for each sub-graph information, and generating corresponding sub-text sequence information of the path information based on text information of each entity node contained in the path information for each path information; and based on the center entity of the sub-graph information, splicing sub-text sequence information corresponding to each path information of the sub-graph information through a text splicing strategy to obtain a text sequence corresponding to the sub-graph information.

In this embodiment, the terminal identifies, for each piece of obtained sub-map information, each path information included in the sub-map information based on a central entity of the sub-map information. Then, the terminal generates, for each path information, self-text sequence information of the path information based on text information of each entity node included in the path information. And finally, the terminal performs splicing processing on the sub-text sequence information corresponding to each path information of the sub-image information through a text splicing strategy based on the central entity of the sub-image information to obtain a text sequence corresponding to the sub-image information.

Suppose that the sub-graph G' obtained by each sampling is composed of a plurality of paths, each path P _i Can be expressed as:

P _i ＝{E _i0 ，R _i0 ，E _i1 ，R _i1 ，…，E _iN ，R _iN ，E _iN+1 }

wherein E is _ij Entity information representing each entity node in the text sequence, R _ij Relationship information representing each entity node in the text sequence,<E _ij ，R _ij ，E _ij+1 >representing the j-th triplet in the path. And (5) a terminal. For each path P _i Generating a text sequence T using text descriptions of entities in a path _i Thereby enriching the semantic information of the context. Wherein the expression of the text sequence is:

wherein D is _ij Representation E _ij Is a text description of (E) representing a text splicing operation _ij 、D _ij 、R _ij Separated by commas, respectively.

Subsequently, two special identifiers (token) in the natural language model, namely [ CLS ], are utilized herein]And [ SEP ]]Splice text T corresponding to each path _i Generating a text sequence S corresponding to the subgraph:

based on the scheme, each path information is converted into a text sequence through a picture-text conversion strategy, sub-text sequences are spliced to obtain a text sequence corresponding to a sub-picture, and the content of picture structure information contained in the text sequence is improved on the basis of semantic information of the text sequence.

Optionally, the text encoder includes a entity mask sub-encoder and a relation mask sub-encoder, and the text encoder performs a transcoding process on each text sequence to obtain a masked entity vector corresponding to each text sequence and a masked relation vector of each text sequence, including: and for each text sequence, carrying out entity masking reconstruction processing on the text sequence through an entity masking sub-encoder to obtain a masking entity vector corresponding to the text sequence, and carrying out relationship masking reconstruction processing on the text sequence through a relationship masking sub-encoder to obtain a masking relationship vector corresponding to the text sequence.

In this embodiment, for each text sequence, the terminal performs entity masking reconstruction processing on entity information of each entity node in the text sequence through the entity masking sub-encoder to obtain a masking entity vector corresponding to the text sequence, and performs relationship masking reconstruction processing on relationship information of each entity node in the text sequence through the relationship masking sub-encoder to obtain a masking relationship vector corresponding to the text sequence. Wherein the entity-masked reconstruction process, and the relationship-masked reconstruction process each comprise a pooling operation such that the text sequence is converted into a vector. Wherein the entity masks the cap encoder, which is a training task designed by masking Modeling ideas (Masked Modeling): entity masking reconstruction (Masked Entity Modeling, MEM) trained text encoders; entity Masked subcode, training task designed by masking Modeling concept (Masked Modeling): relational masking reconstruction (Masked Entity Modeling, MEM) trained text encoders.

Based on the scheme, the entity information and the relation information of each entity node in the text sequence are subjected to the covering coding processing through the trained entity covering cap encoder and the relation covering cap encoder, the text sequence is obtained, the relation covering reconstruction processing is carried out on the obtained text sequence, the covering relation vector corresponding to the text sequence is obtained, and the accuracy of acquiring the entity information and the relation information of the entity node is improved.

Optionally, updating, by the relational graph decoder, the masked entity vector of the text sequence and the masked relational vector of the text sequence based on the graph structure information of the sub-graph information corresponding to the text sequence, includes: determining the sub-graph structure information of each entity node of the text sequence based on the graph structure information of the sub-graph information corresponding to the text sequence, and carrying out pooling treatment on the sub-graph structure information of each entity node to obtain a graph structure vector of each entity node; for each entity node of the text sequence, carrying out fusion processing on a sub-masking entity vector for masking the entity node in the entity vector and a graph structure vector of the entity node to obtain a sub-structure masking entity vector for masking the entity node, and carrying out fusion processing on a sub-masking relation vector for masking the entity node in the entity vector and a graph structure vector of the entity node to obtain a sub-structure masking relation vector for the entity node; and respectively carrying out convolution processing on the sub-structure covering entity vector of the entity node and the sub-structure covering relation vector of the entity node through a graph convolution neural network of the relation graph decoder to obtain an updated sub-covering entity vector of the entity node and an updated sub-covering relation vector of the entity node, taking the covering entity vector containing the updated sub-covering entity vector of all the entity nodes as an updated covering entity vector of the text sequence, and taking the covering relation vector containing the updated sub-covering relation vector of all the entity nodes as an updated covering relation vector of the text sequence.

In this embodiment, the terminal determines the sub-graph structure information of each entity node of the text sequence based on the sub-graph structure information of the sub-graph information corresponding to the text sequence, and then the terminal performs pooling processing on the sub-graph structure information of each entity node to obtain the graph structure vector of each entity node. And the terminal performs fusion processing on the sub-covering entity vector covering the entity node in the entity vector and the graph structure vector of the entity node aiming at each entity node of the text sequence to obtain the sub-structure covering entity vector of the entity node. And the terminal performs fusion processing on the sub-covering relation vector covering the entity node in the entity vector and the graph structure vector of the entity node to obtain the sub-structure covering relation vector of the entity node. And the terminal respectively carries out convolution processing on the entity vector covered by the sub-structure of the entity node and the relation vector covered by the sub-structure of the entity node through a graph convolution neural network of the relation graph decoder to obtain an updated sub-covered entity vector of the entity node and an updated sub-covered relation vector of the entity node. The terminal then uses the masked entity vector containing the updated sub-masked entity vectors of all entity nodes as the updated masked entity vector of the text sequence. Finally, the terminal uses the mask relation vector containing the updated sub-mask relation vectors of all entity nodes as the updated mask relation vector of the text sequence.

Specifically, each entity or relationship in the subgraph may be represented as a vector having a dimension ≡and an initial value of the vector is derived from the text encoder. The vector update procedure is as follows:

e _t ＝φ(e _h ，e _r )

wherein, the liquid crystal display device comprises a liquid crystal display device,indicating the combination operation, h, r, t respectively indicate the head entity information, the relation information and the tail entity information in the triplet of one entity node, +.>Representing their vectors. For a K-layer graph decoder, let h ^k+1 _V Representing the vector that node v gets after the k-layer convolution, then:

wherein, the liquid crystal display device comprises a liquid crystal display device,representing the outgoing edge of node v and the corresponding set of first-order forward neighboring entity nodes (u, r),representing a matrix of learnable parameters associated with a particular entity, hk _u And hk _r Representing the vectors of neighbors u and outgoing edges r at the k-1 layer, respectively. After the entity vector update is completed, the relationship vector will also be updated:

wherein, the liquid crystal display device comprises a liquid crystal display device,representing a matrix of learnable parameters associated with a particular relationship. It should be noted that h ⁰ _v And h ⁰ _r I.e., the initialization vector (i.e., the masked entity vector, and the masked relationship vector) generated by the text encoder.

Based on the scheme, the graph structure information in the sub-graph information, the hidden entity vectors of all entity nodes and the hidden relation vectors are fused through the relation graph decoder, so that the content of the graph structure information of the relation vectors and the entity vectors is added, and the accuracy of the finally obtained relation vectors and entity vectors is improved.

Optionally, reconstructing an entity vector corresponding to the updated masked entity vector of the text sequence and a relationship vector corresponding to the updated masked relationship vector of the text sequence to obtain each entity vector corresponding to the information retrieval knowledge graph and each relationship vector corresponding to the information retrieval knowledge graph, including: identifying a first masking vector in the updated masking entity vector of the text sequence and a second masking vector in the updated masking relation vector of the text sequence, reconstructing the first masking vector through a linear layer of a relation diagram decoder to obtain an entity vector corresponding to the text sequence, and reconstructing the second masking vector through the linear layer to obtain a relation vector corresponding to the text sequence; and taking the entity vectors of the text sequences corresponding to all the sub-image information of the information retrieval knowledge graph as the entity vectors corresponding to the information retrieval knowledge graph, and taking the relation vectors of the text sequences corresponding to all the sub-image information of the information retrieval knowledge graph as the relation vectors of the information retrieval knowledge graph.

In this embodiment, the terminal identifies a first mask vector of the updated mask entity vector of the text sequence and a second mask vector of the updated mask relation vector of the text sequence. And then, reconstructing the first masking vector through a linear layer of the relational graph decoder by the terminal to obtain an entity vector corresponding to the text sequence, and reconstructing the second masking vector through the linear layer to obtain a relational vector corresponding to the text sequence. And the terminal takes the entity vectors of the text sequences corresponding to all sub-image information of the information retrieval knowledge graph as the entity vectors corresponding to the information retrieval knowledge graph. And then the terminal takes the relation vector of the text sequence corresponding to all sub-image information of the information retrieval knowledge graph as each relation vector of the information retrieval knowledge graph. The first masking vector and the second masking vector are the sub-entity vector corresponding to the masked entity node and the relation vector corresponding to the masked entity node in the masking entity vector and the masking relation vector.

Specifically, vector h through masked entities, relationships ^K _mask And (3) reconstructing the masking part, wherein the corresponding relation of the reconstruction function is as follows:

wherein W is _m And b _m Is a learnable parameter.

The loss of MEM (entity masked reconstruction) and MRM (relation masked reconstruction) is calculated by a cross entropy function:

where z represents the output result of the linear layer and t represents the real label to be predicted. N represents the total number of tags, equal to the number of entities in MEM tasks, equal to the number of relationships in MRM tasks.

Based on the scheme, the hidden entity vectors and the hidden relation vectors are hidden and rebuilt through the relation graph decoder, so that the entity vectors and the relation vectors fully utilize the graph structure information contained in the information retrieval knowledge graph, and the accuracy of each obtained entity vector and relation vector is improved.

In one embodiment, as shown in fig. 2, an information retrieval knowledge-graph embedding example is provided, the example comprising the steps of:

step S201, an information retrieval knowledge graph and a plurality of entity nodes of the information retrieval knowledge graph are obtained, and each entity node traverses the information retrieval knowledge graph through a sub-sampling algorithm to obtain each path information corresponding to each entity node.

Step S202, the intersecting point of the two-by-two intersecting path information is taken as a central entity, and the central entity and the path information intersecting the central entity are taken as sub-image information.

Step S203, for each piece of sub-graph information, identifies path information included in the sub-graph information, and for each piece of path information, generates corresponding sub-text sequence information of the path information based on text information of each entity node included in the path information.

Step S204, based on the center entity of the sub-graph information, sub-text sequence information corresponding to each path information of the sub-graph information is spliced through a text splicing strategy, and a text sequence corresponding to the sub-graph information is obtained.

Step S205, for each text sequence, performing entity masking reconstruction processing on the text sequence through an entity masking sub-encoder to obtain a masking entity vector corresponding to the text sequence, and performing relationship masking reconstruction processing on the text sequence through a relationship masking sub-encoder to obtain a masking relationship vector corresponding to the text sequence.

Step S206, the sub-graph structure information of the entity nodes is pooled, and the graph structure vector of each entity node is obtained.

Step S207, for each entity node of the text sequence, fusing the entity node hiding sub-entity vector in the entity vector with the entity node graph structure vector to obtain the entity node hiding sub-structure entity vector, and fusing the entity node hiding sub-hiding relation vector in the entity vector with the entity node graph structure vector to obtain the entity node sub-structure hiding relation vector.

Step S208, the sub-structure covering entity vector of the entity node and the sub-structure covering relation vector of the entity node are respectively convolved through a graph convolution neural network of the relation graph decoder to obtain an updated sub-covering entity vector of the entity node and an updated sub-covering relation vector of the entity node, the covering entity vector containing the updated sub-covering entity vectors of all entity nodes is used as an updated covering entity vector of the text sequence, and the covering relation vector containing the updated sub-covering relation vector of all entity nodes is used as an updated covering relation vector of the text sequence.

Step S209, a first mask vector in the updated mask entity vector of the text sequence and a second mask vector in the updated mask relation vector of the text sequence are identified, the first mask vector is reconstructed through a linear layer of the relation diagram decoder to obtain an entity vector corresponding to the text sequence, and the second mask vector is reconstructed through the linear layer to obtain a relation vector corresponding to the text sequence.

Step S210, using entity vectors of text sequences corresponding to all sub-image information of the information retrieval knowledge graph as each entity vector corresponding to the information retrieval knowledge graph, and using relation vectors of text sequences corresponding to all sub-image information of the information retrieval knowledge graph as each relation vector of the information retrieval knowledge graph.

It should be understood that, although the steps in the flowcharts related to the embodiments described above are sequentially shown as indicated by arrows, these steps are not necessarily sequentially performed in the order indicated by the arrows. The steps are not strictly limited to the order of execution unless explicitly recited herein, and the steps may be executed in other orders. Moreover, at least some of the steps in the flowcharts described in the above embodiments may include a plurality of steps or a plurality of stages, which are not necessarily performed at the same time, but may be performed at different times, and the order of the steps or stages is not necessarily performed sequentially, but may be performed alternately or alternately with at least some of the other steps or stages.

Based on the same inventive concept, the embodiment of the application also provides an information retrieval knowledge graph embedding device for realizing the information retrieval knowledge graph embedding method. The implementation scheme of the solution provided by the device is similar to the implementation scheme described in the above method, so the specific limitation in the embodiment of one or more information retrieval knowledge graph embedding devices provided below may refer to the limitation of the information retrieval knowledge graph embedding method hereinabove, and will not be repeated herein.

In one embodiment, as shown in fig. 3, there is provided an information retrieval knowledge graph embedding apparatus, including: an acquisition module 310, an encoding module 320, and a generation module 330, wherein:

an obtaining module 310, configured to obtain an information retrieval knowledge graph, and collect each piece of sub-graph information of the information retrieval knowledge graph through a sub-graph sampling algorithm;

the encoding module 320 is configured to convert each piece of sub-graph information into a text sequence corresponding to each piece of sub-graph information through a graphics context conversion policy, and perform encoding conversion processing on each text sequence through a text encoder to obtain a masking entity vector corresponding to each text sequence and a masking relation vector of each text sequence;

The generating module 330 is configured to update, by using a relational graph decoder, a masked entity vector of the text sequence and a masked relation vector of the text sequence, and reconstruct, for each text sequence, an entity vector corresponding to the updated masked entity vector of the text sequence and a relation vector corresponding to the updated masked relation vector of the text sequence, so as to obtain each entity vector corresponding to the information retrieval knowledge graph and each relation vector corresponding to the information retrieval knowledge graph, based on graph structure information of sub-graph information corresponding to the text sequence.

Optionally, the acquiring module 310 is specifically configured to:

Optionally, the encoding module 320 is specifically configured to:

Optionally, the generating module 330 is specifically configured to:

The above-described respective modules in the information retrieval knowledge graph embedding apparatus may be implemented in whole or in part by software, hardware, and a combination thereof. The above modules may be embedded in hardware or may be independent of a processor in the computer device, or may be stored in software in a memory in the computer device, so that the processor may call and execute operations corresponding to the above modules.

In one embodiment, a computer device is provided, which may be a terminal, and the internal structure of which may be as shown in fig. 4. The computer device includes a processor, a memory, a communication interface, a display screen, and an input device connected by a system bus. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device includes a non-volatile storage medium and an internal memory. The non-volatile storage medium stores an operating system and a computer program. The internal memory provides an environment for the operation of the operating system and computer programs in the non-volatile storage media. The communication interface of the computer device is used for carrying out wired or wireless communication with an external terminal, and the wireless mode can be realized through WIFI, a mobile cellular network, NFC (near field communication) or other technologies. The computer program is executed by a processor to implement an information retrieval knowledge graph embedding method. The display screen of the computer equipment can be a liquid crystal display screen or an electronic ink display screen, and the input device of the computer equipment can be a touch layer covered on the display screen, can also be keys, a track ball or a touch pad arranged on the shell of the computer equipment, and can also be an external keyboard, a touch pad or a mouse and the like.

It will be appreciated by persons skilled in the art that the architecture shown in fig. 4 is merely a block diagram of some of the architecture relevant to the present inventive arrangements and is not limiting as to the computer device to which the present inventive arrangements are applicable, and that a particular computer device may include more or fewer components than shown, or may combine some of the components, or have a different arrangement of components.

In an embodiment, a computer device is provided comprising a memory and a processor, the memory having stored therein a computer program, the processor implementing the steps of the method of any of the first aspects when the computer program is executed.

In an embodiment, a computer-readable storage medium is provided, on which a computer program is stored which, when executed by a processor, implements the steps of the method of any of the first aspects.

In an embodiment, a computer program product is provided, comprising a computer program which, when executed by a processor, implements the steps of the method of any of the first aspects.

The user information (including but not limited to user equipment information, user personal information, etc.) and the data (including but not limited to data for analysis, stored data, presented data, etc.) related to the present application are information and data authorized by the user or sufficiently authorized by each party.

Those skilled in the art will appreciate that implementing all or part of the above described methods may be accomplished by way of a computer program stored on a non-transitory computer readable storage medium, which when executed, may comprise the steps of the embodiments of the methods described above. Any reference to memory, database, or other medium used in embodiments provided herein may include at least one of non-volatile and volatile memory. The nonvolatile Memory may include Read-Only Memory (ROM), magnetic tape, floppy disk, flash Memory, optical Memory, high density embedded nonvolatile Memory, resistive random access Memory (ReRAM), magnetic random access Memory (Magnetoresistive Random Access Memory, MRAM), ferroelectric Memory (Ferroelectric Random Access Memory, FRAM), phase change Memory (Phase Change Memory, PCM), graphene Memory, and the like. Volatile memory can include random access memory (Random Access Memory, RAM) or external cache memory, and the like. By way of illustration, and not limitation, RAM can be in the form of a variety of forms, such as static random access memory (Static Random Access Memory, SRAM) or dynamic random access memory (Dynamic Random Access Memory, DRAM), and the like. The databases referred to in the embodiments provided herein may include at least one of a relational database and a non-relational database. The non-relational database may include, but is not limited to, a blockchain-based distributed database, and the like. The processor referred to in the embodiments provided in the present application may be a general-purpose processor, a central processing unit, a graphics processor, a digital signal processor, a programmable logic unit, a data processing logic unit based on quantum computing, or the like, but is not limited thereto.

The technical features of the above embodiments may be arbitrarily combined, and all possible combinations of the technical features in the above embodiments are not described for brevity of description, however, as long as there is no contradiction between the combinations of the technical features, they should be considered as the scope of the description.

The foregoing examples illustrate only a few embodiments of the application and are described in detail herein without thereby limiting the scope of the application. It should be noted that it will be apparent to those skilled in the art that several variations and modifications can be made without departing from the spirit of the application, which are all within the scope of the application. Accordingly, the scope of the application should be assessed as that of the appended claims.

Claims

1. An information retrieval knowledge graph embedding method, which is characterized by comprising the following steps:

2. The method of claim 1, wherein the collecting sub-graph information of the sample information retrieval knowledge-graph comprises:

3. The method according to claim 2, wherein the entity node contains text information, and the converting each piece of sub-picture information into a text sequence corresponding to each piece of sub-picture information through a picture-text conversion strategy includes:

4. The method according to claim 1, wherein the text encoder includes a entity mask sub-encoder and a relation mask sub-encoder, the encoding and converting, by the text encoder, each text sequence to obtain a masked entity vector corresponding to each text sequence and a masked relation vector of each text sequence, including:

5. The method according to claim 1, wherein the updating, by a relational graph decoder, the masked entity vector of the text sequence and the masked relational vector of the text sequence based on the graph structure information of the sub-graph information corresponding to the text sequence includes:

6. The method according to claim 1, wherein reconstructing the entity vector corresponding to the updated masked entity vector of the text sequence and the relation vector corresponding to the updated masked relation vector of the text sequence to obtain each entity vector corresponding to the information retrieval knowledge graph and each relation vector corresponding to the information retrieval knowledge graph comprises:

7. An information retrieval knowledge graph embedding apparatus, the apparatus comprising:

8. A computer device comprising a memory and a processor, the memory storing a computer program, characterized in that the processor implements the steps of the method of any of claims 1 to 6 when the computer program is executed.

9. A computer readable storage medium, on which a computer program is stored, characterized in that the computer program, when being executed by a processor, implements the steps of the method of any of claims 1 to 6.

10. A computer program product comprising a computer program, characterized in that the computer program, when being executed by a processor, implements the steps of the method of any of claims 1 to 6.