CN116821291A

CN116821291A - Question-answering method and system based on knowledge graph embedding and language model alternate learning

Info

Publication number: CN116821291A
Application number: CN202310716940.XA
Authority: CN
Inventors: 谭启涛; 王开业; 周家樑; 敬龙儿; 黄梅
Original assignee: Chengdu Aerospace Science And Industry Big Data Research Institute Co ltd
Current assignee: Chengdu Aerospace Science And Industry Big Data Research Institute Co ltd
Priority date: 2023-06-16
Filing date: 2023-06-16
Publication date: 2023-09-29

Abstract

The invention discloses a question-answering method and a question-answering system based on knowledge graph embedding and language model alternate learning, which belong to the field of deep learning and comprise the following steps: aiming at the question-answering system, a knowledge graph embedding model and a cross sharing unit are introduced, and the cross sharing unit is used as a tie of the knowledge graph embedding model and the question-answering model, and is connected with the knowledge graph embedding model and the question-answering model, so that the two models exchange information, and the two models can additionally acquire information from the other party, thereby overcoming the defect of information sparsity of the two models. The meaning and structure of language can be better understood by the present invention.

Description

Question-answering method and system based on knowledge graph embedding and language model alternate learning

Technical Field

The invention relates to the field of deep learning, in particular to a question-answering method and system based on knowledge graph embedding and language model alternate learning.

Background

The Knowledge Graph (KG) is a directed Graph with real world entities as nodes and relationships among the entities as edges, and each directed edge, head entity and tail entity form a triplet (head entity, relationship and tail entity), so as to form a huge semantic network Graph. The knowledge graph comprises rich semantic association among entities, and is commonly used in a recommendation system by combining excellent characteristics of a graph data structure. However, the representation of triples, while very effective in structuring data, makes KG difficult to use due to the limitations of the form of the triples. Knowledge graph embedding (Knowledge Graph Embedding) learning is to solve this problem, and learn a low-dimensional feature vector for each entity and relation in the knowledge graph, while maintaining the original structure and semantic information in the graph.

The question answering system (Question Answering, QA) refers to the task of automatically answering questions posed by a user with a computer to meet the knowledge needs of the user. Unlike search engines, the question-answering system returns documents that are no longer ranked based on keyword matching, but rather exact answers. The question-answer technology is divided into a retrievable answer and a generative answer according to the answer generation mode. The retrievable answers correspond to a discriminant model, such techniques have only one fixed answer, and the answers already exist in a database, such as FAQ (Frequently Asked Questions) -based question-answering systems, knowledge-graph-based question-answering systems. The generated answer has no fixed answer, the answer is generated according to the context semantics of the question, and the same question may have different answers in different scenes.

The prior art has the following technical problems: the knowledge graph and the question-answering system have the defect of information sparsity, and the meaning and structure of the language cannot be well understood.

Disclosure of Invention

The invention aims to overcome the defects of the prior art, and provides a question-answering method and a question-answering system based on knowledge graph embedding and language model alternate learning, which can make up for the defect of information sparsity of a model and can better understand the meaning and structure of language.

The invention aims at realizing the following scheme:

a question-answering method based on knowledge graph embedding and language model alternate learning comprises the following steps:

aiming at the question-answering system, a knowledge graph embedding model and a cross sharing unit are introduced, and the cross sharing unit is used as a tie of the knowledge graph embedding model and the question-answering model, and is connected with the knowledge graph embedding model and the question-answering model, so that the two models exchange information, and the two models can additionally acquire information from the other party, thereby overcoming the defect of information sparsity of the two models.

Further, the knowledge graph embedding model and the cross sharing unit are introduced, the cross sharing unit is used as a tie of the knowledge graph embedding model and the question-answering model, and the knowledge graph embedding model and the question-answering model are connected, so that the two models exchange information, and the method comprises the following substeps:

s1, arranging corpus data into question-answer pairs, generating a corpus dictionary and generating corpus embedding word compressing;

s2, constructing a knowledge graph triplet in the form of head, relation and tail, wherein the head represents a head entity, the relation represents a relation, and the tail represents a tail entity; constructing a map entity dictionary according to all the heads and the tails, and constructing map entity embedding KG entity embedding according to the map entity dictionary; constructing a relation dictionary according to all the relation, and constructing relation embedding relation embedding according to the relation dictionary;

s3, inquiring in a corpus dictionary according to the input problem query to obtain a problem index query index; inquiring in the word embedding according to the query index to obtain a question embedding question embedding; inputting question embedding into a pre-trained large language model, decoding the output;

s4, inquiring in a map entity dictionary according to the query entity of the problem entity to obtain a problem entity index question entity index; according to question entity index, inquiring in a map entity ebadd to obtain a problem entity embedding question entity embedding;

s5, inquiring in a map entity dictionary according to the map head entity KG head to obtain a map head entity index KG head index; inquiring in the map entity embedding KG entity embedding according to the KG head index to obtain a map head entity embedding KG head embedding;

s6, inquiring in a relation dictionary according to the graph relation KG relation to obtain a graph relation index KG relation index; inquiring in relation embedding according to KG relation index to obtain a map relation embedding KG relation embedding;

s7, the problem entity embedding question entity embedding and the map head entity embedding KG head embedding are connected through a cross sharing unit cross-feature-sharing units of two layers, and finally, a problem entity cross sharing output question entity cross output-2 and a map head entity cross sharing output KG head cross output-2 are obtained;

s8: the decoding output of the step S3 and question entity cross output-2 of the step S7 are subjected to a connection layer Concat operation, and then sequentially pass through a multiple DNN layer and a Softmax layer to obtain the predictive probability of QA;

s9: the output map relation embedding KG relation embedding of the step S6 and the map head entity cross sharing output KG head cross output-2 of the step S7 are subjected to a connection layer Concat operation, and then a DNN layer is adopted to obtain a map tail embedding predicted value prediction KG tail embedding;

step 10: inquiring in a KG dictionary according to the KG tail of the map to obtain a KG tail index; and according to KG tail index, inquiring in the map entity embedding KG entity embedding to obtain a map tail entity embedding KG tail embedding.

Further, in step S1, the "question-answer" pair is in the form of: { "Q": "question", "A": "answer" }; generating a corpus dictionary, namely establishing characters and indexes corresponding to the characters; the corpus is generated by adopting a vector to represent each corpus character.

Further, in step S1, the dimension of ebedding takes a multiple of 12.

Further, in step S7, the Cross sharing unit Cross-feature-sharing units structure includes an input Layer, an outer product operation Layer Cross, a feature matrix Layer Cross feature matrix, a feature fusion Layer compression and an output Layer layer+1;

e (l) and V (l) are input from an input Layer L, an outer product operation is performed at an outer product operation Layer Cross, C (l) and C (l, t) are obtained at a feature matrix Layer Cross feature matrix, W (l, VV), W (l, EV), W (l, EE) and W (l, EE) are obtained at a feature fusion Layer compression, and E (l+1) and V (l+1) are output at an output Layer L+1; the intermediate calculation process is as follows:

C(l)＝V(l)E(l)

V(l+1)＝C(l)W(l,VV)+C(l,t)W(l,EV)+b(l,V)

E(l+1)＝C(l)W(l,VE)+C(l,t)W(l,EE)+b(l,E)

wherein l represents a first cross sharing unit, E (l) and V (l) each represent an input characteristic of the first cross sharing unit, C (l) represents an outer product operation result of E (l) and V (l), C (l, t) represents a transpose of C (l), W (l, VV), W (l, EV), W (l, EE) and W (l, VE) each represent a weight parameter, b (l, V) and b (l, E) each represent a bias parameter, and E (l+1) and V (l+1) each represent an output characteristic of the first cross unit.

Further, in step S8, the loss function of the QA task is the cross entropy loss plus the L2 regularization term:

wherein N represents the number of samples, y _i Representing class labels, p _i Representing the prediction probability, w _l Represents a regularization parameter, λ represents a regularization rate, and l represents a first parameter.

Further, in step S10, the loss function of the KGE task is a mean square error plus L2 regularization term:

further, the training method comprises the following steps: and adopting an alternate learning mode, and performing alternate training by using an end-to-end method.

Further, in step S3, the pre-trained large language model includes Bert and GPT.

A question-answering system based on knowledge graph embedding and language model alternate learning, comprising a computer device, wherein a program is stored in a memory of the computer device, and the question-answering method based on knowledge graph embedding and language model alternate learning as set forth in any one of the above is executed when the program is loaded by a processor.

The beneficial effects of the invention include:

the invention adopts a multi-task learning framework, uses the cross interest units as the links to connect the question-answering system and the knowledge graph feature learning task, and uses the question-answering system and the knowledge graph feature learning as two separate but related tasks, so that the two models can acquire information from the other party additionally, thereby overcoming the defect of information sparsity. Based on the pre-training large predictive model, the dependency relationship between the context information is fully learned, and the model can better understand the meaning and structure of the language.

Drawings

In order to more clearly illustrate the embodiments of the invention or the technical solutions of the prior art, the drawings which are used in the description of the embodiments or the prior art will be briefly described, it being obvious that the drawings in the description below are only some embodiments of the invention, and that other drawings can be obtained according to these drawings without inventive faculty for a person skilled in the art.

FIG. 1 is a question-answering system structure based on knowledge graph embedding and language model alternate learning in an embodiment of the invention;

FIG. 2 is a cross-feature sharing unit structure according to an embodiment of the present invention.

Detailed Description

All of the features disclosed in all of the embodiments of this specification, or all of the steps in any method or process disclosed implicitly, except for the mutually exclusive features and/or steps, may be combined and/or expanded and substituted in any way.

In view of the problems in the background, the inventors of the present invention have further considered and analyzed and found that the existing related schemes are as follows:

scheme one: faq+bert protocol. In the early stage, a plurality of companies or institutions can set up own frequently asked question list, and the FAQ-based question-answering system has the advantages of large data volume, high question quality and the like, but has fewer questions in a specific field, and limits the application range of the FAQ-based question-answering system. The FAQ converts the user question into a sentence vector through algorithms such as TF-IDF and the like, calculates the similarity with the question vector of the database, and finally returns an answer corresponding to the question with the maximum similarity. In order to enhance the understanding of user questions, students have also used the Bert model to convert user questions into sentence vectors in recent years.

Scheme II: knowledge graph + Cypher scheme. Under the specific field, a question-answering system is built by a common knowledge graph, and the method is mainly divided into 3 links. First, intention recognition, namely, recognizing the intention of a user and understanding a problem raised by the user; secondly, analyzing the problems, namely converting the problems of the user into Cypher sentences based on the Cypher sentences; thirdly, knowledge retrieval and answer generation, namely inquiring in a Neo4j database based on a Cypher sentence, and forming a natural sentence according to a template by the inquired result and returning the natural sentence to a user.

Scheme III: a language model based tuning scheme. The language model is the core: for any sequence of words, the language model can calculate the probability that the sequence is a sentence. Such as the article of xx platform, true water, and sleeping apple, fast, the former is a sentence, the language model gives a higher probability, and the latter has a lower probability. The pre-training language models which are successful at present are Bert, GPT and the like. The question-answering model is typically fine-tuned based on an existing pre-trained language model, i.e., training is continued on the basis of domain-specific data for downstream question-answering tasks.

The inventors of the present invention further conducted inventive analysis and found that the above scheme had the following problems:

the first scheme has the characteristics of large quantity, multiple problems and the like, but the calculated quantity is increased when the similarity is calculated, the memory overhead is increased, and the number of problems in a specific field is also small;

the second scheme has the problem that only the knowledge in the map can be answered, and when the knowledge of the map is exceeded, the answer cannot be returned;

the problem of the scheme I and the scheme II is that the problem in the mode has fixed answers in the system, belongs to a discriminant model, and can not obtain different answers according to the semantic information of the context for the same problem in different scenes;

the third scheme is that the method can extract semantic information of the context, but does not combine the knowledge graph with the language model;

the problem of combining the scheme II and the scheme III is that only a knowledge graph or a language model is singly used, and in practice, the entity related to the language model is overlapped with the entity in the knowledge graph and is not mutually independent.

Therefore, in order to solve the technical problems, the invention provides a question-answering method and a question-answering system based on knowledge graph embedding and language model alternate learning. The embodiment of the invention relates to a knowledge graph embedding model, a language model, and an alternate learning method of the knowledge graph embedding model and the language model.

The technical conception of the invention is as follows: in the embodiment, for the question-answering system, a knowledge graph embedded model and a cross sharing unit (cross-feature-sharing units) are introduced, and the cross sharing unit can enable the two models to exchange information and serve as ties of the knowledge graph embedded model (Knowledge Graph Embedding, KGE) and the question-answering model, and the knowledge graph embedded model and the question-answering model are connected, so that the two models can additionally acquire information from the other party, and the defect of information sparsity of the two models is overcome. An end-to-end (end-to-end) method is used for alternate training in an alternate learning mode. The language model does not need to be trained from the head, and a pre-trained large language model with open sources such as Bert, GPT and the like can be adopted.

In a further technical concept, as shown in fig. 1, a question-answering method for knowledge graph embedding and language model alternative learning is specifically provided, and specifically includes the following steps:

step 1: the corpus data is organized into "question-answer" pairs of the following form: { "Q": "what are the process cracks of an aeroengine? "A": includes casting and forging cracks "}; generating a corpus dictionary, namely establishing indexes of characters and character correspondence, for example: { "UNK":0, "navigate" 1, "null" 2,. }; generating corpus embedding word embedding, namely, each corpus character adopts a vector to express, and the dimension of the embedding is a multiple of 12;

step 2: constructing a knowledge graph triplet in the form of (head, relation, tail), wherein the head represents a head entity, the relation represents a relation, and the tail represents a tail entity; constructing a map entity dictionary { "entity 1":0, "entity 2": building a map entity insert KG entity embedding from a map entity dictionary; constructing a relation dictionary { "relation 1":0, "relationship 2": building a relationship embedding relation embedding according to the relationship dictionary;

step 3: according to the input problem query, inquiring in a corpus dictionary to obtain a problem index query index; inquiring in the word embedding according to the query index to obtain a question embedding question embedding; inputting question embedding into a pre-trained large language model, decoding the output;

step 4: inquiring in a map entity dictionary according to the query entity of the problem entity to obtain a problem entity index question entity index; according to question entity index, querying in the map entity embedding KG entity embedding to obtain a problem entity embedding question entity embedding;

step 5: inquiring in a map entity dictionary according to the map head entity KG head to obtain a map head entity index KG head index; inquiring in the map entity embedding KG entity embedding according to the KG head index to obtain a map head entity embedding KG head embedding;

step 6: inquiring in a relation dictionary according to the graph relation KG relation to obtain a graph relation index KG relation index; according to KG relation index, querying in relation embedding relation embedding to obtain a map relation embedding KG relation embedding;

step 7: the problem entity embedding question entity embedding and the map header entity embedding KG head embedding are connected through two layers of Cross sharing units, namely Cross sharing output question entity Cross output-2 and map header entity Cross sharing output KG head Cross output-2 are finally obtained, the Cross sharing units are as shown in fig. 2, E (l) and V (l) are input from an input Layer l, an outer product operation is carried out on the outer product operation Layer Cross, C (l) and C (l, t) are obtained in a feature matrix Layer Cross feature matrix, W (l, VV), W (l, EV), W (l, EE) and W (l, EE) are obtained in a feature fusion Layer compression, and E (l+1) and V (l+1) are output in an output Layer l+1; the intermediate calculation process is as follows:

C(l)＝V(l)E(l)

V(l+1)＝C(l)W(l,VV)+C(l,t)W(l,EV)+b(l,V)

E(l+1)＝C(l)W(l,VE)+C(l,t)W(l,EE)+b(l,E)

wherein l represents the input characteristics of the first cross sharing unit, E (l) and V (l) represent the input characteristics of the first cross sharing unit, C (l) represents the result of the outer product operation of E (l) and V (l), C (l, t) represents the transpose of C (l), W (l, VV), W (l, EV), W (l, EE), W (l, VE) represents the weight parameters, b (l, V) represents the bias parameters, E (l+1) and V (l+1) represent the output characteristics of the first cross unit.

Step 8: and (3) performing a connection layer Concat operation on the decoding output of the step (3) and the question entity cross output-2 of the step (7), and sequentially passing through a multiple DNN layer and a Softmax layer to obtain the predictive probability of QA, wherein the loss function of the QA task is a cross entropy loss plus L2 regular term.

Wherein N represents the number of samples，y _i Representing class labels, p _i Representing the prediction probability, w _l Represents a regularization parameter, λ represents a regularization rate, and l represents a first parameter.

Step 9: and (3) performing connection layer Concat operation by the output map relation embedding KG relation embedding of the step (6) and the map head entity cross sharing output KG head cross output-2 of the step (7), and obtaining a map tail embedded predicted value prediction KG tail embedding through a DNN layer.

Step 10: inquiring in a KG dictionary according to the KG tail of the map to obtain a KG tail index; and according to KG tail index, inquiring in the graph entity embedding KG entity embedding to obtain the graph tail entity embedding KG tail embedding, wherein the loss function of the KGE task is a regular term of mean square error plus L2.

It should be noted that, within the scope of protection defined in the claims of the present invention, the following embodiments may be combined and/or expanded, and replaced in any manner that is logical from the above specific embodiments, such as the disclosed technical principles, the disclosed technical features or the implicitly disclosed technical features, etc.

Example 1

Example 2

On the basis of embodiment 1, the knowledge graph embedding model and the cross sharing unit are introduced, the cross sharing unit is used as a tie of the knowledge graph embedding model and the question-answering model, and the knowledge graph embedding model and the question-answering model are connected, so that the two models exchange information, and the method comprises the following substeps:

Example 3

On the basis of embodiment 1, in step S1, the "question-answer" pair is in the form of: { "Q": "question", "A": "answer" }; generating a corpus dictionary, namely establishing characters and indexes corresponding to the characters; the corpus is generated by adopting a vector to represent each corpus character.

Example 4

On the basis of example 3, in step S1, the dimension of ebedding takes a multiple of 12.

Example 5

On the basis of embodiment 1, in step S7, the Cross sharing unit Cross-feature-sharing units structure includes an input Layer, an outer product operation Layer Cross, a feature matrix Layer Cross feature matrix, a feature fusion Layer compression and an output Layer layer+1;

C(l)＝V(l)E(l)

V(l+1)＝C(l)W(l,VV)+C(l,t)W(l,EV)+b(l,V)

E(l+1)＝C(l)W(l,VE)+C(l,t)W(l,EE)+b(l,E)

Example 6

Based on embodiment 1, in step S8, the loss function of the QA task is the cross entropy loss plus the L2 regularization term:

Example 7

Based on embodiment 6, in step S10, the loss function of the KGE task is a mean square error plus L2 regularization term:

example 8

On the basis of example 1, the training steps are included: and adopting an alternate learning mode, and performing alternate training by using an end-to-end method.

Example 9

On the basis of embodiment 2, in step S3, the open-source pre-trained large language model includes Bert and GPT.

Example 10

A question-answering system based on knowledge graph embedding and language model alternate learning, comprising a computer device, wherein a program is stored in a memory of the computer device, and the question-answering method based on knowledge graph embedding and language model alternate learning according to any one of embodiments 1 to 9 is executed when the program is loaded by a processor.

The units involved in the embodiments of the present invention may be implemented by software, or may be implemented by hardware, and the described units may also be provided in a processor. Wherein the names of the units do not constitute a limitation of the units themselves in some cases.

According to an aspect of embodiments of the present invention, there is provided a computer program product or computer program comprising computer instructions stored in a computer readable storage medium. The computer instructions are read from the computer-readable storage medium by a processor of a computer device, and executed by the processor, cause the computer device to perform the methods provided in the various alternative implementations described above.

As another aspect, the embodiment of the present invention also provides a computer-readable medium that may be contained in the electronic device described in the above embodiment; or may exist alone without being incorporated into the electronic device. The computer-readable medium carries one or more programs which, when executed by the electronic device, cause the electronic device to implement the methods described in the above embodiments.

Claims

1. A question-answering method based on knowledge graph embedding and language model alternate learning is characterized by comprising the following steps:

2. The learning-by-learning question-answering method based on knowledge-graph embedding and language model alternation according to claim 1, wherein the introduction of knowledge-graph embedding model and cross sharing unit, the cross sharing unit as a tie of knowledge-graph embedding model and question-answering model, connects the two models to let the two models exchange information, comprises the sub-steps of:

s2, constructing a knowledge graph triplet in the form of head, relation and tail, wherein the head represents a head entity, the relation represents a relation, and the tail represents a tail entity; constructing a map entity dictionary according to all the heads and the tails, and constructing map entity embedding KG entityembedding according to the map entity dictionary; constructing a relation dictionary according to all the relation, and constructing relation embedding relation embedding according to the relation dictionary;

s3, inquiring in a corpus dictionary according to the input problem query to obtain a problem index; inquiring in the word embedding according to the query index to obtain a question embedding question embedding; inputting question embedding into a pre-trained large language model, decoding the output;

s5, inquiring in a map entity dictionary according to the map head entity KG head to obtain a map head entity index KG head index; inquiring in the map entity embedding KG entityembedding according to the KG head index to obtain a map head entity embedding KG head embedding;

s7, connecting the problem entity embedding question entity embedding and the map head entity embedding KGhead embedding through two layers of cross sharing units, namely cross-feature-sharing units, and finally obtaining a problem entity cross sharing output question entity cross output-2 and a map head entity cross sharing output KG head cross output-2;

3. The learning-by-learning question-answer method based on knowledge-graph embedding and language model alternation according to claim 1, wherein in step S1, the "question-answer" pair is in the form of: { "Q": "question", "A": "answer" }; generating a corpus dictionary, namely establishing characters and indexes corresponding to the characters; the corpus is generated by adopting a vector to represent each corpus character.

4. A knowledge-based learning-by-learning method with alternating language models as claimed in claim 3, wherein in step S1, the dimension of empedding takes a multiple of 12.

5. The learning-by-learning question-answering method based on knowledge graph embedding and language model alternation according to claim 1, wherein in step S7, the Cross sharing unit Cross-feature-sharing structures include an input Layer, an outer product operation Layer Cross, a feature matrix Layer Cross featurematrix, a feature fusion Layer compression and an output Layer +1;

C(l)＝V(l)E(l)

V(l+1)＝C(l)W(l,VV)+C(l,t)W(l,EV)+b(l,V)

E(l+1)＝C(l)W(l,VE)+C(l,t)W(l,EE)+b(l,E)

6. The knowledge-graph-embedding-and-language-model-alternating-learning-based question-answering method according to claim 1, wherein in step S8, the loss function of the QA task is a cross entropy loss plus an L2 regularization term:

7. The learning-by-learning question-answering method based on knowledge graph embedding and language model alternation according to claim 6, wherein in step S10, the loss function of KGE task is a mean square error plus L2 regularization term:

8. the knowledge-based learning-by-learning question-answering method based on knowledge-graph embedding and language model alternation according to claim 1, comprising the training step of: and adopting an alternate learning mode, and performing alternate training by using an end-to-end method.

9. The knowledge-based learning-by-learning method of knowledge-graph embedding and language model alternation according to claim 2, wherein in step S3, the pre-trained large language model comprises Bert and GPT.

10. A question-answering system based on knowledge graph embedding and language model alternate learning, characterized by comprising a computer device, wherein a program is stored in a memory of the computer device, and the question-answering method based on knowledge graph embedding and language model alternate learning according to any one of claims 1 to 9 is executed when the program is loaded by a processor.