CN117688189A - Knowledge graph, knowledge base and large language model fused question-answering system construction method - Google Patents

Knowledge graph, knowledge base and large language model fused question-answering system construction method Download PDF

Info

Publication number
CN117688189A
CN117688189A CN202311821070.9A CN202311821070A CN117688189A CN 117688189 A CN117688189 A CN 117688189A CN 202311821070 A CN202311821070 A CN 202311821070A CN 117688189 A CN117688189 A CN 117688189A
Authority
CN
China
Prior art keywords
question
model
knowledge
entity
knowledge graph
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202311821070.9A
Other languages
Chinese (zh)
Other versions
CN117688189B (en
Inventor
田茂春
李镇江
蓝日成
杨跃
甘郝新
范光伟
赵平
王清正
刘怡心
刘斌
张水平
赖杭
***
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangxi Datengxia Water Control Project Development Co ltd
Pearl River Hydraulic Research Institute of PRWRC
Original Assignee
Guangxi Datengxia Water Control Project Development Co ltd
Pearl River Hydraulic Research Institute of PRWRC
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangxi Datengxia Water Control Project Development Co ltd, Pearl River Hydraulic Research Institute of PRWRC filed Critical Guangxi Datengxia Water Control Project Development Co ltd
Priority to CN202311821070.9A priority Critical patent/CN117688189B/en
Publication of CN117688189A publication Critical patent/CN117688189A/en
Application granted granted Critical
Publication of CN117688189B publication Critical patent/CN117688189B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a method for constructing a question-answering system by fusing a knowledge graph, a knowledge base and a large language model, belongs to the technical field of natural language processing, and provides a complete method for constructing a question-answering system. Aiming at the data characteristics of the water conservancy industry, the question-answering system is customized from multiple dimensions, and a set of perfect question-answering system construction methods are provided, including model selection, training strategies, data set construction modes and the like. According to the invention, a pipeline mode is adopted to combine natural language processing models to construct a set of perfect question processing architecture, and all required data sets are constructed from target knowledge graphs without a large number of manual labels. The framework ensures the accuracy and the comprehensiveness of the knowledge graph question-answering system, simultaneously couples the knowledge base, the knowledge graph and the large language model, realizes the advantage complementation between the knowledge base, the knowledge graph and the large language model, and improves the use experience of users.

Description

Knowledge graph, knowledge base and large language model fused question-answering system construction method
Technical Field
The invention belongs to the technical field of natural language processing, and particularly relates to a method for constructing a question-answering system integrating a knowledge graph, a knowledge base and a large-scale language model.
Background
The question-answering system is an important means for humans to obtain information from large-scale data. The question answering system based on the natural language processing technology enables users to present questions in an intuitive and natural mode, so that needed information is obtained. Generally, knowledge sources can be divided into knowledge maps, knowledge bases and general documents according to different knowledge storage media. The knowledge base is a wide concept including any form of knowledge storage, and the knowledge graph is a specific form of knowledge base, emphasizes semantic relations among entities and is easy to inquire and infer. The method has the advantages that the independent knowledge patterns and document libraries are arranged in different vertical fields, and a question-answering system is built aiming at the knowledge patterns and the document libraries, so that the method has important significance in aspects of science popularization, knowledge learning and decision support, and can assist users to quickly search knowledge sources and discover potential relations among different knowledge objects.
However, the existing question-answering system has the following problems: the knowledge graph of the water conservancy industry has the characteristics of complex data structure, different entity lengths and the like, and the scene proportion of multi-hop query performed under specific conditions is larger, so that higher requirements are provided for entity extraction, entity link, inference rule design and the like. Furthermore, deep learning natural language processing models rely heavily on high quality manually labeled datasets, which is the greatest difficulty in the construction of question-answering systems. Finally, most of the existing question-answering systems focus on breakthrough of a specific technology, such as entity extraction, graph reasoning and overall modeling, and tend to ignore the integrity of the system itself. The knowledge graph is limited in size, the manually constructed rules are also usually focused on part of the most common problem types, and the question-answering system constructed in the mode always has limitations and is difficult to apply to actual life and work.
Disclosure of Invention
The invention aims at: in order to solve the problems, a question-answering system construction method integrating a knowledge graph, a knowledge base and a large-scale language model is provided.
The technical scheme adopted by the invention is as follows: a method for constructing a question-answering system integrating a knowledge graph, a knowledge base and a large language model comprises the following steps:
s1: acquiring a question input by a user, and extracting all entity references existing in a question-answering system problem by using a deep learning model;
s2: searching potential link entities for each entity in a specified knowledge graph by using a candidate entity ranking algorithm;
s3: and carrying out question classification on the questions presented by the user by using a preset question template set, and selectively returning one of the knowledge graph answers, the large language model answers or the knowledge base answers according to classification results.
In a preferred embodiment, the construction method includes obtaining a question input by a user, i.e. extracting a natural language question in the form of a string of characters transferred from the interface. All entity references present in the question-answering system questions are extracted using a deep learning model, with specific steps including model training and model prediction.
In a preferred embodiment, in the step S1, the model training includes the steps of:
s1.1: and designing a problem seed template according to the service requirement, and setting a category label for each template. Searching all entities in the knowledge graph, randomly filling the entities into a seed template according to the types of the entities, and recording the filled position indexes to form an entity extraction data set D. Wherein the filled template sentences are used as data set samples, and the recorded position indexes are used as labels.
S1.2: EDA data enhancement of entity extraction data set D, specifically including Random Mask (RM), random delete (RandomDelete, RD), random insert (RandomInsertion, RI) and synonym substitution (SynonymReplacement, SR), to obtain enhanced data set D e-ner
S1.3: using enhanced dataset D e-ner A named entity recognition model is trained, the named entity recognition model uses a PRGC (PotentialRelationandGlobalCorrespondence) architecture, and an Encoder part uses a Hadamard large open source Chinese pre-training Roberta-wwm model.
In a preferred embodiment, in the step S1, the model prediction uses the PRGC model after training to extract named entities from the question input by the user, so as to obtain all the entity references.
In a preferred embodiment, in the step S2, the specific steps are as follows:
s2.1: and traversing all entities in the searching knowledge graph to form a candidate entity list, storing the list by using a Faiss vector library, and selecting an m3e-base model by using a text vectorization model.
S2.2: for each entity mention, use Faiss vector library as L 2 Top-5 entities most relevant to similarity retrieval as candidate link entities E sim And obtains a normalized similarity value as a vector similarity score S sim
S2.3: for E sim The popularity score of each candidate link entity is calculated, and the specific calculation formula is as follows:
wherein: in-deg (e) is the sum of the outbound and inbound degrees of entity e, and α is a hyper-parameter (typically a positive integer, which varies according to the complexity of the knowledge-graph).
S2.4: the search score is the sum of the vector similarity score and the candidate entity popularity score, namely:
reorder candidate link entity E based on search score sim And obtaining the most relevant entity mentioned by each entity, and completing entity linking.
In a preferred embodiment, in the step S3, the model training includes the steps of:
s3.1: enhancement data set D obtained using S1.2 e-ner Further constructing a question classification data set:
s3.2: classifying data sets D using questions e-cls And training a question classification model. The question classification model uses a Bert-FC architecture, wherein the Bert model uses a Hadamard large open source Chinese pre-trained Roberta-wwm model.
In a preferred embodiment, in the step S3.1, the model training includes the steps of:
s3.1.1: the question is Sentence, and all entities in the question are extracted from the recorded entity filling position and marked as e 1 、e 2 、…、e n
S3.1.2: using special tags [ CLS ] and [ SEP ], the question is spliced with all entities into the following form:
Q=[CLS],Sentence,[SEP],e 1 ,[SEP],e 2 ,[SEP]……
wherein Q is taken as a sample, and the category of the seed template is taken as a label, so as to obtain a question classification data set D e-cls
In a preferred embodiment, in the step S3, the step of model prediction includes:
s3.3: and further classifying by using a question classification model according to the questions input by the user and the obtained candidate link entities to obtain the category to which the questions belong, and correspondingly returning a knowledge graph answer, a large model answer or a knowledge base answer. The specific description is as follows:
s3.3.1: knowledge graph answer: after the link entity is structured and mapped, the Cypher query statement is called to query and infer, and a specific entity or path is returned to be used as a knowledge graph answer.
S3.3.2: large model answer: and constructing attribute information of the questions and the link entities into Prompt, and further inputting the Prompt into the large model to obtain a large model answer. Wherein the large model may use an open API, or a localized proprietary deployment.
S3.3.3: knowledge base answer: the knowledge base answers contain the summarized results of the large model, as well as the background knowledge about the questions and the sources of the background knowledge. The method specifically comprises the following three steps:
s3.3.3.1: the docx library or the pdfplumber library based on Python splits a local file (e.g. a PDF document) into private knowledge bases, and the splitting rules are four (implemented by regular expressions):
(1) The Chinese and English periods not before and after the number are replaced by line-wrapping characters (\n).
(2) A line feed (\n) is inserted in the middle of the case where (\d\d) occurs after punctuation.
(3) Chinese and English semicolons are replaced by \n.
(4) And adding the sentences before Chinese and English colon (:) as shared sentences into each subsequent clause.
Finally, the documents are split according to the line-feed symbol (n) to form a knowledge base.
S3.3.3.2: and (3) using Faiss in combination with a text embedding model m3e-base, retrieving knowledge in a knowledge base according to user input problems as background knowledge, and simultaneously recording a knowledge source file.
S3.3.3.3: and forming a promtt by the user input questions and knowledge base background knowledge, and further inputting the promtt into the large model to obtain summarized answers of the large model as knowledge base answers. The large model used was consistent with S3.3.2 procedure.
In a preferred embodiment, in the step S1, extracting transformation refers to extracting possible entity references from the natural language text, and identifying key objects queried by the question text is designed to further explore a new paradigm of the question-answering system; on one hand, entity references possibly existing in the text are extracted based on a natural language processing model, keyword extraction can be intelligently realized, and the method is suitable for continuously changing question contexts; on the other hand, the text similarity algorithm is utilized to extract the related knowledge in the knowledge base, so that the expertise of the large language model answer can be enhanced, and the illusion problem of the large language model is greatly lightened; in addition, the system also realizes user-friendly interaction service.
In a preferred embodiment, in the step S3, the system interaction flow is as follows: (1) After the problem is preprocessed, a named entity identification module and an entity linking module are input to finish the linking of the problem and the knowledge graph candidate entity; (2) Completing intention recognition and template matching by combining the problems with candidate entities successfully linked through a text classification model; (3) According to the identified intention and the template, the system automatically selects an answer mode for the question, wherein the answer mode comprises knowledge graph inquiry, knowledge graph reasoning, inquiring a large language model by combining a knowledge base and a prompt word and independently inquiring the large language model; (4) And returning different types of answers according to different answer modes, and returning the answers to the user through interfaces of different styles.
In summary, due to the adoption of the technical scheme, the beneficial effects of the invention are as follows:
1. the invention provides a complete question-answering system construction method. Aiming at the data characteristics of the water conservancy industry, the question-answering system is customized from multiple dimensions, and a set of perfect question-answering system construction methods are provided, including model selection, training strategies, data set construction modes and the like. On the premise of ensuring the functional accuracy, the capability of the deep learning technology in the question-answering system is enhanced, and the functions of the knowledge graph and other technologies in the question-answering system are fully exerted.
2. In the invention, aiming at important data in the knowledge graph, a question-answering system gives out answers to the knowledge graph by using a rule template and a Cypher query statement, so that a user can intuitively check knowledge venation and check the relation between knowledge entities through operations such as double click expansion; aiming at general data in a knowledge base, a question-answering system gives out summarized answers through a large language model, so that the tedious work of reading a large number of files by a user is avoided, knowledge texts and source files are given out at the same time, and a reasonable question-answering effect is realized; aiming at wide other data, the question-answering system gives answers through a large language model, so that all questions asked by a user can be answered, and the use experience of the user is improved.
Drawings
Fig. 1 is a schematic flow diagram of the present invention.
Detailed Description
The present invention will be described in further detail with reference to the drawings and examples, in order to make the objects, technical solutions and advantages of the present invention more apparent. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the invention.
Referring to fig. 1:
examples:
a method for constructing a question-answering system integrating a knowledge graph, a knowledge base and a large language model comprises the following steps:
s1: acquiring a question input by a user, and extracting all entity references existing in a question-answering system problem by using a deep learning model;
s2: searching potential link entities for each entity in a specified knowledge graph by using a candidate entity ranking algorithm;
s3: and carrying out question classification on the questions presented by the user by using a preset question template set, and selectively returning one of the knowledge graph answers, the large language model answers or the knowledge base answers according to classification results.
The construction method comprises the steps of acquiring a question input by a user, namely extracting a character string form natural language question transmitted from an interface. All entity references present in the question-answering system questions are extracted using a deep learning model, with specific steps including model training and model prediction.
In step S1, the model training includes the steps of:
s1.1: and designing a problem seed template according to the service requirement, and setting a category label for each template. Searching all entities in the knowledge graph, randomly filling the entities into a seed template according to the types of the entities, and recording the filled position indexes to form an entity extraction data set D. Wherein the filled template sentences are used as data set samples, and the recorded position indexes are used as labels.
S1.2: EDA data enhancement of entity extraction data set D, specifically including Random Mask (RM), random delete (RandomDelete, RD), random insert (RandomInsertion, RI) and synonym substitution (SynonymReplacement, SR), to obtain enhanced data set D e-ner
S1.3: using enhanced dataset D e-ner A named entity recognition model is trained, the named entity recognition model uses a PRGC (PotentialRelationandGlobalCorrespondence) architecture, and an Encoder part uses a Hadamard large open source Chinese pre-training Roberta-wwm model.
In step S1, model prediction uses a PRGC model which is completed through training to extract named entities from questions input by a user, and all entity references are obtained.
In step S2, the specific steps are as follows:
s2.1: and traversing all entities in the searching knowledge graph to form a candidate entity list, storing the list by using a Faiss vector library, and selecting an m3e-base model by using a text vectorization model.
S2.2: for each entity mention, use Faiss vector library as L 2 Top-5 entities most relevant to similarity retrieval as candidate link entities E sim And obtains a normalized similarity value as a vector similarity score S sim
S2.3: for E sim The popularity score of each candidate link entity is calculated, and the specific calculation formula is as follows:
wherein: in-deg (e) is the sum of the outbound and inbound degrees of entity e, and α is a hyper-parameter (typically a positive integer, which varies according to the complexity of the knowledge-graph).
S2.4: the search score is the sum of the vector similarity score and the candidate entity popularity score, namely:
reorder candidate link entity E based on search score sim And obtaining the most relevant entity mentioned by each entity, and completing entity linking.
In step S3, the model training includes the steps of:
s3.1: enhancement data set D obtained using S1.2 e-ner Further constructing a question classification data set:
s3.2: classifying data sets D using questions e-cls And training a question classification model. The question classification model uses a Bert-FC architecture, wherein the Bert model uses a Hadamard large open source Chinese pre-trained Roberta-wwm model.
In step S3.1, the model training comprises the steps of:
s3.1.1: the question is Sentence, and all entities in the question are extracted from the recorded entity filling position and marked as e 1 、e 2 、…、e n
S3.1.2: using special tags [ CLS ] and [ SEP ], the question is spliced with all entities into the following form:
Q=[CLS],Sentence,[SEP],e 1 ,[SEP],e 2 ,[SEP]……
wherein Q is taken as a sample, and the category of the seed template is taken as a label, so as to obtain a question classification data set D e-cls
In step S3, the step of model prediction includes:
s3.3: and further classifying by using a question classification model according to the questions input by the user and the obtained candidate link entities to obtain the category to which the questions belong, and correspondingly returning a knowledge graph answer, a large model answer or a knowledge base answer. The specific description is as follows:
s3.3.1: knowledge graph answer: after the link entity is structured and mapped, the Cypher query statement is called to query and infer, and a specific entity or path is returned to be used as a knowledge graph answer.
S3.3.2: large model answer: and constructing attribute information of the questions and the link entities into Prompt, and further inputting the Prompt into the large model to obtain a large model answer. Wherein the large model may use an open API, or a localized proprietary deployment.
S3.3.3: knowledge base answer: the knowledge base answers contain the summarized results of the large model, as well as the background knowledge about the questions and the sources of the background knowledge. The method specifically comprises the following three steps:
s3.3.3.1: the docx library or the pdfplumber library based on Python splits a local file (e.g. a PDF document) into private knowledge bases, and the splitting rules are four (implemented by regular expressions):
(1) The Chinese and English periods not before and after the number are replaced by line-wrapping characters (\n).
(2) A line feed (\n) is inserted in the middle of the case where (\d\d) occurs after punctuation.
(3) Chinese and English semicolons are replaced by \n.
(4) And adding the sentences before Chinese and English colon (:) as shared sentences into each subsequent clause.
Finally, the documents are split according to the line-feed symbol (n) to form a knowledge base.
S3.3.3.2: and (3) using Faiss in combination with a text embedding model m3e-base, retrieving knowledge in a knowledge base according to user input problems as background knowledge, and simultaneously recording a knowledge source file.
S3.3.3.3: and forming a promtt by the user input questions and knowledge base background knowledge, and further inputting the promtt into the large model to obtain summarized answers of the large model as knowledge base answers. The large model used was consistent with S3.3.2 procedure.
In step S1, extracting transformation refers to extracting possible entity references from natural language text, and identifying key objects queried by question text, which are designed for further exploring a new paradigm of the question-answering system; on one hand, entity references possibly existing in the text are extracted based on a natural language processing model, keyword extraction can be intelligently realized, and the method is suitable for continuously changing question contexts; on the other hand, the text similarity algorithm is utilized to extract the related knowledge in the knowledge base, so that the expertise of the large language model answer can be enhanced, and the illusion problem of the large language model is greatly lightened; in addition, the system also realizes user-friendly interaction service.
In step S3, the system interaction flow is as follows: (1) After the problem is preprocessed, a named entity identification module and an entity linking module are input to finish the linking of the problem and the knowledge graph candidate entity; (2) Completing intention recognition and template matching by combining the problems with candidate entities successfully linked through a text classification model; (3) According to the identified intention and the template, the system automatically selects an answer mode for the question, wherein the answer mode comprises knowledge graph inquiry, knowledge graph reasoning, inquiring a large language model by combining a knowledge base and a prompt word and independently inquiring the large language model; (4) And returning different types of answers according to different answer modes, and returning the answers to the user through interfaces of different styles.
The invention provides a complete question-answering system construction method. Aiming at the data characteristics of the water conservancy industry, the question-answering system is customized from multiple dimensions, and a set of perfect question-answering system construction methods are provided, including model selection, training strategies, data set construction modes and the like. On the premise of ensuring the functional accuracy, the capability of the deep learning technology in the question-answering system is enhanced, and the functions of the knowledge graph and other technologies in the question-answering system are fully exerted.
In the invention, aiming at important data in the knowledge graph, a question-answering system gives out answers to the knowledge graph by using a rule template and a Cypher query statement, so that a user can intuitively check knowledge venation and check the relation between knowledge entities through operations such as double click expansion; aiming at general data in a knowledge base, a question-answering system gives out summarized answers through a large language model, so that the tedious work of reading a large number of files by a user is avoided, knowledge texts and source files are given out at the same time, and a reasonable question-answering effect is realized; aiming at extensive other data, the question-answering system gives answers through a large language model, ensures that all questions queried by a user can be answered, improves the user experience. The framework ensures the accuracy and the comprehensiveness of the knowledge graph question-answering system, simultaneously couples the knowledge base, the knowledge graph and the large language model, realizes the advantage complementation between the knowledge base, the knowledge graph and the large language model, and improves the use experience of users.
It is noted that relational terms such as first and second, and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Moreover, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises an element.
The previous description is provided to enable any person skilled in the art to make or use the present invention. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the invention. Thus, the present invention is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims (10)

1. A method for constructing a question-answering system integrating a knowledge graph, a knowledge base and a large language model is characterized by comprising the following steps of: the construction method of the question-answering system comprises the following steps:
s1: acquiring a question input by a user, and extracting all entity references existing in a question-answering system problem by using a deep learning model;
s2: searching potential link entities for each entity in a specified knowledge graph by using a candidate entity ranking algorithm;
s3: and carrying out question classification on the questions presented by the user by using a preset question template set, and selectively returning one of the knowledge graph answers, the large language model answers or the knowledge base answers according to classification results.
2. The method for constructing a question-answering system integrating knowledge graph, knowledge base and large language model as claimed in claim 1, wherein: the construction method comprises the steps of acquiring a user input question, namely extracting a character string form natural language problem transmitted from an interface, extracting all entity references existing in the question-answering system problem by using a deep learning model, and specifically comprises the steps of model training and model prediction.
3. The method for constructing a question-answering system integrating knowledge graph, knowledge base and large language model as claimed in claim 1, wherein: in the step S1, the model training includes the following steps:
s1.1: designing a problem seed template according to service requirements, and setting a category label for each template; searching all entities in the knowledge graph, randomly filling the entities into a seed template according to the types of the entities, and recording the filled position indexes to form an entity extraction data set D; the filled template sentences are used as data set samples, and recorded position indexes are used as labels;
s1.2: EDA data enhancement is carried out on the entity extraction data set D, and the EDA data enhancement specifically comprises random covering, random deleting, random inserting and synonym replacing, so that an enhanced data set D is obtained e-ner
S1.3: using enhanced dataset D e-ner Training a named entity recognition model, wherein the named entity recognition model uses a PRGC architecture, and an Encoder part uses a Hadamard large open source Chinese pre-training Roberta-wwm model.
4. The method for constructing a question-answering system integrating knowledge graph, knowledge base and large language model as claimed in claim 1, wherein: in the step S1, the model prediction uses the trained PRGC model to extract named entities from the question inputted by the user, so as to obtain all the entity references.
5. The method for constructing a question-answering system integrating knowledge graph, knowledge base and large language model as claimed in claim 1, wherein: in the step S2, the specific steps are as follows:
s2.1: traversing all entities in the searching knowledge graph to form a candidate entity list, storing the list by using a Faiss vector library, and selecting an m3e-base model by using a text vectorization model;
s2.2: for each entity mention, use Faiss vector library as L 2 Similarity retrievalThe most relevant Top-5 entities are taken as candidate link entities E sim And obtains a normalized similarity value as a vector similarity score S sim
S2.3: for E sim The popularity score of each candidate link entity is calculated, and the specific calculation formula is as follows:
wherein: in-deg (e) is the sum of the outbound and inbound degrees of entity e, and α is a hyper-parameter;
s2.4: the search score is the sum of the vector similarity score and the candidate entity popularity score, namely:
reorder candidate link entity E based on search score sim And obtaining the most relevant entity mentioned by each entity, and completing entity linking.
6. The method for constructing a question-answering system integrating knowledge graph, knowledge base and large language model as claimed in claim 1, wherein: in the step S3, a preset question template set is used to classify questions presented by a user, a deep learning model is required to be used in the process of selectively returning one of a knowledge graph answer, a large language model answer or a knowledge base answer according to classification results, model training and model prediction are included in the process of using the deep learning model, and the model training includes the following steps:
s3.1: using the resulting enhanced data set D e-ner Further constructing a question classification data set:
s3.2: classifying data sets D using questions e-cls Training a question classification model; the question classification model uses a Bert-FC architecture, wherein the Bert model uses Haw large open source Chinese pre-training Roberta-wwm model.
7. The method for constructing a question-answering system integrating knowledge graph, knowledge base and large language model as claimed in claim 6, wherein: in the step S3.1, the model training includes the following steps:
s3.1.1: the question is Sentence, and all entities in the question are extracted from the recorded entity filling position and marked as e 1 、e 2 、…、e n
S3.1.2: using special tags [ CLS ] and [ SEP ], the question is spliced with all entities into the following form:
Q=[CLS],Sentence,[SEP],e 1 ,[SEP],e 2 ,[SEP]……
wherein Q is taken as a sample, and the category of the seed template is taken as a label, so as to obtain a question classification data set D e-cls
8. The method for constructing a question-answering system integrating knowledge graph, knowledge base and large language model as claimed in claim 1, wherein: in the step S3, a preset question template set is used to classify questions presented by a user, a deep learning model is required to be used in the process of selectively returning one of a knowledge graph answer, a large language model answer or a knowledge base answer according to classification results, model training and model prediction are included in the process of using the deep learning model, and the step of model prediction includes:
s3.3: according to the questions input by the user and the obtained candidate link entities, further classifying by using a question classification model to obtain the category to which the questions belong, and correspondingly returning a knowledge graph answer, a large model answer or a knowledge base answer; the specific description is as follows:
s3.3.1: knowledge graph answer: after the link entity is structured and mapped, invoking a Cypher query statement to query and infer, and returning a specific entity or path to be used as a knowledge graph answer;
s3.3.2: large model answer: the method comprises the steps that attribute information of a question and a link entity is built into a promt, and the promt is further input into a large model to obtain a large model answer; wherein the large model may use an open API, or a localized private deployment;
s3.3.3: knowledge base answer: the knowledge base answers contain the summarized results of the large model, and also the related background knowledge of the questions and the sources of the background knowledge; the method specifically comprises the following three steps:
s3.3.3.1: the Python-based docx library or the pdfplumberer library will be local
The part of the piece is provided with a plurality of grooves,
finally, splitting the document according to the line feed symbol (n) to form a knowledge base;
s3.3.3.2: using Faiss in combination with a text embedded model m3e-base, searching knowledge in a knowledge base according to user input problems to serve as background knowledge, and recording a knowledge source file;
s3.3.3.3: the method comprises the steps that a user input question and knowledge base background knowledge form a promt, the promt is further input into a large model, and a summary answer of the large model is obtained and used as a knowledge base answer; the large model used was consistent with S3.3.2 procedure.
9. The method for constructing a question-answering system integrating knowledge graph, knowledge base and large language model as claimed in claim 1, wherein: in the step S1, extraction and transformation are to extract possible entity references from the natural language text, and identify key objects queried by the question text, which are designed for further exploring a new paradigm of the question-answering system; and extracting deterministic short answers based on a large language model, and simultaneously combining a knowledge graph and a knowledge base to jointly enable a question-answering system.
10. The method for constructing a question-answering system integrating knowledge graph, knowledge base and large language model as claimed in claim 9, wherein: in the step S3, the system interaction flow is as follows: (1) After the problem is preprocessed, a named entity identification module and an entity linking module are input to finish the linking of the problem and the knowledge graph candidate entity; (2) Completing intention recognition and template matching by combining the problems with candidate entities successfully linked through a text classification model; (3) According to the identified intention and the template, the system automatically selects an answer mode for the question, wherein the answer mode comprises knowledge graph inquiry, knowledge graph reasoning, inquiring a large language model by combining a knowledge base and a prompt word and independently inquiring the large language model; (4) And returning different types of answers according to different answer modes, and returning the answers to the user through interfaces of different styles.
CN202311821070.9A 2023-12-27 2023-12-27 Knowledge graph, knowledge base and large language model fused question-answering system construction method Active CN117688189B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311821070.9A CN117688189B (en) 2023-12-27 2023-12-27 Knowledge graph, knowledge base and large language model fused question-answering system construction method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311821070.9A CN117688189B (en) 2023-12-27 2023-12-27 Knowledge graph, knowledge base and large language model fused question-answering system construction method

Publications (2)

Publication Number Publication Date
CN117688189A true CN117688189A (en) 2024-03-12
CN117688189B CN117688189B (en) 2024-06-14

Family

ID=90126446

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311821070.9A Active CN117688189B (en) 2023-12-27 2023-12-27 Knowledge graph, knowledge base and large language model fused question-answering system construction method

Country Status (1)

Country Link
CN (1) CN117688189B (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN118132729A (en) * 2024-04-28 2024-06-04 支付宝(杭州)信息技术有限公司 Answer generation method and device based on medical knowledge graph
CN118152547A (en) * 2024-05-11 2024-06-07 青岛网信信息科技有限公司 Robot answer method, medium and system according to understanding capability of questioner

Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180144051A1 (en) * 2016-11-18 2018-05-24 Facebook, Inc. Entity Linking to Query Terms on Online Social Networks
US20200218988A1 (en) * 2019-01-08 2020-07-09 International Business Machines Corporation Generating free text representing semantic relationships between linked entities in a knowledge graph
CN112100351A (en) * 2020-09-11 2020-12-18 陕西师范大学 Method and equipment for constructing intelligent question-answering system through question generation data set
WO2021017290A1 (en) * 2019-07-31 2021-02-04 平安科技(深圳)有限公司 Knowledge graph-based entity identification data enhancement method and system
CN112328773A (en) * 2020-11-26 2021-02-05 四川长虹电器股份有限公司 Knowledge graph-based question and answer implementation method and system
WO2021139283A1 (en) * 2020-06-16 2021-07-15 平安科技(深圳)有限公司 Knowledge graph question-answer method and apparatus based on deep learning technology, and device
CN113515613A (en) * 2021-06-25 2021-10-19 华中科技大学 Intelligent robot integrating chatting, knowledge and task question answering
CN113934831A (en) * 2021-10-19 2022-01-14 中电积至(海南)信息技术有限公司 Knowledge graph question-answering method based on deep learning
CN115658845A (en) * 2022-09-30 2023-01-31 中国科学院软件研究所 Intelligent question-answering method and device suitable for open-source software supply chain
CN115858799A (en) * 2022-06-29 2023-03-28 齐鲁工业大学 Knowledge representation learning method integrating ordered relationship path and entity description information
CN117149974A (en) * 2023-08-31 2023-12-01 东南大学 Knowledge graph question-answering method for sub-graph retrieval optimization
CN117171329A (en) * 2023-09-28 2023-12-05 浙大城市学院 Semantic analysis-based traditional Chinese medicine domain knowledge graph question-answering method

Patent Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180144051A1 (en) * 2016-11-18 2018-05-24 Facebook, Inc. Entity Linking to Query Terms on Online Social Networks
US20200218988A1 (en) * 2019-01-08 2020-07-09 International Business Machines Corporation Generating free text representing semantic relationships between linked entities in a knowledge graph
WO2021017290A1 (en) * 2019-07-31 2021-02-04 平安科技(深圳)有限公司 Knowledge graph-based entity identification data enhancement method and system
WO2021139283A1 (en) * 2020-06-16 2021-07-15 平安科技(深圳)有限公司 Knowledge graph question-answer method and apparatus based on deep learning technology, and device
CN112100351A (en) * 2020-09-11 2020-12-18 陕西师范大学 Method and equipment for constructing intelligent question-answering system through question generation data set
CN112328773A (en) * 2020-11-26 2021-02-05 四川长虹电器股份有限公司 Knowledge graph-based question and answer implementation method and system
CN113515613A (en) * 2021-06-25 2021-10-19 华中科技大学 Intelligent robot integrating chatting, knowledge and task question answering
CN113934831A (en) * 2021-10-19 2022-01-14 中电积至(海南)信息技术有限公司 Knowledge graph question-answering method based on deep learning
CN115858799A (en) * 2022-06-29 2023-03-28 齐鲁工业大学 Knowledge representation learning method integrating ordered relationship path and entity description information
CN115658845A (en) * 2022-09-30 2023-01-31 中国科学院软件研究所 Intelligent question-answering method and device suitable for open-source software supply chain
CN117149974A (en) * 2023-08-31 2023-12-01 东南大学 Knowledge graph question-answering method for sub-graph retrieval optimization
CN117171329A (en) * 2023-09-28 2023-12-05 浙大城市学院 Semantic analysis-based traditional Chinese medicine domain knowledge graph question-answering method

Non-Patent Citations (5)

* Cited by examiner, † Cited by third party
Title
安波等: "融合知识表示的知识库问答***", 中国科学:信息科学, no. 11, 21 November 2018 (2018-11-21) *
张森等: "基于多维度匹配的知识库问答实体链接", 研究与开发, 31 December 2021 (2021-12-31) *
张芳容等: "知识库问答***中实体关系抽取方法研究", 计算机工程与应用, no. 11, 31 December 2020 (2020-12-31) *
张鹤译等: "大语言模型融合知识图谱的问答***研究", 《计算机科学与探索》, vol. 17, no. 10, 8 December 2023 (2023-12-08) *
范俊杰等: "数智时代下开源情报的军事知识图谱问答智能服务研究", 数据分析与知识发现, 26 October 2023 (2023-10-26) *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN118132729A (en) * 2024-04-28 2024-06-04 支付宝(杭州)信息技术有限公司 Answer generation method and device based on medical knowledge graph
CN118152547A (en) * 2024-05-11 2024-06-07 青岛网信信息科技有限公司 Robot answer method, medium and system according to understanding capability of questioner

Also Published As

Publication number Publication date
CN117688189B (en) 2024-06-14

Similar Documents

Publication Publication Date Title
CN110399457B (en) Intelligent question answering method and system
US10678816B2 (en) Single-entity-single-relation question answering systems, and methods
CN112115238B (en) Question-answering method and system based on BERT and knowledge base
CN111475623B (en) Case Information Semantic Retrieval Method and Device Based on Knowledge Graph
CN117688189B (en) Knowledge graph, knowledge base and large language model fused question-answering system construction method
US8818789B2 (en) Knowledge system method and apparatus
KR100533810B1 (en) Semi-Automatic Construction Method for Knowledge of Encyclopedia Question Answering System
US8874431B2 (en) Knowledge system method and apparatus
CN103488724B (en) A kind of reading domain knowledge map construction method towards books
CN110321432A (en) Textual event information extracting method, electronic device and non-volatile memory medium
CN115599902B (en) Oil-gas encyclopedia question-answering method and system based on knowledge graph
Dobson Interpretable Outputs: Criteria for Machine Learning in the Humanities.
CN115982338A (en) Query path ordering-based domain knowledge graph question-answering method and system
Säily et al. Explorations into the social contexts<? br?> of neologism use in early English correspondence
CN111666374A (en) Method for integrating additional knowledge information into deep language model
Gammack et al. Semantic knowledge management system for design documentation with heterogeneous data using machine learning
CN114996455A (en) News title short text classification method based on double knowledge maps
CN111858885B (en) Keyword separation user question intention identification method
KR20220015129A (en) Method and Apparatus for Providing Book Recommendation Service Based on Interactive Form
Chen et al. FAQ system in specific domain based on concept hierarchy and question type
Ceylan Application of Natural Language Processing to Unstructured Data: A Case Study of Climate Change
Almotairi et al. A review on question answering systems: Domains, modules, techniques and challenges
Meguellati et al. Feature selection for location metonymy using augmented bag-of-words
Sergeev An Application of Semantic Relation Extraction Models
Jain Representation and curation of knowledge graphs with embeddings

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant