CN112328766B - Knowledge graph question-answering method and device based on path search - Google Patents

Knowledge graph question-answering method and device based on path search Download PDF

Info

Publication number
CN112328766B
CN112328766B CN202011249330.6A CN202011249330A CN112328766B CN 112328766 B CN112328766 B CN 112328766B CN 202011249330 A CN202011249330 A CN 202011249330A CN 112328766 B CN112328766 B CN 112328766B
Authority
CN
China
Prior art keywords
question
constraint
path
search
point
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202011249330.6A
Other languages
Chinese (zh)
Other versions
CN112328766A (en
Inventor
骆敏
展华益
王欣
杨兰
蒋伟
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Sichuan Changhong Electric Co Ltd
Original Assignee
Sichuan Changhong Electric Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Sichuan Changhong Electric Co Ltd filed Critical Sichuan Changhong Electric Co Ltd
Priority to CN202011249330.6A priority Critical patent/CN112328766B/en
Publication of CN112328766A publication Critical patent/CN112328766A/en
Application granted granted Critical
Publication of CN112328766B publication Critical patent/CN112328766B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/36Creation of semantic tools, e.g. ontology or thesauri
    • G06F16/367Ontology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/332Query formulation
    • G06F16/3329Natural language query formulation or dialogue systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Mathematical Physics (AREA)
  • Databases & Information Systems (AREA)
  • Computational Linguistics (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Evolutionary Computation (AREA)
  • Animal Behavior & Ethology (AREA)
  • Evolutionary Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Human Computer Interaction (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a knowledge graph question-answering method and a knowledge graph question-answering device based on path search, wherein the method comprises the following steps: acquiring an input question of a user and carrying out simple natural language processing; analyzing the question, and identifying a search point and a restriction point of the question; constructing a target point identification model and identifying a target point of the question; aiming at each target point of the question, searching in the knowledge map by taking the search point of the question as the center to obtain a plurality of candidate paths meeting the target points of the question; increasing the constraint conditions of the candidate paths according to the constraint points of the question; screening the candidate paths to obtain one or more candidate paths meeting the conditions; converting the obtained path into an executable query statement by combining a knowledge graph storage mode to obtain a final answer; and returning the obtained final result to the user as an answer. The invention can realize semantic understanding of natural language question, deeply analyzes the question, and searches out relevant answers by using a path search algorithm, and the content is accurate.

Description

Knowledge graph question-answering method and device based on path search
Technical Field
The invention relates to the technical field of computers, in particular to a knowledge graph question-answering method and a knowledge graph question-answering device based on path search.
Background
As a research direction which attracts much attention and has a wide development prospect in the fields of artificial intelligence and natural language processing, a question-answering system is a high-level form of an information retrieval system, and can answer questions posed by users in natural language with accurate and concise natural language. The question answering based on the knowledge graph introduces background knowledge in an open field or a special field when answering questions, so that the question answering method is more suitable for thinking of people and accurate and rapid. In recent years, the system serves our daily life in various fields such as encyclopedia knowledge question and answer, financial fields, medical fields and the like in the form of customer service robots or search engines and the like.
In the existing related technology, the mainstream method is to perform semantic understanding on the problem, then combine the knowledge graph to generate candidate items including candidate logical expressions, candidate relations, candidate answer entities and the like, and then screen out correct answers from the candidate items by a series of methods such as feature extraction, semantic matching or vector modeling and the like. The core part of the methods lies in the screening of the candidate items, and because the candidate items have the characteristics of excessive quantity, complex structure, unobvious characteristics and the like, the methods in the related technology need complex network structures, and most methods can only solve the problem of a single subject entity and do not relate to complex problems and multiple problems.
Disclosure of Invention
The invention provides a knowledge graph question-answering method and a knowledge graph question-answering device based on path search, which are used for solving the problem that only single subject entity problems can be solved and complicated problems and multiple problems are not involved due to the fact that candidate items have the characteristics of excessive quantity, complex structure, unobvious characteristics and the like when problems are semantically understood in the prior art.
The technical scheme adopted by the invention is as follows: a knowledge graph question-answering method based on path search is provided, which comprises the following steps:
s 1: acquiring an input question of a user and carrying out simple natural language processing;
s 2: analyzing the question, and identifying a search point and a restriction point of the question;
s 3: constructing a target point identification model and identifying a target point of the question;
s 4: aiming at each target point of the question, searching in the knowledge map by taking the search point of the question as the center to obtain a plurality of candidate paths meeting the target points of the question;
s 5: increasing the constraint conditions of the candidate paths according to the constraint points of the question;
s 6: screening the candidate paths to obtain one or more candidate paths meeting the conditions;
s 7: converting the obtained path into an executable query statement by combining a knowledge graph storage mode to obtain a final answer;
s 8: and returning the obtained final result to the user as an answer.
Preferably, said s2 comprises:
s 21: training a sequence marking model according to the training sample;
s 22: identifying search terms and constraint terms of the question sentence through the trained sequence labeling model;
s 23: and aiming at each search term and constraint term identified in the question, finding out a corresponding search point and a corresponding constraint point in the knowledge graph according to the corresponding relation between the search term and the search point and between the constraint term and the constraint point, and obtaining a search set and a constraint set.
Preferably, said s23 is followed by:
s 24: for the case where a part of the search term or constraint term in the question cannot be linked to the knowledge-graph through s23, a list of synonyms is created that are mapped to search points or constraint points in the knowledge-graph by words or phrases that have similar meanings to the search term or constraint term.
Preferably, the sequence labeling model comprises an embedding layer, a bidirectional long and short memory neural network layer and a conditional random field which are connected in sequence;
the method for training a sequence labeling model according to training samples comprises the following steps:
s 211: converting each word into an input vector through the established dictionary;
s 212: aiming at each training batch, keeping the length of each input vector consistent, and setting the maximum length;
s 213: setting the dimension of each word vector dimension in the embedding layer, setting the size of a hidden layer of the two-way long and short memory neural network layer, setting the number of labels of the conditional random field, and obtaining a final output vector of the sequence labeling model after the conditional random field;
s 214: and taking the error rate of labeling each question as a loss function, and training a sequence labeling model by using the user question and the labeling of the search term and the constraint term of each question.
Preferably, said s3 comprises:
s 31: constructing a training sample of a question target point;
s 32: establishing a target point identification model based on the marked training samples;
s 33: and identifying a target point set of the question based on the target point identification model of the question.
Preferably, the target point identification model comprises an embedded layer, a bidirectional cycle long and short memory neural network layer and a neural network layer based on a softmax activation function which are sequentially connected;
the method for establishing the target point identification model comprises the following steps:
s 321: converting each word into an input vector through the established dictionary;
s 322: aiming at each training batch, keeping the length of each input vector consistent, and setting the maximum length;
s 323: setting dimension of each word vector dimension in the embedding layer, setting the size of a hidden layer of the bidirectional long and short memory neural network layer, and setting the output size of the neural network layer to obtain a final output vector;
s 324: and (3) training the model by taking the error rate of each question classification as a loss function, arranging training samples with prediction errors from small to large according to the loss function values, manually marking n pieces of data before screening, keeping other data unchanged, retraining the model by using marked data, and repeating s32 until the accuracy of the target point identification model is not increased or the repetition frequency reaches the set frequency.
Preferably, said s4 comprises:
s 41: aiming at each target point of the question, searching in the knowledge graph by taking the search point of the question as the center, and if one or more paths meet the conditions, taking the searched path as a candidate path and returning;
s 42: if s41 does not find a candidate path meeting the condition, the searched path is retained, path extension is performed on the basis, all end nodes on the path are used as a new search center to perform searching, and s41 and s42 are repeated until the searched new path meets the condition or the set maximum search frequency is reached.
Preferably, said s5 comprises:
in the process of path retrieval, matching all traversed nodes with constraint points of question sentences; if the constraint point of the question appears in the searching process and the distance between the constraint point of the question and the searching point on the path is minimum, adding the constraint point into the candidate path to be used as a path constraint item; and if the constraint point of the question does not appear in the searching process but has the minimum distance with the target point, adding the constraint point into the candidate path to be used as the target constraint item.
The invention also provides a knowledge-graph question-answering device based on path search, which comprises:
a user interaction module: the system comprises a module, a module and a database, wherein the module is used for interacting with a user, receiving a user input question, inputting the user input question into the module, acquiring a result returned from the module, and outputting the result serving as an answer to the user;
a search point and constraint point identification module: the sequence marking model is used for training the search terms and the constraint terms, and the question search points and the constraint points are identified through the sequence marking model and are mapped into the knowledge map;
a target point identification module: the question target point recognition model is used for training a question target point recognition model, and the question target point is recognized through the target point recognition model;
a path search module: the method comprises the steps of searching candidate paths meeting conditions in a knowledge graph according to question search points and target points, adding path constraint conditions according to question constraint points, and then screening the paths according to a set screening mode;
an answer retrieval module: and the system is used for converting the finally obtained path into an executable query statement matched with the knowledge graph storage mode and retrieving the answer.
Preferably, the path searching module further comprises:
a target point searching submodule: the system comprises a knowledge graph, a query point acquisition unit and a query point acquisition unit, wherein the knowledge graph is used for searching the query point as a center to obtain a plurality of candidate paths meeting the query point;
a path constraint submodule: increasing the constraint conditions of the candidate paths according to the constraint points of the question;
a path screening submodule: and the candidate paths are screened to obtain one or more candidate paths meeting the conditions.
The invention has the beneficial effects that: the invention can realize semantic understanding of natural language question, deeply analyze the question, and search out relevant answers by using a path search algorithm, has accurate content, does not have a complex candidate item screening process, and can solve complex problems and multiple problems.
Drawings
FIG. 1 is a flow chart of a knowledge-graph question-answering method based on path search according to the present invention;
fig. 2 is a schematic structural diagram of a knowledge-graph question-answering device based on path search.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention will be described in further detail with reference to the accompanying drawings, but embodiments of the present invention are not limited thereto.
It should be understood that the various steps recited in the method embodiments of the present disclosure may be performed in a different order, and/or performed in parallel. Moreover, method embodiments may include additional steps and/or omit performing the illustrated steps. The scope of the present disclosure is not limited in this respect.
Example 1:
referring to fig. 1, the invention relates to a knowledge graph question-answering method based on path search, which is mainly applied to the field of question-answering based on a knowledge graph, obtains the answer of a question by analyzing the question and combining the content of the knowledge graph, and specifically comprises the following implementation steps:
s 1: and acquiring an input question of the user and carrying out simple natural language processing.
s 2: and analyzing the question, and identifying the search point and restriction point of the question.
s 3: and constructing a target point identification model which is a multi-classification model and identifying the target point of the question sentence.
s 4: and aiming at each target point of the question, searching in the knowledge graph by taking the search point of the question as the center to obtain a plurality of candidate paths meeting the target points of the question.
s 5: and increasing the constraint conditions of the candidate paths according to the constraint points of the question sentences.
s 6: and screening the candidate paths to obtain one or more candidate paths meeting the conditions.
s 7: and converting the obtained path into an executable query statement by combining a knowledge graph storage mode to obtain a final answer.
s 8: and returning the obtained final result to the user as an answer.
Further said s 1: and acquiring user input and performing word segmentation processing on the user input. Specifically, if the word segmentation is performed by the ending segmentation, the ending segmentation is the prior art and is not described herein.
Said further s2 comprising:
s 21: and training a sequence labeling model according to the training samples. Taking the following model as an example, firstly converting the sentence after word segmentation into an input vector, then inputting the input vector into the model, wherein the first layer of the model is an Embedding layer (Embedding) for converting a single word into a word vector, the second layer is a bidirectional long and short memory neural network layer (Bi-LSTM), the third layer is a Conditional Random Field (CRF), and finally outputting the result of sequence labeling. Specifically, taking 5 ten thousand diet health training question sentences as an example, for each sentence, each word is first converted into a vector by an established dictionary, and for each training batch (batch), the length of each input vector is kept consistent by operations such as filling, and the maximum length is set to 100. The Embedding layer of the model sets the dimension of each word vector to be 300 dimensions, the size of the hidden layer of the Bi-LSTM to be 256, the number of labels of the CRF layer to be 6, and the final output vector of the model obtained after passing through the CRF layer is 50000 multiplied by 100 multiplied by 6. The sequence annotation model takes the error rate of each sentence annotation as a loss function, and trains the model using the 5 ten thousand user questions and the search term and constraint term annotations of each question.
s 22: the method comprises the steps of carrying out search term recognition and constraint term recognition on the question through a trained sequence labeling model, specifically, showing the use process of an actual model by taking 3 ten thousand diet health test questions as an example, carrying out word segmentation on each question by using a Chinese word segmentation, converting each word into a vector through an established dictionary, inputting the vector into the model to obtain a sequence labeling result of each question, and finally converting the sequence labeling result into a search term and a constraint term of each question.
s 23: for each search term and constraint term identified in the question, finding out the corresponding search point and constraint point in the knowledge graph according to the corresponding relationship (the matching mode includes not only character string matching, editing distance matching and the like) between the search term and the search point and between the constraint term and the constraint point, and obtaining a search set E and a constraint set C. Specifically, taking search term or constraint term labeling of 5 ten thousand diet health training question sentences and a diet health knowledge map as an example, matching the edit distance of the search term or constraint term of each question sentence with all entities of the knowledge map, finding out the entity with the minimum edit distance as the search point or constraint point of the question sentence, and obtaining a search set E and a constraint set C after matching all the search terms and constraint terms, wherein each set comprises one or more options.
s 24: for the case where a part of the search term or constraint term in the question cannot be linked to the knowledge-graph through s23, a list of synonyms is created that are mapped to search points or constraint points in the knowledge-graph by words or phrases that have similar meanings to the search term or constraint term. Specifically, taking 1000 diet health training question sentences of which the matching cannot reach the search point or the constraint point as an example, a synonym list containing 315 entities is established through encyclopedia vocabulary and manual labeling, the 1000 question sentences which cannot be matched are matched with the entities in the knowledge graph through the synonyms of the search terms or the constraint terms, and then mapped into the knowledge graph.
Said further s3 comprising:
s 31: and constructing a training sample of a question target point, and if the target point corresponding to the question for training is not marked, roughly marking the target point of the question through semantic understanding of the question to obtain the training sample. Specifically, taking 5 ten thousand diet health training question sentences as an example, the training data is not labeled with target points, firstly, for each type of target points, manually labeling 5 related keywords, and then obtaining the target points of each question sentence in a keyword matching manner.
s 32: and establishing a target point identification model based on the marked training samples, manually marking the sample screening part of the sample with the wrong classification of the model, retraining the model by using the retrained training sample, and repeating the step until the accuracy of the model is not improved or the repetition times reaches the upper limit. Taking the following model as an example, firstly converting a sentence after word segmentation into an input vector, then inputting the input vector into the model, wherein the first layer of the model is an Embedding layer (Embedding) for converting a single word into a word vector, the second layer is a bidirectional cyclic long-short memory neural network layer (Bi-LSTM), the third layer is a neural network layer based on a softmax activation function, and finally outputting a target point classification result. Specifically, taking the training question roughly labeled in the above steps as an example, for each sentence, each word is first converted into a vector through an established dictionary, for each training batch (batch), the length of each input vector is kept consistent through operations such as filling, and the maximum length is set to be 100. The Embelling layer of the model sets the dimension of each word vector to be 300 dimensions, the hidden layer size of the Bi-LSTM to be 256, the output size of the neural network layer to be 10, and the final output vector of the obtained model to be 50000 multiplied by 10. The model takes the error rate of each language classification as a loss function to train the model, and arranges the training samples with error prediction from small to large according to the loss function value, 100 pieces of data before screening are manually marked, and other data are unchanged. Retraining the model with the labeled data and repeating s32 until the accuracy of the model no longer increases or the number of repetitions reaches 10.
s 33: and identifying a target point set of the question based on the target point identification model of the question. Specifically, 3 ten thousand diet health test questions are taken as an example to show the use process of an actual model, each question is firstly participated by using a final participator, then each word is converted into a vector through an established dictionary, then the vector is input into the model to obtain a target point classification result of each question, and finally the target point classification results are combined into a question target point set T.
Said further s4 comprising:
s 41: aiming at each target point of the question, taking the search point of the question as the center,and searching in the knowledge graph, and if one or more paths meet the conditions, taking the searched path as a candidate path and returning. Specifically, taking the search points and the target points identified in the above steps as examples, the target point t for each question sentence isiSequentially gathering each search point e of the search set of question sentencesiAs a search center, searching the diet health knowledge map for the search center e based on the search centeriAll the nodes connected with each other judge whether the path from the search center to the nodes meets the target point t of the question sentenceiIf yes, then keeping the searching point eiTo the target point tiEnding the path search process, otherwise, keeping the slave search center eiTo all its neighbors and proceeds to the next step.
s 42: if s41 does not find a candidate path meeting the condition, the searched path is retained, path extension is performed on the basis, all end nodes on the path are used as a new search center to perform searching, and s41 and s42 are repeated until the searched new path meets the condition or the set maximum search frequency is reached. Specifically, taking the above search result as an example, the end node e of each path reserved in the previous step is sequentially usednAs a new search center eiAnd removing the path from the candidate paths, and repeating s41, s42 to update the candidate path set until the path searching process is finished or the number of repetitions reaches 5.
Further said s 5: in the process of path retrieval, matching all traversed nodes with constraint points of question sentences; if the constraint point of the question appears in the searching process and the distance between the constraint point of the question and the searching point on the path is minimum, adding the constraint point into the candidate path to be used as a path constraint item; and if the constraint point of the question does not appear in the searching process but has the minimum distance with the target point, adding the constraint point into the candidate path to be used as the target constraint item. Specifically, taking the candidate path and the constraint point of the question in the above steps as examples, all end nodes e in the path search process are usednCompare to constraint points for each constraint point ciIf it is connected to end node enAre coincident with each other, and the constraint point ciAnd search center eiC is minimized compared to the distances between other nodesiAdding to the core constraint of the candidate path, otherwise calculating ciAnd target point tiDirect distance, if the distance is smaller than the distances between other end nodes and the target point, c isiAnd adding the target constraint item into the target constraint item of the candidate path.
Further said s 6: and screening and de-duplicating the candidate paths corresponding to each target point of the question sentence, and taking the obtained result as the final candidate path of the target point. Specifically, for example, the candidate paths obtained in the above steps are screened by setting a screening condition (for example, the number of search points and the number of constraint points in the candidate paths) or training a screening model (for example, a ranking model), and the screened paths are deduplicated according to the order of the search points, the constraint points, and the target points to obtain a final result.
Further said s 7: converting the search points, the constraint points and the target points on the path into query conditions according to the obtained final path by combining a knowledge graph storage mode; converting the query conditions on the path into final query statements matched with the knowledge graph storage mode by combining the related grammar; and executing the query statement to obtain a final answer. Specifically, taking the final path and the diet health knowledge map obtained in the above steps as an example, the diet health knowledge map is stored in the neo4j database, first, the search points and the target points on the path are converted into basic triple query statements, then, the constraint conditions are converted into additional query conditions, finally, standard query grammars such as "match" and "return" are added, and the query statements are executed to obtain the final result.
Further said s 8: specifically, taking the query result obtained in the above steps as an example, the result is returned to the user.
The knowledge map question-answering method based on path search can realize semantic understanding of natural language question sentences, carry out deep analysis on the question sentences, retrieve relevant answers by using a path search algorithm, have accurate content, have no complex candidate item screening process and can solve complex problems and multiple problems.
Example 2
Fig. 2 shows a knowledge-graph question-answering apparatus based on path search, comprising:
a user interaction module: the system is used for interacting with a user, firstly receiving a user input question, inputting the user input question into the following module, then acquiring a result returned from the following module, and outputting the result as an answer to the user.
A search point and constraint point identification module: and the sequence labeling model is used for training the search terms and the constraint terms, identifying question search points and constraint points through the sequence labeling model, and mapping the question search points and the constraint points into the knowledge graph.
A target point identification module: the question target point recognition model is used for training the question target point recognition model, and the question target point is recognized through the target point recognition model.
A path search module: the method is used for searching candidate paths meeting the conditions in the knowledge graph according to question search points and target points, adding path constraint conditions according to question constraint points, and then screening the paths according to a set screening mode.
Specifically, the path search module further includes:
a target point searching submodule: the system comprises a knowledge graph, a query point acquisition unit and a query point acquisition unit, wherein the knowledge graph is used for searching the query point as a center to obtain a plurality of candidate paths meeting the query point;
a path constraint submodule: increasing the constraint conditions of the candidate paths according to the constraint points of the question;
a path screening submodule: and the candidate paths are screened to obtain one or more candidate paths meeting the conditions.
An answer retrieval module: and the system is used for converting the finally obtained path into an executable query statement matched with the knowledge graph storage mode and retrieving the answer.
It will be understood by those skilled in the art that all or part of the processes of the methods of the above embodiments may be implemented by hardware instructions related to a program, and the program may be stored in a computer-readable storage medium, and when executed, may include the processes of the above embodiments of the methods. The storage medium may be a magnetic disk, an optical disk, a Read-only Memory (ROM), a Random Access Memory (RAM), or the like.
The above examples are only intended to illustrate the technical solution of the present invention, and not to limit it; although the invention has been described in detail with reference to the foregoing embodiments, it will be understood by those skilled in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.

Claims (8)

1. A knowledge graph question-answering method based on path search is characterized by comprising the following steps:
s 1: acquiring an input question of a user and carrying out simple natural language processing;
s 2: analyzing the question, and identifying a search point and a restriction point of the question;
s 3: constructing a target point identification model and identifying a target point of the question;
s 4: aiming at each target point of the question, searching in the knowledge map by taking the search point of the question as the center to obtain a plurality of candidate paths meeting the target points of the question;
s 5: increasing the constraint conditions of the candidate paths according to the constraint points of the question;
s 6: screening the candidate paths to obtain one or more candidate paths meeting the conditions;
s 7: converting the obtained path into an executable query statement by combining a knowledge graph storage mode to obtain a final answer;
s 8: returning the obtained final result to the user as an answer;
said s4 comprising:
s 41: aiming at each target point of the question, searching in the knowledge graph by taking the search point of the question as the center, and if one or more paths meet the conditions, taking the searched path as a candidate path and returning;
s 42: if s41 does not find a candidate path meeting the condition, the searched path is reserved, path extension is carried out on the basis, all end nodes on the path are used as a new search center to carry out searching, and s41 and s42 are repeated until the searched new path meets the condition or the maximum value of the set search times is reached;
said s5 comprising:
in the process of path retrieval, matching all traversed nodes with constraint points of question sentences; if the constraint point of the question appears in the searching process and the distance between the constraint point of the question and the searching point on the path is minimum, adding the constraint point into the candidate path to be used as a path constraint item; and if the constraint point of the question does not appear in the searching process but has the minimum distance with the target point, adding the constraint point into the candidate path to be used as the target constraint item.
2. The method of claim 1, wherein the s2 comprises:
s 21: training a sequence marking model according to the training sample;
s 22: identifying search terms and constraint terms of the question sentence through the trained sequence labeling model;
s 23: and aiming at each search term and constraint term identified in the question, finding out a corresponding search point and a corresponding constraint point in the knowledge graph according to the corresponding relation between the search term and the search point and between the constraint term and the constraint point, and obtaining a search set and a constraint set.
3. The method of claim 2, wherein the s23 is followed by further comprising:
s 24: for the case where a part of the search term or constraint term in the question cannot be linked to the knowledge-graph through s23, a list of synonyms is created that are mapped to search points or constraint points in the knowledge-graph by words or phrases that have similar meanings to the search term or constraint term.
4. The knowledge-graph question-answering method based on path search is characterized in that the sequence labeling model comprises an embedding layer, a bidirectional long and short memory neural network layer and a conditional random field which are connected in sequence;
the method for training a sequence labeling model according to training samples comprises the following steps:
s 211: converting each word into an input vector through the established dictionary;
s 212: aiming at each training batch, keeping the length of each input vector consistent, and setting the maximum length;
s 213: setting the dimension of each word vector dimension in the embedding layer, setting the size of a hidden layer of the two-way long and short memory neural network layer, setting the number of labels of the conditional random field, and obtaining a final output vector of the sequence labeling model after the conditional random field;
s 214: and taking the error rate of labeling each question as a loss function, and training a sequence labeling model by using the user question and the labeling of the search term and the constraint term of each question.
5. The method of claim 1, wherein the s3 comprises:
s 31: constructing a training sample of a question target point;
s 32: establishing a target point identification model based on the marked training samples;
s 33: and identifying a target point set of the question based on the target point identification model of the question.
6. The knowledge-graph question-answering method based on path search according to claim 5, characterized in that the target point identification model comprises an embedded layer, a bidirectional cyclic long-short memory neural network layer and a neural network layer based on a softmax activation function which are connected in sequence;
the method for establishing the target point identification model comprises the following steps:
s 321: converting each word into an input vector through the established dictionary;
s 322: aiming at each training batch, keeping the length of each input vector consistent, and setting the maximum length;
s 323: setting dimension of each word vector dimension in the embedding layer, setting the size of a hidden layer of the bidirectional long and short memory neural network layer, and setting the output size of the neural network layer to obtain a final output vector;
s 324: and (3) training the model by taking the error rate of each question classification as a loss function, arranging training samples with prediction errors from small to large according to the loss function values, manually marking n pieces of data before screening, keeping other data unchanged, retraining the model by using marked data, and repeating s32 until the accuracy of the target point identification model is not increased or the repetition frequency reaches the set frequency.
7. A knowledge-graph question-answering device based on path search is characterized by comprising:
a user interaction module: the system comprises a module, a module and a database, wherein the module is used for interacting with a user, receiving a user input question, inputting the user input question into the module, acquiring a result returned from the module, and outputting the result serving as an answer to the user;
a search point and constraint point identification module: the sequence labeling model is used for training search terms and constraint terms, question search points and constraint points are identified through the sequence labeling model, and the question search points and the constraint points are mapped into the knowledge graph;
a target point identification module: the question target point recognition model is used for training a question target point recognition model, and the question target point is recognized through the target point recognition model;
a path search module: the method comprises the steps of searching candidate paths meeting conditions in a knowledge graph according to question search points and target points, adding path constraint conditions according to question constraint points, and then screening the paths according to a set screening mode;
an answer retrieval module: the system is used for converting the finally obtained path into an executable query statement matched with the knowledge graph storage mode and retrieving answers;
searching candidate paths meeting the conditions in the knowledge graph according to question search points and target points, and adding path constraint conditions according to question constraint points comprises the following steps:
s 41: aiming at each target point of the question, searching in the knowledge graph by taking the search point of the question as the center, and if one or more paths meet the conditions, taking the searched path as a candidate path and returning;
s 42: if s41 does not find a candidate path meeting the condition, the searched path is reserved, path extension is carried out on the basis, all end nodes on the path are used as a new search center to carry out searching, and s41 and s42 are repeated until the searched new path meets the condition or the maximum value of the set search times is reached;
in the process of path retrieval, matching all traversed nodes with constraint points of question sentences; if the constraint point of the question appears in the searching process and the distance between the constraint point of the question and the searching point on the path is minimum, adding the constraint point into the candidate path to be used as a path constraint item; and if the constraint point of the question does not appear in the searching process but has the minimum distance with the target point, adding the constraint point into the candidate path to be used as the target constraint item.
8. The apparatus according to claim 7, wherein the path search module further comprises:
a target point searching submodule: the system comprises a knowledge graph, a query point acquisition unit and a query point acquisition unit, wherein the knowledge graph is used for searching the query point as a center to obtain a plurality of candidate paths meeting the query point;
a path constraint submodule: increasing the constraint conditions of the candidate paths according to the constraint points of the question;
a path screening submodule: and the candidate paths are screened to obtain one or more candidate paths meeting the conditions.
CN202011249330.6A 2020-11-10 2020-11-10 Knowledge graph question-answering method and device based on path search Active CN112328766B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011249330.6A CN112328766B (en) 2020-11-10 2020-11-10 Knowledge graph question-answering method and device based on path search

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011249330.6A CN112328766B (en) 2020-11-10 2020-11-10 Knowledge graph question-answering method and device based on path search

Publications (2)

Publication Number Publication Date
CN112328766A CN112328766A (en) 2021-02-05
CN112328766B true CN112328766B (en) 2022-05-03

Family

ID=74317734

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011249330.6A Active CN112328766B (en) 2020-11-10 2020-11-10 Knowledge graph question-answering method and device based on path search

Country Status (1)

Country Link
CN (1) CN112328766B (en)

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112989005B (en) * 2021-04-16 2022-07-12 重庆中国三峡博物馆 Knowledge graph common sense question-answering method and system based on staged query
CN113742447B (en) * 2021-07-19 2024-04-02 暨南大学 Knowledge graph question-answering method, medium and equipment based on query path generation
CN113468311B (en) * 2021-07-20 2023-09-19 四川启睿克科技有限公司 Knowledge graph-based complex question and answer method, device and storage medium
CN114996412B (en) * 2022-08-02 2022-11-15 医智生命科技(天津)有限公司 Medical question and answer method and device, electronic equipment and storage medium
CN115544106B (en) * 2022-12-01 2023-02-28 云南电网有限责任公司信息中心 Internal event retrieval method and system for call center platform and computer equipment

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109684448A (en) * 2018-12-17 2019-04-26 北京北大软件工程股份有限公司 A kind of intelligent answer method
CN111091006A (en) * 2019-12-20 2020-05-01 北京百度网讯科技有限公司 Entity intention system establishing method, device, equipment and medium
CN111428009A (en) * 2020-06-12 2020-07-17 太平金融科技服务(上海)有限公司 Relationship query method and device, computer equipment and storage medium

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103871097B (en) * 2014-02-26 2017-01-04 南京航空航天大学 Data fusion technique fusion method based on dental preparations
CN106469169A (en) * 2015-08-19 2017-03-01 阿里巴巴集团控股有限公司 Information processing method and device
CN110019836A (en) * 2017-08-23 2019-07-16 中兴通讯股份有限公司 A kind of intelligent answer method and device
CN110147437B (en) * 2019-05-23 2022-09-02 北京金山数字娱乐科技有限公司 Knowledge graph-based searching method and device
CN110704434B (en) * 2019-09-24 2022-09-13 北京百度网讯科技有限公司 Method and device for inquiring shortest path of map, electronic equipment and storage medium
CN111611343B (en) * 2020-04-28 2023-06-16 北京智通云联科技有限公司 Searching system, method and equipment based on shortest path query of knowledge graph

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109684448A (en) * 2018-12-17 2019-04-26 北京北大软件工程股份有限公司 A kind of intelligent answer method
CN111091006A (en) * 2019-12-20 2020-05-01 北京百度网讯科技有限公司 Entity intention system establishing method, device, equipment and medium
CN111428009A (en) * 2020-06-12 2020-07-17 太平金融科技服务(上海)有限公司 Relationship query method and device, computer equipment and storage medium

Also Published As

Publication number Publication date
CN112328766A (en) 2021-02-05

Similar Documents

Publication Publication Date Title
CN112328766B (en) Knowledge graph question-answering method and device based on path search
CN107748757B (en) Question-answering method based on knowledge graph
CN109271505B (en) Question-answering system implementation method based on question-answer pairs
CN117033608B (en) Knowledge graph generation type question-answering method and system based on large language model
CN110399457B (en) Intelligent question answering method and system
CN108804521B (en) Knowledge graph-based question-answering method and agricultural encyclopedia question-answering system
CN111931506B (en) Entity relationship extraction method based on graph information enhancement
CN106599032B (en) Text event extraction method combining sparse coding and structure sensing machine
CN110188147B (en) Knowledge graph-based document entity relationship discovery method and system
CN111914556B (en) Emotion guiding method and system based on emotion semantic transfer pattern
CN111522910A (en) Intelligent semantic retrieval method based on cultural relic knowledge graph
CN111831789A (en) Question-answer text matching method based on multilayer semantic feature extraction structure
CN115599902B (en) Oil-gas encyclopedia question-answering method and system based on knowledge graph
CN116127095A (en) Question-answering method combining sequence model and knowledge graph
CN112328800A (en) System and method for automatically generating programming specification question answers
CN114238653B (en) Method for constructing programming education knowledge graph, completing and intelligently asking and answering
CN115599899B (en) Intelligent question-answering method, system, equipment and medium based on aircraft knowledge graph
CN109522396B (en) Knowledge processing method and system for national defense science and technology field
CN114036281A (en) Citrus control question-answering module construction method based on knowledge graph and question-answering system
CN111666374A (en) Method for integrating additional knowledge information into deep language model
CN115982338A (en) Query path ordering-based domain knowledge graph question-answering method and system
CN111339777A (en) Medical related intention identification method and system based on neural network
CN117332789A (en) Semantic analysis method and system for dialogue scene
CN114647719A (en) Question-answering method and device based on knowledge graph
CN117474010A (en) Power grid language model-oriented power transmission and transformation equipment defect corpus construction method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant