CN111368096A

CN111368096A - Knowledge graph-based information analysis method, device, equipment and storage medium

Info

Publication number: CN111368096A
Application number: CN202010156693.9A
Authority: CN
Inventors: 邹辉
Original assignee: Ping An Life Insurance Company of China Ltd
Current assignee: Ping An Life Insurance Company of China Ltd
Priority date: 2020-03-09
Filing date: 2020-03-09
Publication date: 2020-07-03

Abstract

The application relates to the technical field of data analysis, in particular to a knowledge graph-based information analysis method, a knowledge graph-based information analysis device, knowledge graph-based information analysis equipment and a storage medium, wherein the knowledge graph-based information analysis method comprises the following steps: judging whether the feedback information of the client is objection information; comparing the text similarity of the objection information with a preset objection type to determine the objection type of the objection information; extracting a named entity in objection information, constructing a target entity triple by taking an objection type as an entity, the named entity as an attribute and a numerical value corresponding to the named entity, traversing a preset knowledge graph by taking the target entity triple as a key element, and obtaining a plurality of candidate entity triples related to the objection information; and extracting candidate answers in the candidate entity triples, scoring the candidate answers to obtain a scoring result, and determining an expected result corresponding to the objection information according to the scoring result. The problem that the requirements of customers cannot be accurately and quickly acquired due to the limitation of the learning content of deep learning is solved.

Description

Knowledge graph-based information analysis method, device, equipment and storage medium

Technical Field

The present application relates to the field of data analysis technologies, and in particular, to a method, an apparatus, a device, and a storage medium for information analysis based on a knowledge graph.

Background

With the development of artificial intelligence technology, especially the rapid development of technologies such as deep learning, natural language processing and the like, the intelligent assistants such as microsoft mini ice, apple Siri, ali honey and the like are widely applied. The use of such smart assistant technology may answer questions posed by the user.

However, when the intelligent assistant technology is used to answer a question, the answer corresponding to the question cannot be given accurately and quickly due to the limitation of the learning content of deep learning.

Disclosure of Invention

Based on the above, aiming at the technical problem that the intelligent assistant cannot quickly and accurately answer the question due to the limited deep learning content at present, an information analysis method, device, equipment and storage medium based on the knowledge graph are provided.

A knowledge graph-based information analysis method comprises the following steps:

sending the problem to a user side and receiving feedback information of the user side;

judging whether the feedback information is consistent with an expected result, if so, sending the next problem to the user side, otherwise, marking the feedback information as objection information;

comparing the text similarity of the objection information with a preset objection type to determine the objection type of the objection information;

extracting a named entity in the objection information, constructing a target entity triple by taking the objection type as an entity, the named entity as an attribute and a numerical value corresponding to the named entity, and traversing a preset knowledge graph by taking the target entity triple as a key element to obtain a plurality of candidate entity triples related to the objection information;

and extracting candidate answers in the candidate entity triples, scoring the candidate answers to obtain a scoring result, and determining an expected result corresponding to the objection information according to the scoring result.

In one possible embodiment, the determining whether the feedback information is consistent with the expected result includes:

performing word vector conversion on the feedback information and the expected result to generate a feedback information word vector and an expected result word vector;

and calculating the similarity between the feedback word vector and the expected result word vector, if the similarity is greater than a preset similarity threshold, determining that the feedback information is consistent with the expected result, otherwise, determining that the feedback information is inconsistent with the expected result.

In one possible embodiment, the extracting named entities in the objection information includes:

performing word vector conversion on the objection information to obtain an objection information word vector;

inputting the dissimilarity information word vector into a preset conditional random field model to generate an initial recognition result;

and inputting the initial recognition result into a preset double-circulation neural network model for re-recognition to obtain the named entity.

In one possible embodiment, the traversing a preset knowledge-graph to obtain a plurality of candidate entity triples related to the objection information includes:

traversing the preset knowledge graph by taking the named entity as a query target to obtain all first entity triples containing the query target;

extracting attributes in the first entity triples, and if the attributes in the first entity triples are consistent with the attributes in the target entity triples, marking the attributes as second entity triples;

and acquiring the position of each second entity triple in the knowledge graph, extracting an upstream entity triple and a downstream entity triple of the second entity triples according to the position, and summarizing the second entity triples, the upstream entity triples and the downstream entity triples to obtain a plurality of candidate entity triples.

In one possible embodiment, the extracting candidate answers in the candidate entity triples, scoring the candidate answers to obtain a scoring result, and determining an expected result corresponding to the objection information according to the scoring result includes:

extracting attributes and attribute values in the candidate entity triples, and acquiring value intervals of all historical attribute values corresponding to the attributes from a system log according to the attributes;

if the attribute value is not in the value range, deleting the candidate entity triple corresponding to the attribute value;

and obtaining weights corresponding to attributes in the rest candidate entity triples, scoring the candidate answers according to the weights, and taking the candidate answer with the highest score as an expected result corresponding to the objection information.

In one possible embodiment, after extracting the candidate answers in each candidate entity triplet, scoring each candidate answer to obtain a scoring result, and determining an expected result corresponding to the objection information according to the scoring result, the method further includes:

determining a continuing problem according to the expected result;

receiving response information of a user terminal to the connection problem, and judging whether the response information contains a question word;

and if the response message contains the query word, re-determining the expected result, and if not, continuously sending the question to the user side.

An information analysis device based on knowledge graph comprises the following modules:

the information transceiving module is set to send the problem to the user side and receive feedback information of the user side;

the information identification module is set to judge whether the feedback information is consistent with an expected result or not, if so, the next problem is sent to the user side, otherwise, the feedback information is marked as objection information;

the type determining module is used for comparing the text similarity of the objection information with a preset objection type to determine the objection type of the objection information;

the triple selecting module is arranged for extracting a named entity in the objection information, constructing a target entity triple by taking the objection type as an entity, the named entity as an attribute and a numerical value corresponding to the named entity, and traversing a preset knowledge graph by taking the target entity triple as a key element to obtain a plurality of candidate entity triples related to the objection information;

and the result generation module is used for extracting the candidate answers in the candidate entity triples, scoring the candidate answers to obtain a scoring result, and determining an expected result corresponding to the objection information according to the scoring result.

In one possible embodiment, the information identification module is further configured to:

A computer device comprising a memory and a processor, the memory having stored therein computer readable instructions which, when executed by the processor, cause the processor to perform the steps of the above-described knowledge-graph based information analysis method.

A storage medium having stored thereon computer-readable instructions which, when executed by one or more processors, cause the one or more processors to perform the steps of the above-described knowledge-graph based information analysis method.

Compared with the existing mechanism, the method and the device have the advantages that the problem is sent to the user side, and the feedback information of the user side is received; judging whether the feedback information is consistent with an expected result, if so, sending the next problem to the user side, otherwise, marking the feedback information as objection information; comparing the text similarity of the objection information with a preset objection type to determine the objection type of the objection information; extracting a named entity in the objection information, constructing a target entity triple by taking the objection type as an entity, the named entity as an attribute and a numerical value corresponding to the named entity, and traversing a preset knowledge graph by taking the target entity triple as a key element to obtain a plurality of candidate entity triples related to the objection information; and extracting candidate answers in the candidate entity triples, scoring the candidate answers, and determining an expected result corresponding to the objection information according to a scoring result. Therefore, the technical problem that the intelligent assistant cannot quickly and accurately answer the questions due to the limited deep learning content is solved.

Drawings

Various other advantages and benefits will become apparent to those of ordinary skill in the art upon reading the following detailed description of the preferred embodiments. The drawings are only for purposes of illustrating the preferred embodiments and are not to be construed as limiting the application.

FIG. 1 is an overall flow diagram of a knowledge-graph based information analysis method in one embodiment of the present application;

FIG. 2 is a schematic diagram of an information recognition process in a knowledge-graph based information analysis method according to an embodiment of the present application;

FIG. 3 is a block diagram of an apparatus for knowledge-graph based information analysis in one embodiment of the present application.

Detailed Description

In order to make the objects, technical solutions and advantages of the present application more apparent, the present application is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the present application and are not intended to limit the present application.

As used herein, the singular forms "a", "an", "the" and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It will be further understood that the terms "comprises" and/or "comprising," when used in this specification, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.

Fig. 1 is an overall flowchart of a method for analyzing knowledge-graph-based information according to an embodiment of the present application, and the method for analyzing knowledge-graph-based information includes the following steps:

s1, sending the question to the user side and receiving feedback information of the user side;

specifically, a user is asked with a fixed template to obtain the user's needs. The server can sort the problems in the database according to the frequency of the problems when the user asks the user with the user all the time, and sorts the problem with the highest frequency in the front and the problem with the low frequency in the back. Then, the server sends the questions to the user side in sequence by obtaining the IP address of the user side where the user is located.

S2, judging whether the feedback information is consistent with the expected result, if so, sending the next question to the user side, otherwise, marking the feedback information as objection information;

specifically, the expected result in this step is a scenario obtained by performing statistical sorting on the questions asked to the user by the user side over the past times, and there is an expected result for each question in the scenario. Such as: the problems are as follows: your income situation? According to the expected result corresponding to the script is a certain value or interval, the answer which does not conform to the script is objection information. For example: the scenario is the specific amount of the monthly salary, and the result answered by the user side is as follows: when the income is low, the range of expected results is exceeded, and the information is marked as objection information.

S3, comparing the objection information with a preset objection type in a text similarity manner to determine the objection type of the objection information;

specifically, the objections can be divided into a plurality of categories by traversing the working logs stored by the user side; the objection type division can be set correspondingly according to different scenes. For example, in an insurance sales scenario, the following categories may be used: the types of premium, insurance bought, body health and insurance not need to be bought and the like; occupation, illness, age, income, family, insurance product, time, premium, amount. Different types correspond to different expected results in the knowledge-graph.

The text similarity calculation in this step may employ rnn, cnn, etc. to find out to which objection type the objection information belongs.

S4, extracting a named entity in the objection information, constructing a target entity triple by taking the objection type as an entity, the named entity as an attribute and a numerical value corresponding to the named entity, and traversing a preset knowledge graph by taking the target entity triple as a key element to obtain a plurality of candidate entity triples related to the objection information;

specifically, the named entity (named entity) is a name of a person, a name of an organization, a name of a place, and other entities identified by names. The named entity in the objection message is the body of the issued action or a noun before and after the adjective. Such as: the child plays the game too long and the named entity in this piece of objection is "child".

There are two general ways to construct triples in a knowledge graph, one is < entity, relationship, entity >, and the other is < entity, attribute value >, where the former represents the relationship between two entities, and the latter represents the attribute relationship inside the entities. Triples may be stored via graph databases, such as open source NEO4J, jena, queried using the cypher, sparsl language, respectively. When the knowledge graph is traversed, the attribute values in the triples can be used as connection points, and the attributes in the triples can also be used as the connection points of the triples.

S5, extracting the candidate answers in the candidate entity triples, scoring the candidate answers to obtain a scoring result, and determining an expected result corresponding to the objection information according to the scoring result.

Specifically, when scoring is performed, a machine learning regression prediction algorithm can be used for scoring, a GBM training set is established, a GBM regression model is trained by the GBM training set, and then the trained GBM regression model is used for classifying and scoring candidate answers. And according to the scoring result, taking the candidate answer with the highest score as an expected result corresponding to the objection information. Manual review can be performed on the identified expected results to determine the accuracy of the results.

In the embodiment, the feedback information of the user is effectively analyzed by adopting a triple mode of { entity, attribute and attribute value }, so that the technical problem that the intelligent assistant cannot quickly and accurately answer the question due to the limited deep learning content is solved.

Fig. 2 is a schematic diagram of an information identification process in a knowledge-graph-based information analysis method according to an embodiment of the present application, where as shown in the figure, the determining whether the feedback information is consistent with an expected result includes:

s21, performing word vector conversion on the feedback information and the expected result to generate a feedback information word vector and an expected result word vector;

specifically, a Word vector conversion tool which can be used for Word vector conversion is a Word2vec or BERT model, and when the Word vector conversion tool is used for Word vector conversion, dimension reduction operation can be performed to convert a multi-dimensional Word vector into a two-dimensional Word vector so as to compare similarity.

S22, calculating the similarity between the feedback word vector and the expected result word vector, if the similarity is larger than a preset similarity threshold, determining that the feedback information is consistent with the expected result, otherwise, determining that the feedback information is inconsistent with the expected result.

Specifically, during similarity calculation, a mode of performing product operation on two word vectors may be adopted, if a matrix eigenvalue obtained after product operation is 0, the two word vectors are completely consistent, if the eigenvalue is less than 1, the actual numerical value is used as a numerical value of the similarity, and if the eigenvalue is greater than 1, a decimal part is reserved as the numerical value of the similarity.

According to the method and the device, the feedback information is effectively classified, so that the objection information is quickly obtained, analysis is conveniently performed on the objection information, and the data volume needing to be analyzed is reduced.

In one embodiment, the extracting named entities in the objection information includes:

specifically, the Word vector conversion may use Word2vec or other Word vector conversion models to perform the embedded conversion.

among them, Conditional Random Fields (CRFs) are discriminant probability models, which are used to label or analyze sequence data, such as natural language characters or biological sequences. The conditional random field is a conditional probability distribution model P (Y | X) representing a markov random field of another set of output random variables Y given a set of input random variables X, i.e., the CRF is characterized by assuming that the output random variables constitute a markov random field. Conditional random fields can be viewed as a generalization of the maximum entropy markov model over the labeling problem.

Among them, the bicirculating neural network has memorability, parameter sharing and graphic completion (turing), so that the nonlinear characteristics of the sequence can be learned with high efficiency. The recurrent neural network is applied to Natural Language Processing (NLP), such as speech recognition, Language modeling, machine translation, and the like.

In the embodiment, the named entity is effectively extracted from the vectorized objection information by using the conditional random field model and the dual-cycle neural network model, so that the speed and the accuracy of information identification are improved.

In one embodiment, the traversing a preset knowledge-graph to obtain a plurality of candidate entity triples related to the objection information includes:

specifically, if the attribute of the first entity triplet is "officer" and the attribute of the target entity triplet is "officer", the two attributes are identical, and only the triplet identical to the target entity triplet is the triplet required in the embodiment, the expected result corresponding to the objection information can be accurately obtained.

Specifically, each entity or attribute in the knowledge graph can be not only connected with one triple, and all candidate answers related to the objection information can be extracted by obtaining the upstream and downstream triples, so that omission is avoided.

According to the method, the candidate triples are effectively obtained by using the knowledge graph, so that the number of data needing to be analyzed is simplified, and the efficiency of answering the customer questions by the intelligent assistant is effectively improved.

In one embodiment, the extracting the candidate answers in each candidate entity triplet, scoring each candidate answer to obtain a scoring result, and determining an expected result corresponding to the objection information according to the scoring result includes:

specifically, if the attribute is age, the attribute value is 20-80. That is, the attribute value in this step conforms to the normal value range of the attribute, for example, the age cannot be a negative number.

Wherein, different attributes correspond to different weights, for example, the weight corresponding to age is 0.8, and the weight corresponding to occupation is 0.6. The weight of the attribute is divided according to different application scenarios.

In the embodiment, the dimension of information identification is simplified by using the parameter of the attribute value, so that the efficiency of answering the client question by the intelligent assistant is improved.

In one embodiment, after extracting the candidate answers in each candidate entity triplet, scoring each candidate answer to obtain a scoring result, and determining an expected result corresponding to the objection information according to the scoring result, the method further includes:

determining a continuing problem according to the expected result;

Wherein, the query words are preset according to the scene requirements. For example, if the expected result is a government employee, the follow-on question is "level", if the user side answers "what is the level? "there is a question in the response message, requiring a redetermination of the expected result that the customer is not a government officer.

In the embodiment, the answer result obtained by the method can be effectively verified by answering the continuing question of the user, so that the parameters in the scheme can be corrected in time.

The technical features mentioned in any of the above corresponding embodiments or implementations are also applicable to the embodiment corresponding to fig. 3 in the present application, and the details of the subsequent similarities are not repeated.

In the above description, a method for analyzing information based on a knowledge graph according to the present application is described, and an apparatus for analyzing information based on a knowledge graph is described below.

A structure of an information analysis apparatus based on a knowledge-graph as shown in fig. 3 is applicable to information analysis based on a knowledge-graph. The knowledge-graph-based information analysis apparatus in the embodiment of the present application can implement the steps corresponding to the knowledge-graph-based information analysis method performed in the embodiment corresponding to fig. 1 described above. The functions realized by the knowledge graph-based information analysis device can be realized by hardware, and can also be realized by hardware executing corresponding software. The hardware or software includes one or more modules corresponding to the above functions, which may be software and/or hardware.

In one embodiment, a knowledge-graph-based information analysis apparatus is provided, as shown in fig. 3, including the following modules:

the information transceiving module 10 is configured to send a question to the user side and receive feedback information of the user side;

the information identification module 20 is configured to determine whether the feedback information is consistent with an expected result, and if so, send the next problem to the user side, otherwise, mark the feedback information as objection information;

a type determining module 30, configured to perform text similarity comparison between the objection information and a preset objection type to determine the objection type of the objection information;

the triple selecting module 40 is configured to extract a named entity in the objection information, construct a target entity triple by using the objection type as an entity, the named entity as an attribute, and a value corresponding to the named entity, and traverse a preset knowledge graph by using the target entity triple as a key element to obtain a plurality of candidate entity triples related to the objection information;

and the result generation module 50 is configured to extract candidate answers in the candidate entity triples, score the candidate answers to obtain a scoring result, and determine an expected result corresponding to the objection information according to the scoring result.

In one embodiment, the information identification module is further configured to:

In one embodiment, a computer device is provided, the computer device includes a memory and a processor, the memory stores computer readable instructions, and the computer readable instructions, when executed by the processor, cause the processor to execute the steps of the knowledge-graph based information analysis method in the above embodiments.

In one embodiment, a storage medium storing computer-readable instructions is provided, which when executed by one or more processors, cause the one or more processors to perform the steps of the method for knowledge-graph-based information analysis in the above embodiments. The storage medium may be a nonvolatile storage medium or a volatile storage medium, and the present application is not limited in particular.

Those skilled in the art will appreciate that all or part of the steps in the methods of the above embodiments may be implemented by associated hardware instructed by a program, which may be stored in a computer-readable storage medium, and the storage medium may include: a Read Only Memory (ROM), a Random Access Memory (RAM), a magnetic or optical disk, or the like.

The technical features of the embodiments described above may be arbitrarily combined, and for the sake of brevity, all possible combinations of the technical features in the embodiments described above are not described, but should be considered as being within the scope of the present specification as long as there is no contradiction between the combinations of the technical features.

The above-described embodiments are merely illustrative of some embodiments of the present application, which are described in more detail and detail, but are not to be construed as limiting the scope of the present application. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the concept of the present application, which falls within the scope of protection of the present application. Therefore, the protection scope of the present patent shall be subject to the appended claims.

Claims

1. A knowledge-graph-based information analysis method is characterized by comprising the following steps:

2. The method of knowledge-graph based information analysis of claim 1, wherein said determining whether the feedback information is consistent with an expected result comprises:

3. The method of knowledge-graph-based information analysis according to claim 1, wherein said extracting named entities in said objection information comprises:

4. The method of knowledge-graph based information analysis of claim 3, wherein traversing a preset knowledge-graph to obtain a plurality of candidate entity triples related to the objection information comprises:

5. The method of any one of claims 1 to 4, wherein the extracting candidate answers in each candidate entity triplet, scoring each candidate answer to obtain a scoring result, and determining an expected result corresponding to the objection information according to the scoring result comprises:

6. The knowledge-graph-based information analysis method according to claim 5, wherein after extracting candidate answers in the candidate entity triples, scoring the candidate answers to obtain a scoring result, and determining an expected result corresponding to the objection information according to the scoring result, the method further comprises:

determining a continuing problem according to the expected result;

7. An information analysis device based on knowledge graph is characterized by comprising the following modules:

8. The apparatus of claim 7, wherein the information recognition module is further configured to:

9. A computer device comprising a memory and a processor, the memory having stored therein computer-readable instructions, wherein the computer-readable instructions, when executed by the processor, cause the processor to perform the steps of the knowledge-graph based information analysis method of any one of claims 1 to 6.

10. A storage medium having stored thereon computer-readable instructions, which, when executed by one or more processors, cause the one or more processors to perform the steps of the method for knowledge-graph based information analysis of any one of claims 1 to 6.