CN110851610A

CN110851610A - Knowledge graph generation method and device, computer equipment and storage medium

Info

Publication number: CN110851610A
Application number: CN201810828187.2A
Authority: CN
Inventors: 许瑾; 刘文昱; 郝萌
Original assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Current assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Priority date: 2018-07-25
Filing date: 2018-07-25
Publication date: 2020-02-28
Anticipated expiration: 2038-07-25
Also published as: CN110851610B

Abstract

The application provides a knowledge graph generation method, a knowledge graph generation device, computer equipment and a storage medium, wherein the method comprises the following steps: the method comprises the steps of obtaining search words input by a user in a search session through the search session containing multiple searches, determining semantic inclusion relation among the search words according to text superposition conditions among the search words or syntax structures of the search words, using the search words as knowledge graph nodes, determining parent-child relation among the nodes according to the semantic inclusion relation among the search words, and finally generating a knowledge graph according to the parent-child relation. The method determines semantic inclusion relation among the search words according to text superposition conditions among the search words or syntax structures of the search words by searching the search words input by a user, and further generates the knowledge graph, thereby solving the technical problems of high construction cost and long construction time caused by the fact that the semantic inclusion relation cannot be accurately identified and the knowledge graph cannot be constructed in a manual mode in the prior art.

Description

Knowledge graph generation method and device, computer equipment and storage medium

Technical Field

The present application relates to the field of artificial intelligence technologies, and in particular, to a method and an apparatus for generating a knowledge graph, a computer device, and a storage medium.

Background

With the continuous development of information technology, the internet information is increasingly abundant. The application of knowledge graph enables artificial intelligence to be better developed, and people can find the most desirable information through searching. A knowledge graph is essentially a semantic network, and is a data structure based on a graph, which describes entities which exist in the world and the relationship among the entities. With the increasingly wide application of knowledge graphs, the construction of knowledge graphs is extremely important.

In the related art, the construction of the knowledge graph is still in a manual stage, and the construction of a specific field needs a great amount of manual labeling work by experts in the field, so that the construction cost is high, and the time is long. In addition, the current machine learning and natural language processing technologies have great difficulty in semantic recognition, and cannot accurately present a structured knowledge graph according to semantic inclusion relations.

Disclosure of Invention

The present application is directed to solving, at least to some extent, one of the technical problems in the related art.

Therefore, the knowledge graph generation method is provided, the semantic inclusion relation among the search words is determined according to the text superposition condition among the search words or the syntactic structure of the search words through the search words input by a user in a search session, and the knowledge graph of the semantic inclusion relation is constructed according to the semantic inclusion relation, so that the technical problems of high construction cost and long construction time caused by the fact that the knowledge graph is constructed manually in the related technology are solved.

The application provides a knowledge graph generating device.

The application provides a computer device.

The present application proposes a non-transitory computer-readable storage medium.

The present application proposes a computer program product.

An embodiment of one aspect of the present application provides a method for generating a knowledge graph, including:

for a search session containing multiple searches, acquiring each search word input by a user in the search session;

determining semantic inclusion relation between the search words according to the text coincidence condition between the search words or the syntactic structure of the search words;

taking each search word as a knowledge graph node, and determining a parent-child relationship among the nodes according to a semantic inclusion relationship among the search words;

and generating the knowledge graph according to the parent-child relationship.

According to the knowledge graph generation method, search terms input by a user in a search session are obtained through the search session comprising multiple searches; determining semantic inclusion relation between the search words according to the text coincidence condition between the search words or the syntactic structure of the search words; taking each search word as a knowledge graph node, and determining a parent-child relationship among the nodes according to a semantic inclusion relationship among the search words; and generating the knowledge graph according to the parent-child relationship. The method determines semantic inclusion relation among the search words according to the text coincidence condition among the search words or the syntactic structure of the search words by searching the search words input in the conversation by a user, and further generates the knowledge graph, thereby solving the technical problems of high construction cost and long construction time caused by the fact that the semantic inclusion relation cannot be accurately identified and the knowledge graph cannot be constructed in a manual mode in the prior art.

In another aspect, an embodiment of the present application provides a knowledge graph generating apparatus, including:

the device comprises an acquisition module, a search module and a search module, wherein the acquisition module is used for acquiring search terms input by a user in a search session containing multiple searches;

the determining module is used for determining semantic inclusion relation among the search terms according to the text coincidence condition among the search terms or the syntactic structure of the search terms;

the generating module is used for taking the search terms as knowledge graph nodes and determining the parent-child relationship among the nodes according to the semantic inclusion relationship among the search terms; and generating the knowledge graph according to the parent-child relationship.

The knowledge graph generating device of the embodiment of the application acquires each search word input by a user in a search session through the search session containing multiple searches; determining semantic inclusion relation between the search words according to the text coincidence condition between the search words or the syntactic structure of the search words; taking each search word as a knowledge graph node, and determining a parent-child relationship among the nodes according to a semantic inclusion relationship among the search words; and generating the knowledge graph according to the parent-child relationship. The method determines semantic inclusion relation among the search words according to the text coincidence condition among the search words or the syntactic structure of the search words by searching the search words input in the conversation by a user, and further generates the knowledge graph, thereby solving the technical problems of high construction cost and long construction time caused by the fact that the semantic inclusion relation cannot be accurately identified and the knowledge graph cannot be constructed in a manual mode in the prior art.

In yet another aspect of the present application, a computer device is provided, which includes a memory, a processor, and a computer program stored in the memory and executable on the processor, and when the processor executes the computer program, the method for generating a knowledge graph according to the foregoing embodiments is implemented.

A further aspect of the present application is to provide a non-transitory computer-readable storage medium, on which a computer program is stored, wherein the computer program is configured to implement the method for generating a knowledge graph according to the foregoing embodiments when executed by a processor.

In yet another aspect, the present application provides a computer program product, wherein when the instructions in the computer program product are executed by a processor, the method for generating a knowledge graph according to the foregoing embodiments is performed.

Additional aspects and advantages of the present application will be set forth in part in the description which follows and, in part, will be obvious from the description, or may be learned by practice of the present application.

Drawings

The foregoing and/or additional aspects and advantages of the present application will become apparent and readily appreciated from the following description of the embodiments, taken in conjunction with the accompanying drawings of which:

FIG. 1 is a schematic flow chart diagram illustrating a method for generating a knowledge graph according to an embodiment of the present disclosure;

FIG. 2 is a schematic flow chart illustrating a process for determining semantic inclusion relationships between search terms according to an embodiment of the present disclosure;

FIG. 3 is a schematic flow chart illustrating another process for determining semantic inclusion relationships between search terms according to an embodiment of the present application;

FIG. 4 is a representation of a dependency structure provided by an embodiment of the present application;

FIG. 5 is a schematic flow chart illustrating a method for determining confidence levels of parent-child relationships between child nodes and parent nodes according to an embodiment of the present application;

FIG. 6 is a schematic flow chart illustrating a method for determining confidence levels of parent-child relationships between child nodes and parent nodes according to an embodiment of the present application;

FIG. 7 is a diagram illustrating a user's one-time search behavior according to an embodiment of the present disclosure;

FIG. 8 is a schematic structural diagram of a knowledge-graph generating apparatus according to an embodiment of the present application;

FIG. 9 illustrates a block diagram of an exemplary computer device suitable for use to implement embodiments of the present application;

FIG. 10 is a schematic flow chart diagram of another method for generating a knowledge graph according to an embodiment of the present application;

FIG. 11 is a block diagram of a knowledge tree and associated documents provided by an embodiment of the present application.

Detailed Description

Reference will now be made in detail to embodiments of the present application, examples of which are illustrated in the accompanying drawings, wherein like or similar reference numerals refer to the same or similar elements or elements having the same or similar function throughout. The embodiments described below with reference to the drawings are exemplary and intended to be used for explaining the present application and should not be construed as limiting the present application.

The knowledge-graph generation method and apparatus of the embodiments of the present application are described below with reference to the drawings.

Fig. 1 is a schematic flow chart of a method for generating a knowledge graph according to an embodiment of the present disclosure.

As shown in fig. 1, the knowledge-graph generating method includes the following steps:

it should be noted that the knowledge graph is a structured semantic knowledge base, and is used for describing concepts and related relationships thereof in the physical world in a symbolic form, and the constituent units of the knowledge graph are "entity-relationship-entity" triples and "entity-related attribute-value" pairs, and the entities are connected with each other through relationships to form a mesh knowledge structure.

Step 101, for a search session containing multiple searches, obtaining each search term input by a user in the search session.

In the embodiment of the application, a user queries required information by inputting search terms, but the required information cannot be queried by inputting the search terms once, so that a search session of the user may include multiple searches, the user needs to input richer search terms, the required information can be accurately queried by multiple searches, and a progressive relation exists between the search terms input each time.

Since the semantics of each search term input by the user are not easily recognized, in this embodiment, the search session of the user is used as the annotation data to recognize the semantic inclusion relationship between the search terms, so that the search session of the user needs to be acquired.

First, a search session matching a knowledge point to be referred by the knowledge graph to be generated is queried according to the knowledge point in the embodiment. For example, the knowledge points involved in the generated knowledge graph are in the field of psychology, and a search session matching psychology needs to be queried. In this embodiment, there are various ways to acquire knowledge points.

As a possible implementation, knowledge points are obtained from terms or words. The knowledge points that people must learn to master include professional terms, professional vocabularies and the like, such as a quadratic equation, an equilateral triangle, the diversity of organisms and the like.

As another possible implementation, most points of common knowledge are obtained by gathering structured data of encyclopedia sites and various vertical sites. Since encyclopedic knowledge contains most knowledge points in a certain field, knowledge points in a certain field can be obtained according to encyclopedic data, for example, when a certain field is a psychological field, by querying encyclopedic data, the available knowledge points may be: psychology, psychology activity, psychological process, psychology dimension, internal activity, experimental psychology, psychosomatic relationship psychology activity, congenital theory, objective theory, activity theory, cognition, cognitive structure, cognitive disorder, cognitive skill, etc.

As another possible implementation, knowledge points are obtained by machine mining. Specifically, the knowledge points such as the names of people, place names, etc. in the document can be identified by the identification technique.

Specifically, to accurately know a certain field, a plurality of search terms may be input, for example, in the case where the knowledge point is a psychological field, the search terms may be psychology, psychological activity, psychological growth, mental activity, and the like. Therefore, for a search session containing a plurality of searches, each search word input by the user in the search session is further acquired.

Further, in a plurality of search sessions of the user, the input search word is not matched with the knowledge point every time, and there may be a case where the input search word is not matched with the knowledge point. In this embodiment, according to the knowledge points matched with the search session, the search terms matched with the knowledge points are screened and retained from the search terms input by the user.

And step 102, determining semantic inclusion relation among the search terms according to the text coincidence condition among the search terms or the syntactic structure of the search terms.

In this embodiment, text overlapping may occur between search terms input by the user, for example, text overlapping may occur between search terms in psychology, psychological activity, psychological dimensions, and the like.

As another possible implementation manner, syntax structures may also exist among the search terms, and the syntax structures that may exist include a centering relationship, a number relationship, a parallel relationship, a co-location relationship, an additional relationship, a guest relationship, a predicate relationship, an analogy relationship, a time relationship, a place relationship, a word structure of "word" and the like.

Furthermore, determining semantic inclusion relation among the search terms according to the text superposition condition among the search terms or the syntactic structure among the search terms.

And 103, taking the search terms as nodes of the knowledge graph, and determining the parent-child relationship among the nodes according to the semantic inclusion relationship among the search terms.

Specifically, each search word input by the user is used as a node of the knowledge graph, each node represents an entity existing in the real world, and the parent-child relationship among the nodes can be further determined according to the semantic inclusion relationship among the search words.

As an example, when the search term input by the user is psychological or psychological activity, the parent-child relationship between the nodes of the knowledge graph can be determined according to the semantic inclusion relationship between the search terms, that is, the parent node is psychological and the child node is psychological activity.

And 104, generating a knowledge graph according to the parent-child relationship.

In this embodiment, each search term is used as a node of the knowledge graph, and a corresponding knowledge graph is generated according to the determined parent-child relationship between the nodes.

As a possible implementation manner, in step 102, according to a text overlapping condition between each search word, a semantic inclusion relationship between each search word is determined, and a semantic inclusion relationship between search words corresponding to corresponding text characters may be determined by determining a maximum substring in text characters of each search word. Referring to fig. 2, fig. 2 is a schematic flow chart illustrating the process of determining semantic inclusion relationship between search terms according to the text coincidence condition between the search terms, and therefore, step 102 also includes the following sub-steps:

and a substep 201, searching the maximum substring of each text character in the obtained text characters of each search word.

Specifically, text characters of each search word are obtained according to the search word input by the user, and further, the maximum substring of each text character is searched in the obtained text characters of each search word.

And a substep 202 of determining a search word corresponding to the maximum substring of each text character, wherein semantic inclusion relation exists between the search word corresponding to the corresponding text character.

Specifically, according to the maximum substring of each text character, determining a search word corresponding to the maximum substring of each text character, and further determining a semantic inclusion relationship between the search word corresponding to the maximum substring and the search word corresponding to the corresponding text character.

As another possible implementation manner, in step 102, the semantic inclusion relationship between the search terms may also be determined according to the syntactic structure between the search terms. Referring to fig. 3, fig. 3 is a schematic flow chart illustrating a process of determining semantic inclusion relationship between search terms according to a syntactic structure between the search terms, and therefore, step 102 also includes the following sub-steps:

step 301, determining whether the search term has a structural relationship of a idiom-centric language, the search term has a parallel relationship, or the search term has a modification relationship according to a syntactic structure between the search terms.

The syntax structure of each search term may include a fixed-language central language structure relationship, a quantitative relationship, a parallel relationship, a co-location relationship, an additional relationship, a dynamic-guest relationship, a mediating relationship, a predicate relationship, a analogy relationship, a time relationship, a place relationship, a "word structure", and the like.

It should be noted that the basic task of syntactic structure analysis is to determine the syntactic structure of a sentence or the dependency between words in the sentence. In the dependency analysis, the syntactic relation between words is regarded as the dependency relation and is represented by a labeled directed arc, the syntactic structure of a sentence is a tree structure taking a virtual node ROOT as a ROOT, and each node in the tree is a word in the sentence.

As an example, referring to FIG. 4, FIG. 4 is a dependency structure representation of a dependency analysis. The directed arcs in the figure are called dependent arcs, and the direction of the dependent arcs is from dependent words to dominant words, but the opposite is also possible and can be expressed uniformly according to personal habits. Labels HED, SBV, VOB, DE on the directed arcs in the figure represent the core relationship, the predicate relationship, the move-guest relationship, and the word structure of "respectively.

In this embodiment, at least one of a search term having a structural relationship with a fixed-language central language, a search term having a parallel relationship, and a search term having a modified relationship is determined from the search terms.

Step 302, if the search word has the structural relationship of the fixed-language central language, determining that the search word as the central language semantically contains the search word as the fixed-language.

Wherein the definite language is used to modify, define, or describe the quality and characteristics of a noun or pronoun. Besides, nouns, pronouns, numerators, prepositions, verb indefinite forms (phrases), participles, definite clauses or words, phrases or sentences corresponding to adjectives can be used as definite phrases. The relationship between the idioms and the core is modified and modified, restricted and restricted.

In this embodiment, for a search word having a structural relationship with a fixed-language core, it is determined that the search word as the core semantically includes a search word as the fixed-language.

Step 303, if the search term has the modified structure relationship, determining that the search term as the central language semantically contains the search term as the modified language.

Wherein, the modifier is a word, phrase or clause for modifying other components in the sentence. Nouns, adjectives, adjective clauses, participles, etc. may be used as modifiers for nouns or pronouns.

In this embodiment, search terms having a modified structure relationship are determined from search terms input by a user, and further, for the search terms having a modified structure relationship, it is determined that the search term as the center term semantically includes a search term as a modified term.

In step 304, if there is a semantic inclusion relationship between the search terms having a parallel relationship.

Specifically, whether the search words of the determined central words have a parallel relationship or not is judged, and the search words having the parallel relationship are determined to have a semantic inclusion relationship.

It should be noted that the execution procedure of the steps 302 to 304 is only an example, and the execution order of the steps 302 to 304 is not limited in this embodiment, and may be any execution order.

According to the knowledge graph generation method, search terms input by a user in a search session are obtained through the search session comprising multiple searches; determining semantic inclusion relation between the search words according to the text coincidence condition between the search words or the syntactic structure of the search words; taking each search word as a knowledge graph node, and determining a parent-child relationship among the nodes according to a semantic inclusion relationship among the search words; and generating the knowledge graph according to the parent-child relationship. Therefore, through searching each search word input in the conversation by the user, the semantic inclusion relationship among the search words is determined according to the text coincidence condition among the search words or the syntactic structure of the search words, and then the knowledge graph is generated, so that the technical problems of high construction cost and long construction time caused by the fact that the semantic inclusion relationship cannot be accurately identified and the knowledge graph cannot be constructed in a manual mode in the prior art are solved.

As a possible implementation manner, after determining the parent-child relationship between the nodes according to the semantic inclusion relationship between the search terms, the confidence of the parent-child relationship between the child node and the parent node needs to be determined according to the difference degree between the information entropy of the search term corresponding to the child node and the information entropy of the search term corresponding to the parent node. The above process is described in detail below with reference to fig. 5.

As shown in fig. 5, after step 103, the knowledge-graph generating method may further include the steps of:

step 401, determining the information entropy of the search word corresponding to the child node according to the importance of each word in the search word corresponding to the child node; wherein the importance is used to indicate how important a word is to the intent expressed by the search term.

In this embodiment, the information entropy of the search term corresponding to the child node is calculated by a Natural Language Processing platform (NLPC) according to the importance degree of each word in the search term corresponding to the child node to the intention expressed by the search term. The information entropy is a rather abstract concept in mathematics, and can be understood as the frequency of occurrence of the search term corresponding to the child node. For example, the information entropy of each child node corresponding to the search term can be calculated by the following formula (1).

Wherein, Ck in the formula (1) represents a search term corresponding to the child node; wordrank (ck) represents the importance of each word in the search term corresponding to the child node; n is a positive integer; the equal-sign left Term (q1) represents the information entropy of the child node corresponding to the search Term.

Step 402, determining the information entropy of the search word corresponding to the father node according to the importance of each word in the search word corresponding to the father node.

Similarly, the information entropy of the search word corresponding to the parent node is calculated by the NLPC according to the importance of each word in the search word corresponding to the parent node, for example, the information entropy of the search word corresponding to the parent node can be calculated by the following formula (2).

Wherein, Ck in the formula (2) represents a search term corresponding to the parent node; wordrank (ck) represents the importance of each word in the search term corresponding to the parent node; n is a positive integer; the equal-sign left Term (q2) represents the information entropy of the parent node corresponding to the search Term.

And 403, determining the confidence of the parent-child relationship between the child node and the parent node according to the difference degree between the information entropy of the search word corresponding to the child node and the information entropy of the search word corresponding to the parent node.

Specifically, since the child node has extra information entropy relative to the parent node, according to the information entropy of the search word corresponding to the child node determined in step 401 and the information entropy of the search word corresponding to the parent node determined in step 402, the degree of difference between the information entropy of the search word corresponding to the child node and the information entropy of the search word corresponding to the parent node is calculated, so that the confidence of the parent-child relationship between the child node and the parent node can be determined. For example, the degree of difference in information entropy can be calculated by the following formula (3).

Wherein, im (q1, q2) child nodes in formula (3) correspond to the degree of difference between the information entropy of the search word and the information entropy of the search word corresponding to the parent node.

As another possible implementation manner, after determining the parent-child relationship between the nodes according to the semantic inclusion relationship between the search terms, the confidence level of the parent-child relationship between the child node and the parent node may also be determined according to the ratio of the co-occurrence frequency to the frequency of the search term corresponding to the parent node, and according to the ratio of the frequency of the search term corresponding to the parent node to the number of the plurality of search sessions. The above process is described in detail below with reference to fig. 6.

Step 501, according to each search word input by users in multiple search sessions, determining the frequency of occurrence of the search word corresponding to the father node, and determining the frequency of co-occurrence of the search word corresponding to the father node and the search word corresponding to the child node in the same search session.

Specifically, different search terms may be input by the user in multiple search sessions, the frequency of occurrence of each search term is different, and the frequency of occurrence of the search term corresponding to the parent node may be determined according to each search term input by the user in multiple search sessions.

Further, in the same search session, the user may input a search word that includes both a search word corresponding to the parent node and a search word corresponding to the child node, so that a frequency of co-occurrence, that is, a co-occurrence frequency, of the search word corresponding to the parent node and the search word corresponding to the child node in the same search session can be determined.

Step 502, determining the confidence of the parent-child relationship between the child node and the parent node according to the ratio of the co-occurrence frequency to the frequency of the search term corresponding to the parent node and the ratio of the frequency of the search term corresponding to the parent node to the number of the plurality of search sessions.

In this embodiment, according to the ratio of the frequency of co-occurrence of the search word corresponding to the parent node and the search word corresponding to the child node in the same search session to the frequency of occurrence of the search word corresponding to the determined corresponding parent node, and by taking the ratio of the frequency of occurrence of the search word corresponding to the parent node to the number of the plurality of search sessions, the confidence of the parent-child relationship between the child node and the parent node can be determined. For example, the confidence of the parent-child relationship between the child node and the parent node can be calculated by the following formula (4).

In formula (4), re (q1, q2) on the left of the equal sign represents the confidence of the parent-child relationship between the child node and the parent node; w1 represents the frequency of occurrence of the parent node corresponding to the search term; w2 represents the co-occurrence frequency of the search word corresponding to the parent node and the search word corresponding to the child node in the same search session; q represents the total number of search terms in the plurality of search sessions.

As an example, referring to fig. 7, fig. 7 is a search behavior of a user, which may comprise multiple search sessions, involving knowledge points of biological habitats. And inquiring to obtain a search session matched with the knowledge point, and determining that each search word is a biological habitat, and measures for eliminating and protecting the biological habitat are taken.

Furthermore, each search word is used as a knowledge graph node, and the parent-child relationship among the nodes is determined according to the semantic inclusion relationship among the search words. In the example, the father node of the knowledge graph is a biological habitat, and the child nodes are measures for disappearance and protection of the biological habitat. By the method for calculating the confidence degrees of the parent-child relationship between the child node and the parent node in the above embodiment, the confidence degrees of the parent-child relationship between each child node and the parent node as shown in fig. 7 can be determined to be 0.95 and 0.98, respectively. For example, the confidence of the parent-child relationship between the child node and the parent node can be calculated by the following formula (5).

Q(q1，q2)＝im(q1，q2)*re(q1，q2)；(5)

In formula (5), the left Q (Q1, Q2) with equal sign represents the confidence of the parent-child relationship between the child node and the parent node; im (q1, q2) is the difference degree between the information entropy of the search word corresponding to the child node and the information entropy of the search word corresponding to the parent node, which is calculated according to the formula (1), the formula (2) and the formula (3); re (q1, q2) is the degree of heat between the parent node and the child node calculated according to the above formula (4). Wherein, the heat is used for measuring the heat of the edge connecting the parent node and the child node in fig. 7.

According to the knowledge graph generation method, after the parent-child relationship among nodes is determined according to the semantic inclusion relationship among the search words, the difference degree between the information entropy of the search word corresponding to the child node and the information entropy of the search word corresponding to the parent node is obtained; or according to the ratio of the co-occurrence frequency to the frequency of occurrence of the search term corresponding to the parent node, and according to the ratio of the frequency of occurrence of the search term corresponding to the parent node to the number of the plurality of search sessions; or the confidence of the parent-child relationship between the child node and the parent node is further determined according to the multiplication of the difference degree between the information entropy of the search word corresponding to the child node and the information entropy of the search word corresponding to the parent node and the heat degree between the child node and the parent node. The method ensures that the semantic inclusion relationship is more accurately identified by determining the confidence degree of the semantic inclusion relationship.

In order to implement the above embodiments, the present application also provides a knowledge graph generating apparatus.

Fig. 8 is a schematic structural diagram of a knowledge graph generating apparatus according to an embodiment of the present application.

As shown in fig. 8, the knowledge-map generating apparatus 100 includes: an acquisition module 110, a determination module 120, and a generation module 130.

The obtaining module 110 is configured to obtain, for a search session including multiple searches, each search word input by a user in the search session.

The determining module 120 is configured to determine a semantic inclusion relationship between the search terms according to a text overlapping condition between the search terms or a syntax structure of each search term.

The generating module 130 is configured to use each search term as a knowledge graph node, and determine a parent-child relationship between nodes according to a semantic inclusion relationship between each search term; and generating the knowledge graph according to the parent-child relationship.

As a possible implementation, the determining module 120 includes:

the first determining unit is used for searching the maximum substring of each text character in the acquired text characters of each search word; and determining the search word corresponding to the maximum substring of each text character, wherein semantic inclusion relation exists between the search words corresponding to the corresponding text characters.

A second determining unit, configured to determine at least one of a search term having a structural relationship with a fixed-language central language, a search term having a parallel relationship, and a search term having a modified relationship from among the search terms according to a syntactic structure between the search terms;

determining that the search words serving as the centre words semantically contain the search words serving as the fixed language for the search words having the structural relation of the fixed language centre words;

for the search words with the modification structure relationship, determining that the search words serving as the central language semantically contain the search words serving as the modification words;

and determining that the search words with the parallel relation have semantic inclusion relation.

As a possible implementation manner, the knowledge graph generating apparatus 100 further includes:

the first determining module is used for determining the information entropy of the search word corresponding to the child node according to the importance of each word in the search word corresponding to the child node; wherein the importance is used for indicating the importance degree of the word to the expressed intention of the search word;

determining the information entropy of the search word corresponding to the father node according to the importance of each word in the search word corresponding to the father node;

and determining the confidence of the parent-child relationship between the child node and the parent node according to the difference degree between the information entropy of the search word corresponding to the child node and the information entropy of the search word corresponding to the parent node.

As a possible implementation manner, the knowledge graph generating apparatus further includes:

the second determining module is used for determining the frequency of the search words corresponding to the father node according to the search words input by the users in the plurality of search sessions, and determining the co-occurrence frequency of the search words corresponding to the father node and the search words corresponding to the child nodes in the same search session;

and determining the confidence of the parent-child relationship between the child node and the parent node according to the ratio of the co-occurrence frequency to the frequency of the search word corresponding to the parent node and the ratio of the frequency of the search word corresponding to the parent node to the number of the plurality of search sessions.

and the query module is used for querying the search session matched with the knowledge points according to the knowledge points involved in the knowledge graph.

and the screening module is used for screening and reserving the search words matched with the knowledge points from the search words input by the user according to the knowledge points matched with the search session.

The knowledge graph generating device of the embodiment of the application acquires each search word input by a user in a search session through the search session containing multiple searches; determining semantic inclusion relation between the search words according to the text coincidence condition between the search words or the syntactic structure of the search words; taking each search word as a knowledge graph node, and determining a parent-child relationship among the nodes according to a semantic inclusion relationship among the search words; and generating the knowledge graph according to the parent-child relationship. Therefore, through searching each search word input in the conversation by the user, the semantic inclusion relationship among the search words is determined according to the text coincidence condition among the search words or the syntactic structure of the search words, and then the knowledge graph is generated, so that the technical problems of high construction cost and long construction time caused by the fact that the semantic inclusion relationship cannot be accurately identified and the knowledge graph cannot be constructed in a manual mode in the prior art are solved.

It should be noted that the explanation of the embodiment of the knowledge graph generation method is also applicable to the knowledge graph generation apparatus of this embodiment, and is not repeated here.

In order to implement the foregoing embodiments, the present application further provides a computer device, which is characterized by comprising a memory, a processor and a computer program stored in the memory and running on the processor, wherein the processor executes the computer program to implement the method for generating a knowledge graph according to the foregoing embodiments.

FIG. 9 illustrates a block diagram of an exemplary computer device suitable for use to implement embodiments of the present application. The computer device 12 shown in fig. 11 is only an example, and should not bring any limitation to the function and the scope of use of the embodiments of the present application.

As shown in FIG. 9, computer device 12 is in the form of a general purpose computing device. The components of computer device 12 may include, but are not limited to: one or more processors or processing units 16, a system memory 28, and a bus 18 that couples various system components including the system memory 28 and the processing unit 16.

Bus 18 represents one or more of any of several types of bus structures, including a memory bus or memory controller, a peripheral bus, an accelerated graphics port, and a processor or local bus using any of a variety of bus architectures. These architectures include, but are not limited to, Industry Standard Architecture (ISA) bus, Micro Channel Architecture (MAC) bus, enhanced ISA bus, Video Electronics Standards Association (VESA) local bus, and Peripheral Component Interconnect (PCI) bus, to name a few.

Computer device 12 typically includes a variety of computer system readable media. Such media may be any available media that is accessible by computer device 12 and includes both volatile and nonvolatile media, removable and non-removable media.

Memory 28 may include computer system readable media in the form of volatile Memory, such as Random Access Memory (RAM) 30 and/or cache Memory 32. Computer device 12 may further include other removable/non-removable, volatile/nonvolatile computer system storage media. By way of example only, storage system 34 may be used to read from and write to non-removable, nonvolatile magnetic media (not shown in FIG. 9, and commonly referred to as a "hard drive"). Although not shown in FIG. 9, a disk drive for reading from and writing to a removable, nonvolatile magnetic disk (e.g., a "floppy disk") and an optical disk drive for reading from or writing to a removable, nonvolatile optical disk (e.g., a Compact disk Read Only memory (CD-ROM), a Digital versatile disk Read Only memory (DVD-ROM), or other optical media) may be provided. In these cases, each drive may be connected to bus 18 by one or more data media interfaces. Memory 28 may include at least one program product having a set (e.g., at least one) of program modules that are configured to carry out the functions of embodiments of the application.

A program/utility 40 having a set (at least one) of program modules 42 may be stored, for example, in memory 28, such program modules 42 including, but not limited to, an operating system, one or more application programs, other program modules, and program data, each of which examples or some combination thereof may comprise an implementation of a network environment. Program modules 42 generally perform the functions and/or methodologies of the embodiments described herein.

The computer device 12 may also communicate with one or more external devices 14 (e.g., keyboard, pointing device, display 24, etc.), with one or more devices that enable a user to interact with the computer system/server 12, and/or with any devices (e.g., network card, modem, etc.) that enable the computer system/server 12 to communicate with one or more other computing devices. Such communication may be through an input/output (I/O) interface 22. Moreover, computer device 12 may also communicate with one or more networks (e.g., a Local Area Network (LAN), a Wide Area Network (WAN), and/or a public Network such as the Internet) via Network adapter 20. As shown in FIG. 11, the network adapter 20 communicates with the other modules of the computer device 12 via the bus 18. It should be understood that although not shown in the figures, other hardware and/or software modules may be used in conjunction with computer device 12, including but not limited to: microcode, device drivers, redundant processing units, external disk drive arrays, RAID systems, tape drives, and data backup storage systems, among others.

The processing unit 16 executes various functional applications and data processing, such as implementing the knowledge-graph generation method mentioned in the foregoing embodiments, by executing a program stored in the system memory 28.

To achieve the above embodiments, the present application also proposes a non-transitory computer-readable storage medium having a computer program stored thereon, wherein the program, when executed by a processor, implements the method for generating a knowledge graph according to the above embodiments.

To achieve the above embodiments, the present application also proposes a computer program product, which is characterized in that when the instructions in the computer program product are executed by a processor, the method for generating a knowledge graph according to the above embodiments is performed.

As an example, referring to FIG. 10, the computer program product includes a base data layer 210, a relationship building layer 220, and an integration layer 230.

Specifically, the basic data layer 210 is used for acquiring knowledge points of the knowledge graph and retaining search terms matched with the knowledge points.

The relationship building layer 220 is configured to determine semantic inclusion relationships between the search terms according to text overlapping conditions between the search terms or syntax structures of the search terms.

And the integration layer 230 is used for cleaning search terms, filtering question sentences, evaluating document resource coverage, counting distribution of knowledge points and removing duplication and combining strategies of the knowledge trees.

The knowledge tree is constructed by taking each search word as a knowledge graph node and determining the parent-child relationship among the nodes according to the semantic inclusion relationship among the search words.

Since the search words input by the user are diversified and may include words including anti-yellow words, politically sensitive words, blank spaces, word errors due to word breakage, query words, and the like, the search words need to be cleaned.

And the document resource coverage rate evaluation is to perform correlation calculation by taking the knowledge tree and the document resource title as topics, evaluate the resource quantity contained in the knowledge tree, and further remove the resource with the resource quantity coverage quantity of 0. For example, a structure diagram of a knowledge tree and associated document is shown in FIG. 11.

Further, the knowledge trees may contain the same knowledge points, and therefore, the same knowledge points need to be de-duplicated or combined to construct the knowledge graph.

In the description herein, reference to the description of the term "one embodiment," "some embodiments," "an example," "a specific example," or "some examples," etc., means that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the application. In this specification, the schematic representations of the terms used above are not necessarily intended to refer to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples. Furthermore, various embodiments or examples and features of different embodiments or examples described in this specification can be combined and combined by one skilled in the art without contradiction.

Furthermore, the terms "first", "second" and "first" are used for descriptive purposes only and are not to be construed as indicating or implying relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defined as "first" or "second" may explicitly or implicitly include at least one such feature. In the description of the present application, "plurality" means at least two, e.g., two, three, etc., unless specifically limited otherwise.

Any process or method descriptions in flow charts or otherwise described herein may be understood as representing modules, segments, or portions of code which include one or more executable instructions for implementing steps of a custom logic function or process, and alternate implementations are included within the scope of the preferred embodiment of the present application in which functions may be executed out of order from that shown or discussed, including substantially concurrently or in reverse order, depending on the functionality involved, as would be understood by those reasonably skilled in the art of the present application.

The logic and/or steps represented in the flowcharts or otherwise described herein, e.g., an ordered listing of executable instructions that can be considered to implement logical functions, can be embodied in any computer-readable medium for use by or in connection with an instruction execution system, apparatus, or device, such as a computer-based system, processor-containing system, or other system that can fetch the instructions from the instruction execution system, apparatus, or device and execute the instructions. For the purposes of this description, a "computer-readable medium" can be any means that can contain, store, communicate, propagate, or transport the program for use by or in connection with the instruction execution system, apparatus, or device. More specific examples (a non-exhaustive list) of the computer-readable medium would include the following: an electrical connection (electronic device) having one or more wires, a portable computer diskette (magnetic device), a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber device, and a portable compact disc read-only memory (CDROM). Additionally, the computer-readable medium could even be paper or another suitable medium upon which the program is printed, as the program can be electronically captured, via for instance optical scanning of the paper or other medium, then compiled, interpreted or otherwise processed in a suitable manner if necessary, and then stored in a computer memory.

It should be understood that portions of the present application may be implemented in hardware, software, firmware, or a combination thereof. In the above embodiments, the various steps or methods may be implemented in software or firmware stored in memory and executed by a suitable instruction execution system. If implemented in hardware, as in another embodiment, any one or combination of the following techniques, which are known in the art, may be used: a discrete logic circuit having a logic gate circuit for implementing a logic function on a data signal, an application specific integrated circuit having an appropriate combinational logic gate circuit, a Programmable Gate Array (PGA), a Field Programmable Gate Array (FPGA), or the like.

It will be understood by those skilled in the art that all or part of the steps carried by the method for implementing the above embodiments may be implemented by hardware related to instructions of a program, which may be stored in a computer readable storage medium, and when the program is executed, the program includes one or a combination of the steps of the method embodiments.

In addition, functional units in the embodiments of the present application may be integrated into one processing module, or each unit may exist alone physically, or two or more units are integrated into one module. The integrated module can be realized in a hardware mode, and can also be realized in a software functional module mode. The integrated module, if implemented in the form of a software functional module and sold or used as a stand-alone product, may also be stored in a computer readable storage medium.

The storage medium mentioned above may be a read-only memory, a magnetic or optical disk, etc. Although embodiments of the present application have been shown and described above, it is understood that the above embodiments are exemplary and should not be construed as limiting the present application, and that variations, modifications, substitutions and alterations may be made to the above embodiments by those of ordinary skill in the art within the scope of the present application.

Claims

1. A method of knowledge-graph generation, the method comprising the steps of:

and generating the knowledge graph according to the parent-child relationship.

2. The method for generating a knowledge graph according to claim 1, wherein the determining semantic inclusion relationship between search terms according to the text coincidence condition between the search terms or the syntactic structure of the search terms comprises:

searching the maximum substring of each text character in the obtained text characters of each search word;

and determining the search word corresponding to the maximum substring of each text character, wherein semantic inclusion relation exists between the search words corresponding to the corresponding text characters.

3. The method for generating a knowledge graph according to claim 1, wherein the determining semantic inclusion relationship between search terms according to the text coincidence condition between the search terms or the syntactic structure of the search terms comprises:

determining at least one of search words with a fixed-language central-language structure relationship, search words with a parallel relationship and search words with a modification relationship from the search words according to a syntactic structure among the search words;

4. The method for generating a knowledge graph according to claim 1, wherein after determining the parent-child relationship between the nodes according to the semantic inclusion relationship between the search terms, the method further comprises:

determining the information entropy of the search word corresponding to the child node according to the importance of each word in the search word corresponding to the child node; wherein the importance is used for indicating the importance degree of a word to the expressed intention of the search word;

and determining the confidence of the parent-child relationship between the child node and the father node according to the difference degree between the information entropy of the search word corresponding to the child node and the information entropy of the search word corresponding to the father node.

5. The method for generating a knowledge graph according to claim 1, wherein after determining the parent-child relationship between the nodes according to the semantic inclusion relationship between the search terms, the method further comprises:

determining the frequency of the search words corresponding to the father node according to the search words input by the users in the plurality of search sessions, and determining the co-occurrence frequency of the search words corresponding to the father node and the search words corresponding to the child node in the same search session;

6. The method of generating a knowledge graph according to any one of claims 1 to 5, wherein the obtaining of each search term input by a user in a search session containing a plurality of searches further comprises:

and inquiring the search session matched with the knowledge points according to the knowledge points involved in the knowledge graph.

7. The method of generating a knowledge graph according to claim 6, wherein the step of obtaining each search term input by a user in a search session comprising a plurality of searches further comprises:

and screening and reserving the search words matched with the knowledge points from the search words input by the user according to the knowledge points matched with the search session.

8. An apparatus for knowledge-graph generation, the apparatus comprising:

9. A computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the method of knowledge-graph generation as claimed in any one of claims 1 to 7 when executing the program.

10. A non-transitory computer-readable storage medium having stored thereon a computer program, wherein the program, when executed by a processor, implements the method of knowledge-graph generation as claimed in any one of claims 1 to 7.

11. A computer program product, characterized in that instructions in the computer program product, when executed by a processor, perform the method of knowledge-graph generation according to any of claims 1-7.