CN110659368A - Knowledge graph construction method and device, electronic equipment and readable storage medium - Google Patents

Knowledge graph construction method and device, electronic equipment and readable storage medium Download PDF

Info

Publication number
CN110659368A
CN110659368A CN201910897883.3A CN201910897883A CN110659368A CN 110659368 A CN110659368 A CN 110659368A CN 201910897883 A CN201910897883 A CN 201910897883A CN 110659368 A CN110659368 A CN 110659368A
Authority
CN
China
Prior art keywords
entity
program
result
identification result
message queue
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201910897883.3A
Other languages
Chinese (zh)
Inventor
齐云飞
陈栋
梁秀钦
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Mininglamp Software System Co ltd
Original Assignee
Beijing Mininglamp Software System Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Mininglamp Software System Co ltd filed Critical Beijing Mininglamp Software System Co ltd
Priority to CN201910897883.3A priority Critical patent/CN110659368A/en
Publication of CN110659368A publication Critical patent/CN110659368A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/36Creation of semantic tools, e.g. ontology or thesauri
    • G06F16/367Ontology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Databases & Information Systems (AREA)
  • Animal Behavior & Ethology (AREA)
  • Artificial Intelligence (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Machine Translation (AREA)

Abstract

The application provides a knowledge graph construction method, a knowledge graph construction device, electronic equipment and a readable storage medium, wherein the knowledge graph construction method comprises the following steps: separating the text information by using an entity identification program to obtain a plurality of clauses; identifying at least one entity of each clause by using an entity identification program, and labeling each entity with a label so as to obtain an entity identification result; utilizing an entity link program to perform entity link processing on the entity identification result and storing the entity identification result into an entity link message queue; or the entity disambiguation program is used for carrying out entity disambiguation processing on the entity identification result and storing the entity disambiguation result into the entity disambiguation message queue; or using a relation extraction program to perform relation extraction processing on the entity identification result and storing the entity identification result into a relation extraction message queue. After the entity recognition result is obtained, the actions of entity linking, entity disambiguation and relationship extraction are executed at the same time, so that the condition that a subsequent program is idle due to the fact that preorder processing cannot be executed in the sequential execution process is avoided, and the response time of knowledge graph construction is shortened.

Description

Knowledge graph construction method and device, electronic equipment and readable storage medium
Technical Field
The application relates to the field of text processing, in particular to a knowledge graph construction method and device, electronic equipment and a readable storage medium.
Background
In the prior art, when a knowledge graph is constructed, the construction is usually performed according to the sequence of four steps of entity identification, entity disambiguation, relationship extraction and entity linking, each of the four steps is performed by a corresponding execution program, and if one or more of the four execution programs cannot work smoothly, the overall response time of the construction of the knowledge graph is long.
Disclosure of Invention
An embodiment of the present application provides a method and an apparatus for constructing a knowledge graph, an electronic device, and a readable storage medium, so as to solve the problem in the prior art that the overall response time of the construction of the knowledge graph is long.
In a first aspect, an embodiment of the present application provides a method for constructing a knowledge graph, where the method includes: carrying out separation processing on the received text information by using an entity identification program to obtain a plurality of clauses; identifying at least one entity of each clause in a plurality of clauses by using the entity identification program, and labeling each entity in the at least one entity with a label so as to obtain an entity identification result, wherein the entity identification result comprises the processed text information, the at least one entity and the label corresponding to each entity in the at least one entity; utilizing the entity link program to perform entity link processing on the entity identification result and storing a corresponding processing result into an entity link message queue; or the entity disambiguation program is used for carrying out entity disambiguation on the entity identification result, and the corresponding processing result is stored in an entity disambiguation message queue; or the relationship extraction program is used for carrying out relationship extraction processing on the entity identification result and storing the corresponding processing result into a relationship extraction message queue.
In the above embodiment, the entity recognition processing is performed on the text information to obtain the entity recognition result, and the entity link processing, the entity disambiguation processing, and the relationship extraction processing may be performed on the entity recognition result at the same time to obtain the respective processing results. Because the entity identification result is obtained and then the processing is not carried out according to the sequence, but the actions of entity linking, entity disambiguation and relationship extraction are simultaneously executed, compared with the prior art, the condition that the subsequent program is idle due to the fact that the preorder processing cannot be executed in the sequence execution process is avoided, and the response time of knowledge graph construction is shortened.
In one possible design, after obtaining the entity identification result, before performing entity link processing on the entity identification result by using the entity link program and storing a corresponding processing result in an entity link message queue, the method further includes: storing the entity identification result to an entity identification message queue by using the entity identification program; and obtaining an entity identification result from the entity identification message queue by utilizing an entity linking program, an entity disambiguation program or a relation extraction program.
In the above embodiment, after the entity identification result is obtained, the entity identification result may be stored in the entity identification message queue, and then the entity linking program, the entity disambiguation program, or the relationship extraction program obtains the entity identification result from the entity identification message queue, and stores the entity identification result in a fixed storage location, thereby facilitating the rapid calling and processing of various programs, and further improving the processing efficiency.
In one possible design, the method further includes: and splicing the result information of at least two message queues in the entity identification message queue, the entity link message queue, the entity disambiguation message queue and the relationship extraction message queue.
In the above embodiment, according to the needs of the subsequent program, the result information in at least two of the four queues, i.e., the entity identification message queue, the entity link message queue, the entity disambiguation message queue, and the relationship extraction message queue, may be concatenated so as to be acquired and applied by the subsequent program.
In one possible design, the performing, by the entity linking program, entity linking processing on the entity identification result includes: for each entity of a plurality of entities in the entity identification result, a plurality of candidate words corresponding to each entity in the knowledge graph are obtained in a similarity calculation mode; processing each entity and a plurality of candidate words corresponding to each entity by utilizing a first neural network, and screening out a candidate word with the highest similarity with the corresponding entity from the plurality of candidate words, wherein the candidate word with the highest similarity in the plurality of candidate words is used as the identification result of the entity; and establishing a link relation between each entity and the corresponding recognition result.
In the above embodiment, the plurality of candidate words respectively corresponding to each entity are screened in the knowledge graph, and then the candidate word with the highest similarity corresponding to the entity is screened from the plurality of candidate words. The words which obviously do not meet the requirements are eliminated through preliminary screening, and then accurate recognition results are selected from the candidate words, so that the recognition speed is improved while the accuracy is ensured.
In one possible design, the performing, by the entity disambiguation program, entity disambiguation processing on the entity identification result includes: inputting the entity identification result to a second neural network; and replacing pronouns in the entity recognition result with corresponding entities by utilizing the second neural network.
In the above embodiment, the second neural network may be a pre-trained neural network capable of performing pronouns replacement, and then the entity recognition result carrying the pronouns is input into the second neural network, and the pronouns in the entity recognition result are replaced by corresponding entities by the second neural network.
In one possible design, the performing, by the relationship extraction program, a relationship extraction process on the entity identification result includes: inputting the entity recognition result into a third neural network, wherein the entity recognition result comprises at least one entity labeled with a first type of label and at least one entity labeled with a second type of label; and establishing a matching relation between at least one entity marked with the first type of label and at least one entity marked with the second type of label by utilizing the third neural network.
In the foregoing embodiment, the third neural network may be a pre-trained neural network, and the neural network is configured to establish a matching relationship between entities corresponding to different types of labels of the entity identification results carrying different types of label entities.
In a second aspect, an embodiment of the present application provides a knowledge graph building apparatus, including: the separation processing module is used for separating the received text information by utilizing an entity identification program to obtain a plurality of clauses; a tag labeling module, configured to identify at least one entity of each of multiple clauses by using the entity identification program, and label each entity of the at least one entity, so as to obtain an entity identification result, where the entity identification result includes the processed text information, the at least one entity, and the tag corresponding to each entity of the at least one entity; the queue storing module is used for carrying out entity link processing on the entity identification result by utilizing the entity link program and storing a corresponding processing result into an entity link message queue; or the entity disambiguation program is used for carrying out entity disambiguation on the entity identification result, and the corresponding processing result is stored in an entity disambiguation message queue; or the relationship extraction program is used for carrying out relationship extraction processing on the entity identification result and storing the corresponding processing result into a relationship extraction message queue.
In one possible design, the apparatus further includes: the entity identification storage module is used for storing an entity identification result to an entity identification message queue by using the entity identification program; and the identification result acquisition module is used for acquiring an entity identification result from the entity identification message queue by utilizing an entity linking program, an entity disambiguation program or a relationship extraction program.
In one possible design, the apparatus further includes: and the information splicing module is used for splicing the result information of at least two message queues in the entity identification message queue, the entity link message queue, the entity disambiguation message queue and the relationship extraction message queue.
In one possible design, the queue storage module is further configured to obtain, for each entity of the multiple entities in the entity identification result, multiple candidate words corresponding to each entity in the knowledge graph in a similarity calculation manner; processing each entity and a plurality of candidate words corresponding to each entity by utilizing a first neural network, and screening out a candidate word with the highest similarity with the corresponding entity from the plurality of candidate words, wherein the candidate word with the highest similarity in the plurality of candidate words is used as the identification result of the entity; and establishing a link relation between each entity and the corresponding recognition result.
In one possible design, the queue storage module is further configured to input the entity identification result to a second neural network; and replacing pronouns in the entity recognition result with corresponding entities by utilizing the second neural network.
In one possible design, the queue storing module is further configured to input the entity identification result to a third neural network, where the entity identification result includes at least one entity labeled with a first type of label and at least one entity labeled with a second type of label; and establishing a matching relation between at least one entity marked with the first type of label and at least one entity marked with the second type of label by utilizing the third neural network.
In a third aspect, the present application provides an electronic device, comprising: a processor, a memory and a bus, the memory storing machine-readable instructions executable by the processor, the processor and the memory communicating via the bus when the electronic device is running, the machine-readable instructions when executed by the processor performing the method of the first aspect or any of the alternative implementations of the first aspect.
In a fourth aspect, the present application provides a readable storage medium having stored thereon a computer program which, when executed by a processor, performs the method of the first aspect or any of the optional implementations of the first aspect.
In a fifth aspect, the present application provides a computer program product which, when run on a computer, causes the computer to perform the method of the first aspect or any possible implementation manner of the first aspect.
In order to make the aforementioned and other objects, features and advantages of the present invention comprehensible, preferred embodiments accompanied with figures are described in detail below.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are required to be used in the embodiments of the present application will be briefly described below, it should be understood that the following drawings only illustrate some embodiments of the present application and therefore should not be considered as limiting the scope, and that those skilled in the art can also obtain other related drawings based on the drawings without inventive efforts.
FIG. 1 is a flow chart diagram illustrating a method for constructing a knowledge graph according to an embodiment of the present application;
FIG. 2 is a flow chart diagram illustrating one embodiment of a method for knowledge graph construction provided by an embodiment of the present application;
FIG. 3 is a flow chart illustrating one embodiment of step S130;
FIG. 4 shows a schematic flow chart of another embodiment of step S130;
FIG. 5 shows a schematic flow chart of yet another embodiment of step S130;
fig. 6 shows a schematic structural block diagram of a knowledge graph constructing apparatus provided in an embodiment of the present application.
Detailed Description
The technical solutions in the embodiments of the present application will be described below with reference to the drawings in the embodiments of the present application.
Fig. 1 is a flowchart illustrating a specific implementation of a knowledge graph constructing method according to an embodiment of the present application, where the method may be executed by an electronic device, where the electronic device may be a user terminal or a server, and the method specifically includes the following steps S110 to S130:
step S110, the received text information is separated by using an entity identification program to obtain a plurality of clauses.
The text information may be a discussion post or comment on a certain topic on the internet, and for convenience of description, the text information is taken as an example of a comment on a cosmetic product in an e-commerce website.
When the text information is partitioned, the text information may be partitioned according to punctuation marks of the text information, for example, the partitioning method may be: dividing the text information into a plurality of clauses according to punctuations as long as the punctuations appear in the text information; the separation method can also be as follows: and dividing the text information into a plurality of clauses according to commas in the text information. The particular manner in which the textual information is partitioned should not be construed as limiting the application.
Before step S110, the received text information may be cached, and after the cached text information reaches a certain number, the cached text information is transmitted to the entity recognition program, so as to improve the processing efficiency of the text information.
Step S120, identifying at least one entity of each of the multiple clauses by using the entity identification program, and labeling each entity of the at least one entity with a tag, thereby obtaining an entity identification result, where the entity identification result includes the processed text information, the at least one entity, and the tag corresponding to each entity of the at least one entity.
The entity is a real object or concept related to the above topic, and the description is continued with the above example, and the entity may be a specific cosmetic name, a usage scenario of a certain cosmetic, an effect of a certain cosmetic, or the like.
The label is used for reflecting the type of the entity, such as a product type representing a brand or a class of cosmetics, a scene type representing a situation where the cosmetics are used, an effect type representing an effect of the cosmetics used, and a flaw type representing a defect existing in the skin of a user when the cosmetics are not used.
Optionally, for each clause in the multiple clauses, an entity in each clause may be identified using the BERT model, and a corresponding tag may be added to the entity. The processed text information in the entity recognition result is the text information which is subjected to the separation processing and recognized the entity.
Step S130, using the entity link program to perform entity link processing on the entity identification result, and storing the corresponding processing result into an entity link message queue; or the entity disambiguation program is used for carrying out entity disambiguation on the entity identification result, and the corresponding processing result is stored in an entity disambiguation message queue; or the relationship extraction program is used for carrying out relationship extraction processing on the entity identification result and storing the corresponding processing result into a relationship extraction message queue.
After the entity identification result is obtained, the entity linking procedure, the entity disambiguation procedure, and the relationship extraction procedure may be utilized to perform the entity linking processing, the entity disambiguation processing, and the relationship extraction processing on the entity identification result, respectively, so as to obtain respective processing results.
The entity linking process may be to select a result word with the highest similarity to a certain entity in the entity recognition result from the knowledge graph, and then establish a linking relationship between the entity and the result word. The entity disambiguation program may replace all pronouns in the textual information with entities corresponding to the pronouns. The relationship extraction program can establish matching relationship between entities belonging to different labels according to the labels.
Because the entity identification result is obtained and then the processing is not carried out according to the sequence, but the actions of entity linking, entity disambiguation and relationship extraction are simultaneously executed, compared with the prior art, the condition that the subsequent program is idle due to the fact that the preorder processing cannot be executed in the sequence execution process is avoided, and the response time of knowledge graph construction is shortened.
Referring to fig. 2, fig. 2 shows a specific implementation of the method for constructing a knowledge graph according to the embodiment of the present application, which specifically includes the following steps S110 to S140:
step S110, the received text information is separated by using an entity identification program to obtain a plurality of clauses.
Step S120, identifying at least one entity of each of the multiple clauses by using the entity identification program, and labeling each entity of the at least one entity with a tag, thereby obtaining an entity identification result, where the entity identification result includes the processed text information, the at least one entity, and the tag corresponding to each entity of the at least one entity.
Steps S110 to S120 shown in fig. 2 are the same as steps S110 to S120 shown in fig. 1, and are not repeated herein.
Step S121, storing the entity identification result to the entity identification message queue by using the entity identification program.
Step S122, an entity identification result is obtained from the entity identification message queue by utilizing an entity linking program, an entity disambiguation program or a relationship extraction program.
Optionally, the entity recognition results may be stored in kafka in new-topic form. Wherein, kafka is an entity identification program, and new-topic is an entity identification message queue. The entity linker, entity disambiguator, and relationship extractor may all pull the entity identification results from the new-topic.
After the entity identification result is obtained, the entity identification result is stored in the entity identification message queue, then the entity link program, the entity disambiguation program or the relationship extraction program all obtain the entity identification result from the entity identification message queue, and the entity identification result is stored in a fixed storage position, so that various programs can be conveniently and rapidly called and processed, and the processing efficiency is further improved.
Step S130, using the entity link program to perform entity link processing on the entity identification result, and storing the corresponding processing result into an entity link message queue; or the entity disambiguation program is used for carrying out entity disambiguation on the entity identification result, and the corresponding processing result is stored in an entity disambiguation message queue; or the relationship extraction program is used for carrying out relationship extraction processing on the entity identification result and storing the corresponding processing result into a relationship extraction message queue.
Step S140, splicing the result information of at least two message queues of the entity identification message queue, the entity link message queue, the entity disambiguation message queue and the relationship extraction message queue.
The entity identification message queue, the entity link message queue, the entity disambiguation message queue and the relationship extraction message queue are different queues storing different result information, and the result information in at least two of the four queues, namely the entity identification message queue, the entity link message queue, the entity disambiguation message queue and the relationship extraction message queue, can be spliced together according to the requirements of a subsequent program so as to be acquired and applied by the subsequent program.
Referring to fig. 3, fig. 3 shows specific steps of performing entity linking processing on the entity identification result by using the entity linking program in step S130, and specifically includes the following steps S131 to S133:
step S131, for each entity of the plurality of entities in the entity recognition result, a plurality of candidate words corresponding to each entity in the knowledge graph are obtained in a similarity calculation mode.
The similarity calculation can be performed by TF-IDF, Jacard similarity, and the like. Optionally, a plurality of entity information in the knowledge graph may be all imported into an elastic search, an index of the entity information in the knowledge graph is made, and then similarity calculation between the entity in the entity recognition result and the entity information in the knowledge graph is performed, so as to obtain a plurality of candidate words corresponding to each entity in the entity recognition result in the knowledge graph.
It is not assumed that the entity identification result includes entity a, entity B, and entity C. For the entity A, three candidate words A1, A2 and A3 corresponding to the entity A in the knowledge graph can be obtained through similarity calculation; for the entity B, two candidate words B1 and B2 corresponding to the entity B in the knowledge graph can be obtained; for entity C, four candidate words C1, C2, C3 and C4 corresponding to entity C in the knowledge graph can be obtained.
Step S132, each entity and the plurality of candidate words corresponding to each entity are processed by using the first neural network, a candidate word with the highest similarity to the corresponding entity is screened from the plurality of candidate words, and the candidate word with the highest similarity in the plurality of candidate words is used as the recognition result of the entity.
The first neural network is a neural network trained in advance, optionally, a certain entity and a plurality of candidate words corresponding to the entity may be input into the first neural network, then the first neural network may output a numerical value of similarity between each candidate word in the plurality of candidate words corresponding to the entity and the entity, and the user selects the candidate word with the highest similarity as the recognition result according to the numerical value of similarity. Optionally, a certain entity and a plurality of candidate words corresponding to the entity may also be input into the first neural network, and then the first neural network directly outputs the recognition result corresponding to the entity.
The description is continued in the above example: for entity B, entity B and candidate words B1 and B2 corresponding to entity B may be both input into the first neural network, and then the similarity value between the first neural network output B1 and entity B is 70%, and the similarity value between B2 and entity B is 65%, and then the user may select the recognition result corresponding to entity B according to the presented similarity value, and since the similarity value of B1 is greater than B2, the user selects B1 as the recognition result corresponding to entity B.
Alternatively, the entity B and the candidate words B1 and B2 corresponding to the entity B may be input into the first neural network, and then the first neural network directly outputs the recognition result corresponding to the entity B, where the recognition result may be B1 or B2, and the output recognition result is not assumed to be B1.
In one embodiment, the entity may be input into the first neural network together with the context, and the first neural network combines the context of the entity to select a corresponding recognition result from a plurality of candidate words that the entity may correspond to.
For example, for the entity "apple" it may refer to a fruit and may also refer to a technology company, and thus, the entity "apple" may be input into the first neural network along with the context "apple recently released a new phone," and the first neural network may determine that, for the entity "apple" herein, it represents a technology company.
Step S133, establishing a link relationship between each entity and the corresponding recognition result.
After the entity B and the corresponding recognition result B1 are obtained through the first neural network, the link relation between the entity B and the entity B1 is established; the link relations between the entity A and the entity C and the corresponding recognition results can be respectively established in the above manner.
The method comprises the steps of screening a plurality of candidate words corresponding to each entity in a knowledge graph, and then screening a candidate word with the highest similarity corresponding to the entity from the candidate words. The words which obviously do not meet the requirements are eliminated through preliminary screening, and then accurate recognition results are selected from the candidate words, so that the recognition speed is improved while the accuracy is ensured.
Referring to fig. 4, fig. 4 shows specific steps of performing entity disambiguation processing on the entity identification result by using the entity disambiguation program in step S130, specifically including the following steps S231 to S232:
step S231, inputting the entity recognition result to a second neural network.
Step S232, replacing pronouns in the entity recognition result with corresponding entities by utilizing the second neural network.
Alternatively, the text information may be input into the second neural network, the text information containing pronouns including it/he/she, it, the vehicle, etc. The second neural network may replace each pronoun of the plurality of pronouns in the textual information with an apparent entity.
Referring to fig. 5, fig. 5 shows specific steps of performing the relationship extraction processing on the entity identification result by using the relationship extraction program in step S130, and specifically includes the following steps S331 to S332:
step S331, inputting the entity identification result to a third neural network, where the entity identification result includes at least one entity labeled with the first type label and at least one entity labeled with the second type label.
Step S332, establishing a matching relationship between at least one entity labeled with the first type of label and at least one entity labeled with the second type of label by using the third neural network.
The third neural network may be a neural network obtained by pre-training relationships between entities matching different labels. It is not assumed that the first type of label is a product type, the second type of label is an effect type, the product type includes two products, namely a product a and a product b, and the effect type includes two effects, namely effect 1 and effect 2. Inputting the text information with the label, which comprises the product a, the product b, the effects 1 and 2, into a third neural network, and establishing a matching relationship between the product and the effects by the third neural network. Alternatively, it is possible that product a matches effect 1 and product b matches effect 2; it is also possible that product b matches effect 1 and product a matches effect 2.
In one embodiment, if the electronic device receives user input:
"I stay up all night often, and the skin is yellow, greasy, want to participate in friend wedding recently please recommend suitable my cosmetics, can whiten, beautiful point best. After the information is input, the scene type can be obtained through an entity recognition program: a wedding; flaw type: staying up all night, yellowing and greasiness; effect type: whitening. The electronic equipment can obtain at least one cosmetic class name which is matched with the scene type of wedding, the flaw type of staying up, yellowing and greasiness and the effect type of whitening according to the matching relation between different labels obtained by the relation extraction program, and displays the matched cosmetic class name to a user.
In the knowledge graph construction method provided by the embodiment of the application, the entity linking program, the entity disambiguation program and the relationship extraction program can be processed in parallel, so that the response time is greatly saved. For a sub-execution program, if there is a large difference between the response time and throughput of the sub-execution program and other execution programs, only the operation mode of the sub-execution program can be modified, or the hardware configuration corresponding to the sub-execution program can be increased, without modifying the whole system, thereby saving the repair time and cost.
Referring to fig. 6, fig. 6 shows a specific implementation of an audio data transmission apparatus provided in an embodiment of the present application, where the apparatus 600 includes:
the separation processing module 610 is configured to perform separation processing on the received text information by using an entity identification program to obtain a plurality of clauses.
A tag labeling module 620, configured to identify at least one entity of each of the multiple clauses by using the entity identification program, and label each entity of the at least one entity, so as to obtain an entity identification result, where the entity identification result includes the processed text information, the at least one entity, and the tag corresponding to each entity of the at least one entity.
A queue storing module 630, configured to perform entity link processing on the entity identification result by using the entity link program, and store a corresponding processing result in an entity link message queue; or the entity disambiguation program is used for carrying out entity disambiguation on the entity identification result, and the corresponding processing result is stored in an entity disambiguation message queue; or the relationship extraction program is used for carrying out relationship extraction processing on the entity identification result and storing the corresponding processing result into a relationship extraction message queue.
The queue storage module 630 is further configured to obtain, for each entity of the multiple entities in the entity identification result, multiple candidate words corresponding to each entity in the knowledge graph in a similarity calculation manner; processing each entity and a plurality of candidate words corresponding to each entity by utilizing a first neural network, and screening out a candidate word with the highest similarity with the corresponding entity from the plurality of candidate words, wherein the candidate word with the highest similarity in the plurality of candidate words is used as the identification result of the entity; and establishing a link relation between each entity and the corresponding recognition result.
The queue storing module 630 is further configured to input the entity identification result to a second neural network; and replacing pronouns in the entity recognition result with corresponding entities by utilizing the second neural network.
The queue storing module 630 is further configured to input the entity identification result into a third neural network, where the entity identification result includes at least one entity labeled with a first type of label and at least one entity labeled with a second type of label; and establishing a matching relation between at least one entity marked with the first type of label and at least one entity marked with the second type of label by utilizing the third neural network.
The device further comprises:
and the entity identification storage module is used for storing the entity identification result to the entity identification message queue by using the entity identification program.
And the identification result acquisition module is used for acquiring an entity identification result from the entity identification message queue by utilizing an entity linking program, an entity disambiguation program or a relationship extraction program.
And the information splicing module is used for splicing the result information of at least two message queues in the entity identification message queue, the entity link message queue, the entity disambiguation message queue and the relationship extraction message queue.
The knowledge graph construction device provided by the embodiment of the application is the same as the knowledge graph construction method in the foregoing, and details are not repeated here.
The present application also provides a readable storage medium having stored thereon a computer program which, when executed by a processor, performs the method of the method embodiments.
The present application also provides a computer program product which, when run on a computer, causes the computer to perform the method of the method embodiments.
In the embodiments provided in the present application, it should be understood that the disclosed apparatus and method may be implemented in other ways. The above-described embodiments of the apparatus are merely illustrative, and for example, the division of the units is only one logical division, and there may be other divisions when actually implemented, and for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection of devices or units through some communication interfaces, and may be in an electrical, mechanical or other form.
In addition, units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.
Furthermore, the functional modules in the embodiments of the present application may be integrated together to form an independent part, or each module may exist separately, or two or more modules may be integrated to form an independent part.
In this document, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions.
The above description is only an example of the present application and is not intended to limit the scope of the present application, and various modifications and changes may be made by those skilled in the art. Any modification, equivalent replacement, improvement and the like made within the spirit and principle of the present application shall be included in the protection scope of the present application.

Claims (10)

1. A method of knowledge graph construction, the method comprising:
carrying out separation processing on the received text information by using an entity identification program to obtain a plurality of clauses;
identifying at least one entity of each clause in a plurality of clauses by using the entity identification program, and labeling each entity in the at least one entity with a label so as to obtain an entity identification result, wherein the entity identification result comprises the processed text information, the at least one entity and the label corresponding to each entity in the at least one entity;
utilizing an entity link program to perform entity link processing on the entity identification result, and storing a corresponding processing result into an entity link message queue; or
Utilizing an entity disambiguation program to perform entity disambiguation on the entity identification result, and storing a corresponding processing result into an entity disambiguation message queue; or
And utilizing a relation extraction program to perform relation extraction processing on the entity identification result, and storing the corresponding processing result into a relation extraction message queue.
2. The method according to claim 1, wherein after obtaining the entity identification result, before performing entity link processing on the entity identification result by using the entity linking program and storing a corresponding processing result in an entity link message queue, the method further comprises:
storing the entity identification result to an entity identification message queue by using the entity identification program;
and obtaining an entity identification result from the entity identification message queue by utilizing an entity linking program, an entity disambiguation program or a relation extraction program.
3. The method of claim 2, further comprising:
and splicing the result information of at least two message queues in the entity identification message queue, the entity link message queue, the entity disambiguation message queue and the relationship extraction message queue.
4. The method according to claim 1, wherein the performing entity link processing on the entity identification result by using the entity link program includes:
for each entity of a plurality of entities in the entity identification result, a plurality of candidate words corresponding to each entity in the knowledge graph are obtained in a similarity calculation mode;
processing each entity and a plurality of candidate words corresponding to each entity by utilizing a first neural network, and screening out a candidate word with the highest similarity with the corresponding entity from the plurality of candidate words, wherein the candidate word with the highest similarity in the plurality of candidate words is used as the identification result of the entity;
and establishing a link relation between each entity and the corresponding recognition result.
5. The method of claim 1, wherein said entity disambiguating said entity identification result using said entity disambiguation program comprises:
inputting the entity identification result to a second neural network;
and replacing pronouns in the entity recognition result with corresponding entities by utilizing the second neural network.
6. The method according to claim 1, wherein the performing, by the relationship extraction program, a relationship extraction process on the entity identification result includes:
inputting the entity recognition result into a third neural network, wherein the entity recognition result comprises at least one entity labeled with a first type of label and at least one entity labeled with a second type of label;
and establishing a matching relation between at least one entity marked with the first type of label and at least one entity marked with the second type of label by utilizing the third neural network.
7. An apparatus for knowledge-graph construction, the apparatus comprising:
the separation processing module is used for separating the received text information by utilizing an entity identification program to obtain a plurality of clauses;
a tag labeling module, configured to identify at least one entity of each of multiple clauses by using the entity identification program, and label each entity of the at least one entity, so as to obtain an entity identification result, where the entity identification result includes the processed text information, the at least one entity, and the tag corresponding to each entity of the at least one entity;
the queue storing module is used for carrying out entity link processing on the entity identification result by utilizing an entity link program and storing a corresponding processing result into an entity link message queue; or
Utilizing an entity disambiguation program to perform entity disambiguation on the entity identification result, and storing a corresponding processing result into an entity disambiguation message queue; or
And utilizing a relation extraction program to perform relation extraction processing on the entity identification result, and storing the corresponding processing result into a relation extraction message queue.
8. The apparatus of claim 7, further comprising:
the entity identification storage module is used for storing an entity identification result to an entity identification message queue by using the entity identification program;
and the identification result acquisition module is used for acquiring an entity identification result from the entity identification message queue by utilizing an entity linking program, an entity disambiguation program or a relationship extraction program.
9. An electronic device, comprising: a processor, a storage medium and a bus, the storage medium storing machine-readable instructions executable by the processor, the processor and the storage medium communicating over the bus when the electronic device is operating, the processor executing the machine-readable instructions to perform the method of any one of claims 1-6 when executed.
10. A readable storage medium, having stored thereon a computer program which, when executed by a processor, performs the method of any one of claims 1-6.
CN201910897883.3A 2019-09-20 2019-09-20 Knowledge graph construction method and device, electronic equipment and readable storage medium Pending CN110659368A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910897883.3A CN110659368A (en) 2019-09-20 2019-09-20 Knowledge graph construction method and device, electronic equipment and readable storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910897883.3A CN110659368A (en) 2019-09-20 2019-09-20 Knowledge graph construction method and device, electronic equipment and readable storage medium

Publications (1)

Publication Number Publication Date
CN110659368A true CN110659368A (en) 2020-01-07

Family

ID=69038350

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910897883.3A Pending CN110659368A (en) 2019-09-20 2019-09-20 Knowledge graph construction method and device, electronic equipment and readable storage medium

Country Status (1)

Country Link
CN (1) CN110659368A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112735543A (en) * 2020-12-30 2021-04-30 杭州依图医疗技术有限公司 Medical data processing method and device and storage medium
CN112749277A (en) * 2020-12-30 2021-05-04 杭州依图医疗技术有限公司 Medical data processing method and device and storage medium
CN113204643A (en) * 2021-06-23 2021-08-03 北京明略软件***有限公司 Entity alignment method, device, equipment and medium

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2015080558A1 (en) * 2013-11-27 2015-06-04 Mimos Berhad A method and system for automated entity recognition
CN106295796A (en) * 2016-07-22 2017-01-04 浙江大学 Entity link method based on degree of depth study
CN107358315A (en) * 2017-06-26 2017-11-17 深圳市金立通信设备有限公司 A kind of information forecasting method and terminal
US20180189265A1 (en) * 2015-06-26 2018-07-05 Microsoft Technology Licensing, Llc Learning entity and word embeddings for entity disambiguation
CN108304552A (en) * 2018-02-01 2018-07-20 浙江大学 A kind of name entity link method that knowledge based planting modes on sink characteristic extracts
CN109241294A (en) * 2018-08-29 2019-01-18 国信优易数据有限公司 A kind of entity link method and device
CN109388803A (en) * 2018-10-12 2019-02-26 北京搜狐新动力信息技术有限公司 Chinese word cutting method and system
CN109885698A (en) * 2019-02-13 2019-06-14 北京航空航天大学 A kind of knowledge mapping construction method and device, electronic equipment

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2015080558A1 (en) * 2013-11-27 2015-06-04 Mimos Berhad A method and system for automated entity recognition
US20180189265A1 (en) * 2015-06-26 2018-07-05 Microsoft Technology Licensing, Llc Learning entity and word embeddings for entity disambiguation
CN106295796A (en) * 2016-07-22 2017-01-04 浙江大学 Entity link method based on degree of depth study
CN107358315A (en) * 2017-06-26 2017-11-17 深圳市金立通信设备有限公司 A kind of information forecasting method and terminal
CN108304552A (en) * 2018-02-01 2018-07-20 浙江大学 A kind of name entity link method that knowledge based planting modes on sink characteristic extracts
CN109241294A (en) * 2018-08-29 2019-01-18 国信优易数据有限公司 A kind of entity link method and device
CN109388803A (en) * 2018-10-12 2019-02-26 北京搜狐新动力信息技术有限公司 Chinese word cutting method and system
CN109885698A (en) * 2019-02-13 2019-06-14 北京航空航天大学 A kind of knowledge mapping construction method and device, electronic equipment

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112735543A (en) * 2020-12-30 2021-04-30 杭州依图医疗技术有限公司 Medical data processing method and device and storage medium
CN112749277A (en) * 2020-12-30 2021-05-04 杭州依图医疗技术有限公司 Medical data processing method and device and storage medium
CN112749277B (en) * 2020-12-30 2023-08-04 杭州依图医疗技术有限公司 Medical data processing method, device and storage medium
CN113204643A (en) * 2021-06-23 2021-08-03 北京明略软件***有限公司 Entity alignment method, device, equipment and medium

Similar Documents

Publication Publication Date Title
CN110377740B (en) Emotion polarity analysis method and device, electronic equipment and storage medium
US11030405B2 (en) Method and device for generating statement
CN110659368A (en) Knowledge graph construction method and device, electronic equipment and readable storage medium
US11741094B2 (en) Method and system for identifying core product terms
CN112699645B (en) Corpus labeling method, apparatus and device
CN114861677B (en) Information extraction method and device, electronic equipment and storage medium
CN113590776A (en) Text processing method and device based on knowledge graph, electronic equipment and medium
CN113657100A (en) Entity identification method and device, electronic equipment and storage medium
CN112182178A (en) Intelligent question answering method, device, equipment and readable storage medium
CN115982376A (en) Method and apparatus for training models based on text, multimodal data and knowledge
CN110059172B (en) Method and device for recommending answers based on natural language understanding
CN112926308A (en) Method, apparatus, device, storage medium and program product for matching text
CN116150339A (en) Dialogue method, dialogue device, dialogue equipment and dialogue storage medium
CN113190746B (en) Recommendation model evaluation method and device and electronic equipment
CN109829033B (en) Data display method and terminal equipment
CN112148841A (en) Object classification and classification model construction method and device
CN116956068A (en) Intention recognition method and device based on rule engine, electronic equipment and medium
CN114818736B (en) Text processing method, chain finger method and device for short text and storage medium
CN111753056A (en) Information pushing method and device, computing equipment and computer readable storage medium
CN116543798A (en) Emotion recognition method and device based on multiple classifiers, electronic equipment and medium
CN114036397B (en) Data recommendation method, device, electronic equipment and medium
CN111782850A (en) Object searching method and device based on hand drawing
CN113360672B (en) Method, apparatus, device, medium and product for generating knowledge graph
CN114297380A (en) Data processing method, device, equipment and storage medium
CN116775815A (en) Dialogue data processing method and device, electronic equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20200107