CN113268607A

CN113268607A - Knowledge graph construction method and device

Info

Publication number: CN113268607A
Application number: CN202110586751.6A
Authority: CN
Inventors: 侯磊; 刘丁枭; 李涓子; 张鹏; 唐杰; 许斌
Original assignee: Tsinghua University
Current assignee: Tsinghua University
Priority date: 2021-05-27
Filing date: 2021-05-27
Publication date: 2021-08-17

Abstract

The invention provides a knowledge graph construction method and a device, wherein the method comprises the steps of carrying out entity linking, keyword extraction and named entity identification on original data to obtain corresponding results; entity combination is carried out on the corresponding results to obtain an entity list; taking the entity in the entity list as a keyword to perform text retrieval on the original data to obtain a text containing the entity; processing all texts containing the entity to obtain first entity related information representing the entity; inputting the entities in the entity list into the background knowledge graph as key words to obtain second entity related information of the entities in the background knowledge graph; and fusing the entity list, the first entity related information and the second entity related information of the entity in the background knowledge graph to obtain a new knowledge graph. The invention realizes the construction of the knowledge graph through the steps, and the new knowledge graph and the background knowledge graph have association and can be used for updating the knowledge graph.

Description

Knowledge graph construction method and device

Technical Field

The invention relates to the technical field of computers, in particular to a method and a device for constructing a knowledge graph.

Background

The knowledge graph is a concept formally proposed in 2012, and the main purpose of the knowledge graph is to enhance the search efficiency and improve the user experience in the era of high-speed internet development and explosive network data growth. The knowledge graph establishes a foundation for intelligent information application by virtue of excellent semantic processing technology and interconnectivity, is widely applied to the aspects of search, question answering, information analysis and the like, and promotes the development of information technology from information service to knowledge service. In recent years, all walks of life are researching and applying the knowledge map to the professional field and better serve the specific field. However, at present, knowledge maps are basically constructed directly and then put into use, and dynamic updating and methods for expanding from the use process are rarely adopted.

Disclosure of Invention

The invention provides a method and a device for constructing a knowledge graph, which are used for solving the defect that the knowledge graph cannot be updated and expanded in use in the prior art and realizing the acquisition of the knowledge graph which can be used for updating and expanding a background knowledge graph.

In a first aspect, the present invention provides a method for constructing a knowledge graph, comprising:

acquiring original data and a background knowledge map;

carrying out entity linking, keyword extraction and named entity identification on the original data to obtain an entity linking result, a keyword extraction result and a named entity identification result;

entity combination is carried out on the entity link result, the keyword extraction result and the named entity identification result, and an entity list is obtained;

taking the entity in the entity list as a keyword to perform text retrieval on the original data to obtain a text containing the entity;

processing all texts containing the entity to obtain first entity related information representing the entity;

inputting the entities in the entity list into a background knowledge graph as key words to obtain second entity related information of the entities;

and fusing the entity list, the first entity related information and the second entity related information of the entities to obtain a new knowledge graph.

Further, the method for constructing a knowledge graph provided by the present invention, wherein the processing all texts containing the entity to obtain the first entity related information representing the entity specifically includes:

performing relation extraction on the text containing the entity to obtain a triple;

clustering the triples and then sequencing the triples, and taking n triples before sequencing as a first entity attribute, wherein n is an integer greater than or equal to 1;

and clustering the texts containing the entities, then sorting the texts, and taking m texts before sorting as first text information representing the entities, wherein m is an integer greater than or equal to 1.

Further, the method for constructing a knowledge graph provided by the present invention, wherein the information related to the second entity includes:

entity classification information of the entity in the background knowledge map, second entity attributes and attribute values thereof and entity related information;

further, the method for constructing the knowledge graph provided by the invention, wherein the method for extracting the keywords from the original data comprises the following steps:

and obtaining a keyword word list and the position and importance of each word in the original data by adopting a word frequency-inverse document frequency algorithm TF-IDF, TextRank, grammar rule-based latent semantic analysis LSA and a latent semantic retrieval LSI method.

Further, the method for constructing a knowledge graph, provided by the invention, includes the following steps of performing entity combination on the entity link result, the keyword extraction result and the named entity recognition result to obtain an entity list:

determining a union set of the entity link result, the keyword extraction result and the named entity identification result as an entity merging result;

and determining the intersection of the entity link result, the keyword extraction result and the named entity identification result as an entity combination result.

Further, the method for constructing a knowledge graph provided by the present invention, wherein the extracting of the relationship of all texts containing the entities includes one or more of the following steps:

extracting the relationship of chapter level of the plurality of texts;

sentence-level relation extraction is carried out on the plurality of texts;

and performing entity relationship expansion on an entity link result obtained by the plurality of texts from the entity link.

Further, the method for constructing a knowledge graph, provided by the invention, comprises the following steps of clustering all texts containing the entities to obtain a plurality of clustered texts:

and performing K-means clustering, density-based clustering, mean shift clustering or hierarchical clustering on all texts containing the entities to obtain a plurality of texts.

In a second aspect, the present invention provides a knowledge-graph constructing apparatus, comprising:

the first processing module is used for acquiring original data and a large-scale knowledge map;

the second processing module is used for carrying out entity linking, keyword extraction and named entity identification on the original data to obtain an entity linking result, a keyword extraction result and a named entity identification result;

the third processing module is used for carrying out entity combination on the entity link result, the keyword extraction result and the named entity identification result to obtain an entity list;

the fourth processing module is used for performing text retrieval on the original data by taking the entities in the entity list as keywords to obtain all texts containing the entities;

a fifth processing module, configured to process all texts including the entity to obtain first entity-related information representing the entity;

a sixth processing module, configured to input an entity in the entity list as a keyword into a background knowledge graph, so as to obtain second entity-related information of the entity in the background knowledge graph;

and the seventh processing module is used for fusing the entity list, the first entity related information and the second entity related information of the entity in the background knowledge graph to obtain a new knowledge graph.

In a third aspect, the present invention provides an electronic device, including a memory, a processor, and a computer program stored in the memory and executable on the processor, wherein the processor implements the steps of any one of the above-mentioned knowledge graph building methods when executing the program.

In a fourth aspect, the present invention provides a non-transitory computer readable storage medium, on which a computer program is stored, wherein the computer program, when executed by a processor, implements the steps of the method for constructing a knowledge graph according to any one of the above.

According to the method and the device for constructing the knowledge graph, original data and a background knowledge graph are obtained; carrying out entity linking, keyword extraction and named entity identification on the original data to obtain an entity linking result, a keyword extraction result and a named entity identification result; entity combination is carried out on the entity link result, the keyword extraction result and the named entity identification result, and an entity list is obtained; taking the entity in the entity list as a keyword to perform text retrieval on the original data to obtain a text containing the entity; processing all texts containing the entity to obtain first entity related information representing the entity; inputting the entities in the entity list into a background knowledge graph as key words to obtain second entity related information of the entities in the background knowledge graph; and fusing the entity list, the first entity related information and the second entity related information of the entity in the background knowledge graph to obtain a new knowledge graph. Because the knowledge graph is a database consisting of concepts and entities, a new knowledge graph can be obtained by fusing the data of the concept layer and the data of the entity layer. That is, the present invention obtains a new knowledge-graph through the above steps that can be used for dynamic updates to the knowledge-graph and extensions during use.

Drawings

In order to more clearly illustrate the technical solutions of the present invention or the prior art, the drawings needed for the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and those skilled in the art can also obtain other drawings according to the drawings without creative efforts.

FIG. 1 is one of the flow diagrams of the construction method of the knowledge-graph provided by the present invention;

FIG. 2 is a second schematic flow chart of the method for constructing a knowledge graph according to the present invention;

FIG. 3 is a schematic structural diagram of a knowledge graph constructing device provided by the present invention;

fig. 4 is a schematic structural diagram of an electronic device provided in the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention clearer, the technical solutions of the present invention will be clearly and completely described below with reference to the accompanying drawings, and it is obvious that the described embodiments are some, but not all embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

In the following, a first aspect of an embodiment of the present invention is described with reference to fig. 1 to 2, and the present invention provides a method for constructing a knowledge graph, including:

step 100, acquiring original data and a background knowledge map;

the original data (data) is used for constructing the knowledge graph and can be text, video, voice and the like or a mixture of several formats; and the background knowledge graph, namely the large-scale knowledge graph, refers to a large-scale knowledge graph of single language or cross-language fusion of the whole field/a certain field. The knowledge graph refers to a database for storing knowledge, and triples (such as player E, place of birth, city S) and the like are stored in the database, and each triplet represents a fact. The knowledge-graph can also be viewed as a graph, with the triplets described above, with player E and city S being nodes, and with the place of birth being a tagged edge pointing toward city S. The existing large-scale knowledge maps may be selected from XLORE, CYC common sense repositories, multilingual Wikipedia (Wikipedia), Dbpedia, Freebase, YAGO, Wikidata, Nell, Probase, BabelNet, ConceptNet, schema. According to the embodiment of the invention, original data and the large-scale knowledge graph are obtained, so that preparation is made for constructing a new knowledge graph and updating the large-scale knowledge graph. For example, in constructing the BJ city travel graph, the relevant document of "BJ city travel net" is selected to include text, pictures, videos, etc. as the original data, and XLORE is selected as the large-scale knowledge graph to be used.

Step 200, performing entity linking, keyword extraction and named entity identification on the original data to obtain an entity linking result, a keyword extraction result and a named entity identification result;

entity linking is the mapping of certain strings in a piece of text to corresponding entities in the knowledge base. Given a document and a knowledge base, entity linking aims to identify all entity mentions in the text and find the corresponding entity of each entity mention in the knowledge base, and if the knowledge base does not include the entity referred to by the entity mention, the mapping to an empty entity is required. The entity linking task is generally divided into three steps, entity discovery, candidate entity generation, and candidate entity disambiguation. Entity discovery aims to identify all entity mentions in a document, and candidate entity generation finds a knowledge base entity to which each entity mention may refer, called a candidate entity set. Candidate entity disambiguation is then determining the knowledge base entity to which the entity reference refers. For example, entity linking of original text based on all data in XLORE can result in a key vocabulary therein, e.g., "some C subsea world is a subsea theme park edutainment. There are 8000 tons of water, 6000 tails of rare marine life, 10 huge and violent sharks, 120 meters of super-long full-sight submarine tunnel, and a remarkable submarine trip! The sentence can obtain the link results of entities such as a certain C seabed world, a theme park, a rare marine organism, shark and the like.

Keyword extraction is to extract some words from the text which are most relevant to the meaning of the article. The keyword extraction is to extract words in the documents which have important meanings for understanding core information of the documents, and the words are used for helping users to understand main information of the texts. For example, the keywords are extracted from all texts, and the keywords can be "a certain museum in palace A", "a certain bridge", and the like.

In addition, named entity recognition refers to extracting all entities in a document, wherein the entities represent concrete things in the real world or abstract concepts. Such as people, institutions, places, or "machine learning," "artificial intelligence," and the like. Distinctions from "named entities" in most research, which are referred to herein include named entities (primarily people, organizations, places), common entities (e.g., movies, books, songs, cultural customs, food, materials, etc.), and abstract concepts (concepts of intangible form that arise from human abstract thinking). In the knowledge base, one entity may correspond to multiple concepts, e.g., player A belongs to both the category "basketball player" and the category "some medal acquirer" in the Wikipedia. The entity identification refers to identifying entities with specific meanings in the text, and mainly comprises a person name, a place name, an organization name, a proper noun and the like. For example, the original document is subjected to entity identification by a rule-based method, and organization organizations such as "Q university", "R university", etc., names such as "historical people a", "historical people B", "historical people C", etc., and locations such as "a street", "a train station", etc. can be obtained.

Step 300, carrying out entity combination on the entity link result, the keyword extraction result and the named entity identification result to obtain an entity list;

because words obtained from the entity linking result, the keyword extraction result and the named entity recognition result are schematic, the entities are integrated, and an entity list consisting of a plurality of entities can be obtained.

Step 400, performing text retrieval on the original data by taking the entities in the entity list as keywords to obtain texts containing the entities;

and inputting the entity obtained in the step as a keyword into the original data for searching, and obtaining a text containing the searched entity.

For example, the text related to all entities in the entity list in the original data is retrieved according to the entity list obtained by merging. For example, a "certain C seabed world" related text "may be available that a certain C seabed world is a seabed theme park edutainment. "drive the car to go to west, namely to a seabed world C. 'and' the seabed world of a certain C of the BJ city workers use 'education, entertainment and environmental responsibility' as the management purpose, and the magical and beautiful seabed world is shown to tourists through high technology. "and the like. For another example, the 'A palace in the BJ city' can be established on the basis of the palace in the capital of Yuan Dynasty, namely 1406 years (four years for Yong), 1420 years (eighteen years for Yong) and 14 years. "and" a certain palace museum in BJ city A is a Chinese comprehensive museum, which is established in 1925, 10 months and 10 days, and is located in a certain palace A in BJ city M, and the collection items include but are not limited to Ming dynasty, Qing dynasty and their collection. "and" a certain palace A of BJ city begins to be built in four years (1406 years) of Yongle of Mingchang, and is built by taking a certain palace A of Nanjing as a blue book, and is built in eighteen years (1420 years) of Yongle, and becomes a emperor palace of Mingqing, two and twenty-four emperors. ' wait for text

Step 500, processing all texts containing the entity to obtain first entity related information representing the entity;

step 600, inputting the entities in the entity list into a background knowledge graph as keywords to obtain second entity related information of the entities;

specifically, the background knowledge graph provides support for knowledge acquisition, that is, an entity in entity acquisition in knowledge acquisition is taken as a keyword to be input into the background knowledge graph, and entity related information of the entity in the large-scale knowledge graph is acquired.

Step 700, the entity list, the first entity related information, and the second entity related information of the entity are fused to obtain a new knowledge graph.

The fusion means that knowledge from different sources is formed into a unified knowledge representation and association. And obtaining a new knowledge graph by using the entity list, the first entity related information and the second entity related information obtained in the step.

And then, updating the background knowledge graph by using the obtained new knowledge graph. This is because the information in the new knowledge graph is derived from two aspects, namely, the original data and the large-scale knowledge graph, and therefore, the triples existing in the obtained new knowledge graph may not exist in the large-scale knowledge graph, and therefore, the large-scale knowledge graph can be updated by using the obtained new knowledge graph. For example, the concept layer and the example layer data obtained by the original data in the 'BJ city tourism network' through a knowledge modeling and knowledge acquisition tool selectively realize the update of the large-scale knowledge map data according to the confidence level.

Further, in the embodiment provided by the present invention, the method for constructing a knowledge graph, wherein the processing all texts including the entity to obtain the first entity-related information representing the entity specifically includes:

clustering the triples, then sorting the triples, and taking n triples before sorting as a first entity attribute, wherein n is an integer greater than or equal to 1;

In particular, relationship extraction is intended to automatically identify relationships between entities from text. Where the triplet represents a fact, such as a triplet (player E, place of birth, city S), player E and city S are nodes, and the place of birth is a directed labeled edge where player E points to city S. Where player E and city S are nodes and where the place of birth is a tagged edge where player E points to city S. Clustering, as used herein, refers to the process of dividing a collection of physical or abstract objects into classes composed of similar objects, referred to as clustering. Sorting refers to adjusting a set of "unordered" sequences of records to an "ordered" sequence of records. In the invention, the triples obtained by clustering the triples extracted by the relation of the texts are clustered, the clustered triples are sequenced, and n triples in the front of the sequence are used as attribute information of the entity.

Similarly, in the embodiment of the invention, the texts are clustered and then sorted, and m texts in the front of the sorting are used as the text information of the entity.

For example, the relationship extraction of all the related texts of the entity "a palace in BJ city" can obtain triples "a palace in BJ city-originated from-public yuan 1406 years", "a palace in BJ city-ming ancestor-originated from", "a palace in BJ city-built on the basis of the palace in junior university", "a palace museum in BJ city-is-comprehensive museum", "a palace in BJ city-ming ancestor-created four years" and the like. And then clustering and sequencing all the triples. Attributes such as a certain A palace in the BJ city-built-public yuan 1406 years and a certain A palace museum in the BJ city-comprehensive museum can be obtained.

And clustering and sequencing all texts. The top text in the ordered sequence is then removed to represent the text message for that entity. The related text of a certain palace A in the BJ city can be obtained, for example, the A palace in the BJ city A is built in the public Yuan 1406 years (four years for Yong le), and is basically completed in 1420 years (eighteen years for Yong le), and is built in the Mingcheng and is built on the basis of the palace in Yuan university. "and the like. In addition, the entity attribute and the text information can be expressed as follows in the manner of table 1:

TABLE 1

Further, in an embodiment provided in the present invention, the second entity related information includes:

specifically, the entities in the entity list are input into the background knowledge graph as keywords, and the entity attribute information and the entity classification information of the entities and the entity related information, that is, the second entity attribute and the attribute value thereof, and the entity classification information and the entity related information, which are stored in the background knowledge graph, are retrieved.

For example, the entity classification information, the instance attribute and the attribute value thereof, the related entity, and the like can be obtained by calling the resources in the large-scale knowledge graph through the entity list. For example, the attribute of "a palace in a BJ city" includes the attributes that "address" is "front street 4 in east city scenic mountain" in the BJ city "and" scenic spot level "is" AAAAA level ". Related scenic spots include KN palace, W gate and the like.

Further, in the embodiment of the present invention, the method for extracting keywords from original data includes:

Further, in the embodiment of the present invention, the method for constructing a knowledge graph, wherein the entity merging is performed on the entity link result, the keyword extraction result, and the named entity recognition result to obtain an entity list, includes:

Specifically, all the obtained entities are merged according to the vocabulary intersection condition extracted by entity linking, keyword extraction and named entity recognition, and the overlapping in words is processed according to rules. For example, the original sentence is "ABCDEFGHIJ", where each letter represents a word, for example, where the word to which the entity link is linked is "DEFG", the keyword or named entity is identified as extracted as CD, the result is revised as CDEFG, the keyword or named entity is identified as CDEFG, the result is revised as CDEFG, the keyword or named entity is identified as CDEFGH, the result is revised as CDEFGH, the keyword or named entity is identified as DE, the result is revised as DEFG, and the keyword or named entity is identified as DEFG, the result is revised as DEFG. Namely, the union of vocabulary results obtained by determining entity links, extracting keywords and identifying named entities is used as an entity merging result.

And if the keyword or the named entity is identified as DEFGH, the result is DEFG, if the keyword or the named entity is identified as EF, the result is corrected to be EF, and if the keyword or the named entity is identified as EFG, the result is corrected to be EFG. For example, a "super match" is obtained by using the entity link result, and is reserved as a "certain super match" according to the previously set rules, the result obtained by extracting the keywords is corrected and knowledge is supplemented, the "certain super match" in the text can be a "certain super match" in the football field by considering the context content in the entity link, and the keywords may only be a "certain super", and the "certain super" may refer to the events such as a "certain volleyball super match" and a "certain badminton super match", and the keywords are changed into a "certain super match", the entity link function provides disambiguation for the keyword extraction function, and the supplementation of a "certain football association super match (for short," certain super "or" certain super match ") is a professional football match at the highest level in a certain area. The inferior-level league is a football association class A league, a football association class B league and a football association member association champion league. "background knowledge. That is, "a super tournament" in the middle entity refers to "a football super tournament" through entity links, and "a super" in keywords may refer to "a volleyball super tournament" or "a badminton super tournament", and other events, that is, a plurality of subsets are presented, but in this document, the finally determined "a super" refers to "a football super tournament" and is further presented as an intersection of vocabulary results obtained by entity links, keyword extraction, and named entity recognition as an entity merging result, in the embodiment of the present invention, a knowledge map construction method is provided, wherein the relationship extraction is performed on all texts containing the entities, and includes one or more of the following:

extracting the relationship of chapter level of the plurality of texts;

sentence-level relation extraction is carried out on the plurality of texts;

Specifically, the term "extracting chapter-level entity relationship from text" refers to extracting the relationship between two corresponding entities from the context of the event text, such as the event text "10/12/2020 News" team A general champion! Team a seized his history 17 th cap every ten years, and B gained a total finals FMVP ' "from which it could be derived that the relationship between entity" player B "and entity" team a "was '" player B ' was effective at ' team a ' ".

The sentence-level entity relationship extraction of the text refers to the relationship between two entities expressed in a sentence presenting a grammatical state in the event text, namely, the relationship between the two entities is judged from one sentence. For example, the sentence "player B is effective in team a" may obtain two entities "player B" and "team a" and the relationship between these two entities is "player B", "effective in" team a ".

The entity relationship expansion is carried out on the entity link results obtained by entity links of a plurality of news events, namely the relationship between the entities cannot be obtained from event texts or sentences, but information of the relationship between two entities exists in the background knowledge of the entities, for example, in the background knowledge of the entity of 'a player B', 2011, depending on the known name degree on the international body jar, the player B and the group A achieve a cooperative agreement, the player B becomes a global exclusive high-level image representative of the club A, and the reward is a plurality of equities of the club A. "it can be known that the relationship between the entity" player B "and the entity" club a "is that" player B "is a stockholder of" club a ".

By adopting the three ways of judging the relationship between the entities, the relationship between different entities can be fully and comprehensively expressed.

Further, in an embodiment provided by the present invention, the method for constructing a knowledge graph, wherein clustering all texts including the entity to obtain a plurality of clustered texts includes:

By adopting the method, the construction of the BJ city tourism knowledge graph is used for explaining how to construct a new knowledge graph, and the BJ city knowledge graph is updated.

To construct a BJ city tourism knowledge map. The current large-scale knowledge maps may be selected from the CYC common sense knowledge base, multilingual Wikipedia (Wikipedia), Dbpedia, Freebase, YAGO, Wikidata, Nell, Probase, BabelNet, ConceptNet, and the like. Large-scale cross-lingual knowledge-graph XLORE was selected. The original data of (1) is a related document of 'BJ city tourism network', and the existing large-scale knowledge graph uses a large-scale cross-language knowledge graph XLORE. XLORE contains about 235 million concepts and 2600 million entities, and is a knowledge graph containing the most structured knowledge of Chinese.

First, the entity linking the original text based on all the data in XLORE can get the key vocabulary in it, e.g., "some C ocean bottom world is ocean bottom theme park educated in music. There are 8000 tons of water, 6000 tails of rare marine life, 10 huge and violent sharks, 120 meters of super-long full-sight submarine tunnel, and a remarkable submarine trip! The sentence can obtain the link results of entities such as a certain C seabed world, a theme park, a rare marine organism, shark and the like.

And then, extracting keywords of the text in the original document by a TF-IDF-like combined grammar rule method, and extracting keywords of all the texts, wherein the keywords can be obtained from a certain museum A, a certain bridge and the like.

Then, the original document is subjected to entity recognition by adopting a rule-based method, and organization organizations such as 'Qinghua university', 'China people university' and the like, names such as 'historical character A', 'historical character B', 'historical character C' and the like, and places such as 'a street', 'a train station' and the like can be obtained.

And then merging the entity names obtained by entity linking, keyword extraction and named entity identification, so as to obtain an entity merging result entity.

And searching texts related to all entities in the entity list in the original data according to the entity list obtained by combination. For example, a "certain C seabed world" related text "may be available that a certain C seabed world is a seabed theme park edutainment. ' the hospital, which faces the sun, drives the car to go to the west, namely to a certain C seabed world, and ' education, entertainment and environmental responsibility ' of a certain C seabed world of the BJ city worker are taken as the management objectives, and the magical and beautiful seabed world is shown to tourists through high technology. "and the like. For example, the 'A palace in the BJ city' can be obtained, the A palace in the BJ city A is built in the Yuan 1406 years (four years for Yong le), the 1420 years (eighteen years for Yong le) are basically completed for 14 years, the A palace is a Ming-Cheng-Zu building, and the A palace is built on the basis of the Yuan Da palace. "and" a palace museum in BJ city a is a chinese comprehensive museum, which is established in 10 months and 10 days in 1925, and is located in a palace purple forbidden city in BJ city a, and the collection items include but are not limited to the dynasties of Ming dynasty and Qing dynasties and their collections. "and" a certain palace A of BJ city begins to be built in four years (1406 years) of Yongle of Mingchang, and is built by taking a certain palace A of Nanjing as a blue book, and is built in eighteen years (1420 years) of Yongle, and becomes a emperor palace of Mingqing, two and twenty-four emperors. "and the like.

And then extracting the relation of the text to obtain the triple. For example, relation extraction is carried out on all related texts of an entity of a certain A palace in a BJ city, the A palace in the BJ city is built from-public Yuan 1406 years, the A palace in the BJ city is built from-Ming Cheng Zu-Sheng, the A palace in the BJ city is built on the basis of Yuan Da Ching palace, the A palace in the BJ city is a museum which is comprehensive, and the A palace in the BJ city is built from four years which is Yongwu Ming Cheng Zu. And then clustering and sequencing all the triples. Attributes such as a certain A palace in the BJ city-built-public yuan 1406 years and a certain A palace museum in the BJ city-comprehensive museum can be obtained.

And clustering and sequencing all texts. The top text in the ordered sequence is then removed to represent the text message for that entity. The related text of a certain palace A in the BJ city can be obtained, for example, the A palace in the BJ city A is built in the public Yuan 1406 years (four years for Yong le), and is basically completed in 1420 years (eighteen years for Yong le), and is built in the Mingcheng and is built on the basis of the palace in Yuan university. "and the like.

Instance attributes and their attribute values, related entities, etc. For example, the attribute of "a palace in a BJ city" includes the attributes that "address" is "front street 4 in east city scenic mountain" in the BJ city "and" scenic spot level "is" AAAAA level ". Related scenic spots include KN palace, W gate and the like.

And finally, combining the entity, the entity attribute, the entity text and the information of the entity corresponding to the entity, such as the related attribute, the text and the like of the background knowledge graph together to form the complete knowledge graph. The data of the 'BJ city tourism network' original data after the function processing such as entity extraction, attribute extraction, text association and the like can realize the updating and supplement of large-scale knowledge map data according to the confidence coefficient.

Referring to fig. 3, an apparatus for constructing a knowledge graph according to an embodiment of the present invention includes:

a first processing module 31, configured to obtain raw data and a large-scale knowledge graph;

the second processing module 32 is configured to perform entity linking, keyword extraction, and named entity identification on the raw data to obtain an entity linking result, a keyword extraction result, and a named entity identification result;

a third processing module 33, configured to perform entity merging on the entity link result, the keyword extraction result, and the named entity identification result, so as to obtain an entity list;

a fourth processing module 34, configured to perform text retrieval on the original data by using the entity in the entity list as a keyword, so as to obtain all texts including the entity;

a fifth processing module 35, configured to process all texts including the entity to obtain first entity-related information representing the entity;

a sixth processing module 36, configured to input an entity in the entity list as a keyword into a background knowledge graph, so as to obtain second entity-related information of the entity in the background knowledge graph;

a seventh processing module 37, configured to fuse the entity list, the first entity-related information, and the second entity-related information of the entity in the background knowledge graph to obtain a new knowledge graph.

Since the apparatus provided by the embodiment of the present invention can be used for executing the method described in the above embodiment, and the operation principle and the beneficial effect are similar, detailed descriptions are omitted here, and specific contents can be referred to the description of the above embodiment.

Further, in the embodiment provided by the present invention, the fifth processing module 35 is specifically configured to:

Further, in the embodiment provided by the present invention, the information about the second entity in the sixth processing module 36 includes:

specifically, the sixth processing module 36 inputs the entities in the entity list into the background knowledge graph as keywords, and retrieves the entity attribute information and the entity classification information of the entities and the entity related information stored in the background knowledge graph, that is, the second entity attribute and the attribute value thereof, and the entity classification information and the entity related information.

Further, in this embodiment of the present invention, the method for extracting keywords from the original data in the second processing module 32 includes:

Further, in the embodiment of the present invention, the knowledge graph constructing apparatus, wherein the third processing module 33 is specifically configured to:

And if the keyword or the named entity is identified as DEFGH, the result is DEFG, if the keyword or the named entity is identified as EF, the result is corrected to be EF, and if the keyword or the named entity is identified as EFG, the result is corrected to be EFG. For example, a "super match" is obtained by using the entity link result, and is reserved as a "certain super match" according to the previously set rules, the result obtained by extracting the keywords is corrected and knowledge is supplemented, the "certain super match" in the text can be a "certain super match" in the football field by considering the context content in the entity link, and the keywords may only be a "certain super", and the "certain super" may refer to the events such as a "certain volleyball super match" and a "certain badminton super match", and the keywords are changed into a "certain super match", the entity link function provides disambiguation for the keyword extraction function, and the supplementation of a "certain football association super match (for short," certain super "or" certain super match ") is a professional football match at the highest level in a certain area. The inferior-level league is a football association class A league, a football association class B league and a football association member association champion league. "background knowledge. That is, "a super tournament" in the middle entity refers to "a football super tournament" through the entity link, and "a super" in the keyword may refer to "a volleyball super tournament" or "a badminton super tournament", and other events, that is, a plurality of subsets are presented, but in this document, the finally determined "a super" refers to "a football super tournament" as presented as an intersection of vocabulary results obtained by entity link, keyword extraction, and named entity recognition as an entity merging result.

Further, in an embodiment of the present invention, the knowledge-graph constructing apparatus, wherein the fifth processing module is configured to perform one or more of the following operations:

extracting the relationship of chapter level of the plurality of texts;

sentence-level relation extraction is carried out on the plurality of texts;

Further, in the embodiment provided by the present invention, in the knowledge graph constructing apparatus, the fifth processing module 35 clusters all the texts including the entity by using the following method, to obtain a plurality of clustered texts, including:

Fig. 4 illustrates a physical structure diagram of an electronic device, which may include, as shown in fig. 4: a processor (processor)410, a communication Interface 420, a memory (memory)430 and a communication bus 440, wherein the processor 410, the communication Interface 420 and the memory 430 are communicated with each other via the communication bus 440. Processor 410 may invoke logic instructions in memory 430 to perform a method of knowledge-graph construction, the method comprising: acquiring original data and a background knowledge map; carrying out entity linking, keyword extraction and named entity identification on the original data to obtain an entity linking result, a keyword extraction result and a named entity identification result; entity combination is carried out on the entity link result, the keyword extraction result and the named entity identification result, and an entity list is obtained; taking the entity in the entity list as a keyword to perform text retrieval on the original data to obtain a text containing the entity; processing all texts containing the entity to obtain first entity related information representing the entity; inputting the entities in the entity list into a background knowledge graph as key words to obtain second entity related information of the entities; and fusing the entity list, the first entity related information and the second entity related information of the entities to obtain a new knowledge graph.

In addition, the logic instructions in the memory 430 may be implemented in the form of software functional units and stored in a computer readable storage medium when the software functional units are sold or used as independent products. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.

In another aspect, the present invention also provides a computer program product comprising a computer program stored on a non-transitory computer-readable storage medium, the computer program comprising program instructions, which when executed by a computer, enable the computer to perform the method for constructing a knowledge graph provided by the above methods, the method comprising: acquiring original data and a background knowledge map; carrying out entity linking, keyword extraction and named entity identification on the original data to obtain an entity linking result, a keyword extraction result and a named entity identification result; entity combination is carried out on the entity link result, the keyword extraction result and the named entity identification result, and an entity list is obtained; taking the entity in the entity list as a keyword to perform text retrieval on the original data to obtain a text containing the entity; processing all texts containing the entity to obtain first entity related information representing the entity; inputting the entities in the entity list into a background knowledge graph as key words to obtain second entity related information of the entities; and fusing the entity list, the first entity related information and the second entity related information of the entities to obtain a new knowledge graph.

In yet another aspect, the present invention also provides a non-transitory computer readable storage medium having stored thereon a computer program which, when executed by a processor, is implemented to perform the methods of constructing a knowledge graph provided above, the methods comprising: acquiring original data and a background knowledge map; carrying out entity linking, keyword extraction and named entity identification on the original data to obtain an entity linking result, a keyword extraction result and a named entity identification result; entity combination is carried out on the entity link result, the keyword extraction result and the named entity identification result, and an entity list is obtained; taking the entity in the entity list as a keyword to perform text retrieval on the original data to obtain a text containing the entity; processing all texts containing the entity to obtain first entity related information representing the entity; inputting the entities in the entity list into a background knowledge graph as key words to obtain second entity related information of the entities; and fusing the entity list, the first entity related information and the second entity related information of the entities to obtain a new knowledge graph.

The above-described embodiments of the apparatus are merely illustrative, and the units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment. One of ordinary skill in the art can understand and implement it without inventive effort.

Through the above description of the embodiments, those skilled in the art will clearly understand that each embodiment can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware. With this understanding in mind, the above-described technical solutions may be embodied in the form of a software product, which can be stored in a computer-readable storage medium such as ROM/RAM, magnetic disk, optical disk, etc., and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the methods described in the embodiments or some parts of the embodiments.

Finally, it should be noted that: the above examples are only intended to illustrate the technical solution of the present invention, but not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.

Claims

1. A knowledge graph construction method is characterized by comprising the following steps:

acquiring original data and a background knowledge map;

2. The method for constructing a knowledge graph according to claim 1, wherein the processing all texts containing the entity to obtain the first entity-related information representing the entity specifically comprises:

3. The method of knowledge-graph construction according to claim 1, wherein the second entity-related information comprises:

and the entity classification information, the second entity attribute and the attribute value thereof and entity related information of the entity in the background knowledge graph.

4. The knowledge graph construction method according to claim 1, wherein the method for extracting keywords from the raw data comprises:

5. The knowledge graph construction method of claim 1, wherein the entity combining the entity linking result, the keyword extraction result and the named entity recognition result to obtain an entity list comprises:

6. The method of knowledge-graph construction according to claim 2, wherein said extracting relationships from all text containing said entity comprises one or more of:

extracting the relationship of chapter level of the plurality of texts;

sentence-level relation extraction is carried out on the plurality of texts;

7. The method of constructing a knowledge graph according to claim 1, wherein clustering all texts containing the entities to obtain a plurality of clustered texts comprises:

8. A knowledge-graph building apparatus, comprising:

9. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor implements the steps of the knowledge-graph construction method according to any one of claims 1 to 7 when executing the program.

10. A non-transitory computer readable storage medium having stored thereon a computer program, wherein the computer program when executed by a processor implements the steps of the method of knowledge-graph construction according to any one of claims 1 to 7.