CN114153983A - Multi-source construction method of industry knowledge graph - Google Patents
Multi-source construction method of industry knowledge graph Download PDFInfo
- Publication number
- CN114153983A CN114153983A CN202111353417.2A CN202111353417A CN114153983A CN 114153983 A CN114153983 A CN 114153983A CN 202111353417 A CN202111353417 A CN 202111353417A CN 114153983 A CN114153983 A CN 114153983A
- Authority
- CN
- China
- Prior art keywords
- industry
- concepts
- entities
- entity
- extracted
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/36—Creation of semantic tools, e.g. ontology or thesauri
- G06F16/367—Ontology
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/33—Querying
- G06F16/3331—Query processing
- G06F16/334—Query execution
- G06F16/3344—Query execution using natural language analysis
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/35—Clustering; Classification
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/10—Text processing
- G06F40/194—Calculation of difference between files
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/237—Lexical tools
- G06F40/247—Thesauruses; Synonyms
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/044—Recurrent networks, e.g. Hopfield networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/088—Non-supervised learning, e.g. competitive learning
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N7/00—Computing arrangements based on specific mathematical models
- G06N7/01—Probabilistic graphical models, e.g. probabilistic networks
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Computational Linguistics (AREA)
- Artificial Intelligence (AREA)
- Data Mining & Analysis (AREA)
- General Health & Medical Sciences (AREA)
- Health & Medical Sciences (AREA)
- Evolutionary Computation (AREA)
- Life Sciences & Earth Sciences (AREA)
- Computing Systems (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Databases & Information Systems (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Molecular Biology (AREA)
- Computational Mathematics (AREA)
- Pure & Applied Mathematics (AREA)
- Algebra (AREA)
- Probability & Statistics with Applications (AREA)
- Mathematical Analysis (AREA)
- Animal Behavior & Ethology (AREA)
- Mathematical Optimization (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention discloses a multi-source construction method of an industry knowledge graph, which comprises the following steps: s1, aiming at four knowledge sources of an open knowledge base, an online encyclopedia, an industry text and industry structure data, industry concepts and entities are extracted; s2 merging synonymous concepts and entities; s3 extracting the upper and lower relation of the concept; s4 extracts non-top and bottom relationships and attribute relationships of concepts and entities. The multi-source construction method can solve the problems that the existing construction method is large in artificial workload, consumes a large amount of computer resources, is excessive in fragmentation information, is incomplete in data, and is difficult to extract and fuse knowledge from different sources in a distinguishing manner, so that the purposes that a target body is constructed, entities and attributes are extracted by adopting a targeted strategy according to different data sources, the characteristics of knowledge from different sources are considered, the knowledge graph is constructed semi-automatically by combining a machine learning method, and the manpower consumed by constructing the large-scale knowledge graph is greatly reduced while the accuracy is ensured are achieved.
Description
Technical Field
The invention relates to the technical field of artificial intelligence text processing, in particular to a multi-source construction method of an industry knowledge graph.
Background
The industry knowledge graph contains massive structural information, and is usually used for analysis application or decision support, so that the requirement on accuracy is high. The construction of the large-scale knowledge graph comprises two modes, namely synchronization with a database and a network encyclopedia respectively. The first method is to use a specific structure for storing the knowledge graph, download a large amount of data, and construct the data in a sub-graph fusion mode after manual integration. This approach is labor intensive, consumes significant computer resources, and does not guarantee data security during the build process. The second method is to adopt a web crawler to perform data acquisition and information extraction on related similar information, which has the problems that a large amount of web page processing causes excessive fragmented information, and most websites have the performance of blocking the crawler, so that the data is incomplete. For the multi-source knowledge graph, knowledge from industry texts, open chain data sets and knowledge bases and encyclopedias has different characteristics, and the existing construction mode is difficult to extract and fuse the knowledge from different sources.
Disclosure of Invention
Aiming at the technical problems in the related art, the invention provides a multi-source construction method of an industry knowledge graph, which can overcome the defects in the prior art.
In order to achieve the technical purpose, the technical scheme of the invention is realized as follows:
a multi-source construction method of an industry knowledge graph comprises the following steps:
s1, aiming at four knowledge sources of an open knowledge base, an online encyclopedia, an industry text and industry structure data, industry concepts and entities are extracted;
s2 merging synonymous concepts and entities;
s3 extracting the upper and lower relation of the concept;
s4 extracts non-top and bottom relationships and attribute relationships of concepts and entities.
Further, the S1 includes the following steps:
s11, collecting the existing open link data set and the business core concepts and entities in an open knowledge base, wherein the open link data set and the open knowledge base comprise DBPedia, YAGO and Zhishi.me;
s12, collecting category labels of classification systems in Wikipedia, encyclopedia and interactive encyclopedia as concepts, titles of encyclopedia articles as entity candidates, and using corresponding brief introduction texts in online encyclopedia as concepts or abstract of entities;
s13, finding out a keyword set for the industry text corpus by adopting word frequency statistics, RAKE, TextRank and TF-IDF methods, and preliminarily screening out an industry core concept from the keyword set by the aid of industry experts;
s14, mapping the related tables and the columns in the tables in the relational database into conceptual entities and attributes of the entities respectively through a D2R Server tool for the industry structure data;
s15 integrates the industry concepts and entities obtained from the four ways in S11-S14.
Further, the S2 includes the following steps:
s21 is clear about the synonymy relationship in the open link data, DBPedia uses "owl: sameAs" "to identify the synonymous entity," "means" "to identify the synonymous entity in YAGO, and" pageRedirects "" to identify the redirection page of the synonymous entity in Zhishi.me;
s22, in the aspect of encyclopedia, merging the learned concepts in the same encyclopedia, traversing the entity pages in the encyclopedia, identifying the page titles with the same redirection label as the same entity, and identifying the values corresponding to the 'alias' and 'Chinese alias' fields in the entity page information as the same entity;
judging whether different online encyclopedia homonymous entities are synonymous or not: for page articles in different online encyclopedias, the articles with the same title and the article content similarity of more than 80% are marked as pages corresponding to the same entity or concept, and the entity or concept corresponding to the article title is marked as synonym;
s23 extracts an industry text synonymy relation: in the field of industry text, first, defining "X and also name Y," "Y and also name Y," "X and also name Y," "X and also name Y," "X and also name Y", "" X and also name Y "s" are "Y" "X is Y" s "are" Y "s", "X" Y "s" are "Y" s "and" s "Y" s "are" Y "s" and "s" Y "s" are "Y" s "Y" s "Y" s "Y" s "and are also called Y" s "Y" s "and are also called Y" s, then, performing word segmentation and part-of-speech tagging on the text through an NLP tool, obtaining training data according to the extracted synonymy relationship, modeling by using a BilSTM-CRF algorithm, and extracting the synonymy relationship;
s24 combines the synonymy relations obtained from the three ways of S21-S23, and if the synonymy relations obtained from different ways have the same concept or entity, then combines the two synonymy relations.
Furthermore, the chapter content similarity in S22 is obtained by an unsupervised learning method, vector representations of all words are obtained by a word2vec algorithm, for any article, tf-idf of each word in the text is used as a weight, word vectors of all words in the article are weighted and averaged to serve as a vector of the article, and cosine similarity between vectors is used as article similarity.
Further, the S3 includes the following steps:
s31, extracting the superior-inferior relation between industry core concepts from the open link data set and the open knowledge base according to the corresponding rules;
s32, directly acquiring the upper and lower relations among the core concepts from the encyclopedia classification system;
s33 extracting the context relation of the industry text: for an industrial text, firstly, defining that X is Y, X is Y, Z and the like, X comprises Y, Z and the like, X has Y, Z and the like, X means Y, Z and the like, X (Y, Z) is a sentence pattern rule for describing the superior-inferior relation, matching is performed in the industrial text according to the patterns, the superior-inferior relation between entities or concepts is extracted, then word segmentation and word property tagging are performed on the text by an NLP tool, training data is obtained according to the extracted superior-inferior relation, a BilSTM-CRF algorithm is used for modeling, and the superior-inferior relation among triples is extracted;
s34 integrates the upper and lower relations obtained from the three ways of S31-S33 to construct a classification tree.
Further, the S4 includes the following steps:
s41, the attribute relation of the concept can be directly extracted from the information module of the open chain data;
s42 compiling an adapter, extracting entity attribute relation of concepts from an information module of the on-line encyclopedic through page analysis, and counting attributes of entities to which the concepts belong, wherein if the number proportion of certain attributes owned by entities corresponding to one concept exceeds 30%, the attributes are considered to be common and become the attributes of the concepts;
s43 extracting non-context relation of industry texts: in the aspect of industry texts, firstly, common sentence pattern rules for describing non-superior-inferior relations are defined under the assistance of industry experts, matching is carried out in industry texts according to the rules, the non-superior-inferior relations between entities or concepts are extracted, then, word segmentation and part-of-speech tagging are carried out on the texts through an NLP tool, training data are obtained according to the extracted non-superior-inferior relations, modeling is carried out through a BiLSTM-CRF algorithm, and the non-superior-inferior relations are extracted;
s44 finally combines the non-superordinate and superordinate relationships obtained by the three routes S41-S43.
The invention has the beneficial effects that: the multisource construction method of the industry knowledge graph can solve the problems that the existing construction method is large in artificial workload, consumes a large amount of computer resources, is excessive in fragmentation information, is incomplete in data and is difficult to extract and fuse knowledge from different sources in a distinguishing mode, so that the purposes that a target body is constructed, entities and attributes are extracted by adopting a targeted strategy according to different data sources, the characteristics of knowledge from different sources are considered, the knowledge graph is constructed semi-automatically by combining a machine learning method, and the manpower consumed by large-scale knowledge graph construction is greatly reduced while the accuracy is ensured are achieved.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings needed in the embodiments will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings without creative efforts.
FIG. 1 is a flow diagram of a multi-source construction method of an industry knowledge graph according to an embodiment of the invention;
FIG. 2 is a flowchart of the multi-source construction method of an industry knowledge graph for determining whether different online encyclopedia entities are synonymous according to an embodiment of the present invention;
FIG. 3 is a flow chart of extracting synonymous, top-bottom, non-top-bottom relations of industry texts of the multi-source construction method of the industry knowledge graph according to the embodiment of the invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all embodiments. All other embodiments that can be derived by one of ordinary skill in the art from the embodiments given herein are intended to be within the scope of the present invention.
As shown in fig. 1 to 3, the multi-source construction method of an industry knowledge graph according to the embodiment of the present invention includes the following steps:
s1, aiming at four knowledge sources of an open knowledge base, an online encyclopedia, an industry text and industry structure data, industry concepts and entities are extracted;
s2 merging synonymous concepts and entities;
s3 extracting the upper and lower relation of the concept;
s4 extracts non-top and bottom relationships and attribute relationships of concepts and entities.
The above S1 includes the following steps:
s11, collecting the existing open link data set and the business core concepts and entities in an open knowledge base, wherein the open link data set and the open knowledge base comprise DBPedia, YAGO and Zhishi.me;
s12, collecting category labels of classification systems in Wikipedia, encyclopedia and interactive encyclopedia as concepts, titles of encyclopedia articles as entity candidates, and using corresponding brief introduction texts in online encyclopedia as concepts or abstract of entities;
s13, finding out a keyword set for the industry text corpus by adopting word frequency statistics, RAKE, TextRank and TF-IDF methods, and preliminarily screening out an industry core concept from the keyword set by the aid of industry experts;
s14, mapping the related tables and the columns in the tables in the relational database into conceptual entities and attributes of the entities respectively through a D2R Server tool for the industry structure data;
s15 integrates the industry concepts and entities obtained from the four ways in S11-S14.
The above S2 includes the following steps:
s21 is clear about the synonymy relationship in the open link data, DBPedia uses "owl: sameAs" "to identify the synonymous entity," "means" "to identify the synonymous entity in YAGO, and" pageRedirects "" to identify the redirection page of the synonymous entity in Zhishi.me;
s22, in the aspect of encyclopedia, merging the learned concepts in the same encyclopedia, traversing the entity pages in the encyclopedia, identifying the page titles with the same redirection label as the same entity, and identifying the values corresponding to the 'alias' and 'Chinese alias' fields in the entity page information as the same entity;
judging whether different online encyclopedia homonymous entities are synonymous or not: for page articles in different online encyclopedias, the articles with the same title and the article content similarity of more than 80% are marked as pages corresponding to the same entity or concept, and the entity or concept corresponding to the article title is marked as synonym;
s23 extracts an industry text synonymy relation: in the field of industry text, first, defining "X and also name Y," "Y and also name Y," "X and also name Y," "X and also name Y," "X and also name Y", "" X and also name Y "s" are "Y" "X is Y" s "are" Y "s", "X" Y "s" are "Y" s "and" s "Y" s "are" Y "s" and "s" Y "s" are "Y" s "Y" s "Y" s "Y" s "and are also called Y" s "Y" s "and are also called Y" s, then, performing word segmentation and part-of-speech tagging on the text through an NLP tool, obtaining training data according to the extracted synonymy relationship, modeling by using a BilSTM-CRF algorithm, and extracting the synonymy relationship;
s24 combines the synonyms obtained from the three approaches S21-S23, if there are the same concepts or entities in the synonyms obtained from different approaches, then combine two synonyms, for example, obtain the synonym "computer, electronic computer" in encyclopedia, obtain the synonym "computer, computer" in business text, and the combined synonym is "computer, electronic computer, computer".
The similarity of the chapter contents in the S22 is obtained by an unsupervised learning method, vector representations of all words are obtained by a word2vec algorithm, for any article, tf-idf of each word in the text is used as a weight, word vectors of all words in the article are weighted and averaged to serve as a vector of the article, and then cosine similarity between the vectors is used as the article similarity.
The above S3 includes the following steps:
s31, extracting the superior-inferior relation between industry core concepts from the open link data set and the open knowledge base according to the corresponding rules;
s32, directly acquiring the upper and lower relations among the core concepts from the encyclopedia classification system;
s33 extracting the context relation of the industry text: for an industrial text, firstly, defining that X is Y, X is Y, Z and the like, X comprises Y, Z and the like, X has Y, Z and the like, X means Y, Z and the like, X (Y, Z) is a sentence pattern rule for describing the superior-inferior relation, matching is performed in the industrial text according to the patterns, the superior-inferior relation between entities or concepts is extracted, then word segmentation and word property tagging are performed on the text by an NLP tool, training data is obtained according to the extracted superior-inferior relation, a BilSTM-CRF algorithm is used for modeling, and the superior-inferior relation among triples is extracted;
s34 integrates the upper and lower relations obtained from the three ways of S31-S33 to construct a classification tree.
The above S4 includes the following steps:
s41, the attribute relation of the concept can be directly extracted from the information module of the open chain data;
s42 compiling an adapter, extracting entity attribute relation of concepts from an information module of the on-line encyclopedic through page analysis, and counting attributes of entities to which the concepts belong, wherein if the number proportion of certain attributes owned by entities corresponding to one concept exceeds 30%, the attributes are considered to be common and become the attributes of the concepts;
s43 extracting non-context relation of industry texts: in the aspect of industry texts, firstly, common sentence pattern rules for describing non-superior-inferior relations are defined under the assistance of industry experts, matching is carried out in industry texts according to the rules, the non-superior-inferior relations between entities or concepts are extracted, then, word segmentation and part-of-speech tagging are carried out on the texts through an NLP tool, training data are obtained according to the extracted non-superior-inferior relations, modeling is carried out through a BiLSTM-CRF algorithm, and the non-superior-inferior relations are extracted;
s44 finally combines the non-superordinate and superordinate relationships obtained by the three routes S41-S43.
In order to facilitate understanding of the above-described technical aspects of the present invention, the above-described technical aspects of the present invention will be described in detail below in terms of specific usage.
When the method is used specifically, industry concepts and entities are extracted aiming at four knowledge sources of an open knowledge base, an online encyclopedia, industry texts and industry structure data, synonymous concepts and entities are combined, the upper and lower level relations of the concepts are extracted, and the non-upper and lower level and attribute relations of the concepts and the entities are extracted.
In conclusion, by means of the technical scheme, the problems that the existing construction method is large in artificial workload, consumes a large amount of computer resources, is excessive in fragmentation information, is incomplete in data, and is difficult to extract and fuse knowledge from different sources are solved, so that the purposes that a target body is constructed, entities and attributes are extracted by adopting a targeted strategy according to different data sources, the characteristics of knowledge from different sources are considered, the knowledge graph is constructed semi-automatically by combining a machine learning method, and the manpower consumed by large-scale knowledge graph construction is greatly reduced while the accuracy is ensured are achieved.
The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents, improvements and the like that fall within the spirit and principle of the present invention are intended to be included therein.
Claims (6)
1. A multi-source construction method of an industry knowledge graph is characterized by comprising the following steps:
s1, aiming at four knowledge sources of an open knowledge base, an online encyclopedia, an industry text and industry structure data, industry concepts and entities are extracted;
s2 merging synonymous concepts and entities;
s3 extracting the upper and lower relation of the concept;
s4 extracts non-top and bottom relationships and attribute relationships of concepts and entities.
2. The multi-source construction method of industry knowledge graph according to claim 1, wherein the S1 comprises the following steps:
s11, collecting the existing open link data set and the business core concepts and entities in an open knowledge base, wherein the open link data set and the open knowledge base comprise DBPedia, YAGO and Zhishi.me;
s12, collecting category labels of classification systems in Wikipedia, encyclopedia and interactive encyclopedia as concepts, titles of encyclopedia articles as entity candidates, and using corresponding brief introduction texts in online encyclopedia as concepts or abstract of entities;
s13, finding out a keyword set for the industry text corpus by adopting word frequency statistics, RAKE, TextRank and TF-IDF methods, and preliminarily screening out an industry core concept from the keyword set by the aid of industry experts;
s14, mapping the related tables and the columns in the tables in the relational database into conceptual entities and attributes of the entities respectively through a D2R Server tool for the industry structure data;
s15 integrates the industry concepts and entities obtained from the four ways in S11-S14.
3. The multi-source construction method of industry knowledge graph according to claim 1, wherein the S2 comprises the following steps:
s21 is clear about the synonymy relationship in the open link data, DBPedia uses "owl: sameAs" "to identify the synonymous entity," "means" "to identify the synonymous entity in YAGO, and" pageRedirects "" to identify the redirection page of the synonymous entity in Zhishi.me;
s22, in the aspect of encyclopedia, merging the learned concepts in the same encyclopedia, traversing the entity pages in the encyclopedia, identifying the page titles with the same redirection label as the same entity, and identifying the values corresponding to the 'alias' and 'Chinese alias' fields in the entity page information as the same entity;
judging whether different online encyclopedia homonymous entities are synonymous or not: for page articles in different online encyclopedias, the articles with the same title and the article content similarity of more than 80% are marked as pages corresponding to the same entity or concept, and the entity or concept corresponding to the article title is marked as synonym;
s23 extracts an industry text synonymy relation: in the field of industry text, first, defining "X and also name Y," "Y and also name Y," "X and also name Y," "X and also name Y," "X and also name Y", "" X and also name Y "s" are "Y" "X is Y" s "are" Y "s", "X" Y "s" are "Y" s "and" s "Y" s "are" Y "s" and "s" Y "s" are "Y" s "Y" s "Y" s "Y" s "and are also called Y" s "Y" s "and are also called Y" s, then, performing word segmentation and part-of-speech tagging on the text through an NLP tool, obtaining training data according to the extracted synonymy relationship, modeling by using a BilSTM-CRF algorithm, and extracting the synonymy relationship;
s24 combines the synonymy relations obtained from the three ways of S21-S23, and if the synonymy relations obtained from different ways have the same concept or entity, then combines the two synonymy relations.
4. The multi-source construction method of the industry knowledge graph of claim 3, wherein the chapter content similarity in S22 is obtained through an unsupervised learning method, vector representations of all words are obtained through a word2vec algorithm, for any article, tf-idf of each word in the text is used as weight, word vectors of all words in the article are weighted and averaged to serve as the vector of the article, and cosine similarity among the vectors is used as the article similarity.
5. The multi-source construction method of industry knowledge graph according to claim 1, wherein the S3 comprises the following steps:
s31, extracting the superior-inferior relation between industry core concepts from the open link data set and the open knowledge base according to the corresponding rules;
s32, directly acquiring the upper and lower relations among the core concepts from the encyclopedia classification system;
s33 extracting the context relation of the industry text: for an industrial text, firstly, defining that X is Y, X is Y, Z and the like, X comprises Y, Z and the like, X has Y, Z and the like, X means Y, Z and the like, X (Y, Z) is a sentence pattern rule for describing the superior-inferior relation, matching is performed in the industrial text according to the patterns, the superior-inferior relation between entities or concepts is extracted, then word segmentation and word property tagging are performed on the text by an NLP tool, training data is obtained according to the extracted superior-inferior relation, a BilSTM-CRF algorithm is used for modeling, and the superior-inferior relation among triples is extracted;
s34 integrates the upper and lower relations obtained from the three ways of S31-S33 to construct a classification tree.
6. The multi-source construction method of industry knowledge graph according to claim 1, wherein the S4 comprises the following steps:
s41, the attribute relation of the concept can be directly extracted from the information module of the open chain data;
s42 compiling an adapter, extracting entity attribute relation of concepts from an information module of the on-line encyclopedic through page analysis, and counting attributes of entities to which the concepts belong, wherein if the number proportion of certain attributes owned by entities corresponding to one concept exceeds 30%, the attributes are considered to be common and become the attributes of the concepts;
s43 extracting non-context relation of industry texts: in the aspect of industry texts, firstly, common sentence pattern rules for describing non-superior-inferior relations are defined under the assistance of industry experts, matching is carried out in industry texts according to the rules, the non-superior-inferior relations between entities or concepts are extracted, then, word segmentation and part-of-speech tagging are carried out on the texts through an NLP tool, training data are obtained according to the extracted non-superior-inferior relations, modeling is carried out through a BiLSTM-CRF algorithm, and the non-superior-inferior relations are extracted;
s44 finally combines the non-superordinate and superordinate relationships obtained by the three routes S41-S43.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111353417.2A CN114153983A (en) | 2021-11-16 | 2021-11-16 | Multi-source construction method of industry knowledge graph |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111353417.2A CN114153983A (en) | 2021-11-16 | 2021-11-16 | Multi-source construction method of industry knowledge graph |
Publications (1)
Publication Number | Publication Date |
---|---|
CN114153983A true CN114153983A (en) | 2022-03-08 |
Family
ID=80456466
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202111353417.2A Pending CN114153983A (en) | 2021-11-16 | 2021-11-16 | Multi-source construction method of industry knowledge graph |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN114153983A (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116450856A (en) * | 2023-06-19 | 2023-07-18 | 航天宏图信息技术股份有限公司 | Meteorological ocean unstructured text knowledge construction method and device and electronic equipment |
CN117852637A (en) * | 2024-03-07 | 2024-04-09 | 南京师范大学 | Definition-based subject concept knowledge system automatic construction method and system |
-
2021
- 2021-11-16 CN CN202111353417.2A patent/CN114153983A/en active Pending
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116450856A (en) * | 2023-06-19 | 2023-07-18 | 航天宏图信息技术股份有限公司 | Meteorological ocean unstructured text knowledge construction method and device and electronic equipment |
CN116450856B (en) * | 2023-06-19 | 2023-09-12 | 航天宏图信息技术股份有限公司 | Meteorological ocean unstructured text knowledge construction method and device and electronic equipment |
CN117852637A (en) * | 2024-03-07 | 2024-04-09 | 南京师范大学 | Definition-based subject concept knowledge system automatic construction method and system |
CN117852637B (en) * | 2024-03-07 | 2024-05-24 | 南京师范大学 | Definition-based subject concept knowledge system automatic construction method and system |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN116628172B (en) | Dialogue method for multi-strategy fusion in government service field based on knowledge graph | |
CN108573411B (en) | Mixed recommendation method based on deep emotion analysis and multi-source recommendation view fusion of user comments | |
Tang et al. | Using Bayesian decision for ontology mapping | |
Gaeta et al. | Ontology extraction for knowledge reuse: The e-learning perspective | |
CN110569369A (en) | Generation method and device, application method and device of knowledge graph of bank financial system | |
Xie et al. | A novel text mining approach for scholar information extraction from web content in Chinese | |
CN103500208A (en) | Deep layer data processing method and system combined with knowledge base | |
CN113962293B (en) | LightGBM classification and representation learning-based name disambiguation method and system | |
Yuan-jie et al. | Web service classification based on automatic semantic annotation and ensemble learning | |
CN114153983A (en) | Multi-source construction method of industry knowledge graph | |
CN110888991A (en) | Sectional semantic annotation method in weak annotation environment | |
CN114443855A (en) | Knowledge graph cross-language alignment method based on graph representation learning | |
Ramar et al. | Technical review on ontology mapping techniques | |
Qin et al. | Agriculture knowledge graph construction and application | |
CN116244446A (en) | Social media cognitive threat detection method and system | |
Hu et al. | EGC: A novel event-oriented graph clustering framework for social media text | |
Konys et al. | Ontology learning approaches to provide domain-specific knowledge base | |
CN113610626A (en) | Bank credit risk identification knowledge graph construction method and device, computer equipment and computer readable storage medium | |
CN113377739A (en) | Knowledge graph application method, knowledge graph application platform, electronic equipment and storage medium | |
CN117094390A (en) | Knowledge graph construction and intelligent search method oriented to ocean engineering field | |
Ciravegna et al. | LODIE: Linked Open Data for Web-scale Information Extraction. | |
Zhu et al. | Construction of transformer substation fault knowledge graph based on a depth learning algorithm | |
Maynard et al. | Change management for metadata evolution | |
Li et al. | Research on optimization of knowledge graph construction flow chart | |
Tang et al. | Toward detecting mapping strategies for ontology interoperability |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |