CN114153983A - Multi-source construction method of industry knowledge graph - Google Patents

Multi-source construction method of industry knowledge graph Download PDF

Info

Publication number
CN114153983A
CN114153983A CN202111353417.2A CN202111353417A CN114153983A CN 114153983 A CN114153983 A CN 114153983A CN 202111353417 A CN202111353417 A CN 202111353417A CN 114153983 A CN114153983 A CN 114153983A
Authority
CN
China
Prior art keywords
industry
concepts
entities
entity
extracted
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111353417.2A
Other languages
Chinese (zh)
Inventor
何伟
李小超
谢水庚
冀天宇
郝志强
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Casicloud Co ltd
Original Assignee
Beijing Casicloud Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Casicloud Co ltd filed Critical Beijing Casicloud Co ltd
Priority to CN202111353417.2A priority Critical patent/CN114153983A/en
Publication of CN114153983A publication Critical patent/CN114153983A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/36Creation of semantic tools, e.g. ontology or thesauri
    • G06F16/367Ontology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • G06F16/3344Query execution using natural language analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/10Text processing
    • G06F40/194Calculation of difference between files
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/237Lexical tools
    • G06F40/247Thesauruses; Synonyms
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/044Recurrent networks, e.g. Hopfield networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/088Non-supervised learning, e.g. competitive learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N7/00Computing arrangements based on specific mathematical models
    • G06N7/01Probabilistic graphical models, e.g. probabilistic networks

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Artificial Intelligence (AREA)
  • Data Mining & Analysis (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Computing Systems (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Databases & Information Systems (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Molecular Biology (AREA)
  • Computational Mathematics (AREA)
  • Pure & Applied Mathematics (AREA)
  • Algebra (AREA)
  • Probability & Statistics with Applications (AREA)
  • Mathematical Analysis (AREA)
  • Animal Behavior & Ethology (AREA)
  • Mathematical Optimization (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a multi-source construction method of an industry knowledge graph, which comprises the following steps: s1, aiming at four knowledge sources of an open knowledge base, an online encyclopedia, an industry text and industry structure data, industry concepts and entities are extracted; s2 merging synonymous concepts and entities; s3 extracting the upper and lower relation of the concept; s4 extracts non-top and bottom relationships and attribute relationships of concepts and entities. The multi-source construction method can solve the problems that the existing construction method is large in artificial workload, consumes a large amount of computer resources, is excessive in fragmentation information, is incomplete in data, and is difficult to extract and fuse knowledge from different sources in a distinguishing manner, so that the purposes that a target body is constructed, entities and attributes are extracted by adopting a targeted strategy according to different data sources, the characteristics of knowledge from different sources are considered, the knowledge graph is constructed semi-automatically by combining a machine learning method, and the manpower consumed by constructing the large-scale knowledge graph is greatly reduced while the accuracy is ensured are achieved.

Description

Multi-source construction method of industry knowledge graph
Technical Field
The invention relates to the technical field of artificial intelligence text processing, in particular to a multi-source construction method of an industry knowledge graph.
Background
The industry knowledge graph contains massive structural information, and is usually used for analysis application or decision support, so that the requirement on accuracy is high. The construction of the large-scale knowledge graph comprises two modes, namely synchronization with a database and a network encyclopedia respectively. The first method is to use a specific structure for storing the knowledge graph, download a large amount of data, and construct the data in a sub-graph fusion mode after manual integration. This approach is labor intensive, consumes significant computer resources, and does not guarantee data security during the build process. The second method is to adopt a web crawler to perform data acquisition and information extraction on related similar information, which has the problems that a large amount of web page processing causes excessive fragmented information, and most websites have the performance of blocking the crawler, so that the data is incomplete. For the multi-source knowledge graph, knowledge from industry texts, open chain data sets and knowledge bases and encyclopedias has different characteristics, and the existing construction mode is difficult to extract and fuse the knowledge from different sources.
Disclosure of Invention
Aiming at the technical problems in the related art, the invention provides a multi-source construction method of an industry knowledge graph, which can overcome the defects in the prior art.
In order to achieve the technical purpose, the technical scheme of the invention is realized as follows:
a multi-source construction method of an industry knowledge graph comprises the following steps:
s1, aiming at four knowledge sources of an open knowledge base, an online encyclopedia, an industry text and industry structure data, industry concepts and entities are extracted;
s2 merging synonymous concepts and entities;
s3 extracting the upper and lower relation of the concept;
s4 extracts non-top and bottom relationships and attribute relationships of concepts and entities.
Further, the S1 includes the following steps:
s11, collecting the existing open link data set and the business core concepts and entities in an open knowledge base, wherein the open link data set and the open knowledge base comprise DBPedia, YAGO and Zhishi.me;
s12, collecting category labels of classification systems in Wikipedia, encyclopedia and interactive encyclopedia as concepts, titles of encyclopedia articles as entity candidates, and using corresponding brief introduction texts in online encyclopedia as concepts or abstract of entities;
s13, finding out a keyword set for the industry text corpus by adopting word frequency statistics, RAKE, TextRank and TF-IDF methods, and preliminarily screening out an industry core concept from the keyword set by the aid of industry experts;
s14, mapping the related tables and the columns in the tables in the relational database into conceptual entities and attributes of the entities respectively through a D2R Server tool for the industry structure data;
s15 integrates the industry concepts and entities obtained from the four ways in S11-S14.
Further, the S2 includes the following steps:
s21 is clear about the synonymy relationship in the open link data, DBPedia uses "owl: sameAs" "to identify the synonymous entity," "means" "to identify the synonymous entity in YAGO, and" pageRedirects "" to identify the redirection page of the synonymous entity in Zhishi.me;
s22, in the aspect of encyclopedia, merging the learned concepts in the same encyclopedia, traversing the entity pages in the encyclopedia, identifying the page titles with the same redirection label as the same entity, and identifying the values corresponding to the 'alias' and 'Chinese alias' fields in the entity page information as the same entity;
judging whether different online encyclopedia homonymous entities are synonymous or not: for page articles in different online encyclopedias, the articles with the same title and the article content similarity of more than 80% are marked as pages corresponding to the same entity or concept, and the entity or concept corresponding to the article title is marked as synonym;
s23 extracts an industry text synonymy relation: in the field of industry text, first, defining "X and also name Y," "Y and also name Y," "X and also name Y," "X and also name Y," "X and also name Y", "" X and also name Y "s" are "Y" "X is Y" s "are" Y "s", "X" Y "s" are "Y" s "and" s "Y" s "are" Y "s" and "s" Y "s" are "Y" s "Y" s "Y" s "Y" s "and are also called Y" s "Y" s "and are also called Y" s, then, performing word segmentation and part-of-speech tagging on the text through an NLP tool, obtaining training data according to the extracted synonymy relationship, modeling by using a BilSTM-CRF algorithm, and extracting the synonymy relationship;
s24 combines the synonymy relations obtained from the three ways of S21-S23, and if the synonymy relations obtained from different ways have the same concept or entity, then combines the two synonymy relations.
Furthermore, the chapter content similarity in S22 is obtained by an unsupervised learning method, vector representations of all words are obtained by a word2vec algorithm, for any article, tf-idf of each word in the text is used as a weight, word vectors of all words in the article are weighted and averaged to serve as a vector of the article, and cosine similarity between vectors is used as article similarity.
Further, the S3 includes the following steps:
s31, extracting the superior-inferior relation between industry core concepts from the open link data set and the open knowledge base according to the corresponding rules;
s32, directly acquiring the upper and lower relations among the core concepts from the encyclopedia classification system;
s33 extracting the context relation of the industry text: for an industrial text, firstly, defining that X is Y, X is Y, Z and the like, X comprises Y, Z and the like, X has Y, Z and the like, X means Y, Z and the like, X (Y, Z) is a sentence pattern rule for describing the superior-inferior relation, matching is performed in the industrial text according to the patterns, the superior-inferior relation between entities or concepts is extracted, then word segmentation and word property tagging are performed on the text by an NLP tool, training data is obtained according to the extracted superior-inferior relation, a BilSTM-CRF algorithm is used for modeling, and the superior-inferior relation among triples is extracted;
s34 integrates the upper and lower relations obtained from the three ways of S31-S33 to construct a classification tree.
Further, the S4 includes the following steps:
s41, the attribute relation of the concept can be directly extracted from the information module of the open chain data;
s42 compiling an adapter, extracting entity attribute relation of concepts from an information module of the on-line encyclopedic through page analysis, and counting attributes of entities to which the concepts belong, wherein if the number proportion of certain attributes owned by entities corresponding to one concept exceeds 30%, the attributes are considered to be common and become the attributes of the concepts;
s43 extracting non-context relation of industry texts: in the aspect of industry texts, firstly, common sentence pattern rules for describing non-superior-inferior relations are defined under the assistance of industry experts, matching is carried out in industry texts according to the rules, the non-superior-inferior relations between entities or concepts are extracted, then, word segmentation and part-of-speech tagging are carried out on the texts through an NLP tool, training data are obtained according to the extracted non-superior-inferior relations, modeling is carried out through a BiLSTM-CRF algorithm, and the non-superior-inferior relations are extracted;
s44 finally combines the non-superordinate and superordinate relationships obtained by the three routes S41-S43.
The invention has the beneficial effects that: the multisource construction method of the industry knowledge graph can solve the problems that the existing construction method is large in artificial workload, consumes a large amount of computer resources, is excessive in fragmentation information, is incomplete in data and is difficult to extract and fuse knowledge from different sources in a distinguishing mode, so that the purposes that a target body is constructed, entities and attributes are extracted by adopting a targeted strategy according to different data sources, the characteristics of knowledge from different sources are considered, the knowledge graph is constructed semi-automatically by combining a machine learning method, and the manpower consumed by large-scale knowledge graph construction is greatly reduced while the accuracy is ensured are achieved.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings needed in the embodiments will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings without creative efforts.
FIG. 1 is a flow diagram of a multi-source construction method of an industry knowledge graph according to an embodiment of the invention;
FIG. 2 is a flowchart of the multi-source construction method of an industry knowledge graph for determining whether different online encyclopedia entities are synonymous according to an embodiment of the present invention;
FIG. 3 is a flow chart of extracting synonymous, top-bottom, non-top-bottom relations of industry texts of the multi-source construction method of the industry knowledge graph according to the embodiment of the invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all embodiments. All other embodiments that can be derived by one of ordinary skill in the art from the embodiments given herein are intended to be within the scope of the present invention.
As shown in fig. 1 to 3, the multi-source construction method of an industry knowledge graph according to the embodiment of the present invention includes the following steps:
s1, aiming at four knowledge sources of an open knowledge base, an online encyclopedia, an industry text and industry structure data, industry concepts and entities are extracted;
s2 merging synonymous concepts and entities;
s3 extracting the upper and lower relation of the concept;
s4 extracts non-top and bottom relationships and attribute relationships of concepts and entities.
The above S1 includes the following steps:
s11, collecting the existing open link data set and the business core concepts and entities in an open knowledge base, wherein the open link data set and the open knowledge base comprise DBPedia, YAGO and Zhishi.me;
s12, collecting category labels of classification systems in Wikipedia, encyclopedia and interactive encyclopedia as concepts, titles of encyclopedia articles as entity candidates, and using corresponding brief introduction texts in online encyclopedia as concepts or abstract of entities;
s13, finding out a keyword set for the industry text corpus by adopting word frequency statistics, RAKE, TextRank and TF-IDF methods, and preliminarily screening out an industry core concept from the keyword set by the aid of industry experts;
s14, mapping the related tables and the columns in the tables in the relational database into conceptual entities and attributes of the entities respectively through a D2R Server tool for the industry structure data;
s15 integrates the industry concepts and entities obtained from the four ways in S11-S14.
The above S2 includes the following steps:
s21 is clear about the synonymy relationship in the open link data, DBPedia uses "owl: sameAs" "to identify the synonymous entity," "means" "to identify the synonymous entity in YAGO, and" pageRedirects "" to identify the redirection page of the synonymous entity in Zhishi.me;
s22, in the aspect of encyclopedia, merging the learned concepts in the same encyclopedia, traversing the entity pages in the encyclopedia, identifying the page titles with the same redirection label as the same entity, and identifying the values corresponding to the 'alias' and 'Chinese alias' fields in the entity page information as the same entity;
judging whether different online encyclopedia homonymous entities are synonymous or not: for page articles in different online encyclopedias, the articles with the same title and the article content similarity of more than 80% are marked as pages corresponding to the same entity or concept, and the entity or concept corresponding to the article title is marked as synonym;
s23 extracts an industry text synonymy relation: in the field of industry text, first, defining "X and also name Y," "Y and also name Y," "X and also name Y," "X and also name Y," "X and also name Y", "" X and also name Y "s" are "Y" "X is Y" s "are" Y "s", "X" Y "s" are "Y" s "and" s "Y" s "are" Y "s" and "s" Y "s" are "Y" s "Y" s "Y" s "Y" s "and are also called Y" s "Y" s "and are also called Y" s, then, performing word segmentation and part-of-speech tagging on the text through an NLP tool, obtaining training data according to the extracted synonymy relationship, modeling by using a BilSTM-CRF algorithm, and extracting the synonymy relationship;
s24 combines the synonyms obtained from the three approaches S21-S23, if there are the same concepts or entities in the synonyms obtained from different approaches, then combine two synonyms, for example, obtain the synonym "computer, electronic computer" in encyclopedia, obtain the synonym "computer, computer" in business text, and the combined synonym is "computer, electronic computer, computer".
The similarity of the chapter contents in the S22 is obtained by an unsupervised learning method, vector representations of all words are obtained by a word2vec algorithm, for any article, tf-idf of each word in the text is used as a weight, word vectors of all words in the article are weighted and averaged to serve as a vector of the article, and then cosine similarity between the vectors is used as the article similarity.
The above S3 includes the following steps:
s31, extracting the superior-inferior relation between industry core concepts from the open link data set and the open knowledge base according to the corresponding rules;
s32, directly acquiring the upper and lower relations among the core concepts from the encyclopedia classification system;
s33 extracting the context relation of the industry text: for an industrial text, firstly, defining that X is Y, X is Y, Z and the like, X comprises Y, Z and the like, X has Y, Z and the like, X means Y, Z and the like, X (Y, Z) is a sentence pattern rule for describing the superior-inferior relation, matching is performed in the industrial text according to the patterns, the superior-inferior relation between entities or concepts is extracted, then word segmentation and word property tagging are performed on the text by an NLP tool, training data is obtained according to the extracted superior-inferior relation, a BilSTM-CRF algorithm is used for modeling, and the superior-inferior relation among triples is extracted;
s34 integrates the upper and lower relations obtained from the three ways of S31-S33 to construct a classification tree.
The above S4 includes the following steps:
s41, the attribute relation of the concept can be directly extracted from the information module of the open chain data;
s42 compiling an adapter, extracting entity attribute relation of concepts from an information module of the on-line encyclopedic through page analysis, and counting attributes of entities to which the concepts belong, wherein if the number proportion of certain attributes owned by entities corresponding to one concept exceeds 30%, the attributes are considered to be common and become the attributes of the concepts;
s43 extracting non-context relation of industry texts: in the aspect of industry texts, firstly, common sentence pattern rules for describing non-superior-inferior relations are defined under the assistance of industry experts, matching is carried out in industry texts according to the rules, the non-superior-inferior relations between entities or concepts are extracted, then, word segmentation and part-of-speech tagging are carried out on the texts through an NLP tool, training data are obtained according to the extracted non-superior-inferior relations, modeling is carried out through a BiLSTM-CRF algorithm, and the non-superior-inferior relations are extracted;
s44 finally combines the non-superordinate and superordinate relationships obtained by the three routes S41-S43.
In order to facilitate understanding of the above-described technical aspects of the present invention, the above-described technical aspects of the present invention will be described in detail below in terms of specific usage.
When the method is used specifically, industry concepts and entities are extracted aiming at four knowledge sources of an open knowledge base, an online encyclopedia, industry texts and industry structure data, synonymous concepts and entities are combined, the upper and lower level relations of the concepts are extracted, and the non-upper and lower level and attribute relations of the concepts and the entities are extracted.
In conclusion, by means of the technical scheme, the problems that the existing construction method is large in artificial workload, consumes a large amount of computer resources, is excessive in fragmentation information, is incomplete in data, and is difficult to extract and fuse knowledge from different sources are solved, so that the purposes that a target body is constructed, entities and attributes are extracted by adopting a targeted strategy according to different data sources, the characteristics of knowledge from different sources are considered, the knowledge graph is constructed semi-automatically by combining a machine learning method, and the manpower consumed by large-scale knowledge graph construction is greatly reduced while the accuracy is ensured are achieved.
The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents, improvements and the like that fall within the spirit and principle of the present invention are intended to be included therein.

Claims (6)

1. A multi-source construction method of an industry knowledge graph is characterized by comprising the following steps:
s1, aiming at four knowledge sources of an open knowledge base, an online encyclopedia, an industry text and industry structure data, industry concepts and entities are extracted;
s2 merging synonymous concepts and entities;
s3 extracting the upper and lower relation of the concept;
s4 extracts non-top and bottom relationships and attribute relationships of concepts and entities.
2. The multi-source construction method of industry knowledge graph according to claim 1, wherein the S1 comprises the following steps:
s11, collecting the existing open link data set and the business core concepts and entities in an open knowledge base, wherein the open link data set and the open knowledge base comprise DBPedia, YAGO and Zhishi.me;
s12, collecting category labels of classification systems in Wikipedia, encyclopedia and interactive encyclopedia as concepts, titles of encyclopedia articles as entity candidates, and using corresponding brief introduction texts in online encyclopedia as concepts or abstract of entities;
s13, finding out a keyword set for the industry text corpus by adopting word frequency statistics, RAKE, TextRank and TF-IDF methods, and preliminarily screening out an industry core concept from the keyword set by the aid of industry experts;
s14, mapping the related tables and the columns in the tables in the relational database into conceptual entities and attributes of the entities respectively through a D2R Server tool for the industry structure data;
s15 integrates the industry concepts and entities obtained from the four ways in S11-S14.
3. The multi-source construction method of industry knowledge graph according to claim 1, wherein the S2 comprises the following steps:
s21 is clear about the synonymy relationship in the open link data, DBPedia uses "owl: sameAs" "to identify the synonymous entity," "means" "to identify the synonymous entity in YAGO, and" pageRedirects "" to identify the redirection page of the synonymous entity in Zhishi.me;
s22, in the aspect of encyclopedia, merging the learned concepts in the same encyclopedia, traversing the entity pages in the encyclopedia, identifying the page titles with the same redirection label as the same entity, and identifying the values corresponding to the 'alias' and 'Chinese alias' fields in the entity page information as the same entity;
judging whether different online encyclopedia homonymous entities are synonymous or not: for page articles in different online encyclopedias, the articles with the same title and the article content similarity of more than 80% are marked as pages corresponding to the same entity or concept, and the entity or concept corresponding to the article title is marked as synonym;
s23 extracts an industry text synonymy relation: in the field of industry text, first, defining "X and also name Y," "Y and also name Y," "X and also name Y," "X and also name Y," "X and also name Y", "" X and also name Y "s" are "Y" "X is Y" s "are" Y "s", "X" Y "s" are "Y" s "and" s "Y" s "are" Y "s" and "s" Y "s" are "Y" s "Y" s "Y" s "Y" s "and are also called Y" s "Y" s "and are also called Y" s, then, performing word segmentation and part-of-speech tagging on the text through an NLP tool, obtaining training data according to the extracted synonymy relationship, modeling by using a BilSTM-CRF algorithm, and extracting the synonymy relationship;
s24 combines the synonymy relations obtained from the three ways of S21-S23, and if the synonymy relations obtained from different ways have the same concept or entity, then combines the two synonymy relations.
4. The multi-source construction method of the industry knowledge graph of claim 3, wherein the chapter content similarity in S22 is obtained through an unsupervised learning method, vector representations of all words are obtained through a word2vec algorithm, for any article, tf-idf of each word in the text is used as weight, word vectors of all words in the article are weighted and averaged to serve as the vector of the article, and cosine similarity among the vectors is used as the article similarity.
5. The multi-source construction method of industry knowledge graph according to claim 1, wherein the S3 comprises the following steps:
s31, extracting the superior-inferior relation between industry core concepts from the open link data set and the open knowledge base according to the corresponding rules;
s32, directly acquiring the upper and lower relations among the core concepts from the encyclopedia classification system;
s33 extracting the context relation of the industry text: for an industrial text, firstly, defining that X is Y, X is Y, Z and the like, X comprises Y, Z and the like, X has Y, Z and the like, X means Y, Z and the like, X (Y, Z) is a sentence pattern rule for describing the superior-inferior relation, matching is performed in the industrial text according to the patterns, the superior-inferior relation between entities or concepts is extracted, then word segmentation and word property tagging are performed on the text by an NLP tool, training data is obtained according to the extracted superior-inferior relation, a BilSTM-CRF algorithm is used for modeling, and the superior-inferior relation among triples is extracted;
s34 integrates the upper and lower relations obtained from the three ways of S31-S33 to construct a classification tree.
6. The multi-source construction method of industry knowledge graph according to claim 1, wherein the S4 comprises the following steps:
s41, the attribute relation of the concept can be directly extracted from the information module of the open chain data;
s42 compiling an adapter, extracting entity attribute relation of concepts from an information module of the on-line encyclopedic through page analysis, and counting attributes of entities to which the concepts belong, wherein if the number proportion of certain attributes owned by entities corresponding to one concept exceeds 30%, the attributes are considered to be common and become the attributes of the concepts;
s43 extracting non-context relation of industry texts: in the aspect of industry texts, firstly, common sentence pattern rules for describing non-superior-inferior relations are defined under the assistance of industry experts, matching is carried out in industry texts according to the rules, the non-superior-inferior relations between entities or concepts are extracted, then, word segmentation and part-of-speech tagging are carried out on the texts through an NLP tool, training data are obtained according to the extracted non-superior-inferior relations, modeling is carried out through a BiLSTM-CRF algorithm, and the non-superior-inferior relations are extracted;
s44 finally combines the non-superordinate and superordinate relationships obtained by the three routes S41-S43.
CN202111353417.2A 2021-11-16 2021-11-16 Multi-source construction method of industry knowledge graph Pending CN114153983A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111353417.2A CN114153983A (en) 2021-11-16 2021-11-16 Multi-source construction method of industry knowledge graph

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111353417.2A CN114153983A (en) 2021-11-16 2021-11-16 Multi-source construction method of industry knowledge graph

Publications (1)

Publication Number Publication Date
CN114153983A true CN114153983A (en) 2022-03-08

Family

ID=80456466

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111353417.2A Pending CN114153983A (en) 2021-11-16 2021-11-16 Multi-source construction method of industry knowledge graph

Country Status (1)

Country Link
CN (1) CN114153983A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116450856A (en) * 2023-06-19 2023-07-18 航天宏图信息技术股份有限公司 Meteorological ocean unstructured text knowledge construction method and device and electronic equipment
CN117852637A (en) * 2024-03-07 2024-04-09 南京师范大学 Definition-based subject concept knowledge system automatic construction method and system

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116450856A (en) * 2023-06-19 2023-07-18 航天宏图信息技术股份有限公司 Meteorological ocean unstructured text knowledge construction method and device and electronic equipment
CN116450856B (en) * 2023-06-19 2023-09-12 航天宏图信息技术股份有限公司 Meteorological ocean unstructured text knowledge construction method and device and electronic equipment
CN117852637A (en) * 2024-03-07 2024-04-09 南京师范大学 Definition-based subject concept knowledge system automatic construction method and system
CN117852637B (en) * 2024-03-07 2024-05-24 南京师范大学 Definition-based subject concept knowledge system automatic construction method and system

Similar Documents

Publication Publication Date Title
CN116628172B (en) Dialogue method for multi-strategy fusion in government service field based on knowledge graph
CN108573411B (en) Mixed recommendation method based on deep emotion analysis and multi-source recommendation view fusion of user comments
Tang et al. Using Bayesian decision for ontology mapping
Gaeta et al. Ontology extraction for knowledge reuse: The e-learning perspective
CN110569369A (en) Generation method and device, application method and device of knowledge graph of bank financial system
Xie et al. A novel text mining approach for scholar information extraction from web content in Chinese
CN103500208A (en) Deep layer data processing method and system combined with knowledge base
CN113962293B (en) LightGBM classification and representation learning-based name disambiguation method and system
Yuan-jie et al. Web service classification based on automatic semantic annotation and ensemble learning
CN114153983A (en) Multi-source construction method of industry knowledge graph
CN110888991A (en) Sectional semantic annotation method in weak annotation environment
CN114443855A (en) Knowledge graph cross-language alignment method based on graph representation learning
Ramar et al. Technical review on ontology mapping techniques
Qin et al. Agriculture knowledge graph construction and application
CN116244446A (en) Social media cognitive threat detection method and system
Hu et al. EGC: A novel event-oriented graph clustering framework for social media text
Konys et al. Ontology learning approaches to provide domain-specific knowledge base
CN113610626A (en) Bank credit risk identification knowledge graph construction method and device, computer equipment and computer readable storage medium
CN113377739A (en) Knowledge graph application method, knowledge graph application platform, electronic equipment and storage medium
CN117094390A (en) Knowledge graph construction and intelligent search method oriented to ocean engineering field
Ciravegna et al. LODIE: Linked Open Data for Web-scale Information Extraction.
Zhu et al. Construction of transformer substation fault knowledge graph based on a depth learning algorithm
Maynard et al. Change management for metadata evolution
Li et al. Research on optimization of knowledge graph construction flow chart
Tang et al. Toward detecting mapping strategies for ontology interoperability

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination