CN110750698A - Knowledge graph construction method and device, computer equipment and storage medium - Google Patents

Knowledge graph construction method and device, computer equipment and storage medium Download PDF

Info

Publication number
CN110750698A
CN110750698A CN201910848696.6A CN201910848696A CN110750698A CN 110750698 A CN110750698 A CN 110750698A CN 201910848696 A CN201910848696 A CN 201910848696A CN 110750698 A CN110750698 A CN 110750698A
Authority
CN
China
Prior art keywords
vocabulary
seed
extended
word
vocabularies
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201910848696.6A
Other languages
Chinese (zh)
Inventor
董润华
徐国强
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
OneConnect Smart Technology Co Ltd
Original Assignee
OneConnect Smart Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by OneConnect Smart Technology Co Ltd filed Critical OneConnect Smart Technology Co Ltd
Priority to CN201910848696.6A priority Critical patent/CN110750698A/en
Publication of CN110750698A publication Critical patent/CN110750698A/en
Priority to PCT/CN2020/087714 priority patent/WO2021047188A1/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/951Indexing; Web crawling techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/36Creation of semantic tools, e.g. ontology or thesauri
    • G06F16/367Ontology

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Animal Behavior & Ethology (AREA)
  • Computational Linguistics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The application relates to the field of data analysis knowledge graph drawing, in particular to a knowledge graph construction method and device, computer equipment and a storage medium. The method comprises the following steps: crawling knowledge information of a plurality of webpages to be selected according to the seed vocabularies; acquiring an extended vocabulary according to the jump link; acquiring a vocabulary label of a seed vocabulary on a webpage to be selected, and constructing an upper vocabulary set; performing word filtering on the extended vocabulary according to the superior vocabulary set, and acquiring a target vocabulary set according to the filtered extended vocabulary and the seed vocabulary; and constructing a knowledge graph according to seed vocabularies, extended vocabularies and jump link relations in the upper word set and the target word set. According to the knowledge graph construction method, the seed vocabulary data are obtained, jumping is carried out based on the seed vocabulary data, the jumping vocabularies are obtained through expansion, meanwhile, the jumping vocabularies are classified through the superior word set, the target word set is obtained, then the knowledge graph is constructed based on the target word set, and the construction efficiency is high in the special field.

Description

Knowledge graph construction method and device, computer equipment and storage medium
Technical Field
The present application relates to the field of computer technologies, and in particular, to a method and an apparatus for constructing a knowledge graph, a computer device, and a storage medium.
Background
The knowledge map, also called scientific knowledge map, is called knowledge domain visualization or knowledge domain mapping map in the book intelligence world, is a series of different graphs displaying the relation between the knowledge development process and the structure, describes knowledge resources and carriers thereof by using visualization technology, and excavates, analyzes, constructs, draws and displays knowledge and the mutual relation between the knowledge resources and the carriers. It takes entities or concepts as nodes and connects through semantic relations. By discovering the association between entities, the semi-structured and unstructured data are integrated, and the knowledge graph can help a machine to understand data, explain phenomena and knowledge reasoning, so that deep-level relationships are discovered, and intelligent search and intelligent interaction are realized.
In the traditional method for constructing the knowledge graph in the vertical field, repeated means are needed to identify and screen the knowledge field of the vocabulary in the knowledge acquisition stage so as to ensure that the acquired knowledge conforms to the current field, and the construction efficiency of the knowledge graph is low.
Disclosure of Invention
Based on this, it is necessary that the prior knowledge graph construction process needs to ensure that the acquired knowledge conforms to the current field, which affects the construction efficiency, and a high-efficiency knowledge graph construction method, device, computer equipment and storage medium are provided.
A method of knowledge-graph construction, the method comprising:
crawling knowledge information of a plurality of to-be-selected webpages according to seed vocabularies, wherein the to-be-selected webpages comprise jump links related to the seed vocabularies, and the seed vocabularies are the knowledge vocabularies in the field to which the to-be-constructed knowledge graph belongs;
acquiring an extended vocabulary according to the jump link;
acquiring a vocabulary label of the seed vocabulary on the webpage to be selected, and constructing an upper vocabulary set;
performing word filtering on the extended vocabulary according to the superior vocabulary set, and acquiring a target vocabulary set according to the filtered extended vocabulary and the seed vocabulary;
and constructing a knowledge graph according to the upper word set, the seed words in the target word set, the extended words and the jump link relation.
In one embodiment, before the constructing a knowledge graph according to the upper word set, the seed words in the target word set, the extended words and the jump link relationship, the method further includes:
crawling knowledge information of a plurality of webpages to be selected according to the extended vocabulary after the word filtration, and acquiring an iterative jump link related to the extended vocabulary after the word filtration;
acquiring an iterative extended vocabulary according to the iterative skip link;
performing word filtering on the iterative expansion vocabulary according to the superior word set, and updating the target word set according to the filtered iterative expansion vocabulary;
taking the iterative expanded vocabulary as a new expanded vocabulary after word filtration, returning to the step of crawling knowledge information of a plurality of webpages to be selected according to the expanded vocabulary after word filtration and acquiring iterative jump links related to the expanded vocabulary after word filtration until all latest iterative expanded vocabularies can be filtered through the upper word set;
constructing a knowledge graph according to the seed vocabulary, the extended vocabulary and the jump link relation in the upper word set and the target word set comprises the following steps:
and constructing a knowledge graph according to the upper word set, the seed words in the updated target word set, the extended words, the jump links, the iterative extended words and the iterative jump links.
In one embodiment, before performing word filtering on the iterative extended vocabulary according to the hypernym set, updating the target word set according to the filtered iterative extended vocabulary, the method further includes:
and reconstructing the upper word set according to the seed words, the expanded words and the iterative expanded words in the target word set and the word tags of the webpage to be selected.
In one embodiment, the performing word filtering on the extended vocabulary according to the hypernym set, and acquiring a target vocabulary set vocabulary according to the filtered extended vocabulary and the seed vocabulary includes:
acquiring each vocabulary label corresponding to the expanded vocabulary;
filtering the expanded vocabulary, wherein the ratio of the vocabulary labels belonging to the superior word set to each vocabulary label corresponding to the expanded vocabulary is less than or equal to the expanded vocabulary of a preset classification threshold;
and acquiring a target word set according to the filtered extended vocabulary and the seed vocabulary.
In one embodiment, the crawling of the knowledge information of a plurality of candidate web pages according to the seed vocabulary further comprises, before the candidate web pages include jump links related to the seed vocabulary:
acquiring domain information corresponding to a knowledge graph to be constructed;
and according to the domain information, crawling seed vocabularies of the domain to which the knowledge graph to be constructed belongs from the domain classification tree of the third-party platform based on the script crawler frame and the xpath analysis library.
In one embodiment, before crawling knowledge information of a plurality of web pages to be selected according to the seed vocabulary, the method further includes:
searching the same vocabulary in the seed vocabulary;
searching a synonymous vocabulary in the seed vocabulary through semantic dependency analysis;
and removing the weight of the seed vocabulary according to the same vocabulary and the synonymous vocabulary.
In one embodiment, the obtaining of the vocabulary tags of the seed vocabularies on the to-be-selected web pages and the constructing of the upper vocabulary set include:
acquiring a vocabulary label corresponding to the seed vocabulary;
and when the word label has an incidence relation with a preset core word, classifying the word label into the upper word set.
A knowledge-graph building apparatus, the apparatus comprising:
the vocabulary information acquisition module is used for crawling knowledge information of a plurality of to-be-selected webpages according to seed vocabularies, wherein the to-be-selected webpages comprise jump links related to the seed vocabularies, and the seed vocabularies are the knowledge vocabularies in the field to which the to-be-constructed knowledge maps belong;
the extended vocabulary identification module is used for acquiring extended vocabularies according to the jump links;
the upper word set building module is used for acquiring the word labels of the seed words on the webpage to be selected and building an upper word set;
the word set filtering module is used for carrying out word filtering on the extended vocabulary according to the superior word set and acquiring a target word set according to the filtered extended vocabulary and the seed vocabulary;
and the map construction module is used for constructing a knowledge map according to the upper word set, the seed words in the target word set, the extended words and the jump link relation.
In one embodiment, the system further includes a seed vocabulary acquiring module, configured to:
acquiring domain information corresponding to a knowledge graph to be constructed;
and according to the domain information, crawling seed vocabularies of the domain to which the knowledge graph to be constructed belongs from the domain classification tree of the third-party platform based on the script crawler frame and the xpath analysis library.
A computer device comprising a memory and a processor, the memory storing a computer program, the processor implementing the following steps when executing the computer program:
crawling knowledge information of a plurality of to-be-selected webpages according to seed vocabularies, wherein the to-be-selected webpages comprise jump links related to the seed vocabularies, and the seed vocabularies are the knowledge vocabularies in the field to which the to-be-constructed knowledge graph belongs;
acquiring an extended vocabulary according to the jump link;
acquiring a vocabulary label of the seed vocabulary on the webpage to be selected, and constructing an upper vocabulary set;
performing word filtering on the extended vocabulary according to the superior vocabulary set, and acquiring a target vocabulary set according to the filtered extended vocabulary and the seed vocabulary;
and constructing a knowledge graph according to the upper word set, the seed words in the target word set, the extended words and the jump link relation.
A computer-readable storage medium, on which a computer program is stored which, when executed by a processor, carries out the steps of:
crawling knowledge information of a plurality of to-be-selected webpages according to seed vocabularies, wherein the to-be-selected webpages comprise jump links related to the seed vocabularies, and the seed vocabularies are the knowledge vocabularies in the field to which the to-be-constructed knowledge graph belongs;
acquiring an extended vocabulary according to the jump link;
acquiring a vocabulary label of the seed vocabulary on the webpage to be selected, and constructing an upper vocabulary set;
performing word filtering on the extended vocabulary according to the superior vocabulary set, and acquiring a target vocabulary set according to the filtered extended vocabulary and the seed vocabulary;
and constructing a knowledge graph according to the upper word set, the seed words in the target word set, the extended words and the jump link relation.
According to the knowledge graph construction method, the knowledge graph construction device, the computer equipment and the storage medium, firstly, knowledge information of a plurality of webpages to be selected is crawled according to seed vocabularies; acquiring an extended vocabulary according to the jump link; acquiring a vocabulary label of a seed vocabulary on a webpage to be selected, and constructing an upper vocabulary set; performing word filtering on the extended vocabulary according to the superior vocabulary set, and acquiring a target vocabulary set according to the filtered extended vocabulary and the seed vocabulary; and constructing a knowledge graph according to seed vocabularies, extended vocabularies and jump link relations in the upper word set and the target word set. According to the knowledge graph construction method, the seed vocabulary data are obtained, jumping is carried out based on the seed vocabulary data, the jumping vocabularies are obtained through expansion, meanwhile, the jumping vocabularies are classified through the superior word set, the target word set is obtained, then the knowledge graph is constructed based on the target word set, and the construction efficiency is high in the special field.
Drawings
FIG. 1 is a diagram of an application environment of a method for knowledge graph construction in one embodiment;
FIG. 2 is a schematic flow diagram of a method for knowledge graph construction in one embodiment;
FIG. 3 is a schematic sub-flow chart of step S100 of FIG. 2 in one embodiment;
FIG. 4 is a schematic flow chart diagram of a method of knowledge graph construction in another embodiment;
FIG. 5 is a block diagram showing the structure of a knowledge-graph constructing apparatus according to an embodiment;
FIG. 6 is a diagram illustrating an internal structure of a computer device according to an embodiment.
Detailed Description
In order to make the objects, technical solutions and advantages of the present application more apparent, the present application is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the present application and are not intended to limit the present application.
The knowledge graph construction method provided by the application can be applied to an application environment shown in fig. 1, wherein the graph construction server 102 can communicate with the third-party platform server 104 in a network mode, corresponding data is searched through the third-party platform server 104, the graph construction server 102 first crawls knowledge information of a plurality of to-be-selected webpages from the third-party platform server 104 according to seed vocabularies, the to-be-selected webpages include jump links related to the seed vocabularies, and the seed vocabularies are knowledge vocabularies in the field to which the to-be-constructed knowledge graphs belong. Then the map construction server 102 acquires the extended vocabulary according to the jump link; acquiring a vocabulary label of a seed vocabulary on a webpage to be selected, and constructing an upper vocabulary set; performing word filtering on the extended vocabulary according to the superior vocabulary set, and acquiring a target vocabulary set according to the filtered extended vocabulary and the seed vocabulary; and constructing a knowledge graph according to seed vocabularies, extended vocabularies and jump link relations in the upper word set and the target word set.
As shown in fig. 2, in one embodiment, the method for constructing a knowledge graph of the present application is implemented by a graph construction server, and specifically includes the following steps:
s100, crawling knowledge information of a plurality of to-be-selected webpages according to the seed vocabularies, wherein the to-be-selected webpages comprise jump links relevant to the seed vocabularies, and the seed vocabularies are the knowledge vocabularies in the field to which the to-be-constructed knowledge graph belongs.
The knowledge graph is also called a scientific knowledge graph, is a large-scale semantic network, takes entities or concepts as nodes, and is connected through semantic relations. By discovering the association between entities, the semi-structured and unstructured data are integrated, and the knowledge graph can help a machine to understand data, explain phenomena and knowledge reasoning, so that deep-level relationships are discovered, and intelligent search and intelligent interaction are realized. The domain is the domain faced by the knowledge graph, and the knowledge graph to be constructed has the professionality and is used for facing each vertical domain. The seed vocabulary refers to some concept vocabularies which are more common or important in the vertical field. With respect to the source of the seed vocabulary, in one embodiment, the seed vocabulary may be crawled from a classification tree corresponding to the current domain of the encyclopedia website platform. In another embodiment, the term of art may be obtained from a knowledgebase in the current domain as a seed vocabulary. In another embodiment, the related concept vocabulary can be obtained from the domain literature in the current domain as the seed vocabulary. The candidate web pages refer to web pages that can be provided by a third-party platform, and in one embodiment, a plurality of encyclopedia web pages corresponding to the seed vocabulary can be searched for and serve as the candidate web pages of the seed vocabulary. The knowledge information is corresponding information for explaining and explaining the seed vocabulary, and comprises meaning explanation, expansion explanation and the like of the seed vocabulary. For example, the meaning of the beginning part of encyclopedia is explained, specifically, the explanation of vocabulary blockchain by encyclopedia "blockchain is distributed, which is executed from 2019, 2/15. "this part of the content is the knowledge information of the seed vocabulary block chain. The knowledge information comprises a plurality of vocabularies for assisting in explaining the seed vocabularies, and the vocabularies comprise jump vocabularies which can jump to another vocabulary. The jump link is the link corresponding to the jump vocabulary. In one embodiment, the jump link corresponding to the jump vocabulary can be located by reading the webpage code of the webpage to be selected. In another embodiment, the jump vocabulary positioned in the knowledge information can be identified through a characteristic identification technology, and then the jump operation is carried out through the position information of the jump vocabulary to obtain the jump link.
And S300, acquiring the extended vocabulary according to the jump link.
The server can search concept information corresponding to the seed vocabulary from the encyclopedic website platform according to the seed vocabulary, recognize the jump link in the concept information, acquire the extended vocabulary according to the jump link, and enrich the knowledge map through the extended vocabulary.
S500, acquiring the vocabulary labels of the seed vocabularies on the webpage to be selected, and constructing an upper vocabulary set.
The vocabulary labels are highly generalized vocabularies of various seed vocabularies, belong to the superior vocabularies of the seed vocabularies, and can be used for constructing the superior vocabulary sets corresponding to the field based on the vocabulary labels corresponding to the various seed vocabularies. The vocabulary label refers to a content label added to the current seed vocabulary by each network encyclopedia platform, such as a vocabulary label of the lowest part in encyclopedia, namely, the vocabulary labels of the lowest part in a lower graph, such as a vocabulary block chain, "scientific encyclopedia vocabulary scientific classification", "finance", and "internet", can be recognized as the vocabulary label of the seed vocabulary "block chain". The server can construct a corresponding upper word set based on the vocabulary labels corresponding to various sub-vocabularies through the vocabulary labels corresponding to the acquired seed vocabularies.
And S700, performing word filtering on the extended vocabulary according to the hypernym set, and acquiring a target word set according to the filtered extended vocabulary and the seed vocabulary.
And simultaneously, screening the vocabularies which are iteratively jumped based on the upper word set, and removing the vocabularies which do not belong to the current vertical field. Due to the fact that the middle part is subjected to a skipping process, the generated extended vocabularies may not belong to the current vertical field any more, the field of each extended vocabulary may need to be screened through an upper word set, the vocabularies belonging to the current vertical field are selected as iterative seed vocabularies, and concept information of the iterative seed vocabularies is obtained to form a knowledge graph.
And S900, constructing a knowledge graph according to the seed vocabulary, the extended vocabulary and the jump link relation in the upper word set and the target word set.
The method can acquire the jump relation between the seed vocabulary and the expanded vocabulary, establish the current knowledge graph based on various information which is determined currently, for example, the knowledge node with the upper word set as the highest level, the seed vocabulary as the lower level node and the expanded vocabulary as the lower level node, establish the connection network of the knowledge graph according to the label relation of the upper word set and the seed vocabulary, the jump link relation of each seed vocabulary and the expanded vocabulary, and store the knowledge information corresponding to each node to each corresponding node.
The knowledge graph construction method comprises the steps of crawling knowledge information of a plurality of webpages to be selected according to seed vocabularies; acquiring an extended vocabulary according to the jump link; acquiring a vocabulary label of a seed vocabulary on a webpage to be selected, and constructing an upper vocabulary set; performing word filtering on the extended vocabulary according to the superior vocabulary set, and acquiring a target vocabulary set according to the filtered extended vocabulary and the seed vocabulary; and constructing a knowledge graph according to the seed vocabulary, the extended vocabulary and the jump link in the target word set. According to the knowledge graph construction method, the seed vocabulary data are obtained, jumping is carried out based on the seed vocabulary data, the jumping vocabularies are obtained through expansion, meanwhile, the jumping vocabularies are classified through the superior word set, the target word set is obtained, then the knowledge graph is constructed based on the target word set, and the construction efficiency is high in the special field.
In one embodiment, step S900 is preceded by:
and crawling knowledge information of a plurality of webpages to be selected according to the expanded vocabulary after the word filtration, and acquiring iterative jump links related to the expanded vocabulary after the word filtration.
And acquiring an iterative expansion vocabulary according to the iterative jump link.
And performing word filtering on the iterative expansion vocabulary according to the hypernym set, and updating the target word set according to the filtered iterative expansion vocabulary.
And taking the iterative expanded vocabulary as the new expanded vocabulary after the word filtration, returning to the step of crawling the knowledge information of a plurality of webpages to be selected according to the expanded vocabulary after the word filtration and obtaining the iterative jump link related to the expanded vocabulary after the word filtration until all the latest iterative expanded vocabulary can be filtered through the upper word set.
The obtained expanded vocabulary corresponding to the seed vocabulary can be used as a new seed vocabulary to carry out a new round of jumping, iterative expanded vocabulary is obtained through continuous jumping, and knowledge information corresponding to the iterative expanded vocabulary also needs to be identified. Meanwhile, the vocabularies which are iteratively jumped can be screened based on the upper word set, the vocabularies which do not belong to the current vertical field are filtered, and the vertical field knowledge graph can be greatly enriched by repeated iteration. The current vocabulary iteration can be ended when no new iterative expansion vocabulary exists or all latest iterative expansion vocabularies are filtered out from the upper vocabulary set.
In one embodiment, performing word filtering on the iterative expansion vocabulary according to the hypernym set, and before updating the target word set according to the filtered iterative expansion vocabulary, the method further includes:
and reconstructing an upper word set according to the seed vocabulary, the expanded vocabulary and the iterative expanded vocabulary in the target word set on the vocabulary tags of the webpage to be selected.
The upper word set can be reconstructed by the filtered extended vocabulary or the vocabulary labels corresponding to the iterative extended vocabulary, and the coverage of the constructed vertical domain knowledge graph can be continuously improved by continuously reconstructing the upper word set.
As shown in fig. 3, in one embodiment, S700 includes:
s720, acquiring each vocabulary tag corresponding to the vocabulary expansion vocabulary;
s740, filtering the vocabulary tags belonging to the superior vocabulary set in the expanded vocabulary, wherein the proportion of the vocabulary tags occupying the vocabulary corresponding to the expanded vocabulary is smaller than or equal to the expanded vocabulary of the preset classification threshold;
s760, acquiring a target word set according to the filtered extended vocabulary and the seed vocabulary.
Whether the current expanded vocabulary belongs to the current vertical field vocabulary can be judged based on whether the proportion of the tag information belonging to the superior vocabulary set in the vocabulary tags of the expanded vocabulary is larger than a preset threshold value. For example, the preset threshold is 20%, the encyclopedia page of the extended vocabulary a has 3 tags, wherein the number of the tags in the hypernym set is 1, and 1/3 tags are greater than 20%, and the skipped vocabulary can be determined to be the domain vocabulary without filtering. Based on a 20% percentage, the expanded vocabulary can be saturated, perhaps through three rounds of jumping. The preset classification threshold value can be set according to the current specific vertical field, the breadth of the knowledge graph and the verticality requirement.
In one embodiment, S100 is preceded by:
and acquiring the domain information corresponding to the knowledge graph to be constructed.
And (3) crawling seed vocabularies of the domain to which the knowledge graph to be constructed belongs from the domain classification tree of the encyclopedic website platform based on the script crawler frame and the xpath analysis library according to the domain information.
The structured seed vocabulary refers to seed vocabulary existing in a structured data form, and structured data can be represented and stored by using a relational database, such as MySQL, Oracle, SQL Server and the like, and represent data in a two-dimensional form. The corresponding information can be obtained through the inherent key value. The general characteristics are as follows: data is in row units, one row of data represents information of one entity, and the attribute of each row of data is the same. The storage and arrangement of the structured data is very regular, which is helpful for operations such as query and modification. The server obtains the seed vocabulary through a web crawler technology, for example, the seed vocabulary of the vertical field to which the knowledge graph to be constructed belongs can be obtained from the classification tree of the encyclopedic website based on a script crawler frame and an xpath analysis library, and in another embodiment, the seed vocabulary of the vertical field to which the knowledge graph to be constructed belongs can also be obtained from the classification knowledge library of the encyclopedic website.
In one embodiment, S100 is preceded by:
and searching the same vocabulary in the seed vocabulary.
And searching the synonyms in the seed vocabulary through semantic dependency analysis.
And de-weighting the seed vocabulary according to the same vocabulary and the synonymous vocabulary.
Because the data from different websites can be repeated by using the crawler technology, the filtering operation needs to be carried out on the crawled seed vocabulary, and the vocabulary belongs to the vertical field and has strong professional field, so that the synonymous vocabulary in the seed vocabulary can be identified through semantic analysis, the synonymous vocabulary is divided together, and the deduplication operation is carried out on the synonymous vocabulary. Meanwhile, the same seed vocabulary is filtered through a deduplication operation, and the deduplication can be performed through a python aggregation operation according to the vocabulary name of the seed vocabulary.
As shown in fig. 4, in one embodiment, S500 includes:
s520, acquiring the vocabulary label of the seed vocabulary on the webpage to be selected.
And S540, when the word labels have an association relation with the preset core words, classifying the word labels into an upper word set.
Some core vocabularies in the field can be constructed in advance, which superior words belong to the current vertical field in the labels of the seed vocabularies are judged based on the preset core vocabularies, namely, vocabularies which are in certain connection with the core vocabularies can be regarded as superior vocabularies in the field, and irrelevant superior vocabularies are filtered. The core vocabulary is used for determining the range of the upper vocabulary which cannot exceed the vertical field, and the specific realization can be realized by searching the vocabulary labels and judging whether the concept explanation corresponding to the vocabulary labels contains the core vocabulary or not. In addition, the higher-level words can be audited through manual auditing, or the efficiency of upper-level word auditing is improved through the combination of manual auditing and machine auditing.
In one embodiment, the method for constructing a knowledge graph comprises the following steps: acquiring domain information corresponding to a knowledge graph to be constructed; and (3) crawling seed vocabularies of the domain to which the knowledge graph to be constructed belongs from the domain classification tree of the encyclopedic website platform based on the script crawler frame and the xpath analysis library according to the domain information. Searching the same vocabulary in the seed vocabulary; searching a synonymous word in the seed word through semantic dependency analysis; and de-weighting the seed vocabulary according to the same vocabulary and the synonymous vocabulary. Crawling knowledge information of a plurality of to-be-selected webpages according to the seed vocabularies, wherein the to-be-selected webpages comprise jump links related to the seed vocabularies, and the seed vocabularies are the knowledge vocabularies in the field to which the to-be-constructed knowledge graph belongs; acquiring an extended vocabulary according to the jump link; acquiring a vocabulary label corresponding to the seed vocabulary; and when the vocabulary labels have an incidence relation with the preset core vocabulary, the vocabulary labels are classified into the upper vocabulary set. Performing word filtering on the extended vocabulary according to the superior vocabulary set, and acquiring a target vocabulary set according to the filtered extended vocabulary and the seed vocabulary; crawling knowledge information of a plurality of webpages to be selected according to the expanded vocabulary after the word filtration, and acquiring iterative jump links related to the expanded vocabulary after the word filtration; obtaining an iterative expansion vocabulary according to the iterative jump link; and constructing a knowledge graph according to the upper word set, the seed words in the updated target word set, the expanded words, the jump links, the iterative expanded words and the iterative jump links. Acquiring each vocabulary tag corresponding to the vocabulary expansion vocabulary; filtering the extended vocabulary, wherein the vocabulary tags belonging to the superior vocabulary set in the extended vocabulary occupy the extended vocabulary with the proportion of each vocabulary tag corresponding to the vocabulary extended vocabulary being less than or equal to the preset classification threshold; and acquiring a target word set according to the filtered extended vocabulary and the seed vocabulary. The iterative extended vocabulary is used as the new extended vocabulary after the word filtration, the step of crawling the knowledge information of a plurality of webpages to be selected according to the extended vocabulary after the word filtration and obtaining the iterative jump links related to the extended vocabulary after the word filtration is returned until all the latest iterative extended vocabulary can be filtered through the upper word set; and constructing a knowledge graph according to the upper word set, the seed words in the updated target word set, the expanded words, the jump links, the iterative expanded words and the iterative jump links.
It should be understood that although the various steps in the flow charts of fig. 2-4 are shown in order as indicated by the arrows, the steps are not necessarily performed in order as indicated by the arrows. The steps are not performed in the exact order shown and described, and may be performed in other orders, unless explicitly stated otherwise. Moreover, at least some of the steps in fig. 2-4 may include multiple sub-steps or multiple stages that are not necessarily performed at the same time, but may be performed at different times, and the order of performance of the sub-steps or stages is not necessarily sequential, but may be performed in turn or alternating with other steps or at least some of the sub-steps or stages of other steps.
As shown in fig. 5, a knowledge-graph constructing apparatus includes:
the vocabulary information acquisition module 100 is used for crawling knowledge information of a plurality of to-be-selected webpages according to the seed vocabulary, wherein the to-be-selected webpages comprise jump links related to the seed vocabulary, and the seed vocabulary is a knowledge vocabulary in the field to which the to-be-constructed knowledge map belongs;
an extended vocabulary recognition module 300 for obtaining an extended vocabulary according to the jump link;
the upper word set building module 500 is used for acquiring the word labels of the seed words on the web pages to be selected and building an upper word set;
the word set filtering module 700 is configured to perform word filtering on the extended vocabulary according to the hypernym set, and obtain a target word set according to the filtered extended vocabulary and the seed vocabulary;
and the map building module 900 is configured to build a knowledge map according to the seed vocabulary, the extended vocabulary and the jump link relationship in the upper word set and the target word set.
In one embodiment, the system further comprises an iteration expansion module, a word filtering module and a word filtering module, wherein the iteration expansion module is used for crawling knowledge information of a plurality of webpages to be selected according to the expanded words after the words are filtered and acquiring iteration jump links related to the expanded words after the words are filtered; obtaining an iterative expansion vocabulary according to the iterative jump link; performing word filtering on the iterative expansion vocabulary according to the hypernym set, and updating a target word set according to the filtered iterative expansion vocabulary; the iterative extended vocabulary is used as the new extended vocabulary after the word filtration, the step of crawling the knowledge information of a plurality of webpages to be selected according to the extended vocabulary after the word filtration and obtaining the iterative jump links related to the extended vocabulary after the word filtration is returned until all the latest iterative extended vocabulary can be filtered through the upper word set; the map building module 900 is further configured to build a knowledge map according to the upper word set, the seed vocabulary in the updated target word set, the expanded vocabulary, the jump link, the iterative expanded vocabulary, and the iterative jump link.
In one embodiment, the iterative expansion module is further used for reconstructing the upper word set on the word tags of the webpage to be selected according to the seed words, the expanded words and the iterative expanded words in the target word set.
In one embodiment, the word set filtering module is used for acquiring each vocabulary tag corresponding to the vocabulary extended vocabulary; filtering the extended vocabulary, wherein the vocabulary tags belonging to the superior vocabulary set in the extended vocabulary occupy the extended vocabulary with the proportion of each vocabulary tag corresponding to the vocabulary extended vocabulary being less than or equal to the preset classification threshold; and acquiring a target word set according to the filtered extended vocabulary and the seed vocabulary.
In one embodiment, the system further comprises a seed vocabulary acquisition module, which is used for acquiring the domain information corresponding to the knowledge graph to be constructed; and (3) crawling seed vocabularies of the domain to which the knowledge graph to be constructed belongs from the domain classification tree of the third-party platform based on the script crawler frame and the xpath analysis library according to the domain information.
In one embodiment, the system further comprises a seed vocabulary duplicate removal module, which is used for searching the same vocabulary in the seed vocabulary; searching a synonymous word in the seed word through semantic dependency analysis; and de-weighting the seed vocabulary according to the same vocabulary and the synonymous vocabulary.
In one embodiment, the upper word set building module is used for acquiring a vocabulary tag corresponding to a seed vocabulary; and when the vocabulary labels have an incidence relation with the preset core vocabulary, the vocabulary labels are classified into the upper vocabulary set.
For specific limitations of the knowledge graph constructing apparatus, reference may be made to the above limitations of the knowledge graph constructing method, which are not described herein again. The modules in the knowledge graph constructing apparatus can be wholly or partially implemented by software, hardware and a combination thereof. The modules can be embedded in a hardware form or independent from a processor in the computer device, and can also be stored in a memory in the computer device in a software form, so that the processor can call and execute operations corresponding to the modules.
In one embodiment, a computer device is provided, which may be a server, and its internal structure diagram may be as shown in fig. 6. The computer device includes a processor, a memory, a network interface, and a database connected by a system bus. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device comprises a nonvolatile storage medium and an internal memory. The non-volatile storage medium stores an operating system and a computer program. The internal memory provides an environment for the operation of an operating system and computer programs in the non-volatile storage medium. The network interface of the computer device is used for communicating with an external terminal through a network connection. The database of the computer device is used for storing knowledge-graph related data. The computer program is executed by a processor to implement a method of knowledge-graph construction.
Those skilled in the art will appreciate that the architecture shown in fig. 6 is merely a block diagram of some of the structures associated with the disclosed aspects and is not intended to limit the computing devices to which the disclosed aspects apply, as particular computing devices may include more or less components than those shown, or may combine certain components, or have a different arrangement of components.
In one embodiment, a computer device is provided, comprising a memory and a processor, the memory having a computer program stored therein, the processor implementing the following steps when executing the computer program:
crawling knowledge information of a plurality of to-be-selected webpages according to the seed vocabularies, wherein the to-be-selected webpages comprise jump links related to the seed vocabularies, and the seed vocabularies are the knowledge vocabularies in the field to which the to-be-constructed knowledge graph belongs;
acquiring an extended vocabulary according to the jump link;
acquiring a vocabulary label of a seed vocabulary on a webpage to be selected, and constructing an upper vocabulary set;
performing word filtering on the extended vocabulary according to the superior vocabulary set, and acquiring a target vocabulary set according to the filtered extended vocabulary and the seed vocabulary;
and constructing a knowledge graph according to seed vocabularies, extended vocabularies and jump link relations in the upper word set and the target word set.
In one embodiment, the processor, when executing the computer program, further performs the steps of: crawling knowledge information of a plurality of webpages to be selected according to the expanded vocabulary after the word filtration, and acquiring iterative jump links related to the expanded vocabulary after the word filtration; obtaining an iterative expansion vocabulary according to the iterative jump link; performing word filtering on the iterative expansion vocabulary according to the hypernym set, and updating a target word set according to the filtered iterative expansion vocabulary; the iterative extended vocabulary is used as the new extended vocabulary after the word filtration, the step of crawling the knowledge information of a plurality of webpages to be selected according to the extended vocabulary after the word filtration and obtaining the iterative jump links related to the extended vocabulary after the word filtration is returned until all the latest iterative extended vocabulary can be filtered through the upper word set; and constructing a knowledge graph according to the upper word set, the seed words in the updated target word set, the expanded words, the jump links, the iterative expanded words and the iterative jump links.
In one embodiment, the processor, when executing the computer program, further performs the steps of: and reconstructing an upper word set according to the seed vocabulary, the expanded vocabulary and the iterative expanded vocabulary in the target word set on the vocabulary tags of the webpage to be selected.
In one embodiment, the processor, when executing the computer program, further performs the steps of: acquiring each vocabulary tag corresponding to the vocabulary expansion vocabulary; filtering the extended vocabulary, wherein the vocabulary tags belonging to the superior vocabulary set in the extended vocabulary occupy the extended vocabulary with the proportion of each vocabulary tag corresponding to the vocabulary extended vocabulary being less than or equal to the preset classification threshold; and acquiring a target word set according to the filtered extended vocabulary and the seed vocabulary.
In one embodiment, the processor, when executing the computer program, further performs the steps of: acquiring domain information corresponding to a knowledge graph to be constructed; and (3) crawling seed vocabularies of the domain to which the knowledge graph to be constructed belongs from the domain classification tree of the third-party platform based on the script crawler frame and the xpath analysis library according to the domain information.
In one embodiment, the processor, when executing the computer program, further performs the steps of: searching the same vocabulary in the seed vocabulary; searching a synonymous word in the seed word through semantic dependency analysis; and de-weighting the seed vocabulary according to the same vocabulary and the synonymous vocabulary.
In one embodiment, the processor, when executing the computer program, further performs the steps of: acquiring a vocabulary label corresponding to the seed vocabulary; and when the vocabulary labels have an incidence relation with the preset core vocabulary, the vocabulary labels are classified into the upper vocabulary set.
In one embodiment, a computer-readable storage medium is provided, having a computer program stored thereon, which when executed by a processor, performs the steps of:
crawling knowledge information of a plurality of to-be-selected webpages according to the seed vocabularies, wherein the to-be-selected webpages comprise jump links related to the seed vocabularies, and the seed vocabularies are the knowledge vocabularies in the field to which the to-be-constructed knowledge graph belongs;
acquiring an extended vocabulary according to the jump link;
acquiring a vocabulary label of a seed vocabulary on a webpage to be selected, and constructing an upper vocabulary set;
performing word filtering on the extended vocabulary according to the superior vocabulary set, and acquiring a target vocabulary set according to the filtered extended vocabulary and the seed vocabulary;
and constructing a knowledge graph according to seed vocabularies, extended vocabularies and jump link relations in the upper word set and the target word set.
In one embodiment, the computer program when executed by the processor further performs the steps of: crawling knowledge information of a plurality of webpages to be selected according to the expanded vocabulary after the word filtration, and acquiring iterative jump links related to the expanded vocabulary after the word filtration; obtaining an iterative expansion vocabulary according to the iterative jump link; performing word filtering on the iterative expansion vocabulary according to the hypernym set, and updating a target word set according to the filtered iterative expansion vocabulary; the iterative extended vocabulary is used as the new extended vocabulary after the word filtration, the step of crawling the knowledge information of a plurality of webpages to be selected according to the extended vocabulary after the word filtration and obtaining the iterative jump links related to the extended vocabulary after the word filtration is returned until all the latest iterative extended vocabulary can be filtered through the upper word set; and constructing a knowledge graph according to the upper word set, the seed words in the updated target word set, the expanded words, the jump links, the iterative expanded words and the iterative jump links.
In one embodiment, the computer program when executed by the processor further performs the steps of: and reconstructing an upper word set according to the seed vocabulary, the expanded vocabulary and the iterative expanded vocabulary in the target word set on the vocabulary tags of the webpage to be selected.
In one embodiment, the computer program when executed by the processor further performs the steps of: acquiring each vocabulary tag corresponding to the vocabulary expansion vocabulary; filtering the extended vocabulary, wherein the vocabulary tags belonging to the superior vocabulary set in the extended vocabulary occupy the extended vocabulary with the proportion of each vocabulary tag corresponding to the vocabulary extended vocabulary being less than or equal to the preset classification threshold; and acquiring a target word set according to the filtered extended vocabulary and the seed vocabulary.
In one embodiment, the computer program when executed by the processor further performs the steps of: acquiring domain information corresponding to a knowledge graph to be constructed; and (3) crawling seed vocabularies of the domain to which the knowledge graph to be constructed belongs from the domain classification tree of the third-party platform based on the script crawler frame and the xpath analysis library according to the domain information.
In one embodiment, the computer program when executed by the processor further performs the steps of: searching the same vocabulary in the seed vocabulary; searching a synonymous word in the seed word through semantic dependency analysis; and de-weighting the seed vocabulary according to the same vocabulary and the synonymous vocabulary.
In one embodiment, the computer program when executed by the processor further performs the steps of: acquiring a vocabulary label corresponding to the seed vocabulary; and when the vocabulary labels have an incidence relation with the preset core vocabulary, the vocabulary labels are classified into the upper vocabulary set.
It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by hardware related to instructions of a computer program, which can be stored in a non-volatile computer-readable storage medium, and when executed, can include the processes of the embodiments of the methods described above. Any reference to memory, storage, database, or other medium used in the embodiments provided herein may include non-volatile and/or volatile memory, among others. Non-volatile memory can include read-only memory (ROM), Programmable ROM (PROM), Electrically Programmable ROM (EPROM), Electrically Erasable Programmable ROM (EEPROM), or flash memory. Volatile memory can include Random Access Memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in a variety of forms such as Static RAM (SRAM), Dynamic RAM (DRAM), Synchronous DRAM (SDRAM), Double Data Rate SDRAM (DDRSDRAM), Enhanced SDRAM (ESDRAM), Synchronous Link DRAM (SLDRAM), Rambus Direct RAM (RDRAM), direct bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM).
The technical features of the above embodiments can be arbitrarily combined, and for the sake of brevity, all possible combinations of the technical features in the above embodiments are not described, but should be considered as the scope of the present specification as long as there is no contradiction between the combinations of the technical features.
The above examples only express several embodiments of the present application, and the description thereof is more specific and detailed, but not construed as limiting the scope of the invention. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the concept of the present application, which falls within the scope of protection of the present application. Therefore, the protection scope of the present patent shall be subject to the appended claims.

Claims (10)

1. A method of knowledge-graph construction, the method comprising:
crawling knowledge information of a plurality of to-be-selected webpages according to seed vocabularies, wherein the to-be-selected webpages comprise jump links related to the seed vocabularies, and the seed vocabularies are the knowledge vocabularies in the field to which the to-be-constructed knowledge graph belongs;
acquiring an extended vocabulary according to the jump link;
acquiring a vocabulary label of the seed vocabulary on the webpage to be selected, and constructing an upper vocabulary set;
performing word filtering on the extended vocabulary according to the superior vocabulary set, and acquiring a target vocabulary set according to the filtered extended vocabulary and the seed vocabulary;
and constructing a knowledge graph according to the upper word set, the seed words in the target word set, the extended words and the jump link relation.
2. The method of claim 1, wherein before constructing the knowledge graph according to the superordinate word set, the seed words in the target word set, the extended words and the jump link relationship, further comprising:
crawling knowledge information of a plurality of webpages to be selected according to the extended vocabulary after the word filtration, and acquiring an iterative jump link related to the extended vocabulary after the word filtration;
acquiring an iterative extended vocabulary according to the iterative skip link;
performing word filtering on the iterative expansion vocabulary according to the superior word set, and updating the target word set according to the filtered iterative expansion vocabulary;
taking the iterative expanded vocabulary as a new expanded vocabulary after word filtration, returning to the step of crawling knowledge information of a plurality of webpages to be selected according to the expanded vocabulary after word filtration and acquiring iterative jump links related to the expanded vocabulary after word filtration until all latest iterative expanded vocabularies can be filtered through the upper word set;
constructing a knowledge graph according to the seed vocabulary, the extended vocabulary and the jump link relation in the upper word set and the target word set comprises the following steps:
and constructing a knowledge graph according to the upper word set, the seed words in the updated target word set, the extended words, the jump links, the iterative extended words and the iterative jump links.
3. The method of claim 2, wherein before performing word filtering on the iteratively expanded vocabulary according to the hypernym set, updating the target word set according to the filtered iteratively expanded vocabulary further comprises:
and reconstructing the upper word set according to the seed words, the expanded words and the iterative expanded words in the target word set and the word tags of the webpage to be selected.
4. The method of claim 1, wherein the word filtering the extended vocabulary according to the hypernym set, and wherein iteratively extending the vocabulary according to the filtered extended vocabulary and the seed vocabulary to obtain the target set of words comprises:
obtaining each vocabulary label corresponding to the expanded vocabulary of the iterative expanded vocabulary;
filtering out the extended vocabulary, wherein the vocabulary tags belonging to the upper level word set occupy the extended vocabulary, and the proportion of each vocabulary tag corresponding to the extended vocabulary is less than or equal to the extended vocabulary of a preset classification threshold;
and acquiring a target word set according to the filtered extended vocabulary and the seed vocabulary.
5. The method of claim 1, wherein crawling knowledge information of a plurality of candidate web pages according to a seed vocabulary, the candidate web pages including jumped links related to the seed vocabulary further comprises:
acquiring domain information corresponding to a knowledge graph to be constructed;
and according to the domain information, crawling seed vocabularies of the domain to which the knowledge graph to be constructed belongs from the domain classification tree of the third-party platform based on the script crawler frame and the xpath analysis library.
6. The method of claim 5, wherein prior to crawling the knowledge information of the plurality of candidate web pages according to the seed vocabulary, the method further comprises:
searching the same vocabulary in the seed vocabulary;
searching a synonymous vocabulary in the seed vocabulary through semantic dependency analysis;
and removing the weight of the seed vocabulary according to the same vocabulary and the synonymous vocabulary.
7. The method of claim 1, wherein the obtaining of the vocabulary tags of the seed vocabulary in the to-be-selected web page and the constructing of the upper vocabulary set comprise:
acquiring a vocabulary label corresponding to the seed vocabulary;
and when the word label has an incidence relation with a preset core word, classifying the word label into the upper word set.
8. An apparatus for knowledge-graph construction, the apparatus comprising:
the vocabulary information acquisition module is used for crawling knowledge information of a plurality of to-be-selected webpages according to seed vocabularies, wherein the to-be-selected webpages comprise jump links related to the seed vocabularies, and the seed vocabularies are the knowledge vocabularies in the field to which the to-be-constructed knowledge maps belong;
the extended vocabulary identification module is used for acquiring extended vocabularies according to the jump links;
the upper word set building module is used for acquiring the word labels of the seed words on the webpage to be selected and building an upper word set;
the word set filtering module is used for carrying out word filtering on the extended vocabulary according to the superior word set and acquiring a target word set according to the filtered extended vocabulary and the seed vocabulary;
and the map construction module is used for constructing a knowledge map according to the upper word set, the seed words in the target word set, the extended words and the jump link relation.
9. A computer device comprising a memory and a processor, the memory storing a computer program, wherein the processor implements the steps of the method of any one of claims 1 to 7 when executing the computer program.
10. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the steps of the method of any one of claims 1 to 7.
CN201910848696.6A 2019-09-09 2019-09-09 Knowledge graph construction method and device, computer equipment and storage medium Pending CN110750698A (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN201910848696.6A CN110750698A (en) 2019-09-09 2019-09-09 Knowledge graph construction method and device, computer equipment and storage medium
PCT/CN2020/087714 WO2021047188A1 (en) 2019-09-09 2020-04-29 Knowledge graph construction method and apparatus, and computer device and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910848696.6A CN110750698A (en) 2019-09-09 2019-09-09 Knowledge graph construction method and device, computer equipment and storage medium

Publications (1)

Publication Number Publication Date
CN110750698A true CN110750698A (en) 2020-02-04

Family

ID=69276115

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910848696.6A Pending CN110750698A (en) 2019-09-09 2019-09-09 Knowledge graph construction method and device, computer equipment and storage medium

Country Status (2)

Country Link
CN (1) CN110750698A (en)
WO (1) WO2021047188A1 (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111460168A (en) * 2020-03-27 2020-07-28 西安交通大学 Knowledge graph verification and updating method based on block chain distributed double consensus
CN111478881A (en) * 2020-03-04 2020-07-31 深圳壹账通智能科技有限公司 Bidirectional recommendation method, device, equipment and storage medium for organization and alliance
CN112487212A (en) * 2020-12-18 2021-03-12 清华大学 Method and device for constructing domain knowledge graph
CN112487213A (en) * 2020-12-18 2021-03-12 清华大学 Cross-language-domain knowledge graph construction method and device
WO2021047188A1 (en) * 2019-09-09 2021-03-18 深圳壹账通智能科技有限公司 Knowledge graph construction method and apparatus, and computer device and storage medium
WO2023246007A1 (en) * 2022-06-23 2023-12-28 广州大学 Value chain knowledge discovery method under personalized customization

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102799677A (en) * 2012-07-20 2012-11-28 河海大学 Water conservation domain information retrieval system and method based on semanteme
CN103092956A (en) * 2013-01-17 2013-05-08 上海交通大学 Method and system for topic keyword self-adaptive expansion on social network platform
CN104679836A (en) * 2015-02-06 2015-06-03 中国农业大学 Automatic extension method of agricultural ontology
KR20160128645A (en) * 2015-04-29 2016-11-08 한국전자통신연구원 Method for integrating the different types of vocabulary network
US20170052988A1 (en) * 2015-08-20 2017-02-23 International Business Machines Corporation Normalizing values in data tables
CN108038200A (en) * 2017-12-12 2018-05-15 北京百度网讯科技有限公司 Method and apparatus for storing data
CN108920482A (en) * 2018-04-27 2018-11-30 浙江工业大学 Microblogging short text classification method based on Lexical Chains feature extension and LDA model

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9672251B1 (en) * 2014-09-29 2017-06-06 Google Inc. Extracting facts from documents
CN106875014B (en) * 2017-03-02 2021-06-15 上海交通大学 Automatic construction implementation method of software engineering knowledge base based on semi-supervised learning
CN109902185A (en) * 2019-03-05 2019-06-18 北京工业大学 A kind of water utilities field concept knowledge mapping construction method based on DBpedia
CN110750698A (en) * 2019-09-09 2020-02-04 深圳壹账通智能科技有限公司 Knowledge graph construction method and device, computer equipment and storage medium

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102799677A (en) * 2012-07-20 2012-11-28 河海大学 Water conservation domain information retrieval system and method based on semanteme
CN103092956A (en) * 2013-01-17 2013-05-08 上海交通大学 Method and system for topic keyword self-adaptive expansion on social network platform
CN104679836A (en) * 2015-02-06 2015-06-03 中国农业大学 Automatic extension method of agricultural ontology
KR20160128645A (en) * 2015-04-29 2016-11-08 한국전자통신연구원 Method for integrating the different types of vocabulary network
US20170052988A1 (en) * 2015-08-20 2017-02-23 International Business Machines Corporation Normalizing values in data tables
CN108038200A (en) * 2017-12-12 2018-05-15 北京百度网讯科技有限公司 Method and apparatus for storing data
CN108920482A (en) * 2018-04-27 2018-11-30 浙江工业大学 Microblogging short text classification method based on Lexical Chains feature extension and LDA model

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
MOHAMMAD: "Towards Understanding the Evolution of Cocabulary Terms in Knowledge Graphs", ARXIV, pages 1 - 10 *
杨帆;: "网络舆论事件中微博评论情感倾向及程度研究――以"于欢案"为例", 传媒观察, no. 11, pages 60 - 66 *

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2021047188A1 (en) * 2019-09-09 2021-03-18 深圳壹账通智能科技有限公司 Knowledge graph construction method and apparatus, and computer device and storage medium
CN111478881A (en) * 2020-03-04 2020-07-31 深圳壹账通智能科技有限公司 Bidirectional recommendation method, device, equipment and storage medium for organization and alliance
CN111460168A (en) * 2020-03-27 2020-07-28 西安交通大学 Knowledge graph verification and updating method based on block chain distributed double consensus
CN112487212A (en) * 2020-12-18 2021-03-12 清华大学 Method and device for constructing domain knowledge graph
CN112487213A (en) * 2020-12-18 2021-03-12 清华大学 Cross-language-domain knowledge graph construction method and device
WO2023246007A1 (en) * 2022-06-23 2023-12-28 广州大学 Value chain knowledge discovery method under personalized customization

Also Published As

Publication number Publication date
WO2021047188A1 (en) 2021-03-18

Similar Documents

Publication Publication Date Title
CN110750698A (en) Knowledge graph construction method and device, computer equipment and storage medium
US10664696B2 (en) Systems and methods for classification of software defect reports
CN109885692B (en) Knowledge data storage method, apparatus, computer device and storage medium
US9588990B1 (en) Performing image similarity operations using semantic classification
US10114804B2 (en) Representation of an element in a page via an identifier
US20170109272A1 (en) Source code flow analysis using information retrieval
CN108270629A (en) Site visitor's behavior monitoring method and device
CN110515986B (en) Processing method and device of social network diagram and storage medium
CN111444352A (en) Knowledge graph construction method and device based on knowledge node membership
CN109145235B (en) Method and device for analyzing webpage and electronic equipment
EP3685243A1 (en) Content pattern based automatic document classification
US10380459B2 (en) System and method for image classification
US20200311061A1 (en) System and method for subset searching and associated search operators
US20220253486A1 (en) Machine learning applications to improve online job listings
CN104408180A (en) Stored data inquiring method and device
CN109344177B (en) Model combination method and device
Bobek et al. Introducing uncertainty into explainable ai methods
CN113704420A (en) Method and device for identifying role in text, electronic equipment and storage medium
CN115130601A (en) Two-stage academic data webpage classification method and system based on multi-dimensional feature fusion
CN110781310A (en) Target concept graph construction method and device, computer equipment and storage medium
CN114385808A (en) Text classification model construction method and text classification method
Acosta-Mendoza et al. Mining clique frequent approximate subgraphs from multi-graph collections
Liu et al. Research on adaptive wrapper in deep web data extraction
Sbodio et al. Tag clustering with self organizing maps
Huynh et al. A novel approach for mining closed clickstream patterns

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination