CN110609906A - Knowledge graph construction method and device, storage medium and electronic terminal - Google Patents

Knowledge graph construction method and device, storage medium and electronic terminal Download PDF

Info

Publication number
CN110609906A
CN110609906A CN201910870536.1A CN201910870536A CN110609906A CN 110609906 A CN110609906 A CN 110609906A CN 201910870536 A CN201910870536 A CN 201910870536A CN 110609906 A CN110609906 A CN 110609906A
Authority
CN
China
Prior art keywords
data
target
entity
knowledge
graph
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201910870536.1A
Other languages
Chinese (zh)
Other versions
CN110609906B (en
Inventor
孙树春
陈阳
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Golden Panda Co Ltd
Original Assignee
Golden Panda Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Golden Panda Co Ltd filed Critical Golden Panda Co Ltd
Priority to CN201910870536.1A priority Critical patent/CN110609906B/en
Publication of CN110609906A publication Critical patent/CN110609906A/en
Application granted granted Critical
Publication of CN110609906B publication Critical patent/CN110609906B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/36Creation of semantic tools, e.g. ontology or thesauri
    • G06F16/367Ontology

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Animal Behavior & Ethology (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The present disclosure relates to the field of computer technologies, and in particular, to a method and an apparatus for constructing a knowledge graph, a storage medium, and an electronic terminal. The method comprises the following steps: acquiring data to be processed, and converting the data to be processed to acquire source data; the source data comprises entity data and relation data and is respectively stored in a first storage unit and a second storage unit; constructing custom target configuration data by combining preset custom basic configuration data and the data to be processed; the custom target configuration data comprises: any one item or combination of any multiple items in the entity category, the entity attribute, the entity value range and the entity relationship category; screening the source data by the first storage unit and the second storage unit based on the user-defined target configuration data to obtain target data; processing the target data to generate a target knowledge graph. The method and the system can realize customized construction of the knowledge graph.

Description

Knowledge graph construction method and device, storage medium and electronic terminal
Technical Field
The present disclosure relates to the field of computer technologies, and in particular, to a method and an apparatus for constructing a knowledge graph, a storage medium, and an electronic terminal.
Background
Knowledge Graph (knowledgegraph) was formally proposed by Google in 2012, and its original intention was to improve the capability of search engines and improve the search quality and experience of users. The knowledge graph describes various entities, concepts and relationships between them existing in the real world in the form of a graph, thereby providing a knowledge base for information processing. Knowledge maps have become one of the key technologies of artificial intelligence, and are widely applied to applications such as intelligent search, automatic question answering, personalized recommendation, content distribution and the like.
The prior art still has certain defects and shortcomings when the knowledge graph is constructed. For example, in some solutions, the triple data may be constructed before the knowledge-graph is constructed, but such solutions lack consideration of the overall structure of the knowledge-graph. In addition, the need for customization of the knowledge graph cannot be achieved.
It is to be noted that the information disclosed in the above background section is only for enhancement of understanding of the background of the present disclosure, and thus may include information that does not constitute prior art known to those of ordinary skill in the art.
Disclosure of Invention
The present disclosure is directed to a method, an apparatus, a storage medium, and an electronic terminal for building a knowledge graph, which enable customized construction of a knowledge graph, and thereby overcome one or more problems due to limitations and disadvantages of related art at least to some extent.
Additional features and advantages of the disclosure will be set forth in the detailed description which follows, or in part will be obvious from the description, or may be learned by practice of the disclosure.
According to a first aspect of the present disclosure, there is provided a knowledge-graph construction method, including:
acquiring data to be processed, and converting the data to be processed to acquire source data; the source data comprises entity data and relation data and is respectively stored in a first storage unit and a second storage unit;
constructing custom target configuration data by combining preset custom basic configuration data and the data to be processed; the custom target configuration data comprises: any one item or combination of any multiple items in the entity category, the entity attribute, the entity value range and the entity relationship category;
screening the source data by the first storage unit and the second storage unit based on the user-defined target configuration data to obtain target data;
processing the target data to generate a target knowledge graph.
In an exemplary embodiment of the present disclosure, the method further comprises:
pre-constructing user-defined basic configuration data based on RDFS; wherein the custom base configuration data comprises: any one item or combination of any multiple items in the entity category, the entity attribute, the entity value range and the entity relationship category;
and carrying out format conversion on the custom basic configuration data based on the RDFS so as to obtain the custom basic configuration data in a preset format.
In an exemplary embodiment of the present disclosure, when generating the target knowledge-graph, the method further includes:
generating a query instruction for reading the target knowledge graph according to the query instruction to acquire actual configuration parameters;
and comparing the consistency of the actual configuration parameters with the custom basic configuration data to generate a detection statistical result.
In an exemplary embodiment of the present disclosure, the detecting the statistical result includes: entity statistics and relationship statistics.
In an exemplary embodiment of the present disclosure, the method further comprises:
comparing the detection statistical result with a preset parameter standard;
when the detection statistical result does not meet the preset parameter standard, re-executing the knowledge graph construction method to obtain an updated detection statistical result;
and judging whether the updated detection statistical result meets the parameter standard.
In an exemplary embodiment of the present disclosure, the processing the target data to generate a target knowledge-graph includes:
inputting the target data into a graphics processing tool to generate the target knowledge-graph.
In an exemplary embodiment of the present disclosure, the method further comprises:
creating a corresponding number of knowledge-graph creation tasks in response to at least one knowledge-graph creation request;
and creating tasks for each knowledge graph and concurrently executing the knowledge graph construction method to obtain a plurality of target knowledge graphs.
According to a second aspect of the present disclosure, there is provided another knowledge-graph constructing apparatus, comprising:
the device comprises a to-be-processed data conversion module, a source data conversion module and a data processing module, wherein the to-be-processed data conversion module is used for acquiring to-be-processed data and converting the to-be-processed data to acquire source data; the source data comprises entity data and relation data and is respectively stored in a first storage unit and a second storage unit;
the target configuration data setting module is used for combining preset user-defined basic configuration data and the data to be processed to construct user-defined target configuration data; the custom target configuration data comprises: any one item or combination of any multiple items in the entity category, the entity attribute, the entity value range and the entity relationship category;
the target data screening module is used for screening the source data by the first storage unit and the second storage unit based on the user-defined target configuration data to obtain target data;
and the knowledge graph creating module is used for processing the target data to generate a target knowledge graph.
According to a third aspect of the present disclosure, there is provided a storage medium having stored thereon a computer program which, when executed by a processor, implements the above-described knowledge-graph construction method.
According to a fourth aspect of the present disclosure, there is provided an electronic terminal comprising:
a processor; and
a memory for storing executable instructions of the processor;
wherein the processor is configured to perform the above-described method of knowledge-graph construction via execution of the executable instructions
In the knowledge graph construction method provided by the embodiment of the disclosure, the preset user-defined basic data and the data to be processed are combined, so that user-defined target configuration data can be constructed for a user, data can be screened according to the target configuration data, and the customized construction of the knowledge graph is realized. And by independently storing the entity data and the relationship data, the nodes can be constructed first and then the relationship among the nodes can be constructed when the knowledge graph is constructed, so that the requirement of customizing the specified entity and relationship can be met. In addition, the source data is screened by using the user-defined target configuration data, so that the interference of redundant data can be effectively reduced.
It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the present disclosure and together with the description, serve to explain the principles of the disclosure. It is to be understood that the drawings in the following description are merely exemplary of the disclosure, and that other drawings may be derived from those drawings by one of ordinary skill in the art without the exercise of inventive faculty.
FIG. 1 schematically illustrates a flow diagram of a method of knowledge-graph construction in an exemplary embodiment of the disclosure;
FIG. 2 schematically illustrates a flow diagram of a method of detecting and accounting data for a knowledge-graph in an exemplary embodiment of the disclosure;
FIG. 3 schematically illustrates a flowchart of a method of concurrently performing a plurality of knowledge-graph creation tasks in an exemplary embodiment of the disclosure;
FIG. 4 schematically illustrates a composition diagram of a knowledge graph construction apparatus in an exemplary embodiment of the present disclosure;
FIG. 5 schematically illustrates a composition diagram of an electronic device in an exemplary embodiment of the disclosure;
fig. 6 schematically illustrates a schematic diagram of a program product in an exemplary embodiment of the disclosure.
Detailed Description
Example embodiments will now be described more fully with reference to the accompanying drawings. Example embodiments may, however, be embodied in many different forms and should not be construed as limited to the examples set forth herein; rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the concept of example embodiments to those skilled in the art. The described features, structures, or characteristics may be combined in any suitable manner in one or more embodiments.
The example embodiment first provides a method for constructing a knowledge graph, which can be applied to personalized customization and construction of knowledge graphs in different technical fields or different data. Referring to fig. 1, the above-mentioned knowledge graph construction method may include the steps of:
step S11, acquiring data to be processed, and converting the data to be processed to acquire source data; the source data comprises entity data and relation data and is respectively stored in a first storage unit and a second storage unit;
step S12, constructing user-defined target configuration data by combining preset user-defined basic configuration data and the data to be processed; the custom target configuration data comprises: any one item or combination of any multiple items in the entity category, the entity attribute, the entity value range and the entity relationship category;
step S13, based on the user-defined target configuration data, the first storage unit and the second storage unit filter the source data to obtain target data;
step S14, processing the target data to generate a target knowledge-graph.
The knowledge-graph construction method provided by the present exemplary embodiment,
by combining preset user-defined basic data and data to be processed, on one hand, by independently storing entity data and relationship data, nodes can be established first and then relationships among the nodes can be established when the knowledge graph is established, and further the requirements of customizing specified entities and relationships are met. On the other hand, user-defined target configuration data can be constructed for the user, so that data can be screened according to the target configuration data, and customized construction of the knowledge graph is achieved. In addition, the source data is screened by using the user-defined target configuration data, so that the interference of redundant data can be effectively reduced.
Hereinafter, each step in the knowledge graph constructing method according to the exemplary embodiment will be described in more detail with reference to the drawings and examples.
Step S101, pre-constructing user-defined basic configuration data based on RDFS; wherein the custom base configuration data comprises: any one item or combination of any multiple items in the entity category, the entity attribute, the entity value range and the entity relationship category;
and step S102, carrying out format conversion on the custom basic configuration data based on the RDFS so as to obtain the custom basic configuration data in a preset format.
In this exemplary embodiment, for different technical fields or for data with different contents, a user may first customize an RDFS (Resource Description Framework) -based userSchema, resource description language framework) And generating RDFSchemaA defined configuration file. Specifically, the user may define the entity category, the basic attribute and value range of the entity, the relationship category between the entities, and other configuration parameters according to the syntax of the RDF Schema. In addition, for the custom base configuration data constructed by using the RDF Schema, the data structure thereof may use a CSV (Comma-separated values) format, or a TSV (Tab-separated values) format, or other formats that facilitate distinguishing entities and relationships.
For example, the user may customize the customized basic configuration data of different contents for the medical data and the communication data by using the syntax of the RDF Schema.
In addition, the configuration file corresponding to the user-defined basic configuration data can be subjected to format conversion through natural language processing. For example, the conversion is made into a format which is easy to be automatically processed by a machine, such as JSON format, YAML format, or XML format. Specifically, the category of the attribute can be judged through RDFS (domain description), the value range of the attribute or the relationship can be judged through RDFS (range description), and the whole attribute or relationship is judgedSchemaThe definition of (A) is converted into a structure of JSON, YAML, XML and the like. The basic attributes corresponding to the entities and the relationship between the entities and the entity keys are distinguished in the structure, so that the subsequent automatic analysis of the server is facilitated.
Alternatively, in other exemplary embodiments of the present disclosure, custom base configuration data may also be constructed using OWL (Web Ontology Language) or OWL 2.
Step S11, acquiring data to be processed, and converting the data to be processed to acquire source data; the source data comprises entity data and relationship data and is respectively stored in the first storage unit and the second storage unit.
In this exemplary embodiment, the method described above may be performed on the server side. Specifically, the server may receive the structured data to be processed uploaded by the user, or obtain the data to be processed in a crawling manner. After the server side obtains the data to be processed, the server side can convert the data to be processed into source data, namely the RDF data of the triples.
In particular, RDF data can be divided into entity data and relationship data. The entity data can be used as node data, and the relationship data is used for describing the relationship between the entities. The converted entity data and the relationship data can be stored respectively. For example, the first storage unit and the second storage unit may be different databases. Or it may be a folder or file system with different paths and file names in a database.
Alternatively, in other exemplary embodiments, the entity data and the relationship data may be stored in the same database, distinguished by specific entity data fields and relationship data fields, or configured with different identification information.
By storing the entity data and the relationship data respectively, when the knowledge graph is constructed, the nodes in the knowledge graph can be constructed according to the entity data, and then the relationship between the nodes at different levels in the knowledge graph is established according to the relationship data, so that the data of the entities and the relationship are decoupled, and the requirement of customizing the specified entities and relationship is met.
Step S12, constructing user-defined target configuration data by combining preset user-defined basic configuration data and the data to be processed; the custom target configuration data comprises: any one or combination of any plurality of entity categories, entity attributes, entity value ranges and entity relationship categories.
In the exemplary embodiment, a user can customize the screening conditions according to the knowledge graph to be customized and constructed, and generate corresponding customized target configuration data. Specifically, the user may combine the data content of the data to be processed at the user terminal, select the entity type and the relationship type, and the entity attribute and the entity value range, or the parameters such as the relationship attribute and the relationship value range, which are required by the knowledge graph to be generated, from the predetermined customized basic configuration data, generate customized target configuration data according to the specifically selected parameters, and send the customized target configuration data to the server. For example, the custom target configuration data may be in the form of JSON, YAML, or XML.
And customizing the configuration parameters of the knowledge graph by a user-defined target configuration parameter.
Step S13, based on the user-defined target configuration data, the first storage unit and the second storage unit filter the source data to obtain target data.
In the exemplary embodiment, after the user-defined target configuration parameters are determined, the RDF data corresponding to the determined target configuration parameters may be filtered in the first storage unit and the second storage unit according to the entity-related parameters and the relationship-related parameters, that is, the entity data and the relationship data are used as target data.
Step S14, processing the target data to generate a target knowledge-graph.
In the exemplary embodiment, the target data obtained by screening may be input into a graphic processing tool, and a target knowledge graph corresponding to the target data may be automatically generated. For example, the graphics processing tool may employ the Neo4j graphics database, and then import the RDF data via the neosemantics plug-in. In the Neo4j graph database, a knowledge graph is constructed with each entity data as a node and relationship data as an edge. Wherein, the node, namely the entity, is marked by a globally unique ID; relationships (attributes) are used to connect two nodes.
Based on the above, in the present exemplary embodiment, referring to fig. 2, when generating the target knowledge-graph, the method may further include:
step S21, generating a query instruction for reading the target knowledge graph according to the query instruction to obtain actual configuration parameters;
step S22, comparing the actual configuration parameters with the custom basic configuration data to generate a detection statistical result.
After the server generates and generates the target knowledge graph, a query instruction can be generated in response to the generation state of the target knowledge graph, so that the server can automatically generate a query statement according to the query instruction and the query language of the graph database according to the structured RDF Schema, and query relevant parameters such as entities and relations contained in the target knowledge graph, attribute value domains of the entities or relations and the like to serve as actual configuration parameters.
For example, when using Neo4j graphic database, a cypher query statement, or a gremlin query statement, a graphql query statement, etc. may be used. The present disclosure is not particularly limited with respect to specific query statements.
In addition, after the current actual configuration parameters of the target knowledge graph are obtained, the current actual configuration parameters can be compared with the user-defined basic configuration parameters preset by a user, and a detection statistical result is generated. In the detection statistical report, the number of entities and relations, and the entity categories and relation categories corresponding to the current actual data in the target knowledge graph can be compared and counted; and counting parameters such as filling rate of corresponding relations in the entities, relations between the entities and the like.
By generating the detection statistical result according to the parameters, the data in the target knowledge graph is counted from multiple dimensions, so that the construction scale of the entity and the relation can be integrally checked, and the entity data and the relation data which are missing in the target knowledge graph are controlled more accurately.
Further, based on the above, in other exemplary embodiments of the present disclosure, the method described above may further include:
step S31, comparing the detection statistical result with a preset parameter standard;
step S32, when the detection statistical result does not meet the preset parameter standard, re-executing the knowledge graph construction method to obtain an updated detection statistical result;
step S33, determining whether the updated detection statistic result satisfies the parameter standard.
In the exemplary embodiment, after the detection statistical result is obtained, the specific data content in the detection statistical result may be compared with a preset parameter standard to determine whether the requirement of the user is met. For example, for the entity type, a user may preset a certain fault tolerance rate, and if it is determined according to the detection statistical result that the error rate corresponding to the entity type is zero, it indicates that there is no error of the entity type in the target knowledge graph. Or, if the entity type is missing and the ratio of the error data is higher than the preset threshold, the control instruction may be generated, the above-mentioned step S12-step S14 are executed again, a new target knowledge graph is obtained again, the target knowledge graph is counted again to obtain a new detection statistical result, until the parameter standard set by the user is determined to be satisfied according to the detection statistical result.
In other exemplary embodiments of the present disclosure, as shown with reference to fig. 3, the method described above may further include:
step S41, responding to at least one knowledge graph establishing request, establishing a corresponding number of knowledge graph establishing tasks;
step S42, creating a task for each of the knowledge graphs and concurrently executing the knowledge graph construction method to obtain a plurality of the target knowledge graphs.
In particular, the method described above may be performed at the server. Different users can respectively submit knowledge graph establishing requests to the server side. After receiving the requests, the server side can create corresponding knowledge graph creation tasks and add the tasks to a preset task list. And each knowledge graph establishing task can be executed concurrently, and the knowledge graphs with different contents under a plurality of application scenes can be efficiently established.
Alternatively, in other disclosed exemplary embodiments, the method may be executed in a user-side terminal device, and a user may submit a plurality of knowledge graph creation requests for different data contents, so that the terminal device may concurrently execute each knowledge graph creation task. The present disclosure does not specifically limit the execution end of the above method.
For example, for the medical field, the basic configuration data is first customized by using the RDFS language, and specifically, the entity types can be customized, including:
1) a gene; s is Gene a rdfs: Class; rdfs label "Gene"; "comment" gene "in rdfs;
2) a pharmaceutical; s is Drug a rdfs is Class; rdfs label "Drug"; "medicine" in rdfs;
3) diseases; s is Disease a rdfs is Class; rdfs, label "Disease"; comment "disease".
Attributes/relationship definitions may also be customized, and specific data structures may include:
name/code; s is name a rdf Property; rdfs label "Name"; s typeName "name"; comment "name"; rdfs, domains, Gene, Drug, and Disease; rdfs, range xsd, string.
For example, for the custom entity types described above, the attribute/relationship definition may include:
1) causing disease; s is calseDisease a rdf Property; rdfs, label "Disease"; s typeName "causes disease"; comment "causes disease"; rdfs, domains, Gene; rdfs, range s, Disease;
2) a targeting agent; s targetDrug a rdf Property; rdfs label "Drug"; s typeName "targeting drugs"; comment "targeting medicine"; rdfs, domains, Gene; rdfs (range s) Drug.
And converting the self-defined structure into configuration data in a JSON format.
After the designated medical data to be processed is acquired, corresponding RDF data can be generated according to the definition of the RDF Schema and stored in a file system or a non-used database according to a preset file structure. For example, the path of the entity data may include: rdf/nodes/disease.ttl; rdf/nodes/drug.ttl; ttl rdf/nodes/gene. The path of the attribute/relationship data may include: rdf/relationships/release.ttl; rdf/relationships/drug.ttl; rdf/relationships/gene.ttl; rdf/relationships/gene-drug.ttl; ttl rdf/relationships/gene-disease.
For the above-mentioned specified medical data, if the user specifies an entity class, a knowledge map containing only "genes" and "drugs" is constructed. According to the generated user-defined basic configuration file in the JSON format, the following files can be obtained for construction: at the same time that the designated entity and the entity's own related attribute rdf data are acquired, rdf/relationships/gene-drug.ttl is acquired because there is an s __ targetdrug relationship between the gene and the drug.
Specifically, the entity data to be screened may include: rdf/nodes/drug.ttl; ttl rdf/nodes/gene. The screened attribute/relationship data may include: rdf/relationships/drug.ttl; rdf/relationships/gene.ttl; rdf/relationships/gene-drug. And automatically importing the screened data into a Neo4j tool through a script to generate a knowledge graph.
And generating a corresponding query statement according to the user-defined basic configuration data of the JSON format, acquiring an actual result, comparing the actual result with the configuration data definition of the JSON format, and detecting errors. And generating corresponding query statistical statements, counting specific data indexes and generating corresponding reports.
In the instruction map construction method provided by the disclosure, related configuration parameters of entity categories and relationship categories in a knowledge map are defined in advance by using an RDF Schema, and a user-defined basic configuration parameter is generated; and converts it into a data format that is easily processed automatically by a machine using natural language processing. And converting the data to be processed into entity data and relation data, and respectively storing the entity data and the relation data in different system files or databases. Determining a screening condition according to the knowledge graph to be customized and constructed, and generating custom target configuration data, wherein the method specifically comprises the following steps: specifying an entity class, specifying a relationship class, or specifying other attribute value range. And automatically screening the RDF data in the corresponding range from the entity data and the relation data according to the specified configuration parameters, and importing the RDF data into a database for constructing a knowledge graph to obtain the knowledge graph. Meanwhile, parameters such as entity types to be counted, relationship types and the like and related logics can be converted into query statements according to the user-defined target configuration parameters and the user-defined basic configuration parameters, and the current actual parameters of the knowledge graph are queried. And counting actual data in the knowledge graph from multiple dimensions, carrying out consistency detection on the actual data and the user-defined basic configuration parameters, and producing a detection statistical result report.
By screening the data range of the triples by using the self-defined configuration parameters, the customized construction of the knowledge graph is realized, and the disturbance of redundant data is effectively reduced. And by defining the structured user-defined basic configuration parameters in advance by using the RDF Schema and defining the target configuration parameters according to the content and the characteristics of the data, the combination with the actual data can be tighter, the statistics and the detection of the actual data in the follow-up process are convenient, and the statistics and the analysis of the whole data of the knowledge graph are convenient. Moreover, the construction of a plurality of knowledge maps can be executed concurrently, and the knowledge maps with different contents under a plurality of application scenes can be efficiently constructed at the same time.
It is to be noted that the above-mentioned figures are only schematic illustrations of the processes involved in the method according to an exemplary embodiment of the invention, and are not intended to be limiting. It will be readily understood that the processes shown in the above figures are not intended to indicate or limit the chronological order of the processes. In addition, it is also readily understood that these processes may be performed synchronously or asynchronously, e.g., in multiple modules.
Further, referring to fig. 4, in the present exemplary embodiment, a knowledge graph constructing apparatus 40 is further provided, including: a to-be-processed data conversion module 401, a target configuration data setting module 402, a target data screening module 403, and a knowledge graph creation module 404. Wherein:
the to-be-processed data conversion module 401 may be configured to acquire to-be-processed data, and convert the to-be-processed data to acquire source data; the source data comprises entity data and relationship data and is respectively stored in the first storage unit and the second storage unit.
The target configuration data setting module 402 may be configured to construct custom target configuration data by combining preset custom basic configuration data and the to-be-processed data; the custom target configuration data comprises: any one or combination of any plurality of entity categories, entity attributes, entity value ranges and entity relationship categories.
The target data filtering module 403 may be configured to filter the source data by the first storage unit and the second storage unit based on the customized target configuration data to obtain target data.
The knowledge-graph creation module 404 may be configured to process the target data to generate a target knowledge-graph.
In the present exemplary embodiment, the apparatus 40 further includes: the device comprises a basic configuration data definition module and a format conversion module. Wherein the content of the first and second substances,
the basic configuration data definition module can be used for pre-constructing the custom basic configuration data based on the RDFS; wherein the custom base configuration data comprises: any one or combination of any plurality of entity categories, entity attributes, entity value ranges and entity relationship categories.
The format conversion module may be configured to perform format conversion on the RDFS-based custom base configuration data to obtain custom base configuration data in a preset format.
In the present exemplary embodiment, the apparatus 40 further includes: the device comprises an actual data query module and a statistic module. Wherein the content of the first and second substances,
the actual data query module may be configured to generate a query instruction, so as to read the target knowledge graph according to the query instruction to obtain an actual configuration parameter.
The statistical module may be configured to compare the actual configuration parameters with the custom base configuration data in a consistent manner to generate a detection statistical result.
In the present exemplary embodiment, the detection statistic includes: entity statistics and relationship statistics.
In the present exemplary embodiment, the apparatus 40 further includes: the device comprises a parameter comparison module, an updating module and a re-judging module. Wherein the content of the first and second substances,
the parameter comparison module may be configured to compare the detection statistical result with a preset parameter standard.
The updating module may be configured to re-execute the knowledge graph construction method when the detection statistical result does not meet the preset parameter standard, so as to obtain an updated detection statistical result.
The judging module may be configured to judge whether the updated detection statistical result satisfies the parameter criterion.
In the exemplary embodiment, the knowledge-graph creation module 404 includes: a graphics tool processing unit.
The graphics tool processing unit may be configured to input the target data into a graphics processing tool to generate the target knowledge-graph.
In the present exemplary embodiment, the apparatus 40 further includes: the system comprises a task creating module and a concurrent execution control module. Wherein the content of the first and second substances,
the task creation module may be to create a corresponding number of knowledge-graph creation tasks in response to at least one knowledge-graph creation request.
The concurrent execution control module may be configured to create a task for each of the knowledge-maps and concurrently execute the knowledge-map construction method to obtain a plurality of the target knowledge-maps.
The specific details of each module in the aforementioned knowledge-graph constructing apparatus 40 have been described in detail in the corresponding knowledge-graph constructing method, and therefore are not described herein again.
It should be noted that although in the above detailed description several modules or units of the device for action execution are mentioned, such a division is not mandatory. Indeed, the features and functionality of two or more modules or units described above may be embodied in one module or unit, according to embodiments of the present disclosure. Conversely, the features and functions of one module or unit described above may be further divided into embodiments by a plurality of modules or units.
In an exemplary embodiment of the present disclosure, an electronic device capable of implementing the above method is also provided.
As will be appreciated by one skilled in the art, aspects of the present invention may be embodied as a system, method or program product. Thus, various aspects of the invention may be embodied in the form of: an entirely hardware embodiment, an entirely software embodiment (including firmware, microcode, etc.) or an embodiment combining hardware and software aspects that may all generally be referred to herein as a "circuit," module "or" system.
An electronic device 500 according to this embodiment of the invention is described below with reference to fig. 5. The electronic device 500 shown in fig. 5 is only an example and should not bring any limitation to the functions and the scope of use of the embodiments of the present invention.
As shown in fig. 5, the electronic device 500 is embodied in the form of a general purpose computing device. The components of the electronic device 500 may include, but are not limited to: the at least one processing unit 510, the at least one memory unit 520, and a bus 530 that couples various system components including the memory unit 520 and the processing unit 510.
Wherein the storage unit stores program code that is executable by the processing unit 510 to cause the processing unit 510 to perform steps according to various exemplary embodiments of the present invention as described in the above section "exemplary methods" of the present specification. For example, the processing unit 510 may perform a method as shown in fig. 1.
The memory unit 520 may include a readable medium in the form of a volatile memory unit, such as a random access memory unit (RAM)5201 and/or a cache memory unit 5202, and may further include a read only memory unit (ROM) 5203.
Storage unit 520 may also include a program/utility 5204 having a set (at least one) of program modules 5205, such program modules 5205 including, but not limited to: an operating system, one or more application programs, other program modules, and program data, each of which, or some combination thereof, may comprise an implementation of a network environment.
Bus 530 may be one or more of any of several types of bus structures including a memory unit bus or memory unit controller, a peripheral bus, an accelerated graphics port, a processing unit, or a local bus using any of a variety of bus architectures.
The electronic device 500 may also communicate with one or more external devices 600 (e.g., keyboard, pointing device, bluetooth device, etc.), with one or more devices that enable a user to interact with the electronic device 500, and/or with any devices (e.g., router, modem, etc.) that enable the electronic device 500 to communicate with one or more other computing devices. Such communication may occur via input/output (I/O) interfaces 550. Also, the electronic device 500 may communicate with one or more networks (e.g., a Local Area Network (LAN), a Wide Area Network (WAN), and/or a public network, such as the internet) via the network adapter 560. As shown, the network adapter 560 communicates with the other modules of the electronic device 500 over the bus 530. It should be appreciated that although not shown in the figures, other hardware and/or software modules may be used in conjunction with the electronic device 500, including but not limited to: microcode, device drivers, redundant processing units, external disk drive arrays, RAID systems, tape drives, and data backup storage systems, among others.
Through the above description of the embodiments, those skilled in the art will readily understand that the exemplary embodiments described herein may be implemented by software, or by software in combination with necessary hardware. Therefore, the technical solution according to the embodiments of the present disclosure may be embodied in the form of a software product, which may be stored in a non-volatile storage medium (which may be a CD-ROM, a usb disk, a removable hard disk, etc.) or on a network, and includes several instructions to enable a computing device (which may be a personal computer, a server, a terminal device, or a network device, etc.) to execute the method according to the embodiments of the present disclosure.
In an exemplary embodiment of the present disclosure, there is also provided a computer-readable storage medium having stored thereon a program product capable of implementing the above-described method of the present specification. In some possible embodiments, aspects of the invention may also be implemented in the form of a program product comprising program code means for causing a terminal device to carry out the steps according to various exemplary embodiments of the invention described in the above section "exemplary methods" of the present description, when said program product is run on the terminal device.
Referring to fig. 6, a program product 60 for implementing the above method according to an embodiment of the present invention is described, which may employ a portable compact disc read only memory (CD-ROM) and include program code, and may be run on a terminal device, such as a personal computer. However, the program product of the present invention is not limited in this regard and, in the present document, a readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.
The program product may employ any combination of one or more readable media. The readable medium may be a readable signal medium or a readable storage medium. A readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples (a non-exhaustive list) of the readable storage medium include: an electrical connection having one or more wires, a portable disk, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
A computer readable signal medium may include a propagated data signal with readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A readable signal medium may also be any readable medium that is not a readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device.
Program code embodied on a readable medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing.
Program code for carrying out operations for aspects of the present invention may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, C + + or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computing device, partly on the user's device, as a stand-alone software package, partly on the user's computing device and partly on a remote computing device, or entirely on the remote computing device or server. In the case of a remote computing device, the remote computing device may be connected to the user computing device through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external computing device (e.g., through the internet using an internet service provider).
Furthermore, the above-described figures are merely schematic illustrations of processes involved in methods according to exemplary embodiments of the invention, and are not intended to be limiting. It will be readily understood that the processes shown in the above figures are not intended to indicate or limit the chronological order of the processes. In addition, it is also readily understood that these processes may be performed synchronously or asynchronously, e.g., in multiple modules.
Other embodiments of the disclosure will be apparent to those skilled in the art from consideration of the specification and practice of the disclosure disclosed herein. This application is intended to cover any variations, uses, or adaptations of the disclosure following, in general, the principles of the disclosure and including such departures from the present disclosure as come within known or customary practice within the art to which the disclosure pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the disclosure being indicated by the following claims.
It will be understood that the present disclosure is not limited to the precise arrangements described above and shown in the drawings and that various modifications and changes may be made without departing from the scope thereof. The scope of the present disclosure is to be limited only by the terms of the appended claims.

Claims (10)

1. A knowledge graph construction method is characterized by comprising the following steps:
acquiring data to be processed, and converting the data to be processed to acquire source data; the source data comprises entity data and relation data and is respectively stored in a first storage unit and a second storage unit;
constructing custom target configuration data by combining preset custom basic configuration data and the data to be processed; the custom target configuration data comprises: any one item or combination of any multiple items in the entity category, the entity attribute, the entity value range and the entity relationship category;
screening the source data by the first storage unit and the second storage unit based on the user-defined target configuration data to obtain target data;
processing the target data to generate a target knowledge graph.
2. The method of claim 1, further comprising:
pre-constructing user-defined basic configuration data based on RDFS; wherein the custom base configuration data comprises: any one item or combination of any multiple items in the entity category, the entity attribute, the entity value range and the entity relationship category;
and carrying out format conversion on the custom basic configuration data based on the RDFS so as to obtain the custom basic configuration data in a preset format.
3. The method of claim 1, wherein in generating the target knowledge-graph, the method further comprises:
generating a query instruction for reading the target knowledge graph according to the query instruction to acquire actual configuration parameters;
and comparing the consistency of the actual configuration parameters with the custom basic configuration data to generate a detection statistical result.
4. The method of claim 3, wherein detecting the statistical result comprises:
entity statistics and relationship statistics.
5. The method according to claim 3 or 4, characterized in that the method further comprises:
comparing the detection statistical result with a preset parameter standard;
when the detection statistical result does not meet the preset parameter standard, re-executing the knowledge graph construction method to obtain an updated detection statistical result;
and judging whether the updated detection statistical result meets the parameter standard.
6. The method of claim 1, wherein the processing the target data to generate a target knowledge-graph comprises:
inputting the target data into a graphics processing tool to generate the target knowledge-graph.
7. The method of claim 1, further comprising:
creating a corresponding number of knowledge-graph creation tasks in response to at least one knowledge-graph creation request;
and creating tasks for each knowledge graph and concurrently executing the knowledge graph construction method to obtain a plurality of target knowledge graphs.
8. A knowledge-graph building apparatus, comprising:
the device comprises a to-be-processed data conversion module, a source data conversion module and a data processing module, wherein the to-be-processed data conversion module is used for acquiring to-be-processed data and converting the to-be-processed data to acquire source data; the source data comprises entity data and relation data and is respectively stored in a first storage unit and a second storage unit;
the target configuration data setting module is used for combining preset user-defined basic configuration data and the data to be processed to construct user-defined target configuration data; the custom target configuration data comprises: any one item or combination of any multiple items in the entity category, the entity attribute, the entity value range and the entity relationship category;
the target data screening module is used for screening the source data by the first storage unit and the second storage unit based on the user-defined target configuration data to obtain target data;
and the knowledge graph creating module is used for processing the target data to generate a target knowledge graph.
9. A storage medium having stored thereon a computer program which, when executed by a processor, implements the method of knowledge-graph construction according to any one of claims 1 to 7.
10. An electronic terminal, comprising:
a processor; and
a memory for storing executable instructions of the processor;
wherein the processor is configured to perform the method of knowledge-graph construction of any of claims 1 to 7 via execution of the executable instructions.
CN201910870536.1A 2019-09-16 2019-09-16 Knowledge graph construction method and device, storage medium and electronic terminal Active CN110609906B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910870536.1A CN110609906B (en) 2019-09-16 2019-09-16 Knowledge graph construction method and device, storage medium and electronic terminal

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910870536.1A CN110609906B (en) 2019-09-16 2019-09-16 Knowledge graph construction method and device, storage medium and electronic terminal

Publications (2)

Publication Number Publication Date
CN110609906A true CN110609906A (en) 2019-12-24
CN110609906B CN110609906B (en) 2023-01-03

Family

ID=68891358

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910870536.1A Active CN110609906B (en) 2019-09-16 2019-09-16 Knowledge graph construction method and device, storage medium and electronic terminal

Country Status (1)

Country Link
CN (1) CN110609906B (en)

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111209409A (en) * 2019-12-27 2020-05-29 南京医康科技有限公司 Data matching method and device, storage medium and electronic terminal
CN111651465A (en) * 2020-05-08 2020-09-11 南京航空航天大学 Knowledge data storage method and device for enterprise cooperation
CN111667074A (en) * 2020-05-19 2020-09-15 北京海致星图科技有限公司 Method and system for generating knowledge graph by applying knowledge inference
CN111753928A (en) * 2020-07-29 2020-10-09 北京人人云图信息技术有限公司 Customs inspection rule generation method based on knowledge graph and tree model construction
CN111859969A (en) * 2020-07-20 2020-10-30 航天科工智慧产业发展有限公司 Data analysis method and device, electronic equipment and storage medium
CN112115315A (en) * 2020-09-25 2020-12-22 平安国际智慧城市科技股份有限公司 Blood relationship data query method and device, computer equipment and storage medium
CN112165395A (en) * 2020-09-11 2021-01-01 烽火通信科技股份有限公司 Network management configuration data conversion method and system
CN112163127A (en) * 2020-09-30 2021-01-01 北京锐安科技有限公司 Relationship graph construction method and device, electronic equipment and storage medium
CN113157947A (en) * 2021-05-20 2021-07-23 中国工商银行股份有限公司 Knowledge graph construction method, tool, device and server

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8429179B1 (en) * 2009-12-16 2013-04-23 Board Of Regents, The University Of Texas System Method and system for ontology driven data collection and processing
CN106649769A (en) * 2016-12-27 2017-05-10 中国科学院大学 Method for converting XBRL data into OWL data based on semantics
CN109658208A (en) * 2019-01-15 2019-04-19 京东方科技集团股份有限公司 Recommended method, device, medium and the electronic equipment of drug
CN109684313A (en) * 2018-12-14 2019-04-26 浪潮软件集团有限公司 A kind of data cleansing processing method and system

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8429179B1 (en) * 2009-12-16 2013-04-23 Board Of Regents, The University Of Texas System Method and system for ontology driven data collection and processing
CN106649769A (en) * 2016-12-27 2017-05-10 中国科学院大学 Method for converting XBRL data into OWL data based on semantics
CN109684313A (en) * 2018-12-14 2019-04-26 浪潮软件集团有限公司 A kind of data cleansing processing method and system
CN109658208A (en) * 2019-01-15 2019-04-19 京东方科技集团股份有限公司 Recommended method, device, medium and the electronic equipment of drug

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
姬源等: "电力领域语义搜索***的构建方法", 《计算机***应用》 *
顾进广等: "知识图谱中链接数据质量评价研究综述", 《武汉大学学报(理学版)》 *

Cited By (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111209409A (en) * 2019-12-27 2020-05-29 南京医康科技有限公司 Data matching method and device, storage medium and electronic terminal
CN111209409B (en) * 2019-12-27 2023-09-29 医渡云(北京)技术有限公司 Data matching method and device, storage medium and electronic terminal
CN111651465A (en) * 2020-05-08 2020-09-11 南京航空航天大学 Knowledge data storage method and device for enterprise cooperation
CN111651465B (en) * 2020-05-08 2023-09-29 南京航空航天大学 Knowledge data storage method and device for enterprise cooperation
CN111667074A (en) * 2020-05-19 2020-09-15 北京海致星图科技有限公司 Method and system for generating knowledge graph by applying knowledge inference
CN111859969A (en) * 2020-07-20 2020-10-30 航天科工智慧产业发展有限公司 Data analysis method and device, electronic equipment and storage medium
CN111859969B (en) * 2020-07-20 2024-05-03 航天科工智慧产业发展有限公司 Data analysis method and device, electronic equipment and storage medium
CN111753928B (en) * 2020-07-29 2023-05-16 北京人人云图信息技术有限公司 Customs detection rule generation method based on knowledge graph and tree model construction
CN111753928A (en) * 2020-07-29 2020-10-09 北京人人云图信息技术有限公司 Customs inspection rule generation method based on knowledge graph and tree model construction
CN112165395A (en) * 2020-09-11 2021-01-01 烽火通信科技股份有限公司 Network management configuration data conversion method and system
CN112165395B (en) * 2020-09-11 2023-04-18 烽火通信科技股份有限公司 Network management configuration data conversion method and system
CN112115315A (en) * 2020-09-25 2020-12-22 平安国际智慧城市科技股份有限公司 Blood relationship data query method and device, computer equipment and storage medium
CN112163127A (en) * 2020-09-30 2021-01-01 北京锐安科技有限公司 Relationship graph construction method and device, electronic equipment and storage medium
CN112163127B (en) * 2020-09-30 2023-11-21 北京锐安科技有限公司 Relationship graph construction method and device, electronic equipment and storage medium
CN113157947A (en) * 2021-05-20 2021-07-23 中国工商银行股份有限公司 Knowledge graph construction method, tool, device and server

Also Published As

Publication number Publication date
CN110609906B (en) 2023-01-03

Similar Documents

Publication Publication Date Title
CN110609906B (en) Knowledge graph construction method and device, storage medium and electronic terminal
US11526338B2 (en) System and method for inferencing of data transformations through pattern decomposition
US20200401607A1 (en) Knowledge-intensive data processing system
US11531914B2 (en) Artificial intelligence (AI) based automatic rule generation
CN109344170B (en) Stream data processing method, system, electronic device and readable storage medium
AU2021212135A1 (en) Building and managing data-processing attributes for modelled data sources
WO2017059014A1 (en) Interoperability of transforms under a unified platform and extensible transformation library of those interoperable transforms
US9459843B1 (en) Methods and apparatuses for providing dynamic definition and selection of metric applications
US20120158416A1 (en) Web-service based generation of business objects
CN113238740B (en) Code generation method, code generation device, storage medium and electronic device
US20200167267A1 (en) Asynchronous consumer-driven contract testing in micro service architecture
CN115989490A (en) Techniques for providing interpretation for text classification
D'Souza et al. Enabling the generation of web applications from mockups
US11836591B1 (en) Scalable systems and methods for curating user experience test results
US11977473B2 (en) Providing a pseudo language for manipulating complex variables of an orchestration flow
CN115145652A (en) Method, device, equipment and medium for creating data processing task
CN117435177B (en) Application program interface construction method, system, equipment and storage medium
US20240086184A1 (en) Schema transformation for managing an application build
US20200226106A1 (en) Data repositories
CN117687634A (en) Service compiling method and device and electronic equipment
WO2024086806A1 (en) Scanning application code to detect and classify sdk data into data categories

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant