CN115269877A - Method, system and equipment for constructing domain entity and event double-center knowledge graph - Google Patents

Method, system and equipment for constructing domain entity and event double-center knowledge graph Download PDF

Info

Publication number
CN115269877A
CN115269877A CN202210957668.XA CN202210957668A CN115269877A CN 115269877 A CN115269877 A CN 115269877A CN 202210957668 A CN202210957668 A CN 202210957668A CN 115269877 A CN115269877 A CN 115269877A
Authority
CN
China
Prior art keywords
data
event
knowledge
entity
graph
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210957668.XA
Other languages
Chinese (zh)
Inventor
李志鹏
石珺
刘汪洋
沈宜
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen Wanglian Anrui Network Technology Co ltd
Original Assignee
Shenzhen Wanglian Anrui Network Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen Wanglian Anrui Network Technology Co ltd filed Critical Shenzhen Wanglian Anrui Network Technology Co ltd
Priority to CN202210957668.XA priority Critical patent/CN115269877A/en
Publication of CN115269877A publication Critical patent/CN115269877A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/36Creation of semantic tools, e.g. ontology or thesauri
    • G06F16/367Ontology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/60Protecting data
    • G06F21/62Protecting access to data via a platform, e.g. using keys or access control rules
    • G06F21/6218Protecting access to data via a platform, e.g. using keys or access control rules to a system of files or objects, e.g. local or distributed file system or database
    • G06F21/6245Protecting personal data, e.g. for financial or medical purposes

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Bioethics (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Databases & Information Systems (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Medical Informatics (AREA)
  • Software Systems (AREA)
  • Computer Security & Cryptography (AREA)
  • Computer Hardware Design (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Animal Behavior & Ethology (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention belongs to the technical field of data mining and identification, and discloses a method, a system and equipment for constructing a domain entity and event double-center knowledge map. A novel quadruple data structure is designed, so that the source tracking of data in a knowledge graph is realized, and the practical application of data access control, privacy protection, license management and the like can be supported. A novel derivative graph calculation module is designed, operations such as aggregation, statistics, association and transformation of knowledge graph data are supported, intelligent calculations such as graph embedding and machine learning models are supported, the operated data are stored in a graph storage engine, and the fine-grained access control, privacy protection and data management capabilities of the knowledge graph are improved.

Description

Method, system and equipment for constructing domain entity and event double-center knowledge map
Technical Field
The invention belongs to the technical field of data mining and recognition, and particularly relates to a method, a system and equipment for constructing a domain entity and event double-center knowledge graph.
Background
With the rapid improvement of data, model algorithms and computing power, computational intelligence and perception intelligence are rapidly developed, machine perception in certain specific fields even surpasses human beings, but many challenges still face at the aspect of machine cognition intelligence.
Knowledge maps have recently received great attention from both academic and industrial circles as an important way of knowledge representation, organization and management. The knowledge base constructed by the knowledge-graph technology is considered as the brain for realizing the machine-perceived intelligence and is also considered as a basic path for realizing the machine-perceived intelligence. The machine has knowledge graph based on perception intelligence, and can understand language and data to a certain extent.
In recent years, a knowledge representation method using "events" as a basic unit has been receiving attention from the academic community. An "event is all facts, not the sum of things", that is, a real-world event is composed of a myriad of interrelated facts (i.e., events). Thus, regarding "events" as units of human knowledge, it follows the laws of human cognitive events. The knowledge base with the event as the knowledge unit can promote the computer to process information similar to the human brain, namely, the knowledge base has human-like cognitive ability. Event knowledge is supplemented into a traditional knowledge base to form an event knowledge graph, the supplemented event knowledge can accurately represent the relationship between an event and entities such as people, places, time and the like, and the evolution rule and mode between events can be expressed by utilizing the semantic relationship between different events.
Knowledge representation methods including knowledge maps have long been symbol-based and often deal with structured data or text data.
In many new application scenarios, such as event prediction and personalized recommendation in the financial and consumption fields, users want to be able to obtain the parent-child relationship between events through a knowledge graph to provide theoretical support for decision making. An event is an action that a thing can take or can be in a state. Moreover, the events occur in order, and each event has a certain logical relationship with the previous event, for example, a sequential relationship, a causal relationship, and the like. Therefore, starting from one event, events of interest to the user include two types: the parent event of the event (the event that triggered the event to occur, i.e., the upstream event) and the child event (the event that would trigger the event, i.e., the downstream event). Wherein, the interest of the parent event is "finding the clue of the event occurrence", and the interest of the child event is "finding the result of the event occurrence".
In order to solve the above problems, in the prior art, CN202111309154.5, discloses a method for constructing a knowledge graph of multi-modal events, and belongs to the technical field of knowledge engineering. Which comprises the following steps: collecting data; constructing a domain event ontology library; extracting multi-modal event trigger words; extracting event elements; extracting a multi-modal event relation; event common-finger disambiguation is carried out on the basis of the event ontology library, and the same sub-events are combined to form a multi-modal event subgraph; and combining the sub-maps to obtain a domain multi-modal event knowledge map. The method can improve the data acquisition quality and efficiency, reduce the labor cost, reduce the complexity of constructing the knowledge graph and improve the quality of constructing the knowledge graph.
The second CN201910236491.2 in the prior art provides a method and a device for constructing an event knowledge graph, and an electronic device. The construction method comprises the following steps: acquiring a plurality of event information groups from a target information source; determining a plurality of target events of a graph to be constructed from a plurality of child events and parent events; for each target event, searching a parent target event and a child target event of the target event from a plurality of target events based on the parent-child relationship between each pair of child events and parent events to obtain a search result corresponding to the target event; constructing an event knowledge graph about each target event based on the corresponding search result of each target event; each node of the event knowledge graph represents a target event, and a connecting line of a single arrow is arranged between any two nodes of which the represented target events have a parent-child relationship. Compared with the prior art, the construction method provided by the embodiment of the invention can be used for constructing the event knowledge graph capable of reflecting the relation of each event.
Through the above analysis, the problems and defects of the prior art are as follows:
(1) The existing general knowledge graph and domain knowledge graph construction mainly focuses on static information such as entities, entity attributes and the like, for example, a large city cannot reflect events occurring in real life and information such as relationships between events and entities and between events.
(2) The existing knowledge graph construction method does not consider data source management, and can not carry out hierarchical classification and batch management on knowledge in the knowledge graph, which are key requirements of data access control, privacy protection and license management.
(3) In the actual use process of the knowledge graph, many applications depend on calculation such as data statistics, aggregation, association, transformation and the like through the knowledge graph, and the existing knowledge graph construction method lacks support for the characteristics, so that graph calculation needs to be performed on the constructed graph when the graph is used, and the knowledge graph is not suitable for high-throughput application.
Disclosure of Invention
In order to overcome the problems in the related art, the disclosed embodiment of the invention provides a field entity and event double-center knowledge graph construction method for massive multi-source heterogeneous data aggregation analysis. In particular to a method, a system, computer equipment and a medium for constructing a dual-center knowledge graph of a field entity and an event for massive multi-source heterogeneous data aggregation analysis. The method also relates to the fields of artificial intelligence, data mining, information retrieval, knowledge maps and recommendation algorithms.
The technical scheme is as follows: a method for constructing a domain entity and event double-center knowledge graph for massive multi-source heterogeneous data aggregation analysis comprises the following steps:
s1, simultaneously taking an entity and an event as a domain event knowledge graph of an ontology construction center, representing static information such as the entity, entity attributes and entity relations in a real event, and representing event attributes, event-entity and event-event relations and other dynamic information;
s2, configuring mass multi-source heterogeneous data input, configuring structured data input by inputting database names, database types, database addresses, user names and password information, and configuring unstructured data and semi-structured data input by specifying file addresses, API (application program interface) interfaces and other modes;
s3, according to the data type, combining the S1 domain event knowledge graph constructed in the step, respectively taking an entity as a center and taking an event as a center, and cleaning, extracting, converting and loading the data; the data comprises structured data, unstructured data and semi-structured data;
s4, generating the processed domain event knowledge graph data according to a knowledge representation format;
s5, storing the data structure represented by the unified knowledge into different types of databases such as a relational database, a graph database, a key value database and a data warehouse;
s6, performing aggregation, statistics, association and transformation operations on the event knowledge graph data in the field of the step S4, and storing the fused result back to a knowledge storage engine;
in one embodiment, in step S1, each piece of knowledge is defined as a quadruple (subject, predicate, object, source), i.e. (subject, predicate, object, source), wherein (subject, predicate, object) is a traditional knowledge triple expression, and wherein source provenance is an additional element for identifying a data source.
In one embodiment, in step S3, during the cleaning, extraction, conversion and loading of the data, the structured data is subjected to data mapping, the relational database table is mapped with the ontology definitions of the entities and events to obtain a knowledge triple, and the source of the relational database table is used as a data source field to obtain a quadruple form.
In one embodiment, in step S3, in the steps of cleaning, extracting, converting and loading the data, the text, the picture, the video, the audio unstructured data and the semi-structured data are respectively processed, and the processing procedure includes: the method comprises the steps of performing entity extraction and event extraction on text data by using a natural language processing algorithm, extracting character, time, place and organization entity information, extracting character and character, character and organization entity relationship and attribute information, extracting event participant and event association relationships including event and event relationships and event and character relationships, performing OCR (optical character recognition) on picture data, converting the picture data into text data for processing, extracting video key frame pictures from video data, converting the video key frame pictures into picture data for processing, performing character transcription on audio data, and converting the video key frame pictures into text data.
In one embodiment, in step S104, the quad representation (subject, prefix, object, provenance) is serialized to generate multiple knowledge serialization representations JSON-LD/RDFa/MCF/Turtle/N-Triples.
In one embodiment, in step S106, the aggregation and statistical calculation are performed on the time series data according to the week/month/year, the semantic association and entity disambiguation calculation is performed on the data that needs to be semantically understood by using machine learning and deep learning methods, the data is transformed and associated, and the fused result is stored back to the knowledge storage engine.
In the invention, the aggregation method can adopt the prior art, but the aggregation, statistics and machine learning are used for constructing the double-center knowledge graph and are stored in the knowledge graph, and the invention has the technical innovation point.
Another object of the present invention is to provide a system for constructing a domain entity and event dual-center knowledge graph for aggregation analysis of massive multi-source heterogeneous data, where the system for constructing a domain entity and event dual-center knowledge graph for aggregation analysis of massive multi-source heterogeneous data includes:
the ontology design module is used for constructing a conceptual layer model of the knowledge graph and realizing semi-automatic construction of a human-computer combined ontology in a mode of manual participation and machine assistance;
the data mapping module maps the structured data by combining ontology knowledge defined in the ontology design module;
the information extraction module is used for providing a configurable and extensible information extraction algorithm aiming at unstructured and structured data, and comprises an extraction algorithm taking an entity as a center and an extraction algorithm taking an event as a center;
the knowledge fusion module is used for carrying out multi-source knowledge fusion on the data output by the data mapping module and the information extraction module under the unified representation of the domain event knowledge ontology;
the knowledge updating module is used for ensuring that the content of the domain event knowledge graph is continuously updated in an iterative manner, and new knowledge is continuously generated along with the time;
the graph storage engine is used for providing storage for the domain event body and the domain event knowledge graph and comprises a relational database, a graph database, a key value database, a data warehouse and a log database;
the derivative graph calculation module is used for carrying out aggregation, statistics, association and transformation operations on the knowledge graph data, including but not limited to the existing algorithm, and storing the operated data in a graph storage engine;
the graph query service module provides diversified query interfaces for the field application;
the graph visualization engine provides visualization functions for domain applications based on graph query services.
In an embodiment, the output of the ontology design module is a domain event ontology, the domain event ontology defines a concept layer of a domain event knowledge graph, and the entity, the entity attribute, the entity relationship, the event attribute, and the event relationship described by the domain event knowledge graph are reduced by combining the entity-centric ontology and the event-centric ontology.
The knowledge fusion module simultaneously supports a customized and personalized knowledge fusion method, and is used for providing the results after ontology alignment and entity disambiguation.
Another object of the present invention is to provide a computer device, which includes a memory and a processor, where the memory stores a computer program, and when the computer program is executed by the processor, the processor executes the method for constructing a domain entity and event bi-centric knowledge graph for massive multi-source heterogeneous data aggregation analysis.
Another object of the present invention is to provide a computer-readable storage medium, which stores a computer program, and when the computer program is executed by a processor, the processor executes the method for constructing a domain entity and event dual-center knowledge graph for massive multi-source heterogeneous data aggregation analysis.
By combining all the technical schemes, the invention has the advantages and positive effects that:
first, aiming at the technical problems existing in the prior art and the difficulty in solving the problems, the technical problems to be solved by the technical scheme of the present invention are closely combined with results, data and the like in the research and development process, and some creative technical effects are brought after the problems are solved. The specific description is as follows:
the invention improves the knowledge expression capability of the knowledge map and effectively expresses the static information and the dynamic information in the real event.
The invention improves the fine-grained access control, privacy protection and data management capabilities of the knowledge graph.
The invention improves the high-throughput real-time service support capability and intelligent computing capability of the knowledge graph.
Secondly, considering the technical solution as a whole or from the perspective of products, the technical effects and advantages of the technical solution to be protected by the present invention are specifically described as follows:
the knowledge graph is constructed by combining the entity-centered knowledge graph and the event-centered knowledge graph, so that static information such as entities, entity attributes and entity relations in real events can be described, and dynamic information such as event attributes and event relations can be expressed.
According to the invention, a novel quadruple data structure is designed, so that the source tracking of data in a knowledge map is realized, and the practical applications of data access control, privacy protection, license management and the like can be supported. The data constructed by the knowledge graph can come from a plurality of different data sources, and when a certain data source is updated, the knowledge graph with a quadruple data structure can quickly retrieve the knowledge tuples needing to be updated; when access control and privacy protection are needed to be carried out on knowledge in the knowledge map, the knowledge map with the four-tuple data structure can realize fine-grained control of the level of the knowledge tuples; when the permission of the open source of the knowledge graph data source changes, for example, the open source is changed into the closed source, the corresponding knowledge element group can be quickly found out, and the potential risk of knowledge infringement is avoided.
The invention designs a novel derivative graph calculation module which supports the operations of aggregation, statistics, association, transformation and the like of knowledge graph data, supports intelligent calculations such as graph embedding, machine learning models and the like, and stores the calculated data in a graph storage engine to accelerate knowledge graph query and retrieval.
Thirdly, as the creative auxiliary evidence of the claims of the present invention, the technical solution of the present invention solves the technical problem of fine-grained access control of the knowledge group.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the present disclosure and together with the description, serve to explain the principles of the disclosure.
Fig. 1 is a schematic diagram of a domain entity and event dual-center knowledge graph construction system for massive multi-source heterogeneous data aggregation analysis according to an embodiment of the present invention;
FIG. 2 is a schematic diagram of a domain entity and event dual-center knowledge graph construction system for massive multi-source heterogeneous data aggregation analysis according to an embodiment of the present invention
Fig. 3 is a flowchart of a domain entity and event dual-center knowledge graph construction method for massive multi-source heterogeneous data aggregation analysis according to an embodiment of the present invention;
fig. 4 is a schematic diagram of a domain entity and event dual-center knowledge graph construction method for massive multi-source heterogeneous data aggregation analysis according to an embodiment of the present invention;
in the figure: 1. a body design module; 2. a data mapping module; 3. an information extraction module; 4. a knowledge fusion module; 5. a knowledge update module; 6. a graph storage engine; 7. a derivative graph calculation module; 8. a graph query service module; 9. a atlas visualization engine.
Detailed Description
In order to make the aforementioned objects, features and advantages of the present invention comprehensible, embodiments accompanied with figures are described in detail below. In the following description, numerous specific details are set forth in order to provide a thorough understanding of the present invention. This invention may, however, be embodied in many different forms and should not be construed as limited to the embodiments set forth herein, but rather should be construed as broadly as the present invention is capable of modification in various respects, all without departing from the spirit and scope of the present invention.
The method for constructing the dual-center knowledge graph of the field entity and the event for the mass multi-source heterogeneous data aggregation analysis, provided by the embodiment of the invention, can be applied to any electronic equipment capable of performing information analysis and graph construction, such as a notebook computer, a tablet computer, a desktop computer and the like.
1. Illustrative examples are illustrated:
example 1
As shown in fig. 1, the system for constructing a domain entity and event dual-center knowledge graph for massive multi-source heterogeneous data aggregation analysis according to the embodiment of the present invention includes: the system comprises a body design module 1, a data mapping module 2, an information extraction module 3, a knowledge fusion module 4, a knowledge updating module 5, a graph storage engine 6, a derivative graph calculation module 7, a graph query service module 8 and a graph visualization engine 9.
The ontology design module 1 is used for constructing a conceptual layer model of the knowledge graph, and realizes semi-automatic construction of a human-computer combined ontology in a mode of manual participation and machine assistance. The output of the ontology design module 1 is a domain event ontology, which defines a concept layer of a domain event knowledge graph, and the innovation point is that the entity, entity attributes, entity relationships, event arguments, event relationships and the like described by the domain event knowledge graph are combined with the entity centered by an entity and the entity centered by an event. The ontological formalization of a bi-centric knowledge-graph can be defined as:
Figure BDA0003792012260000081
wherein
Figure BDA0003792012260000082
The representation of the entity is carried out by,
Figure BDA0003792012260000083
which is indicative of an event or events that,
Figure BDA0003792012260000084
the attributes of the entity are represented by a representation,
Figure BDA0003792012260000085
the argument of the event is represented by,
Figure BDA0003792012260000086
a relationship between the entities is represented as a relationship,
Figure BDA0003792012260000087
representing the relationship of the entity to the event,
Figure BDA0003792012260000088
representing relationships between events.
The system is designed for massive multi-source heterogeneous data in practical application, and data input comprises structured data, unstructured data and semi-structured data.
The data mapping module 2 performs the mapping of the structured data by combining ontology knowledge defined in the ontology design module 1. The mapping mainly comprises the steps of mapping between a relational database and a designed ontology, and directly converting structured data into a quadruple data set.
The information extraction module 3 provides configurable and extensible information extraction algorithms for unstructured, locally structured data, including entity-centric extraction algorithms and event-centric extraction algorithms.
And the knowledge fusion module 4 performs multi-source knowledge fusion on the data output by the data mapping module and the information extraction module under the unified representation of the domain event knowledge ontology. The module can support various ontology alignment and entity disambiguation methods, and can complete knowledge fusion of large-scale data. Meanwhile, a user-defined personalized knowledge fusion method is supported, and the quality of results after body alignment and entity disambiguation is guaranteed. The output of the knowledge fusion module is just the knowledge of one piece in the domain event knowledge map.
The knowledge updating module 5 guarantees that the content of the domain event knowledge graph is continuously updated in an iterative mode, new knowledge is continuously generated along with the lapse of time, old knowledge is likely to evolve, and the knowledge updating module guarantees the timeliness of the knowledge through the updating of the mode layer and the data layer.
The graph storage engine 6 provides a complete set of storage solutions for the domain event ontology and the domain event knowledge graph, including a relational database, a graph database, a key value database, a data warehouse, a log database, and the like.
The derivative graph calculation module 7 is one of the important innovation points of the invention, and the module enables the domain event knowledge graph to store static ontology information and dynamic event information and also to perform calculation such as statistical operation, machine learning, deep learning and the like according to the data.
The derivative graph calculation module 7 can reduce the implementation cost of large-scale graph calculation and accelerate the query and retrieval of the knowledge graph by supporting the operations of aggregation, statistics, association, transformation and the like of the knowledge graph data and storing the data after the operations in the graph storage engine. The module also supports intelligent calculation such as graph embedding and machine learning models, and provides functions which cannot be provided by traditional knowledge graph query.
The graph query service module 8 provides diversified query interfaces for the domain applications, and the graph visualization engine 9 provides visualization functions for the domain applications based on the graph query service.
Example 2
Based on the field entity and event double-center knowledge graph construction system for massive multi-source heterogeneous data aggregation analysis provided in embodiment 1, further, the specific working principle as shown in fig. 2 includes:
the first stage is as follows:
after the structured data are mapped by the data mapping module 2, the data are sent to the knowledge fusion module 4 for entity disambiguation and body alignment;
the unstructured data is subjected to entity extraction and attribute extraction by taking an entity as a center through an information extraction module 3; the sending knowledge fusion module 4 carries out entity disambiguation and body alignment;
the semi-structured data is subjected to argument extraction and relation extraction by taking an event as a center through an information extraction module 3; the sending knowledge fusion module 4 carries out entity disambiguation and body alignment;
and a second stage:
the ontology design module 1 relates to a data transmission knowledge fusion module 4 for the ontology of the field event; meanwhile, the data are stored in a graph storage engine 6, a derivative graph calculation module 7 carries out operations such as aggregation, statistics, association and transformation on the knowledge graph data, and the data after the operations are stored in the graph storage engine;
the knowledge fusion module 4 performs multi-source knowledge fusion on the data output by the data mapping module 2 and the information extraction module 3 under the unified representation of the domain event knowledge ontology. And (4) carrying out various ontology alignment and entity disambiguation methods to complete knowledge fusion of large-scale data. And outputs the domain event knowledge map to be stored in the map storage engine 6.
The knowledge updating module 5 ensures that the content of the domain event knowledge graph is continuously updated in an iterative manner, new knowledge is continuously generated and old knowledge may be evolved as time goes on, the timeliness of the knowledge is ensured by updating the mode layer and the data layer, and the updated domain event knowledge graph is stored in the graph storage engine 6.
And a third stage:
neighborhood application is carried out on the knowledge graph spectrogram query service module 8 processed by the derivative graph calculation module 7 through a query interface, and information of graph query service is visualized through a graph visualization engine 9.
Example 3
As shown in fig. 3, the method for constructing a domain entity and event double-center knowledge graph for massive multi-source heterogeneous data aggregation analysis according to the embodiment of the present invention includes the following steps:
s101, taking an entity and an event as a domain event knowledge graph of an ontology construction center at the same time, representing static information such as the entity, entity attributes and entity relations in a real event, and representing dynamic information such as event attributes, event-entity and event-event relations;
s102, configuring mass multi-source heterogeneous data input, configuring structured data input by inputting database names, database types, database addresses, user names and password information, and configuring unstructured data and semi-structured data input by specifying file addresses, API (application program interface) interfaces and other modes;
s103, according to the data type, combining the S1 domain event knowledge graph constructed in the step, respectively taking an entity as a center and taking an event as a center, and cleaning, extracting, converting and loading the data; the data comprises structured data, unstructured data and semi-structured data;
s104, generating the processed domain event knowledge graph data according to a knowledge representation format;
s105, storing the data structure represented by the unified knowledge into different types of databases of a relational database, a graph database, a key value database and a data warehouse;
s106, performing aggregation, statistics, association and transformation operations on the event knowledge graph data in the field of the step S104, and storing the fused result back to the knowledge storage engine;
s107, the method is applied to diversified services.
In the embodiment of the present invention, step S101 specifically includes using the entity and the event as the domain event knowledge graph of the ontology construction center at the same time, representing static information such as entity, entity attribute, and entity relationship in the real event, and representing dynamic information such as event attribute, event-entity, event-event relationship, and the like
The domain event knowledge graph is subjected to ontology design with an entity and an event as double centers, and the entity, the entity attribute, the entity relation, the event argument, the event relation and the like described by the domain event knowledge graph are subjected to conceptual normalization, so that the knowledge expression capability of the knowledge graph is improved, and static information and dynamic information are effectively expressed. On the basis that each piece of knowledge is abstracted into a four-tuple form (subject, predict, object, provenance), wherein the subject (subject, predict, object) is expressed by a traditional knowledge triple, the invention adds a new element of provenance for identifying a data source and improving the fine-grained access control, privacy protection and data management capability of the knowledge graph.
In the embodiment of the invention, the step S102 specifically comprises the steps of configuring mass multi-source heterogeneous data input, configuring by inputting information such as database names, database types, database addresses, user names and passwords and supporting input of various types of data such as relational databases and NoSQL databases.
In the embodiment of the present invention, step S103 specifically includes, according to the data type, combining the designed ontology, taking the entity as the center and taking the event as the center, and cleaning, extracting, converting, and loading the data. The specific implementation processes of cleaning, extracting, converting and loading the data are respectively processed according to different types of the structured data and the unstructured data.
And (2) carrying out data mapping on the structured data by combining the ontology conceptual knowledge of the definition, mapping a relational database table and the ontology definitions of the entities and the events to obtain (subject, predict, object) knowledge triples, and taking the source of the relational database table as a data source field to obtain a quadruple form (subject, predict, object, provenance). The method comprises the steps of classifying unstructured data and semi-structured data such as texts, pictures, videos and audios, respectively processing the unstructured data and the semi-structured data, performing entity extraction and event extraction on the text data by using a natural language processing algorithm, extracting entity information such as characters, time, places and organizations, extracting entity relations and attribute information such as character-character and character-organization, extracting event participants and event association relations including event-event relations and event-character relations, performing OCR (optical character recognition) on the picture data, converting the picture data into text data, extracting key frame pictures of the videos, converting the key frame pictures into the picture data, processing the picture data, performing character transcription on the audio data, converting the audio data into the text data, and processing the text data by referring to the steps without repeated description.
In the embodiment of the present invention, step S104 specifically includes generating the processed data according to a knowledge representation format, serializing a four-tuple representation (subject, prefix, object, provenance), and generating multiple knowledge serialization representation files such as JSON-LD/RDFa/MCF/Turtle/N-Triples.
In the embodiment of the present invention, step S105 specifically includes storing the knowledge representation file in different types of databases, such as a relational database, a graph database, a key-value database, and a data warehouse.
In the embodiment of the invention, the step S106 specifically comprises the steps of aggregating, counting, associating, transforming and other operations of knowledge map data and intelligently calculating machine learning and the like aiming at business application requirements, aggregating and counting time series data according to week/month/year, carrying out semantic association, entity disambiguation and other calculations on data needing semantic understanding by using a machine learning and deep learning method, transforming and associating the data, and storing the fused result back to a knowledge storage engine.
By the derivative graph calculation module, calculation services required by the knowledge graph service are calculated and stored in advance, and the high-throughput real-time service support capability and the intelligent calculation capability of the knowledge graph are improved.
In the embodiment of the present invention, step S107 specifically includes providing a restful api-based WebService service for diversified business applications.
Example 4
As shown in fig. 4, the method for constructing a domain entity and event double-center knowledge graph for massive multi-source heterogeneous data aggregation analysis according to the embodiment of the present invention includes the following steps:
(1) And (3) performing ontology design with an entity and an event as centers on the domain event knowledge graph.
(2) And configuring mass multi-source heterogeneous data input.
(3) And according to the data type, combining the designed body, respectively taking the entity as the center and taking the event as the center, and cleaning, extracting, converting and loading the data.
(4) And generating the processed data according to a knowledge representation format.
(5) And performing knowledge fusion, performing operations such as aggregation, statistics, association and transformation on knowledge map data, performing intelligent calculation such as machine learning and the like according to business application requirements, and performing aggregation and statistical calculation on time series data according to weeks, months and years.
(6) And storing the knowledge representation file into different types of databases such as a relational database, a graph database, a key value database, a data warehouse and the like.
(7) The method is oriented to diversified business applications and provides a WebService service based on restful API.
In the above embodiments, the description of each embodiment has its own emphasis, and reference may be made to the related description of other embodiments for parts that are not described or recited in any embodiment.
For the information interaction, execution process and other contents between the above-mentioned devices/units, because the embodiments of the method of the present invention are based on the same concept, the specific functions and technical effects thereof can be referred to the method embodiments specifically, and are not described herein again.
It will be apparent to those skilled in the art that, for convenience and brevity of description, only the above-mentioned division of the functional units and modules is illustrated, and in practical applications, the above-mentioned function distribution may be performed by different functional units and modules according to needs, that is, the internal structure of the apparatus is divided into different functional units or modules to perform all or part of the above-mentioned functions. Each functional unit and module in the embodiments may be integrated in one processing unit, or each unit may exist alone physically, or two or more units are integrated in one unit, and the integrated unit may be implemented in a form of hardware, or in a form of software functional unit. In addition, specific names of the functional units and modules are only for convenience of distinguishing from each other, and are not used for limiting the protection scope of the present invention. The specific working processes of the units and modules in the system may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.
2. The application example is as follows:
application example 1
An application embodiment of the present invention provides a computer device, including: at least one processor, a memory, and a computer program stored in the memory and executable on the at least one processor, the processor implementing the steps of any of the various method embodiments described above when executing the computer program.
Application example 2
The application embodiment of the present invention further provides a computer-readable storage medium, where a computer program is stored, and when the computer program is executed by a processor, the steps in the above method embodiments may be implemented.
Application example 3
The application embodiment of the present invention further provides an information data processing terminal, where the information data processing terminal is configured to provide a user input interface to implement the steps in the above method embodiments when implemented on an electronic device, and the information data processing terminal is not limited to a mobile phone, a computer, or a switch.
Application example 4
The application embodiment of the present invention further provides a server, where the server is configured to provide a user input interface to implement the steps in the above method embodiments when implemented on an electronic device.
Application example 5
The application embodiment of the present invention provides a computer program product, which when running on an electronic device, enables the electronic device to implement the steps in the above method embodiments when executed.
The integrated unit, if implemented in the form of a software functional unit and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, all or part of the flow of the method according to the embodiments of the present invention may be implemented by a computer program, which may be stored in a computer-readable storage medium and used for instructing related hardware to implement the steps of the embodiments of the method according to the embodiments of the present invention. Wherein the computer program comprises computer program code, which may be in the form of source code, object code, an executable file or some intermediate form, etc. The computer readable medium may include at least: any entity or device capable of carrying computer program code to a photographing apparatus/terminal device, a recording medium, computer memory, read-only memory (ROM), random Access Memory (RAM), electrical carrier signal, telecommunications signal and software distribution medium. Such as a usb-drive, a removable hard drive, a magnetic or optical disk, etc.
The above description is only for the purpose of illustrating the preferred embodiments of the present invention, and the scope of the present invention should not be limited thereto, and any modifications, equivalents and improvements made by those skilled in the art within the technical scope of the present invention as disclosed in the present invention should be covered thereby.

Claims (10)

1. A method for constructing a dual-center domain entity and event knowledge graph for massive multi-source heterogeneous data aggregation analysis is characterized by comprising the following steps:
s1, taking an entity and an event as a domain event knowledge graph of an ontology construction center at the same time, representing static information of the entity, entity attributes and entity relations in a real event, and representing event attributes, event-entity and event-event relation dynamic information;
s2, configuring mass multi-source heterogeneous data input, configuring structured data input by inputting database names, database types, database addresses, user names and password information, and configuring unstructured data and semi-structured data input by specifying file addresses and API (application program interface) interfaces;
s3, according to the data types, combining the domain event knowledge graph constructed in the step S1, respectively taking an entity as a center and taking an event as a center, and cleaning, extracting, converting and loading the data; the data comprises structured data, unstructured data and semi-structured data;
s4, generating the processed domain event knowledge graph data according to a knowledge representation format;
s5, storing the data structure represented by the unified knowledge into different types of databases such as a relational database, a graph database, a key value database and a data warehouse;
s6, performing aggregation, statistics, association and transformation operations on the event knowledge graph data in the field of the step S4, and storing the fused result back to a knowledge storage engine;
and S7, the method is applied to diversified services.
2. The method for constructing the domain entity and event double-center knowledge graph for the massive multi-source heterogeneous data aggregation analysis according to claim 1, wherein in the step S1, each piece of knowledge is defined to be in a quadruple form, and the method comprises the following steps: subject, predicate, object, source provenance, where subject, predicate, the object is a traditional knowledge triple expression, wherein the provenance is an added element for identifying a data source.
3. The method for constructing the field entity and event double-center knowledge graph for the aggregated analysis of the massive multi-source heterogeneous data according to claim 1, wherein in the step S3, data is subjected to cleaning, extraction, conversion and loading, data mapping is performed on structured data, a relational database table is mapped with ontology definitions of entities and events to obtain a knowledge triple, and a source of the relational database table is used as a data source field to obtain a quadruple form.
4. The method for constructing the field entity and event double-center knowledge graph for the aggregation analysis of the massive multi-source heterogeneous data according to claim 1, wherein in the step S3, in the steps of cleaning, extracting, converting and loading the data, the text, the picture, the video, the audio unstructured data and the semi-structured data are classified and processed respectively, and the processing process comprises the following steps: the method comprises the steps of performing entity extraction and event extraction on text data by using a natural language processing algorithm, extracting character, time, place and organization entity information, extracting character and character, character and organization entity relationship and attribute information, extracting event participant and event association relationships including event and event relationships and event and character relationships, performing OCR (optical character recognition) on picture data, converting the picture data into text data for processing, extracting video key frame pictures from video data, converting the video key frame pictures into picture data for processing, performing character transcription on audio data, and converting the video key frame pictures into text data.
5. The method for constructing the field entity and event double-center knowledge graph oriented to the mass multi-source heterogeneous data aggregation analysis, according to claim 2, is characterized in that subject, predicate, object and source provenance represented in a four-tuple form are serialized to generate multiple knowledge serialization representations of JSON-LD/RDFa/MCF/Turtle/N-Triples.
6. The method for constructing the domain entity and event double-center knowledge graph for the mass multi-source heterogeneous data aggregation analysis according to claim 4, is characterized in that aggregation and statistical calculation are performed on time series data according to the week/month/year, semantic association and entity disambiguation calculation are performed on data needing semantic understanding by using machine learning and deep learning methods, data are transformed and associated, and the fused result is stored back to a knowledge storage engine.
7. A construction system for realizing the method for constructing the domain entity and event double-center knowledge graph for the mass multi-source heterogeneous data aggregation analysis according to any one of claims 1 to 6, wherein the system for constructing the domain entity and event double-center knowledge graph for the mass multi-source heterogeneous data aggregation analysis comprises:
the body design module (1) is used for constructing a conceptual layer model of the knowledge graph and realizing the semi-automatic construction of a human-computer combined body in a mode of manual participation and machine assistance;
the data mapping module (2) is used for mapping the structured data by combining the ontology knowledge defined in the ontology design module (1);
the information extraction module (3) is used for providing a configurable and extensible information extraction algorithm aiming at unstructured and structured data, and comprises an entity-centered extraction algorithm and an event-centered extraction algorithm;
the knowledge fusion module (4) is used for carrying out multi-source knowledge fusion on the data output by the data mapping module and the information extraction module under the condition of uniform representation of a domain event knowledge body;
the knowledge updating module (5) is used for ensuring that the content of the domain event knowledge graph is continuously updated in an iterative manner, and new knowledge is continuously generated along with the time;
the graph storage engine (6) is used for providing storage for the domain event body and the domain event knowledge graph and comprises a relational database, a graph database, a key value database, a data warehouse and a log database;
the derivative graph calculation module (7) is used for carrying out aggregation, statistics, association and transformation operations on the knowledge graph data and storing the operated data in a graph storage engine;
the graph query service module (8) provides diversified query interfaces for the domain application;
the atlas visualization engine (9) provides visualization functionality for domain applications based on graph query services.
8. The system for constructing the domain entity and event double-center knowledge graph for the mass multi-source heterogeneous data aggregation analysis according to claim 7,
the output of the ontology design module (1) is a domain event ontology, the domain event ontology defines a concept layer of a domain event knowledge graph, and entities, entity attributes, entity relationships, event attributes and event relationships described by the domain event knowledge graph are subjected to stipulation by combining the ontology with the entity as a center and the ontology with the event as a center;
the knowledge fusion module (4) simultaneously supports a customized and personalized knowledge fusion method, and is used for providing the results after ontology alignment and entity disambiguation.
9. A computer device, characterized in that the computer device comprises a memory and a processor, the memory stores a computer program, and the computer program, when executed by the processor, causes the processor to execute the domain entity and event bi-center knowledge graph construction method for massive multi-source heterogeneous data aggregation analysis according to any one of claims 1 to 6.
10. A computer-readable storage medium storing a computer program, which when executed by a processor, causes the processor to execute the method for constructing a domain entity and event dual-center knowledge graph for massive multi-source heterogeneous data aggregation analysis according to any one of claims 1 to 6.
CN202210957668.XA 2022-08-10 2022-08-10 Method, system and equipment for constructing domain entity and event double-center knowledge graph Pending CN115269877A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210957668.XA CN115269877A (en) 2022-08-10 2022-08-10 Method, system and equipment for constructing domain entity and event double-center knowledge graph

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210957668.XA CN115269877A (en) 2022-08-10 2022-08-10 Method, system and equipment for constructing domain entity and event double-center knowledge graph

Publications (1)

Publication Number Publication Date
CN115269877A true CN115269877A (en) 2022-11-01

Family

ID=83751168

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210957668.XA Pending CN115269877A (en) 2022-08-10 2022-08-10 Method, system and equipment for constructing domain entity and event double-center knowledge graph

Country Status (1)

Country Link
CN (1) CN115269877A (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115438197A (en) * 2022-11-07 2022-12-06 巢湖学院 Method and system for complementing relationship of matter knowledge map based on double-layer heterogeneous graph
CN115840826A (en) * 2023-02-15 2023-03-24 海乂知信息科技(南京)有限公司 Automatic reduction system, method, electronic device and medium of knowledge graph concept
CN116541472A (en) * 2023-03-22 2023-08-04 麦博(上海)健康科技有限公司 Knowledge graph construction method in medical field
CN116701663A (en) * 2023-08-07 2023-09-05 鹏城实验室 Method for constructing knowledge graph based on digital retina system
CN116955639A (en) * 2023-04-24 2023-10-27 浙商期货有限公司 Method and device for constructing future industry chain knowledge graph and computer equipment
CN117131935A (en) * 2023-10-25 2023-11-28 浙商期货有限公司 Knowledge graph construction method oriented to futures field

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115438197A (en) * 2022-11-07 2022-12-06 巢湖学院 Method and system for complementing relationship of matter knowledge map based on double-layer heterogeneous graph
CN115840826A (en) * 2023-02-15 2023-03-24 海乂知信息科技(南京)有限公司 Automatic reduction system, method, electronic device and medium of knowledge graph concept
CN116541472A (en) * 2023-03-22 2023-08-04 麦博(上海)健康科技有限公司 Knowledge graph construction method in medical field
CN116955639A (en) * 2023-04-24 2023-10-27 浙商期货有限公司 Method and device for constructing future industry chain knowledge graph and computer equipment
CN116701663A (en) * 2023-08-07 2023-09-05 鹏城实验室 Method for constructing knowledge graph based on digital retina system
CN116701663B (en) * 2023-08-07 2024-01-09 鹏城实验室 Method for constructing knowledge graph based on digital retina system
CN117131935A (en) * 2023-10-25 2023-11-28 浙商期货有限公司 Knowledge graph construction method oriented to futures field

Similar Documents

Publication Publication Date Title
US12026455B1 (en) Systems and methods for construction, maintenance, and improvement of knowledge representations
US11829391B2 (en) Systems, methods, and apparatuses for executing a graph query against a graph representing a plurality of data stores
CN115269877A (en) Method, system and equipment for constructing domain entity and event double-center knowledge graph
JP7201730B2 (en) Intention recommendation method, device, equipment and storage medium
US11645471B1 (en) Determining a relationship recommendation for a natural language request
US9535902B1 (en) Systems and methods for entity resolution using attributes from structured and unstructured data
CN107391677B (en) Method and device for generating Chinese general knowledge graph with entity relation attributes
US9110970B2 (en) Destructuring and restructuring relational data
CN110275962B (en) Method and apparatus for outputting information
WO2023160137A1 (en) Graph data storage method and system, and computer device
Bertone et al. A survey on visual analytics for the spatio-temporal exploration of microblogging content
CN111708805A (en) Data query method and device, electronic equipment and storage medium
WO2019110654A1 (en) Systems and methods for querying databases using interactive search paths
Varga et al. Analytical metadata modeling for next generation BI systems
CN114254014A (en) Business data display method, device, equipment and storage medium
CN116150436B (en) Data display method and system based on node tree
CN112214615A (en) Policy document processing method and device based on knowledge graph and storage medium
US11314793B2 (en) Query processing
CN116467291A (en) Knowledge graph storage and search method and system
CN113901077A (en) Method and system for producing entity object label, storage medium and electronic equipment
Tsvetovat et al. NetIntel: A database for manipulation of rich social network data
Lei et al. Constructing movie domain knowledge graph based on lod
Cao E-Commerce Big Data Mining and Analytics
Riemer et al. Using complex event processing for modeling semantic requests in real-time social media monitoring
US20230196181A1 (en) Intelligent machine-learning model catalog

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination