CN115640408A - Data-based relation extraction method, device, equipment and storage medium - Google Patents

Data-based relation extraction method, device, equipment and storage medium Download PDF

Info

Publication number
CN115640408A
CN115640408A CN202211335834.9A CN202211335834A CN115640408A CN 115640408 A CN115640408 A CN 115640408A CN 202211335834 A CN202211335834 A CN 202211335834A CN 115640408 A CN115640408 A CN 115640408A
Authority
CN
China
Prior art keywords
entity
metadata
relationship
network graph
graph
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211335834.9A
Other languages
Chinese (zh)
Inventor
刘识
王耀影
李开阳
朱天佑
陈振宇
李继伟
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Big Data Center Of State Grid Corp Of China
Original Assignee
Big Data Center Of State Grid Corp Of China
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Big Data Center Of State Grid Corp Of China filed Critical Big Data Center Of State Grid Corp Of China
Priority to CN202211335834.9A priority Critical patent/CN115640408A/en
Publication of CN115640408A publication Critical patent/CN115640408A/en
Pending legal-status Critical Current

Links

Images

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a data-based relation extraction method, a data-based relation extraction device, data-based relation extraction equipment and a storage medium. The method comprises the following steps: acquiring a plurality of table metadata, and preprocessing the table metadata to obtain a metadata set; completing a mapping task of metadata-entity based on the metadata set to obtain an entity set; constructing a network graph based on the entity set; completing map completion on the network map based on a map neural network algorithm to obtain a target network map; and finishing entity relationship identification based on the target network graph to obtain an entity relationship type. According to the method, the corresponding entity mapping algorithm is researched based on the table metadata so as to define the entity relationship type, more comprehensive entity relationship can be mined, and the identification efficiency of the entity relationship is improved.

Description

Data-based relation extraction method, device, equipment and storage medium
Technical Field
The embodiment of the invention relates to the technical field of data processing, in particular to a method, a device, equipment and a storage medium for extracting a relationship based on data.
Background
Data has penetrated every industry and business function area today and has become an important production element. With the new round of productivity growth and the arrival of consumer surplus tides, the mining and use of massive data presupposes that "big data" already exists in various business domains.
In the prior art, the classification and extraction of a large amount of business data are mainly carried out by adopting a manual labeling and manual carding mode, so that the workload is high, the large-scale popularization is not easy, the comprehensive business field knowledge accumulation cannot be formed, and the comprehensive and effective support cannot be provided for the knowledge application level.
Disclosure of Invention
The invention provides a relation extraction method, a relation extraction device, relation extraction equipment and a storage medium based on data, and aims to solve the problem of large traffic during data classification and extraction in the prior art.
According to an aspect of the present invention, there is provided a relationship extraction method based on data, including:
acquiring a plurality of table metadata, and preprocessing the table metadata to obtain a metadata set;
completing a mapping task of metadata-entity based on the metadata set to obtain an entity set;
constructing a network graph based on the entity set;
completing map completion on the network map based on a map neural network algorithm to obtain a target network map;
and finishing entity relationship identification based on the target network graph to obtain an entity relationship type.
According to another aspect of the present invention, there is provided a data-based relationship extraction apparatus, including:
the processing module is used for acquiring a plurality of table metadata and preprocessing the table metadata to obtain a metadata set;
a completion module, configured to complete a metadata-entity mapping task based on the metadata set to obtain an entity set;
a construction module for constructing a network graph based on the entity set;
the completion module is used for completing map completion on the network graph based on a graph neural network algorithm to obtain a target network graph;
and the identification module is used for finishing entity relationship identification based on the target network graph to obtain the entity relationship type.
According to another aspect of the present invention, there is provided an electronic apparatus including: at least one processor; and
a memory communicatively coupled to the at least one processor; wherein the content of the first and second substances,
the memory stores a computer program executable by the at least one processor, the computer program being executable by the at least one processor to enable the at least one processor to perform a data-based relationship extraction method according to any of the embodiments of the invention.
According to another aspect of the present invention, there is provided a computer-readable storage medium storing computer instructions for causing a processor to implement the method for data-based relationship extraction according to any one of the embodiments of the present invention when executed.
The embodiment of the invention provides a relation extraction method based on data, which comprises the following steps: acquiring a plurality of table metadata, and preprocessing the table metadata to obtain a metadata set; completing a mapping task of metadata-entity based on the metadata set to obtain an entity set; constructing a network graph based on the entity set; completing map completion on the network map based on a map neural network algorithm to obtain a target network map; and completing entity relationship identification based on the target network diagram to obtain an entity relationship type. The method researches a corresponding entity mapping algorithm based on the table metadata to define the entity relationship type, solves the problem of large traffic during data classification and extraction in the prior art, can mine more comprehensive entity relationships, and improves the identification efficiency of the entity relationships.
It should be understood that the statements in this section are not intended to identify key or critical features of the embodiments of the present invention, nor are they intended to limit the scope of the invention. Other features of the present invention will become apparent from the following description.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings needed to be used in the description of the embodiments will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.
Fig. 1 is a schematic flowchart of a data-based relationship extraction method according to an embodiment of the present invention;
fig. 2 is a schematic flow chart of a data-based relationship extraction method according to a second embodiment of the present invention;
fig. 3 is a schematic flow chart of a data-based relationship extraction method according to a third embodiment of the present invention;
fig. 4 is a schematic flowchart of a data-based relationship extraction method according to an embodiment of the present invention;
fig. 5 is a schematic structural diagram of a relationship extraction device based on data according to a fourth embodiment of the present invention;
fig. 6 is a schematic structural diagram of an electronic device according to a method for extracting a relationship based on data in an embodiment of the present invention.
Detailed Description
In order to make the technical solutions of the present invention better understood, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention. It should be understood that the various steps recited in the method embodiments of the present invention may be performed in a different order and/or performed in parallel. Moreover, method embodiments may include additional steps and/or omit performing the illustrated steps. The scope of the invention is not limited in this respect.
The term "include" and variations thereof as used herein are open-ended, i.e., "including but not limited to". The term "based on" is "based at least in part on". The term "one embodiment" means "at least one embodiment"; the term "another embodiment" means "at least one additional embodiment"; the term "some embodiments" means "at least some embodiments". Relevant definitions for other terms will be given in the following description.
It should be noted that the terms "first," "second," and the like in the description and claims of the present invention and in the drawings described above are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the invention described herein are capable of operation in sequences other than those illustrated or described herein. Moreover, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.
It is noted that references to "a", "an", and "the" modifications in the present invention are intended to be illustrative rather than limiting, and that those skilled in the art will recognize that reference to "one or more" unless the context clearly dictates otherwise.
The names of messages or information exchanged between devices in the embodiments of the present invention are for illustrative purposes only, and are not intended to limit the scope of the messages or information.
In the prior art, entity relationship extraction currently faces three main challenges. Firstly, the diversity of natural language expression and the core of relationship extraction is to map the relationship knowledge of natural language expression to the relationship triples. Natural language expressions are diverse and implicit, making the task of relationship extraction extremely challenging. The diversity of natural language expressions means that there are many ways of expressing the same relationship. Secondly, the implication of the relational expression means that sometimes no explicit identification can be found in the text, and the relationship is implicit in the text. Finally, the complexity of entity relationships, the goal of relationship extraction is to extract semantic relationships between entities, however, there may be multiple relationships between the same pair of entities in the real world, and some relationships may exist simultaneously, while some relationships are time-specific.
The main methods of relationship extraction are as follows:
1) Based on the rule relationship extraction, the method has high dependence degree on domain experts. Based on the current situations of large electric power quantity, multiple fields, non-structures and the like, an expert is difficult to formulate a rule with strong generalization capability, so that the expert is required to formulate a large number of rules for various data, the process is complicated, the labor cost is high, the efficiency is low, the transportability is poor, and the method is only suitable for the service condition with small data quantity.
2) Based on the traditional machine learning relationship extraction, the main work content of the method is data preprocessing and feature engineering. Electric power data has a large amount of unstructured multiclass data, and data preprocessing and characteristic engineering just have very big work load, and the work task is too heavy, consumes plenty of time, and this has just caused artifical input big, and the human cost is high, and simultaneously, the engineer still need compromise two directions in trade field and technical field.
3) And performing relation extraction based on a deep learning model, wherein the method mainly adopts LSTM + CRF to perform entity identification and perform relation classification on entity pairs. The Long Short-Term Memory network (LSTM) is a time-cycle neural network, and the LSTM can extract Chinese character information and can retain the information of a text sequence through a gated Memory mechanism. However, LSTM, while somewhat alleviating the long-term dependence of the Recurrent Neural Network (RNN), is problematic for longer sequence data; meanwhile, LSTM can not be calculated in parallel, and if the time span is too large and the network depth is very deep, the calculation amount is very large and time is consumed.
4) Based on the relationship extraction of the pre-training model fine tuning, the method mainly adopts Bert + CRF to carry out entity identification and carries out relationship classification on entity pairs. The Bert model can interact with global information like CNN, neglects distance and breaks through the limit that RNN can not calculate in parallel, but the model file is too large and the training time is too long. The limitations of the pre-training model are that the ability to memorize and store linguistic knowledge is limited, and that the ability to understand linguistic logic is limited. Fine-tuning on a small dataset may lead to the phenomenon of over-estimated or under-estimated. In addition, the pre-training model has high requirements on hardware, high requirements on computing power and video memory and high application cost.
Based on the above problems, the present invention provides a relationship extraction method based on data, which can solve the above-mentioned problems.
It should be noted that the relationship extraction method based on data provided by the present invention is not only applicable to the relationship extraction task, but also applicable to the event extraction task.
Example one
Fig. 1 is a flowchart of a data-based relationship extraction method according to an embodiment of the present invention, where the method is applicable to a situation of performing relationship extraction on a large amount of business data, and the method may be executed by a data-based relationship extraction apparatus, where the apparatus may be implemented by software and/or hardware and is generally integrated on an electronic device, and the electronic device may be a computer device.
As shown in fig. 1, a method for extracting a relationship based on data according to an embodiment of the present invention includes the following steps:
s110, obtaining a plurality of table metadata, and preprocessing the table metadata to obtain a metadata set.
The table metadata may be metadata in a data table, and a plurality of table metadata may be included in one table. The table metadata may be obtained from one or more databases stored in the server. For example, the table metadata may be obtained from one or more databases such as MySql, oracle, hive, etc. Table metadata is data describing other data, or structural data for providing information about a resource. The metadata set may be a collection of preprocessed table metadata, where the metadata set includes a plurality of preprocessed table metadata.
The specific treatment process of the pretreatment can be set according to actual conditions. For example, the preprocessing may include a uniform naming process, a process of deleting the same data, and the like.
In this embodiment, because there may be cases of duplicate naming, identical data but different naming, etc. in table metadata in different databases or different tables, after the table metadata is obtained from the database, the table metadata may be preprocessed to obtain a metadata set, which may be denoted as X.
And S120, completing a mapping task of metadata-entity based on the metadata set to obtain an entity set.
The metadata-entity mapping task may be a task of converting metadata into an entity, and completing the metadata-entity mapping task may be understood as converting metadata in a metadata set into an entity, where one metadata corresponds to one entity. An entity is an object which objectively exists and can be distinguished from one another, and in terms of a database, an entity often refers to a collection of objects of a certain type; the individual of each type of data object is called an entity, and the entity can be a concrete human object or an abstract concept and relation. Multiple entities may be included in the set of entities.
The mapping scheme of the metadata-entity may include the following two types: one is a single table as one entity, and table metadata as entity attributes; the other is to take the metadata in the table as one entity, and one table metadata corresponds to a plurality of entities.
In this embodiment, the metadata-entity mapping task can be completed in the following two ways:
the method comprises the steps of firstly, completing a mapping task of metadata-entity through a mapping algorithm;
and in the second mode, the mapping task of the metadata-entity is completed through the entity mapping relation table.
In the first mode, a mapping task of metadata-entity is completed through a mapping algorithm to convert the table metadata in the metadata set into a corresponding entity, and an entity set S can be constructed according to the entity. In the second mode, the table metadata in the metadata set is mapped into corresponding entities through the mapping relation recorded in the entity mapping relation table, and a plurality of entities are constructed into an entity set S.
Wherein the mapping algorithm may be a pre-designed association rule between the metadata and the entity. The function of the mapping algorithm can be represented as F (X), and the mapping process can be represented as: s = F (X).
S130, constructing a network graph based on the entity set.
The network graph is a graphical model and is shaped like a network, and the network graph consists of three factors, namely an operation (arrow line), an event (also called a node) and a route. The operation refers to a work or a working procedure, and a specific activity process consuming manpower, material resources and time is required; in the network diagram, the job is indicated by an arrow, the arrow tail indicates the start of the job, and the arrow indicates the end of the job. An event refers to the beginning or ending of a certain job, which does not consume any resources and time, and is represented by "o" in the network diagram, where "o" is an intersection of two or more arrow lines, also called a node. The route is a channel from the beginning of the network to the end of the network through a series of continuous operations and events along the direction of an arrow line.
In this embodiment, after the entity set is obtained, all entities may be connected together according to the entities in the entity set and the relationship between the entities, so as to construct a network diagram G1. Specifically, the information of a single table may be used as an entity, and the table metadata may be used as an entity attribute, that is, the single table of the entity may correspond to a node in the network graph; when the attributes between two tables are associated, a connecting edge can exist between the nodes, and the step of determining the edge is repeated until all the tables have corresponding entities, so that the construction of the network graph of the entities corresponding to the metadata of the whole table is completed.
And S140, completing atlas completion on the network map based on a graph neural network algorithm to obtain a target network map.
The Graph Neural Network (GNN) is an algorithm general term for learning Graph structure data by using a Neural Network, extracting and exploring features and patterns in the Graph structure data, and meeting requirements of Graph learning tasks such as clustering, classification, prediction, segmentation, generation and the like. The graph neural network algorithm, by which the graph of the network graph can be complemented, can be considered as an extension of deep learning on non-euclidean space. The graph completion can be understood as an operation of completing the network graph, and a missing part in the network graph can be completed through the graph completion. The target network graph may be a completed network graph obtained after completion.
In this embodiment, after the network graph is constructed, if the network graph has an incomplete part, the graph neural network algorithm may be used to perform graph completion on the network graph, and the network graph after the graph completion may be used as a target network graph, so that the entity relationship may be identified more completely and accurately according to the target network graph.
Wherein the graph neural network algorithm may include one or more of a link prediction algorithm, a node classification algorithm, and an edge classification algorithm. For example, the graph completion may be performed directly by a link prediction algorithm, may be performed by a link prediction algorithm and a node classification algorithm, or may be performed by a link prediction algorithm, a node classification algorithm, and an edge classification algorithm. And is not particularly limited herein.
S150, completing entity relationship identification based on the target network graph to obtain an entity relationship type.
The entity relationship identification refers to a task of extracting the implicit relationship between entities in the text in the natural language processing process, that is, the entity relationship identification can be understood as a task of finding the relationship between the entities. The entity relationship type can be understood as the type of the entity relationship, and the entity relationship types between different entities can be the same or different. The entity relationship type may be classified according to actual situations, which is not limited in this embodiment. An entity relationship may be some sort of relationship between two or more entities.
In this embodiment, after the target network graph is obtained, the entity relationship between the entities may be determined according to the target network graph, so as to obtain the entity relationship types of all the entities. For example, when the entities include a teacher and a lesson, the type of entity relationship between the teacher and the lesson can be one-to-one or one-to-many, i.e., a teacher can teach one or more lessons. When the entities include students and courses, the types of entity relationships between the students and the courses may be one-to-one, one-to-many, many-to-one, and many-to-many, i.e., one student may learn one or more courses, and one course may also be learned by one or more students.
In this embodiment, the entity relationship identification may be completed in the following three ways:
the method comprises the following steps: extracting the triple relation from the target network graph G2 to represent the entity relation, and finishing the identification of the entity node relation;
the second method comprises the following steps: a relationship template is set in advance, and when the relationship between two entity categories is determined, the relationship between the entities can be determined using the relationship template. Filling slots according to different entity types and a set entity relationship template to obtain entity relationships;
the third method comprises the following steps: and (4) adopting a graph multi-hop inference technology to complete the relationship identification of two non-adjacent nodes.
The embodiment of the invention provides a relation extraction method based on data, which comprises the following steps: acquiring a plurality of table metadata, and preprocessing the table metadata to obtain a metadata set; completing a mapping task of metadata-entity based on the metadata set to obtain an entity set; constructing a network graph based on the entity set; completing map completion on the network map based on a map neural network algorithm to obtain a target network map; and finishing entity relationship identification based on the target network graph to obtain an entity relationship type. The method researches a corresponding entity mapping algorithm based on the table metadata to define the entity relationship type, solves the problem of large traffic during data classification and extraction in the prior art, can mine more comprehensive entity relationships, and improves the identification efficiency of the entity relationships.
In one embodiment, the pre-processing comprises: uniformly naming the same table metadata with different names in the plurality of table metadata; duplicate table metadata in table metadata having the same name is deleted.
Where the naming may be the name of the table metadata, the naming of different table metadata may be different. Repeating table metadata may be understood as table metadata in which the data is identical but the naming of the data is different.
In this embodiment, after obtaining the table metadata, the table metadata may be preprocessed, including: uniformly naming metadata with the same data but different names in the table metadata; for multiple table metadata with the same name, duplicate table metadata is deleted.
Example two
Fig. 2 is a schematic flow chart of a data-based relationship extraction method according to a second embodiment of the present invention, where the second embodiment is optimized based on the foregoing embodiments. In the present embodiment, step S120 is further embodied. As shown in fig. 2, a second data-based relationship extraction method provided in the embodiment of the present invention includes the following steps:
s210, obtaining a plurality of table metadata, and preprocessing the table metadata to obtain a metadata set.
And S220, mapping the table metadata in the metadata set according to a preset mapping rule to obtain an entity set.
The preset mapping rule may be a pre-designed mapping rule, and the preset mapping rule may be freely set according to an actual situation.
In this embodiment, after the metadata set is obtained, a preset mapping rule may be set according to a requirement, and the metadata in the metadata set may be mapped to an entity according to the preset mapping rule, so that the table metadata may be mapped to the entity according to the requirement. For example, the preset mapping rule may be designed as: taking single table information as an entity and table metadata as entity attributes; or the metadata information in the table is regarded as a single entity, that is, one table metadata corresponds to a plurality of entities.
S230, training and predicting an entity mapping relation table through a machine learning algorithm and a deep learning algorithm; and mapping the table metadata in the metadata set according to the mapping relation in the entity mapping relation table to obtain an entity set.
It should be noted that S220 and S230 are two parallel execution schemes, and one execution scheme may be selected.
The machine learning algorithm is based on machine learning, the machine learning is a science of artificial intelligence, and the main research object in the field is artificial intelligence, particularly how to improve the performance of a specific algorithm in empirical learning. The machine learning algorithm may include supervised learning algorithms, unsupervised learning algorithms, reinforcement learning algorithms, linear regression algorithms, and the like. The deep learning algorithm is developed on the basis of machine learning and is machine learning based on an artificial neural network. The deep learning algorithm may include a back propagation algorithm, a random gradient descent algorithm, a learning rate decay algorithm, and the like. The present embodiment does not limit the types of the machine learning algorithm and the deep learning algorithm.
The entity mapping relation table can be a table for recording mapping relations between table metadata and entities, and the entity mapping relation table is obtained based on training and prediction of a machine learning algorithm and a deep learning algorithm.
In this embodiment, after the metadata set is obtained, in addition to mapping the metadata in the metadata set to the entity according to the preset mapping rule, a suitable machine learning algorithm and a deep learning algorithm may be selected to train and predict the metadata set, so as to obtain the entity mapping relationship table T.
The mapping relationship in the entity mapping relationship table may be a mapping relationship between table metadata and an entity, and the mapping relationships between different table metadata and entities may be the same or different.
In this embodiment, after the entity mapping relationship table T is obtained, each table metadata in the metadata set may be mapped to a different entity according to the mapping relationship in the entity mapping relationship table T, so as to obtain an entity set.
S240, constructing a network graph based on the entity set.
In one embodiment, the constructing the network graph based on the set of entities includes: corresponding the entities in the entity set to the nodes in the network graph one by one; calculating a feature vector of the node; aiming at each entity, calculating a coding vector of each entity attribute in the entity, and splicing a plurality of coding vectors to obtain a fixed-length vector; determining the relevance among the nodes through the feature vectors of the nodes; and constructing a connecting edge between the two nodes with the association so as to connect the two nodes with the association.
Wherein, the node can be understood as the representation form of the entity in the network diagram, and the feature vector of the computing node can be understood as the feature vector of the computing entity. The entity attribute may be attribute information of the entity. . The fixed-length vector may be a vector resulting from concatenation of the encoded vectors. The connecting edge may be a connecting line between nodes, and nodes having an association may be connected by the connecting edge.
In this embodiment, after the entity set is obtained, each entity in the entity set may correspond to a node in the network graph one by one, a feature vector of each node and a code vector of all attributes of each entity are calculated, multiple code vectors may be spliced into one fixed-length vector, the association between nodes may be determined by the feature vectors of the nodes, and for two nodes having the association, the two nodes may be connected by a connecting edge, thereby forming the network graph G1.
And S250, completing map completion on the network map based on a map neural network algorithm to obtain a target network map.
And S260, completing entity relationship identification based on the target network diagram to obtain an entity relationship type.
In the method for extracting a relationship based on data provided in the second embodiment of the present invention, the entity set obtained by completing the metadata-entity mapping task based on the metadata set is further refined as follows: mapping table metadata in the metadata set according to a preset mapping rule to obtain an entity set; or training and predicting an entity mapping relation table through a machine learning algorithm and a deep learning algorithm; and mapping the table metadata in the metadata set according to the mapping relation in the entity mapping relation table to obtain an entity set. Therefore, the entity set can be more accurately determined according to different methods to define the entity relationship type, the problem of large traffic during data classification and extraction in the prior art is solved, more comprehensive entity relationships can be mined, and the identification efficiency of the entity relationships is improved.
In one embodiment, the completing entity relationship identification based on the target network graph to obtain an entity relationship type includes: constructing an entity relationship template, wherein the entity relationship template is provided with a plurality of entity slot filling structures, and one slot corresponds to one entity category; and filling the entities in the target network graph into corresponding slots in the entity relationship template according to the entity types to obtain the entity relationship types.
The entity relationship template may be understood as an entity relationship diagram, that is, an ER diagram, of relationships between entities, the entity slot filling structure may be understood as a frame in the ER diagram, a slot may refer to a position where an entity in the ER diagram is located, and an entity category may be a category of the entity.
In this embodiment, after the target network diagram is obtained, the type between each entity needs to be identified according to the target network diagram, for example, an entity relationship template may be constructed, and each entity may be filled into a corresponding slot in the entity relationship template according to the type of the entity, so as to obtain the entity relationship type.
EXAMPLE III
Fig. 3 is a schematic flow chart of a data-based relationship extraction method according to a third embodiment of the present invention, and the third embodiment performs optimization based on the foregoing embodiments. In the present embodiment, step S140 is further embodied. As shown in fig. 3, a relationship extraction method based on data provided in the third embodiment of the present invention includes the following steps:
s310, obtaining a plurality of table metadata, and preprocessing the table metadata to obtain a metadata set.
S320, completing a mapping task of metadata-entity based on the metadata set to obtain an entity set.
S330, constructing a network graph based on the entity set.
S340, predicting the possibility of the link between two nodes which do not generate the connecting edge in the network graph through a link prediction algorithm, and if the possibility of the link is greater than a preset value, connecting the two nodes.
The link prediction algorithm can predict lost edges or edges which may appear in the future in the graph, and the link prediction algorithm can be used for judging the intimacy degree between two adjacent nodes. A link may be understood as connecting two nodes together. The preset value can be a preset value, and the preset value can be set according to actual conditions.
In this embodiment, after the network graph is obtained, whether a lost edge exists in the network graph or the possibility of a link between two nodes that have not yet generated a link edge may be predicted through a link prediction algorithm, and if the possibility of a link is greater than a preset value, it indicates that the two nodes may have an edge in the future, and the two nodes may be connected.
And S350, completing the information of the nodes in the network graph through a node classification algorithm.
The node classification algorithm is an algorithm for classifying nodes, and the node classification algorithm is a technique commonly used in the prior art, which is not described in detail in this embodiment. Information completion is an operation of completing information.
In this embodiment, after the network graph is obtained, the nodes in the network graph may be detected through a node classification algorithm to determine whether there is a node with incomplete information, and information completion is performed on the node with incomplete information, so that the completeness of the node may be ensured.
And S360, completing the information of the edges in the network graph through an edge classification algorithm to obtain a target network graph.
The edge classification algorithm is an algorithm for classifying edges, and the edge classification algorithm is a technique commonly used in the prior art, which is not described in detail in this embodiment.
In this embodiment, after the network graph is obtained, edges in the network graph may be detected through an edge classification algorithm to determine whether there is an incomplete edge, and the incomplete edge is completed, so that the integrity of the edge may be ensured.
It should be noted that the execution order of S340, S350, and S360 is not limited, and may be executed simultaneously.
And S370, finishing entity relationship identification based on the target network graph to obtain an entity relationship type.
The third embodiment of the invention provides a relation extraction method based on data, which comprises the steps of obtaining a plurality of table metadata, and preprocessing the table metadata to obtain a metadata set; completing a mapping task of metadata-entity based on the metadata set to obtain an entity set; constructing a network graph based on the entity set; predicting the possibility of the link between two nodes which do not generate the connecting edges in the network graph through a link prediction algorithm, and if the possibility of the link is greater than a preset value, connecting the two nodes; completing information of the nodes in the network graph through a node classification algorithm; completing information of edges in the network graph through an edge classification algorithm to obtain a target network graph; and finishing entity relationship identification based on the target network graph to obtain an entity relationship type. The method researches a corresponding entity mapping algorithm based on the table metadata to define the entity relationship type, solves the problem of large traffic during data classification and extraction in the prior art, can mine more comprehensive entity relationships, and improves the identification efficiency of the entity relationships.
The embodiment of the invention provides a specific implementation mode on the basis of the technical schemes of the above embodiments.
Fig. 4 is a schematic flow chart of a data-based relationship extraction method according to an embodiment of the present invention, and as a specific implementation manner of the present invention, as shown in fig. 4, the method includes the following steps:
s1, obtaining table metadata and preprocessing the table metadata to obtain a metadata set X;
s2, obtaining an entity set S by adopting a mapping algorithm F (X) for the metadata set;
s3, constructing a graph network G1 according to the entity set S of the mapping relation table T;
s4, completing the network by adopting a graph neural network link prediction algorithm, a node classification algorithm and an edge classification algorithm to obtain a graph network G2 after completion;
the link prediction refers to how to predict the possibility of a link between two nodes that have not generated a connecting edge in a network through known information such as network nodes and network structures. Including both predictions of unknown links and predictions of future links. If there is a link between two nodes in the dataset, it is a positive sample. If no link exists for the known dataset, it is a negative example. Usually, the number of node pairs on the graph is less than that of node pairs without edges, and for balancing, only the same negative samples as the positive samples are used in each training round.
The algorithm process carries out feature extraction on positive and negative samples in the graph by adopting GraphSAGE, GNN, GAE or RGCN, and finally carries out dot product calculation on the obtained features or carries out a binary model by adopting MLP (Multi-level hierarchical processing), the whole model is trained to obtain a GraphSAGE model, and the model can complete unknown data link prediction and graph completion to obtain a graph network G2.
GraphSAGE is an index framework that can utilize vertex feature information (such as text attributes) to efficiently generate embedding for unseen vertices. GraphSAGE is intended to learn a node representation method of how to aggregate vertex features by sampling and aggregating from the local neighbors of a vertex, rather than training a separate embedding for each vertex. The GCN is a neural network architecture that operates on graph data, and is very powerful, even a randomly initialized two-layer GCN can generate useful feature representations of the nodes in the graph network.
In the present embodiment, one or more of the link prediction algorithm, the node classification algorithm, and the edge classification algorithm may be selected for use.
And S5, obtaining the entity relation type by carrying out graph reasoning algorithm on the graph network G2.
In this embodiment, in addition to obtaining the entity relationship type through the graph inference algorithm, the triple relationship can be extracted from G2 to represent the entity relationship, and the entity node relationship identification is completed; or setting a relationship template in advance, and under the condition that the relationship between two categories is determined, using the relationship template to fill the slots according to the set entity relationship template according to different entity categories to obtain the entity relationship.
Example four
Fig. 5 is a schematic structural diagram of a data-based relationship extraction apparatus according to a fourth embodiment of the present invention, which is applicable to a case of performing relationship extraction on a large amount of business data, where the apparatus may be implemented by software and/or hardware and is generally integrated on an electronic device. As shown in fig. 5, the apparatus includes:
the processing module 410 is configured to obtain a plurality of table metadata, and perform preprocessing on the plurality of table metadata to obtain a metadata set;
a completion module 420, configured to complete a metadata-entity mapping task based on the metadata set to obtain an entity set;
a construction module 430, configured to construct a network map based on the entity set;
a completion module 440, configured to complete atlas completion on the network graph based on a graph neural network algorithm to obtain a target network graph;
the identifying module 450 is configured to complete entity relationship identification based on the target network diagram to obtain an entity relationship type.
The embodiment provides a data-based relationship extraction device, which includes a processing module, a metadata extraction module and a metadata extraction module, wherein the processing module is used for acquiring a plurality of table metadata and preprocessing the plurality of table metadata to obtain a metadata set; a completion module, configured to complete a metadata-entity mapping task based on the metadata set to obtain an entity set; a construction module for constructing a network graph based on the entity set; the completion module is used for completing map completion on the network map based on a map neural network algorithm to obtain a target network map; the identification module is used for completing entity relationship identification based on the target network diagram to obtain the entity relationship type, so that the problem of large traffic during data classification and extraction in the prior art is solved, more comprehensive entity relationships can be mined, and the identification efficiency of the entity relationships is improved.
In one embodiment, the pre-processing comprises:
uniformly naming the same table metadata with different names in the plurality of table metadata;
duplicate table metadata in table metadata having the same name is deleted.
In one embodiment, the completion module 420 further comprises:
and the first mapping unit is used for mapping the table metadata in the metadata set according to a preset mapping rule to obtain an entity set.
In one embodiment, the completion module 420 further comprises:
the training unit is used for training and predicting the obtained entity mapping relation table through a machine learning algorithm and a deep learning algorithm;
and the second mapping unit is used for mapping the table metadata in the metadata set according to the mapping relation in the entity mapping relation table to obtain an entity set.
In one embodiment, the building module 430 further includes:
a corresponding unit, configured to correspond the entities in the entity set to nodes in a network graph one to one;
the first calculation unit is used for calculating a feature vector of the node;
the second calculation unit is used for calculating a coding vector of each entity attribute in each entity aiming at each entity, and splicing the multiple coding vectors to obtain a fixed-length vector;
the relevance determining unit is used for determining the relevance among the nodes through the feature vectors of the nodes;
and the edge constructing unit is used for constructing a connecting edge between the two nodes with the correlation so as to connect the two nodes with the correlation.
In one embodiment, the completion module 440 further comprises:
the prediction unit is used for predicting the possibility of the link between two nodes which do not generate the connecting edges in the network graph through a link prediction algorithm, and if the possibility of the link is greater than a preset value, the two nodes are connected;
the node completion unit is used for completing the information of the nodes in the network graph through a node classification algorithm;
and the edge completion unit is used for performing information completion on the edges in the network graph through an edge classification algorithm to obtain a target network graph.
In one embodiment, the identification module 450 further comprises:
the system comprises a relation construction unit, a relation classification unit and a relation classification unit, wherein the relation construction unit is used for constructing an entity relation template which is provided with a plurality of entity slot filling structures, and one slot corresponds to one entity type;
and the entity classification unit is used for filling the entities in the target network graph into corresponding slots in the entity relationship template according to the entity types to obtain the entity relationship types.
The data-based relation extraction device can execute the data-based relation extraction method provided by any embodiment of the invention, and has corresponding functional modules and beneficial effects of the execution method.
EXAMPLE five
FIG. 6 illustrates a schematic structural diagram of an electronic device 10 that may be used to implement an embodiment of the present invention. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital assistants, cellular phones, smart phones, wearable devices (e.g., helmets, glasses, watches, etc.), and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be exemplary only, and are not meant to limit implementations of the inventions described and/or claimed herein.
As shown in fig. 6, the electronic device 10 includes at least one processor 11, and a memory communicatively connected to the at least one processor 11, such as a Read Only Memory (ROM) 12, a Random Access Memory (RAM) 13, and the like, wherein the memory stores a computer program executable by the at least one processor, and the processor 11 can perform various suitable actions and processes according to the computer program stored in the Read Only Memory (ROM) 12 or the computer program loaded from a storage unit 18 into the Random Access Memory (RAM) 13. In the RAM 13, various programs and data necessary for the operation of the electronic apparatus 10 can also be stored. The processor 11, the ROM 12, and the RAM 13 are connected to each other via a bus 14. An input/output (I/O) interface 15 is also connected to the bus 14.
A number of components in the electronic device 10 are connected to the I/O interface 15, including: an input unit 16 such as a keyboard, a mouse, or the like; an output unit 17 such as various types of displays, speakers, and the like; a storage unit 18 such as a magnetic disk, an optical disk, or the like; and a communication unit 19 such as a network card, modem, wireless communication transceiver, etc. The communication unit 19 allows the electronic device 10 to exchange information/data with other devices via a computer network such as the internet and/or various telecommunication networks.
The processor 11 may be a variety of general and/or special purpose processing components having processing and computing capabilities. Some examples of processor 11 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various specialized Artificial Intelligence (AI) computing chips, various processors running machine learning model algorithms, a Digital Signal Processor (DSP), and any suitable processor, controller, microcontroller, or the like. The processor 11 performs the various methods and processes described above, such as a data-based relational extraction method.
In some embodiments, the data-based relationship extraction method may be implemented as a computer program tangibly embodied in a computer-readable storage medium, such as storage unit 18. In some embodiments, part or all of the computer program may be loaded and/or installed onto the electronic device 10 via the ROM 12 and/or the communication unit 19. When the computer program is loaded into RAM 13 and executed by processor 11, one or more steps of the data-based relationship extraction method described above may be performed. Alternatively, in other embodiments, the processor 11 may be configured to perform the data-based relational extraction method by any other suitable means (e.g., by means of firmware).
Various implementations of the systems and techniques described here above may be implemented in digital electronic circuitry, integrated circuitry, field Programmable Gate Arrays (FPGAs), application Specific Integrated Circuits (ASICs), application Specific Standard Products (ASSPs), system on a chip (SOCs), load programmable logic devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs that are executable and/or interpretable on a programmable system including at least one programmable processor, which may be special or general purpose, receiving data and instructions from, and transmitting data and instructions to, a storage system, at least one input device, and at least one output device.
Computer programs for implementing the methods of the present invention can be written in any combination of one or more programming languages. These computer programs may be provided to a processor of a general purpose computer, special purpose computer, or other programmable data processing apparatus, such that the computer programs, when executed by the processor, cause the functions/acts specified in the flowchart and/or block diagram block or blocks to be performed. A computer program can execute entirely on a machine, partly on a machine, as a stand-alone software package partly on a machine and partly on a remote machine or entirely on a remote machine or server.
In the context of the present invention, a computer-readable storage medium may be a tangible medium that can contain, or store a computer program for use by or in connection with an instruction execution system, apparatus, or device. A computer readable storage medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. Alternatively, the computer readable storage medium may be a machine readable signal medium. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
To provide for interaction with a user, the systems and techniques described here can be implemented on an electronic device having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and a pointing device (e.g., a mouse or a trackball) by which a user can provide input to the electronic device. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user can be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user can be received in any form, including acoustic, speech, or tactile input.
The systems and techniques described here can be implemented in a computing system that includes a back-end component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such back-end, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), wide Area Networks (WANs), blockchain networks, and the Internet.
The computing system may include clients and servers. A client and server are generally remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. The server can be a cloud server, also called a cloud computing server or a cloud host, and is a host product in a cloud computing service system, so that the defects of high management difficulty and weak service expansibility in the traditional physical host and VPS service are overcome.
It should be understood that various forms of the flows shown above may be used, with steps reordered, added, or deleted. For example, the steps described in the present invention may be executed in parallel, sequentially, or in different orders, and are not limited herein as long as the desired results of the technical solution of the present invention can be achieved.
The above-described embodiments should not be construed as limiting the scope of the invention. It should be understood by those skilled in the art that various modifications, combinations, sub-combinations and substitutions may be made in accordance with design requirements and other factors. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims (10)

1. A method for data-based relationship extraction, the method comprising:
acquiring a plurality of table metadata, and preprocessing the table metadata to obtain a metadata set;
completing a mapping task of metadata-entity based on the metadata set to obtain an entity set;
constructing a network graph based on the entity set;
completing map completion on the network map based on a map neural network algorithm to obtain a target network map;
and finishing entity relationship identification based on the target network graph to obtain an entity relationship type.
2. The method of claim 1, wherein the pre-processing comprises:
uniformly naming the same table metadata with different names in the plurality of table metadata;
duplicate table metadata in table metadata having the same name is deleted.
3. The method of claim 1, wherein completing a metadata-entity mapping task based on the set of metadata results in a set of entities, comprising:
and mapping the table metadata in the metadata set according to a preset mapping rule to obtain an entity set.
4. The method of claim 1, wherein completing a metadata-entity mapping task based on the set of metadata results in a set of entities, comprising:
training and predicting an entity mapping relation table through a machine learning algorithm and a deep learning algorithm;
and mapping the table metadata in the metadata set according to the mapping relation in the entity mapping relation table to obtain an entity set.
5. The method of claim 1, wherein the constructing the network graph based on the set of entities comprises:
corresponding the entities in the entity set to the nodes in the network graph one by one;
calculating a feature vector of the node;
aiming at each entity, calculating a coding vector of each entity attribute in the entity, and splicing a plurality of coding vectors to obtain a fixed-length vector;
determining the relevance among the nodes through the feature vectors of the nodes;
and constructing a connecting edge between the two nodes with the association so as to connect the two nodes with the association.
6. The method of claim 1, wherein the graph-based neural network algorithm completes graph completion on the network graph to obtain a target network graph, and comprises:
predicting the possibility of the link between two nodes which do not generate the connecting edge in the network graph through a link prediction algorithm, and if the possibility of the link is greater than a preset value, connecting the two nodes;
completing information of the nodes in the network graph through a node classification algorithm;
and completing the information of the edges in the network graph through an edge classification algorithm to obtain a target network graph.
7. The method of claim 1, wherein the performing entity relationship identification based on the target network graph to obtain an entity relationship type comprises:
constructing an entity relationship template, wherein the entity relationship template is provided with a plurality of entity slot filling structures, and one slot corresponds to one entity category;
and filling the entities in the target network graph into corresponding slots in the entity relationship template according to the entity types to obtain the entity relationship types.
8. A data-based relationship extraction apparatus, the apparatus comprising:
the processing module is used for acquiring a plurality of table metadata and preprocessing the table metadata to obtain a metadata set;
a completion module, configured to complete a metadata-entity mapping task based on the metadata set to obtain an entity set;
a construction module for constructing a network graph based on the entity set;
the completion module is used for completing map completion on the network graph based on a graph neural network algorithm to obtain a target network graph;
and the identification module is used for finishing entity relationship identification based on the target network graph to obtain the entity relationship type.
9. An electronic device, characterized in that the electronic device comprises:
at least one processor; and
a memory communicatively coupled to the at least one processor; wherein the content of the first and second substances,
the memory stores a computer program executable by the at least one processor, the computer program being executable by the at least one processor to enable the at least one processor to perform the data-based relationship extraction method of any one of claims 1-7.
10. A computer-readable storage medium storing computer instructions for causing a processor to perform the data-based relationship extraction method of any one of claims 1-7 when executed.
CN202211335834.9A 2022-10-28 2022-10-28 Data-based relation extraction method, device, equipment and storage medium Pending CN115640408A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211335834.9A CN115640408A (en) 2022-10-28 2022-10-28 Data-based relation extraction method, device, equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211335834.9A CN115640408A (en) 2022-10-28 2022-10-28 Data-based relation extraction method, device, equipment and storage medium

Publications (1)

Publication Number Publication Date
CN115640408A true CN115640408A (en) 2023-01-24

Family

ID=84946641

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211335834.9A Pending CN115640408A (en) 2022-10-28 2022-10-28 Data-based relation extraction method, device, equipment and storage medium

Country Status (1)

Country Link
CN (1) CN115640408A (en)

Similar Documents

Publication Publication Date Title
WO2023065545A1 (en) Risk prediction method and apparatus, and device and storage medium
CN112732919B (en) Intelligent classification label method and system for network security threat information
CN113722493B (en) Text classification data processing method, apparatus and storage medium
CN114265979A (en) Method for determining fusion parameters, information recommendation method and model training method
CN114329201A (en) Deep learning model training method, content recommendation method and device
CN112784591B (en) Data processing method and device, electronic equipment and storage medium
CN115293149A (en) Entity relationship identification method, device, equipment and storage medium
CN115688920A (en) Knowledge extraction method, model training method, device, equipment and medium
CN112560461A (en) News clue generation method and device, electronic equipment and storage medium
CN114299194A (en) Training method of image generation model, image generation method and device
CN114511249A (en) Grid partitioning method and device, electronic equipment and storage medium
CN110019796A (en) A kind of user version information analysis method and device
CN116578671A (en) Emotion-reason pair extraction method and device
CN115186738B (en) Model training method, device and storage medium
CN114239583B (en) Method, device, equipment and medium for training entity chain finger model and entity chain finger
CN115640408A (en) Data-based relation extraction method, device, equipment and storage medium
CN115640399A (en) Text classification method, device, equipment and storage medium
CN116150283A (en) Knowledge graph body construction system, method, electronic equipment and storage medium
CN114817476A (en) Language model training method and device, electronic equipment and storage medium
CN113886543A (en) Method, apparatus, medium, and program product for generating an intent recognition model
CN113535946A (en) Text identification method, device and equipment based on deep learning and storage medium
CN117667606B (en) High-performance computing cluster energy consumption prediction method and system based on user behaviors
CN114969273B (en) College entrance examination professional recommendation method, device, equipment and storage medium
Zhang et al. SG-RC: SG-CIM Grid Knowledge Graph Relationship Complementation Model Based on Entropy Uncertainty and Semantic Recognition
CN113360624B (en) Training method, response device, electronic device and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination