CN112181952B

CN112181952B - Method, system, device and storage medium for constructing data model

Info

Publication number: CN112181952B
Application number: CN202011366878.9A
Authority: CN
Inventors: 张玉天; 谈元鹏; 蒲天骄
Original assignee: China Electric Power Research Institute Co Ltd CEPRI
Current assignee: China Electric Power Research Institute Co Ltd CEPRI
Priority date: 2020-11-30
Filing date: 2020-11-30
Publication date: 2021-12-14
Anticipated expiration: 2040-11-30
Also published as: CN112181952A

Abstract

The invention discloses a data model construction method, a system, equipment and a storage medium, comprising the following steps: 1) fusing each layer of objects in the data structure relation of the power data model by taking the SG-CIM model as a reference to form a big data model; 2) carrying out structured representation on each pair of object groups in the big data model, and then measuring the rationality of each pair of object groups in the big data model; 3) constructing a generator and a discriminator, performing counterstudy training on the generator based on the discriminator, the measurement result in the step 2) and the head entity and the tail entity of the real object group, and then generating a data structure relationship by using the trained generator to complete the construction of a data model based on the generation of the counterstudy and the knowledge fusion.

Description

Method, system, device and storage medium for constructing data model

Technical Field

The invention belongs to the field of power systems, and relates to a data model construction method, a data model construction system, data model construction equipment and a storage medium.

Background

Data are continuously generated along with development of an electric power system, acquisition and storage systems of different data models are established in various electric power professional fields according to professional requirements, and complex and huge electric power related data resources pose huge challenges to data management. Meanwhile, the technology of data application is increasingly enriched along with the rapid development of big data and artificial intelligence, the value of data analysis and mining is increasingly highlighted, the scheme based on the data-driven algorithm is very serious in dependence on data, data is required to be accumulated in a multi-dimension, high frequency, long accumulation and the like as much as possible, but in actual work, the model design of data acquisition is usually lagged behind the application, and the data analysis and mining are also limited by the available data condition.

The existing public data model (SG-CIM) of the national power grid company enterprise is a multi-level and full-coverage data model design for electric power data on the basis of a public information model (CIM) according to the expansion and reorganization of the CIM by the actual business of the national power grid company. However, the problems of incomplete coverage and description capability, insufficient consideration of the incidence relation between the model and the asset attribute and the like exist, so that a plurality of researches on SG-CIM-based extension or other data models exist, for example, a CIM-based power grid dispatching center application model, a CIM-based power grid graph maintenance and sharing scheme, a CIM-based power grid planning management decision auxiliary system and the like are provided, and the aim of designing a more complete data model is taken. However, it is difficult to get rid of the human experience, whether SG-CIM or optimization schemes rely on knowledge about individuals or small groups when designing data models. Therefore, a new approach is needed for designing a scientific, advanced, and efficient data model.

The SG-CIM model is a data model which is built by a national power grid company based on CIM standard extension, any original class of CIM is not changed in the extension process, and all modifications are carried out on the newly inherited class. The SG-CIM divides 12 primary theme domains and a plurality of secondary theme domains, and classifies and collects more than 1000 classes according to services. There are two main methodologies for data model design:

and designing from bottom to top, namely combing data entities accessed by an unstructured platform of a national network company and relations among the data entities, abstracting and refining the data entities, analyzing and merging data subject domains to which the data entities belong, analyzing relations among the subject domains and forming an unstructured data association model.

The top-down design is that the business requirements of unstructured data of each business line are sorted by analyzing and abstracting downwards from business targets and combining the existing business system, and according to the business process, key entities are abstracted, the relationship between the subject domain of the entity and the entity is analyzed, and the incidence relationship between the unstructured data entity and the structured data forms an unstructured data incidence model.

SC-CIM can not meet the requirements of a full-service data model at present, and the problems that the covering surface and the description capability are incomplete, the incidence relation between the model and the asset attribute is not fully considered and the like still exist, so that a plurality of SG-CIM-based expansions or other data model researches are provided, for example, a CIM-based power grid dispatching center application model, a CIM-based power grid graph maintenance and sharing scheme, a CIM-based power grid planning management decision auxiliary system and the like are provided, and the aim of designing a more complete data model is taken. However, no matter the SG-CIM or the optimization scheme is based on the knowledge of individuals or small groups when designing the data model, it is difficult to get rid of the constraint of human experience, so the comprehensiveness of the coverage object is poor, and no development margin is provided, and a new method is needed for designing an advanced data model.

Disclosure of Invention

The present invention is directed to overcoming the above-mentioned shortcomings in the prior art, and providing a method, a system, a device and a storage medium for constructing a data model, which can improve the comprehensiveness of a data model covering an object and have a development margin.

In order to achieve the above purpose, the method for constructing the data model comprises the following steps:

fusing each layer of objects in the data structure relation of the power data model by taking the SG-CIM model as a reference to form a big data model;

carrying out structural representation on each pair of object groups in the big data model, and measuring the rationality of each pair of object groups in the big data model according to the structural representation result to obtain a measurement result;

acquiring a head entity and a tail entity of a real object group, constructing a generator and a discriminator, performing counterstudy training on the generator based on the discriminator, the acquired measurement result and the head entity and the tail entity of the real object group, generating a data structure relationship by using the trained generator, and constructing a data model according to the data structure relationship generated by the generator.

Taking the SG-CIM model as a reference, fusing objects of each layer in the data structure relationship of the electric power data model to form a big data model, wherein the concrete operation is as follows:

collecting each power data model, extracting a data structure relation of each power data model, calculating the similarity between objects of each layer in the data structure relation layer by layer from bottom to top by taking the SG-CIM as a reference from a subtopic domain in the SG-CIM, and fusing two layers of objects with the similarity being more than or equal to a preset threshold value into a new object to form a big data model.

The similarity between two-layer objects is the sum of the similarities between the sub-layer objects in the two-layer object.

And calculating the similarity between the sub-layer objects based on the electric power word vector.

Carrying out structural representation on each pair of object groups in the big data model, and carrying out measurement on the rationality of each pair of object groups in the big data model according to the result of the structural representation comprises the following specific operations:

for each pair of object groups in big data model

Carrying out structured representation, wherein h is a head entity, r is a relation, t is a tail entity, and a projection matrix is defined for the relation r

Projecting the vector of the head entity and the vector of the tail entity from the entity space to a subspace of the relation r, wherein the head entity vector in the subspace

Tail entity vector in subspace

Relationships in subspaces

And calculating the distance between the two entities, and measuring the rationality of each object group according to the distance between the entities.

In the counterstudy training process, random sampling replacement is carried out on a head entity and a tail entity of a real object group to form disturbance, and then the disturbance is converted by a longitude quantity characteristic encoder and then input into a generator.

The loss function for the counterlearning training is:

wherein the content of the first and second substances,

in the interest of expectation,

as a feedback parameter for the discriminator,

representing a generating function consisting of a measurement sample and a random perturbation,

the weighted vector sum of the measures is prefaced, z represents the random perturbation, and x represents the true sample.

A construction system of a data model includes:

the generating module is used for fusing each layer of objects in the data structure relation of the power data model by taking the SG-CIM model as a reference to form a big data model;

the measurement module is used for carrying out structural representation on each pair of object groups in the big data model, measuring the rationality of each pair of object groups in the big data model and obtaining a measurement result;

the confrontation learning module is used for acquiring a head entity and a tail entity of the real object group, constructing a generator and a discriminator, carrying out confrontation learning training on the generator based on the measurement result of the discriminator and the measurement module and the head entity and the tail entity of the real object group, generating a data structure relationship by using the trained generator, and constructing a data model according to the data structure relationship generated by the generator.

A computer device comprising a memory, a processor and a computer program stored in the memory and executable on the processor, the processor implementing the steps of the method of constructing the data model when executing the computer program.

A computer-readable storage medium, in which a computer program is stored which, when being executed by a processor, carries out the steps of the method of constructing the data model.

The invention has the following beneficial effects:

when the method, the system, the equipment and the storage medium for constructing the data model are specifically operated, firstly, objects of all layers in the data structure relationship of the electric power data model are fused to form a big data model, then, a generator and a discriminator are constructed, then, a counterstudy method is utilized, counterstudy training is carried out on the generator based on the discriminator, the combined rational measurement result of each pair of objects in the big data model and the head entity and the tail entity of the real object group, and in the process of the counterstudy training, objects which are possibly not used are generated to break through the limitation of human experience, so that the finally obtained data model covers the objects more comprehensively and has certain development margin.

Further, in the process of the countercheck learning training, random sampling replacement is carried out on head entities and tail entities of the real object group to form disturbance, so that objects which are not used by the data model are partially generated, and the comprehensiveness of the data structure model covering the objects is improved.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, are included to provide a further understanding of the invention, and are incorporated in and constitute a part of this specification, illustrate exemplary embodiments of the invention and together with the description serve to explain the invention and not to limit the invention. In the drawings:

FIG. 1 is a perspective view of a space;

FIG. 2 is a flow chart of generating a countermeasure;

FIG. 3 is a flow chart of the present invention.

Detailed Description

The present invention will be described in detail below with reference to the embodiments with reference to the attached drawings. It should be noted that the embodiments and features of the embodiments in the present application may be combined with each other without conflict.

The following detailed description is exemplary in nature and is intended to provide further details of the invention. Unless otherwise defined, all technical terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs. The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of exemplary embodiments according to the invention.

Referring to fig. 3, the method for constructing a data model according to the present invention includes the following steps:

1) fusing each layer of objects in the data structure relation of the power data model by taking the SG-CIM model as a reference to form a big data model;

specifically, the electric power data models are collected, the data structure relation of the electric power data models is extracted, the similarity between the objects in each layer in the data structure relation is calculated layer by layer from bottom to top from the subtopic domain in the SG-CIM model by taking the SG-CIM model as a reference, the similarity between the objects in the two layers is the sum of the similarities between the objects in the sub-layers in the objects in the two layers, the similarity between the objects in the sub-layers is calculated based on the electric power word vector, and the objects in the two layers with the similarity larger than or equal to a preset threshold are fused into a new object to form the big data model.

2) Carrying out structural representation on each pair of object groups in the big data model, and measuring the rationality of each pair of object groups in the big data model according to the structural representation result to obtain a measurement result;

specifically, for each pair of object groups in the big data model

Tail entity vector in subspace

Relationships in subspaces

3) Acquiring a head entity and a tail entity of the real object group, constructing a generator and a discriminator, performing counterstudy training on the generator based on the discriminator, the measurement result of the step 2) and the head entity and the tail entity of the real object group, generating a data structure relationship by using the trained generator, constructing a data model according to the data structure relationship, and completing the construction of the data model based on the generation of the counterstudy and the knowledge fusion.

The specific process is as follows:

31) generating a sample based on generating the countermeasure network delta;

inputting the entity pairs of the data structure relationship into a metric feature encoder to calculate an entity approximation representation (true data) of the data structure relationship, selecting the object group as a standard set

I.e. training, positive samples

Representing an approximation to a standard set, negative examples

Represents the score of a positive sample, far from the standard set

Setting a loss function for cosine similarity between the positive sample and the standard set and negative sample scoring similarity

Wherein, in the step (A),

in order for the parameters to be learned,

for the margin, the generator then replaces h or t in the real data samples with random sampling, generating the relationship embedding (dummy data). Randomly sampling h or t in the real data sample to replace to form disturbance, converting the disturbance by a longitude quantity characteristic encoder to be used as the input of a generator, and setting a loss function

Wherein, in the step (A),

as a feedback parameter for the discriminator,

representing a generating function formed by a measurement sample and random disturbance, finally, distinguishing real data from pseudo data by a discriminator, distributing correct relation types for the real data and the pseudo data, and setting a loss function

Wherein, in the step (A),

in the interest of expectation,

as a feedback parameter for the discriminator,

And randomly sampling h or t in the real data sample, replacing and inputting the h or t into the generator, distinguishing a new object group output by the generator through artificial knowledge, reserving the forward object group, optimizing the sample set, substituting the sample set into the generation countermeasure learning iteration until a termination condition is met, wherein the termination condition is that no new retainable forward object group appears any more.

It should be noted that the data sample size is increased by generating the countermeasure method, meanwhile, with the addition of random disturbance, part of samples may be objects which are not used by the existing data model, positive samples are reserved, the process is repeated, iterative optimization is realized, human experience limitation is broken through to a certain extent, and the finally generated data model not only fuses the objects of the existing various data models, but also generates part of objects which are possibly useful and unused, so that the final data model covers the objects more comprehensively and has a certain development margin, and the science and the advancement of the design method are reflected.

The construction system of the data model comprises:

the measurement module is used for carrying out structural representation on each pair of object groups in the big data model and then measuring the rationality of each pair of object groups in the big data model;

and the confrontation learning module is used for acquiring a head entity and a tail entity of the real object group, constructing a generator and a discriminator, carrying out confrontation learning training on the generator based on the measurement results of the discriminator and the measurement module and the head entity and the tail entity of the real object group, generating a data structure relationship by using the trained generator, constructing a data model according to the data structure relationship, and finishing the construction of the data model based on the generation of the confrontation learning and the knowledge fusion.

A computer-readable storage medium, in which a computer program is stored which, when being executed by a processor, carries out the steps of the method of constructing a data model.

As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

Finally, it should be noted that: the above embodiments are only for illustrating the technical solutions of the present invention and not for limiting the same, and although the present invention is described in detail with reference to the above embodiments, those of ordinary skill in the art should understand that: modifications and equivalents may be made to the embodiments of the invention without departing from the spirit and scope of the invention, which is to be covered by the claims.

Claims

1. A method for constructing a data model is characterized by comprising the following steps:

acquiring a head entity and a tail entity of a real object group, constructing a generator and a discriminator, carrying out countermeasure learning training on the generator based on the discriminator, the acquired measurement result and the head entity and the tail entity of the real object group, generating a data structure relationship by using the trained generator, and constructing a data model according to the data structure relationship generated by the generator;

collecting each power data model, extracting a data structure relationship of each power data model, calculating similarity between objects of each layer in the data structure relationship layer by layer from bottom to top from a subtopic domain in the SG-CIM model by taking the SG-CIM model as a reference, and fusing two layers of objects with the similarity being more than or equal to a preset threshold value into a new object to form a big data model;

carrying out structured representation on each pair of object groups (h, r, t) in the big data model, wherein h is a head entity, r is a relation, t is a tail entity, and a projection matrix M is defined for the relation r_r∈R^d×kProjecting the vector of the head entity and the vector of the tail entity from the entity space to a subspace of the relation r, wherein the head entity vector h in the subspace_r＝hM_rTail entity vector t in subspace_r＝tM_rThe relation r in subspace is approximately equal to h_r-t_rCalculating the distance between two entities, and measuring the rationality of each object group according to the distance between the entities;

the similarity between the two-layer objects is the sum of the similarities between the sub-layer objects in the two-layer objects;

calculating the similarity between sub-layer objects based on the electric power word vector;

2. The method of constructing a data model of claim 1, wherein the loss function for the counterlearning training is:

wherein the content of the first and second substances,

to expect, D_φAs feedback parameter of the discriminator, G_θ(T_rZ) represents a generating function consisting of a measurement sample and a random perturbation, T_rThe weighted vector sum of the measures is prefaced, z represents the random perturbation, and x represents the true sample.

3. A system for building a data model, comprising:

the confrontation learning module is used for acquiring a head entity and a tail entity of the real object group, constructing a generator and a discriminator, carrying out confrontation learning training on the generator based on the measurement results of the discriminator and the measurement module and the head entity and the tail entity of the real object group, generating a data structure relationship by using the trained generator, and constructing a data model according to the data structure relationship generated by the generator;

carrying out structured representation on each pair of object groups (h, r, t) in the big data model, wherein h is a head entity, r is a relation, t is a tail entity, and a projection matrix M is defined for the relation r_r∈R^d×kProjecting the vector of the head entity and the vector of the tail entity from the entity space to the relationshipr, where the head entity vector h in the subspace_r＝hM_rTail entity vector t in subspace_r＝tM_rThe relation r in subspace is approximately equal to h_r-t_rCalculating the distance between two entities, and measuring the rationality of each object group according to the distance between the entities;

4. A computer device comprising a memory, a processor and a computer program stored in the memory and executable on the processor, characterized in that the processor implements the steps of the method of constructing a data model according to any one of claims 1 to 2 when executing the computer program.

5. A computer-readable storage medium, in which a computer program is stored which, when being executed by a processor, carries out the steps of the method of constructing a data model according to any one of claims 1 to 2.