CN113672603A

CN113672603A - Multi-source heterogeneous electric power big data automatic label implementation method and system

Info

Publication number: CN113672603A
Application number: CN202110919217.2A
Authority: CN
Inventors: 闾海荣; 许瑞坤; 王维笑; 孙艺新; 李心达; 崔维平; 黄林; 李科
Original assignee: Tsinghua University; State Grid Energy Research Institute Co Ltd; State Grid Sichuan Electric Power Co Ltd
Current assignee: Tsinghua University; State Grid Energy Research Institute Co Ltd; State Grid Sichuan Electric Power Co Ltd
Priority date: 2021-08-11
Filing date: 2021-08-11
Publication date: 2021-11-19

Abstract

The invention provides a multisource heterogeneous power big data automatic label implementation method, which comprises the steps of firstly taking each power participant as a node to form a distributed computer model, acquiring the stored data and the information data of each node, storing the classified storage data and information data in a meta information database according to the subject category, preprocessing the data in the meta information database to form normalized data, then the preset automatic label generation rule or the automatic label generation model is called to create the calculation container locally, and jointly executing a tag rule or algorithm based on the computation containers through a MapReduce parallel framework to generate corresponding tags for the normalized data, therefore, the label technology of multi-source heterogeneous electric power big data is realized, the label technology is oriented to equipment labels and user labels, and business and intelligent label service and application display are provided for electric power monitoring and intelligent analysis.

Description

Multi-source heterogeneous electric power big data automatic label implementation method and system

Technical Field

The invention relates to the technical field of informatization management, in particular to a method and a system for realizing an electric power big data automation label based on multi-source isomerism.

Background

With the rapid increase of power demand, the power grid scale is enlarged, and the structure and management mode of the power business system are complicated. Under the background environment of rapid increase of power equipment, the traditional management and analysis mode of centralized power data cannot meet the application requirements for high parallelism, high reliability and high fault tolerance in multi-source data management, and the problems of non-uniform data caliber, inconsistent governing standards, incompatible analysis methods and the like exist in a multi-system cross-department collaborative scene. The equipment and the corresponding user data among different mechanisms, different stations cannot be uniformly converged on the platform for management and analysis, interconnection and intercommunication among systems cannot be realized, and the problems of low efficiency, data privacy safety and the like exist in the management and analysis of multi-source heterogeneous data.

Therefore, there is a need for a method or system to enable management of power equipment and user data for multiple sources across an organization.

Disclosure of Invention

In view of the above problems, the present invention aims to provide a method and a system for implementing an automatic label for big electric power data based on multi-source heterogeneous technologies, so as to solve the problems that the traditional centralized electric power data management and analysis mode cannot meet the application requirements for high parallelism, high reliability and high fault tolerance in multi-source data management, and the data aperture is not uniform, the governing standards are not consistent, and the analysis methods are not compatible in a multi-system cross-department collaborative scene.

The invention provides a method for realizing an automatic label of power big data based on multi-source isomerism, which comprises the following steps:

taking each power participant as a node to form a distributed computer model;

acquiring storage data and information data of each node in the distributed computer model, classifying according to the subject category, and storing the storage data and the information data classified according to the subject category in a meta-information database;

preprocessing data in the meta-information database based on an algorithm library to form normalized data;

calling a preset automatic label generation rule or an automatic label generation model to create a computation container locally, and executing the label rule or algorithm together based on the computation container through a MapReduce parallel framework to generate a corresponding label for the normalized data.

Preferably, the power participants at least comprise a station system, a business system and a production system.

Preferably, the process of acquiring the storage data and the information data of each node and storing the categorized storage data and information data in the meta information database according to the subject category includes:

collecting local data about each node, and storing the local data into a pre-constructed local database to form stored data;

enabling each node to register the node with a service scheduling module of the distributed computer model, and acquiring a physical position and a node website of each node to acquire information data;

creating a topic for the stored data and the information data;

and dividing the stored data and the information data according to the data specification to which the subject belongs to form specification data, and storing the specification data in a meta information database corresponding to the subject to which the specification data belongs.

Preferably, the data specification includes at least: data standards, data structures, data formats, data types, data precision, and the names of the devices to which the data belongs.

Preferably, the process of dividing the stored data and the information data according to the data specification includes:

each node acquires a preset theme list through the service scheduling module;

selecting a theme from the theme list for subscription, and acquiring a data specification related to the selected theme;

and dividing the stored data and the information data into subjects according to the data specification.

Preferably, before preprocessing the data in the meta-information database based on an algorithm library to form normalized data, the method further comprises:

formulating an automatic label generation rule through a rule engine, and establishing an algorithm model; wherein the algorithm model comprises a preprocessing model based on a preprocessing algorithm and an automatic label generation model based on machine learning;

and storing the automatic label generation rule and the algorithm model into an algorithm library.

Preferably, the process of preprocessing the data in the meta-information database based on an algorithm library to form normalized data comprises:

enabling each node to call the preprocessing model in the algorithm library according to the data in the meta-information database;

judging whether a preprocessing algorithm corresponding to the data in the meta-information database exists in the preprocessing model or not; wherein,

if a preprocessing algorithm corresponding to the data in the meta-information database exists, preprocessing the data in the meta-information database based on the preprocessing algorithm to form normalized data; if the preprocessing algorithm corresponding to the data in the meta-information database does not exist, a local preprocessing algorithm corresponding to the data in the meta-information database is built locally at a node, and the data in the meta-information database is preprocessed through the local preprocessing algorithm to form normalized data;

wherein the pre-processing algorithm comprises: data cleaning, data integration and data specification.

Preferably, after jointly executing the label rule or algorithm based on the computation container through a MapReduce parallel framework to generate a corresponding label for the normalized data, the method further includes:

storing the label and a model formed after running or training according to the label into a label library;

connecting the tag library with the meta-information database, and establishing an external search query link of the tag library;

and performing label query through the search query link to call data corresponding to the label.

The invention also provides a system for realizing the automatic label of the large power data based on the multisource isomerism, which is realized as described above and comprises the following steps:

the computer service unit is used for forming a distributed computer model by taking each power participant as a node;

the data service unit is used for acquiring the stored data and the information data of each node in the distributed computer model, classifying the stored data and the information data according to the theme class, storing the stored data and the information data which are classified according to the theme class in a meta-information database, and preprocessing the data in the meta-information database based on an algorithm library to form normalized data;

and the tag center unit is used for calling a preset automatic tag generation rule or an automatic tag generation model to create a calculation container locally, and executing the tag rule or algorithm together based on the calculation container through a MapReduce parallel framework to generate a corresponding tag for the normalized data.

Preferably, the computer service unit comprises a service scheduling module and a storage module, wherein,

the service scheduling module is used for providing service interfaces for each node, and the service interfaces at least comprise registration interfaces for performing node registration on each node, so that each node obtains a theme interface of a preset theme list; the theme interface comprises a theme subscription interface, a theme meta-information interface, a main body standard interface and a theme modeling interface;

the storage module is used for bearing an algorithm library and storing a rule engine to formulate an automatic label generation rule; the algorithm library stores the automatic label generation rule and the algorithm model; the algorithm model comprises a preprocessing model based on a preprocessing algorithm and an automatic label generation model based on machine learning.

According to the technical scheme, the method and the system for realizing the automatic label of the large power data based on the multisource isomerism, provided by the invention, have the advantages that each power participant is taken as a node to form a distributed computer model, the stored data and the information data of each node are obtained, the classified stored data and the classified information data are stored in a meta-information database according to the subject category, the data in the meta-information database are preprocessed based on an algorithm library to form normalized data, then a preset automatic label generation rule or an automatic label generation model is called to create a calculation container locally, and the label rule or algorithm is executed together based on the calculation container through a MapReduce parallel framework to generate a corresponding label for the normalized data, so the problem that the uniform caliber cannot be realized in a multi-department cross-system is solved, an information isolated island in a data management system is broken, the method realizes interconnection and intercommunication of power information management instead of directly providing original data, realizes the labeling technology of multisource heterogeneous power big data by using a distributed storage and calculation framework and combining a containerization technology in a mode of rule-based processing and machine learning processing, and provides business and intelligent labeling service and application display for power monitoring and intelligent analysis by facing to equipment labels and user labels.

Drawings

Other objects and results of the present invention will become more apparent and more readily appreciated as the same becomes better understood by reference to the following specification taken in conjunction with the accompanying drawings. In the drawings:

fig. 1 is a flowchart of an implementation method of an electric power big data automation tag based on multi-source isomerism according to an embodiment of the present invention;

fig. 2 is a schematic diagram of a power big data automation tag implementation system based on multi-source heterogeneous according to an embodiment of the present invention.

Detailed Description

The traditional centralized power data management and analysis mode cannot meet the application requirements for high parallelism, high reliability and high fault tolerance in multi-source data management, and the problems of non-uniform data apertures, inconsistent treatment standards, incompatible analysis methods and the like exist in a multi-system cross-department collaborative scene. The equipment and the corresponding user data among different mechanisms, different stations cannot be uniformly converged on the platform for management and analysis, interconnection and intercommunication among systems cannot be realized, and the problems of low efficiency, data privacy safety and the like exist in the management and analysis of multi-source heterogeneous data.

In view of the above problems, the present invention provides a method and a system for implementing an automatic label for big power data based on multi-source heterogeneous technologies, and the following describes in detail a specific embodiment of the present invention with reference to the accompanying drawings.

In order to illustrate the method and system for implementing an automatic label of power big data based on multi-source isomerism provided by the invention, fig. 1 shows an exemplary indication of the method for implementing an automatic label of power big data based on multi-source isomerism according to the embodiment of the invention; fig. 2 shows an exemplary designation of the power big data automation tag implementation system based on multi-source heterogeneous technology according to the embodiment of the present invention.

The following description of the exemplary embodiment(s) is merely illustrative in nature and is in no way intended to limit the invention, its application, or uses. Techniques and apparatus known to those of ordinary skill in the relevant art may not be discussed in detail, but are intended to be considered a part of the specification where appropriate.

As shown in fig. 1, the method for implementing an automatic label for big power data based on multi-source heterogeneous provided by the embodiment of the present invention includes:

s1: taking each power participant as a node to form a distributed computer model;

s2: acquiring storage data and information data of each node in a distributed computer model, classifying according to the subject category, and storing the storage data and the information data classified according to the subject category in a meta-information database;

s3: preprocessing data in the meta-information database based on an algorithm library to form normalized data;

s4: and calling a preset automatic label generation rule or an automatic label generation model to create a computation container locally, and executing a label rule or algorithm together based on the computation container through a MapReduce parallel framework to generate a corresponding label for the normalized data.

As shown in fig. 1, step S1 is a process of building a distributed computer model, that is, a process of building a multi-source heterogeneous whole computer model, where each participant (each node) in the multi-source heterogeneous system at least includes a station system, a business system, and a production system, that is, physical resources or entity nodes such as the station system, the business system, and the production system respectively build and store data, construct a local distributed database, and respectively store data of each node in a distributed database of each node.

In the embodiment shown in fig. 1, step S2 is to acquire the storage data and information data of each node, and store the categorized storage data and information data in the meta information database according to the subject category, where the step includes:

s21: acquiring local data of each node, and storing the local data into a pre-constructed local database to form stored data;

s22: enabling each node to register the node with a service scheduling module of the distributed computer model, and acquiring the physical position and the node website of each node to acquire information data;

s23: establishing a theme for the stored data and the information data;

s24: dividing the stored data and the information data according to the data specification to form specification data, and storing the specification data in a meta-information database corresponding to the subject to which the specification data belongs; wherein the data specification at least comprises: data standard, data structure, data format, data type, data precision and data belonging equipment name;

the process of dividing the stored data and the information data according to the data specification comprises the following steps:

s241: each node acquires a preset theme list through a service scheduling module;

s242: selecting a theme from a theme list for subscription, and acquiring a data specification related to the selected theme;

s243: and dividing the stored data and the information data into subjects according to the data specification.

Specifically, in steps S21 and S22, first, physical resources or entity nodes in the multi-source heterogeneous system are registered with the service scheduling module, relevant information of the entity nodes including physical locations and node addresses is provided to obtain information data, storage data about each node is established, a local distributed database is constructed, and then the storage data and the information data in the system are stored in the distributed database.

In steps S23, S24, topics are established for the storage data and the information data, that is, a system administrator or a project manager establishes corresponding topics for the requirements of the application, and at the same time, relevant information of the topics is defined, including: and then dividing the stored data and the information data according to data specifications such as data standard, data structure, data format, data type, data precision, data belonging equipment name and the like, and storing the divided data and information data into a meta-information database of the theme.

In the embodiment shown in fig. 1, step S3 is a process of preprocessing data in the meta information database based on the algorithm library to form normalized data, wherein step S0 is further included before preprocessing data in the meta information database based on the algorithm library to form normalized data, and step S0 includes:

s01: formulating an automatic label generation rule through a rule engine, and establishing an algorithm model; the algorithm model comprises a preprocessing model based on a preprocessing algorithm and an automatic label generation model based on machine learning;

s02: and storing the automatic label generation rule and the algorithm model into an algorithm library.

Specifically, before step S3, a rule needs to be pre-established, that is, an automatic label generation rule is formulated by using a rule engine in a computation service module according to different data structures and business requirements, in this embodiment, an algorithm model is established and stored in an algorithm library by using experience and judgment of experts, that is, a data preprocessing algorithm is established, a big data analysis model and a machine learning model are established, then a preprocessing model is formed based on the preprocessing algorithm, and the big data analysis model and the machine learning model are encoded and converted into a code mirror image and stored in a mirror image warehouse to generate an automatic label generation model, so that a label can be generated directly through the automatic label generation rule in the following process, and a label can also be generated through the automatic label generation model.

A process for preprocessing data in a meta-information database based on an algorithm library to form normalized data, comprising:

s31: calling the pre-established preprocessing model in an algorithm library by each node according to data in the meta-information database;

s32: judging whether a preprocessing algorithm corresponding to the data in the meta-information database exists in the preprocessing model or not; wherein,

s33: if a preprocessing algorithm corresponding to the data in the meta-information database exists, preprocessing the data in the meta-information database based on the preprocessing algorithm to form normalized data; if the preprocessing algorithm corresponding to the data in the meta-information database does not exist, a local preprocessing algorithm corresponding to the data in the meta-information database is built locally at the node, and the data in the meta-information database is preprocessed through the local preprocessing algorithm to form normalized data;

wherein, the preprocessing algorithm comprises: data cleaning, data integration and data specification;

specifically, for multi-source heterogeneous data preprocessing, the entity node (each node) selects a corresponding data preprocessing algorithm from a preprocessing model in an algorithm library, and if the algorithm does not exist, the entity node can build a preprocessing algorithm by itself and then create a mirror image and store the mirror image in a mirror image library; wherein the data preprocessing method used is determined by topic modeling in the service scheduling module. The data preprocessing method comprises data cleaning, data integration, data transformation and data reduction, and is mainly used for carrying out standardized processing on abnormal values, different sources, formats and characteristic properties in an original data set.

And the entity node calls a mirror image (preset) of a data preprocessing algorithm from the algorithm library for the corresponding data in the theme, creates a preprocessing calculation container by using the data and the mirror image of the preprocessing algorithm, executes the specified data preprocessing algorithm by the preprocessing calculation container, and then acquires the preprocessed data from the calculation container and stores the preprocessed data in the local distributed database.

In the embodiment shown in fig. 1, step S4 is to invoke a preset automatic tag generation rule or an automatic tag generation model to create a computation container locally, and execute a tag rule or an automatic tag generation model together based on the computation container through a MapReduce parallel framework to generate a corresponding tag for normalized data, where the process is a process of generating a tag, and in step S0, an automatic tag generation rule and an automatic tag generation model are preset in an algorithm library, so that in the process of generating a tag, each entity node performs generation of a tag together based on the MapReduce parallel framework, and a tag generation method may be based on the automatic tag generation rule or based on machine learning processing, that is, the automatic tag generation model; based on an automatic label generation rule, acquiring a specified label generation mirror image from a mirror image warehouse preset in advance; similarly, the method based on machine learning includes a feature extraction method, a supervised learning model, a clustering method, a deep learning model and the like to obtain a specified algorithm from a preset mirror warehouse, then each node uses the obtained rule or algorithm to locally create a tag computation container, and then generates a tag based on the tag computation container.

In addition, after the tag rule or algorithm is executed together based on the computation container through the MapReduce parallel framework to generate the corresponding tag for the normalized data, the method further includes step S5, and step S5 includes:

s51: storing the labels and the models formed after the labels are operated or trained into a label library;

s52: connecting the tag library and the meta information database, and establishing an external search query link of the tag library;

s53: and performing label query through the search query link to call data corresponding to the label.

Specifically, after the execution of the tag computation container is completed, the generated tags are stored in a tag library of a tag center, the tag library stores the generated tags, a model or rule after running analysis or training and the like, the searching and application services of the tags can be applied subsequently, the tags can be searched and queried through the tag center, the intelligent sequencing of tag query, the recommendation of the tags and the like are provided, in the application services of the tags, the tag center provides functions of tag combination, tag management and the like according to different business requirements, and meanwhile, the trained tag model can be used for providing functions of tag prediction, tag display service and the like for newly generated data.

As described above, the method for realizing the automatic label of the large power data based on the multisource isomerism provided by the invention comprises the steps of firstly forming a distributed computer model by taking each power participant as a node, obtaining the stored data and the information data of each node, storing the classified stored data and the classified information data in a meta-information database according to the subject category, preprocessing the data in the meta-information database to form normalized data, calling a preset automatic label generation rule or an automatic label generation model to create a calculation container locally, and executing a label rule or algorithm together based on the calculation container through a MapReduce parallel framework to generate a corresponding label for the normalized data, so that the problem that the uniform caliber cannot be realized in a multi-department cross system is solved, an information island in a data management system is broken, and the interconnection and intercommunication of power information management are realized, the method is characterized in that original data is not directly provided, a distributed storage and calculation framework is utilized based on rule processing and machine learning processing, a containerization technology is combined, a label technology of multi-source heterogeneous electric power big data is realized, equipment labels and user labels are oriented, and business and intelligent label service and application display are provided for electric power monitoring and intelligent analysis.

As shown in fig. 2, the present invention further provides a system 100 for implementing an automatic label for big electric data based on multi-source heterogeneous system, and the method for implementing an automatic label for big electric data based on multi-source heterogeneous system includes:

a computer service unit 101 configured to construct a distributed computer model using each power participant as a node;

the data service unit 102 is used for acquiring the stored data and the information data of each node of the distributed computer model, classifying the stored data and the information data according to the subject categories, storing the stored data and the information data classified according to the subject categories in the meta-information database, and preprocessing the data in the meta-information database based on an algorithm library to form normalized data;

and the tag center unit 103 is used for calling a preset automatic tag generation rule or an automatic tag generation model to create a computation container locally, and executing a tag rule or algorithm together based on the computation container through a MapReduce parallel framework to generate a corresponding tag for the normalized data.

As in the embodiment shown in fig. 2, the computer service unit 101 comprises a service scheduling module 101-1 and a storage module 101-2, wherein,

the service scheduling module 101-1 is configured to provide a service interface for each node, where the service interface at least includes a registration interface for performing node registration on each node, so that each node obtains a theme interface of a preset theme list; the theme interface comprises a theme subscription interface, a theme meta-information interface, a main body standard interface and a theme modeling interface;

the storage module 101-2 is used for bearing an algorithm library and storing a rule engine to formulate an automatic label generation rule; wherein, the algorithm library stores the automatic label generation rule and the algorithm model; the algorithm model comprises a preprocessing model based on a preprocessing algorithm and an automatic label generation model based on machine learning.

It can be seen from the foregoing embodiment that, in the multisource-heterogeneous-based power big data automation tag implementation system provided by the present invention, first, the computer service unit 101 uses each power participant as a node to form a distributed computer model, then the data service unit 102 obtains the stored data and information data of each node, stores the classified stored data and information data in the meta-information database according to the subject category, pre-processes the data in the meta-information database to form normalized data, then the tag center unit 103 calls a preset automatic tag generation rule or an automatic tag generation model to create a computation container locally, and executes a tag rule or an algorithm together based on the computation container to generate a corresponding tag for the normalized data through the MapReduce parallel framework, so that the problem that a uniform caliber cannot be implemented in a multi-department system is solved, the method breaks through an information isolated island in a data management system, realizes interconnection and intercommunication of power information management, does not directly provide original data, utilizes a distributed storage and calculation framework and combines a containerization technology through a mode based on rule processing and machine learning processing, realizes a labeling technology of multi-source heterogeneous power big data, faces to an equipment label and a user label, and provides business and intelligent label service and application display for power monitoring and intelligent analysis.

The method and the system for realizing the multi-source heterogeneous-based power big data automation label provided by the invention are described above by way of example with reference to the attached drawings. However, it should be understood by those skilled in the art that various modifications may be made to the method and system for implementing the power big data automation tag based on multi-source heterogeneous provided by the present invention without departing from the scope of the present invention. Therefore, the scope of the present invention should be determined by the contents of the appended claims.

Claims

1. A method for realizing an automatic label of electric power big data based on multi-source isomerism is characterized by comprising the following steps:

taking each power participant as a node to form a distributed computer model;

2. The multi-source heterogeneous power big data automation tag implementation method of claim 1,

the power participants at least comprise a station system, a service system and a production system.

3. The multi-source heterogeneous power big data automation tag implementation method of claim 1, wherein the process of obtaining the storage data and the information data of each node and storing the classified storage data and information data in the meta information database according to the subject categories comprises:

creating a topic for the stored data and the information data;

4. The multi-source heterogeneous power big data automation tag implementation method of claim 3,

the data specification includes at least: data standards, data structures, data formats, data types, data precision, and the names of the devices to which the data belongs.

5. The multi-source heterogeneous power big data automation label implementation method according to claim 4, wherein the process of performing subject affiliated division on the storage data and the information data according to data specifications comprises:

each node acquires a preset theme list through the service scheduling module;

6. The multi-source heterogeneous power big data automation tag implementation method according to claim 1, before preprocessing the data in the meta-information database based on an algorithm library to form normalized data, further comprising:

7. The multi-source heterogeneous power big data automation tag implementation method of claim 6, wherein the process of preprocessing the data in the meta-information database based on an algorithm library to form normalized data comprises:

8. The multi-source heterogeneous power big data automation tag implementation method according to claim 6, wherein after the tag rules or algorithms are jointly executed based on the computation container through a MapReduce parallel framework to generate corresponding tags for the normalized data, further comprising:

9. A multisource heterogeneous based power big data automatic label implementation system for implementing the multisource heterogeneous based power big data automatic label implementation method according to any one of claims 1-8, comprising:

10. The multi-source heterogeneous power big data automation tag implementation system of claim 9, wherein the computer service unit comprises a service scheduling module and a storage module, wherein,