CN118035517A

CN118035517A - Cloud source data-based data classification method and system

Info

Publication number: CN118035517A
Application number: CN202410136948.3A
Authority: CN
Inventors: 吴隽鼎; 李曦鹏; 杨经铭; 杜东宇
Original assignee: Guangdong Shunyou Smart Technology Co ltd
Current assignee: Guangdong Shunyou Smart Technology Co ltd
Priority date: 2024-01-31
Filing date: 2024-01-31
Publication date: 2024-05-14

Abstract

The invention belongs to the technical field of data classification, and particularly relates to a cloud source data-based data classification method and system, which are used for acquiring user-acquired data and storing data cloud, defining the data cloud as source data, acquiring and integrating the source data into data streams, wherein the data streams comprise recorded data and recorded events, classifying the data streams, constructing an event matrix for the classified data streams, carrying out secondary classification on the data streams through the event matrix, constructing the data streams for the cloud acquired data, recording the source data, the data source and the recorded state in the data streams, analyzing and finely classifying the source data, the data source and the recorded state, so that the classification of the data streams is more accurate, the calculation power for processing the data classification in a power system is reduced, the time and the cost are greatly saved, and the follow-up data processing is more accurate.

Description

Cloud source data-based data classification method and system

Technical Field

The invention belongs to the technical field of data classification, and particularly relates to a data classification method and system based on cloud source data.

Background

With the continuous development of the technology of the electric power internet of things, data in a power grid is also continuously increased. The data sources in the power industry are very complex, equipment state information from power plants and substations is available, the information is different, corresponding acquisition modes, acquisition equipment and data processing methods are different, different forms of the same data are possible, in a large amount of data based on cloud, the acquired data are difficult to analyze and calculate, a large amount of redundant data cannot be loaded according to the calculation conditions of a power system, the data are all transmitted into the cloud for data analysis and processing based on the power Internet of things of the existing cloud computing architecture, excessive redundancy of the data is caused by methods such as data acquisition analysis equipment through monitoring, and certain problems exist in classification data accuracy.

Disclosure of Invention

In view of the above limitations of the prior art, the present invention is directed to a data classification method and system based on cloud source data, so as to solve one or more technical problems in the prior art, and at least provide a beneficial selection or creation condition.

A data classification method based on cloud source data, the method comprising the steps of:

S100: acquiring user acquired data and storing a data cloud, and defining the data cloud as source data;

S200: collecting and integrating source data into a data stream, wherein the data stream comprises recorded data and recorded events;

S300: classifying the data streams, and constructing an event matrix for the classified data streams;

S400: and carrying out secondary classification on the data stream through the event matrix.

Further, in step S100, data in a cloud database is acquired, the acquired data is defined as source data, a data source of the source data is acquired, the source data is divided into uploading data and data to be processed, the data to be processed is classified by combining the data sources, the data to be processed is standardized to obtain a classified data result, an edge layer uploads the uploading data and the classified data result to the cloud layer, and the uploading data and the classified data result are primarily classified by using an LSTM-FCN data classification model in the cloud layer and stored in the cloud database.

Further, in step S200, the classified source data is obtained and stored, the classified data sources and the record events corresponding to the source data are generated, the source data, the data sources of the source data and the record events corresponding to the source data are integrated into a data stream, and a data stream set data is constructed, the data stream set data has an event sequence, the data stream is added into the data stream set through the event sequence of obtaining the source data, the data characteristics of the data stream are obtained, a data stream is defined as a_list, the source data a_data of a_list is obtained, the source a_feature of a_list data is obtained, the record events a_event of a_list are obtained, the data stream a_list classification weight value is given through the data characteristics, the source data of adjacent data streams in the data stream set data are calculated, and the weight ratio is calculatedWeight calculation of source data,/>Wherein p is the total number of the set data, qda is the weight proportion of source data between adjacent source data, a characteristic data set is constructed by giving a weight Qea to the duty ratio of a_event in the set data to the recorded event a_event, the characteristic data set is trained by a random forest model, the corresponding data source proportion is output, a weight value Qfa is given to the data source proportion, and the comprehensive weight value Q,/>, obtained by three weight values, is calculated

Initializing three parameters for further classifying the data stream according to the comprehensive weight value, wherein the parameters are respectively a source data fluctuation difference K _A, a data source distance difference K _B and a recorded event number difference K _C, and the K _A、K_B、K_C is subjected to datamation, and the source data fluctuation difference K _A, the data source distance difference K _B and the recorded event number difference K _C are used for constructing arrays KA, KB and KC.

The beneficial effects of obtaining the comprehensive weight are as follows: by respectively giving weights to the source data, the data source and the recording time and integrating the weights into comprehensive weights, more accurate weight values can be obtained, and the accuracy of subsequent secondary classification is ensured.

Further, in step S300, in the cloud database, the arrays KA, KB, KC of the data stream are read, an array KD is constructed by recording the time when the event recorded in the data stream changes, the arrays KA are read, the index KAi is the ith element in the array KA, all elements in the arrays KB, KC, KD are reordered according to the acquired order ordering relationship acquired in the array KA, the minimum value of the values in the arrays KD is A1, the current maximum value in the arrays KD is A2, b1=a1/mean (KD), b2=a2/mean (KD), mean (KD) represents the average value of all elements in KD,

Defining ki=b1+ (B2-B1) × (KCi-min (KC))/(max (KC) -min (KC)), wherein KCi is the i-th element in the array KC, min (KC) is the value of the element with the smallest value in the array TC, max (KC) is the value of the element with the largest value in the array TC, constructing Ki into an array E1, i.e., ki is the i-th element in the array E1, i=1, 2,..;

Recording KBI as the ith element in the array KB, updating the value of KBI into KBI/KB1, recording the updated array KB as the array E2, and constructing an event matrix by the arrays E1 and E2;

Further, an event matrix is established through the data stream, and the method for completing the secondary classification of the data stream by using the event matrix specifically comprises the following steps:

S301, defining the index S of the smallest element in the array E1, recording E2i as the i-th element in the array E2, i=1, 2,..n, creating a blank array KE, writing E21, E22..e., E2S-1, E2S +1, E2S +2, (c.), E2N is added into the array INS in turn, note KEj is the j-th element in the set KE, j=1, 2,..n-1, go to S302;

S302, traversing the value range of j from j=1, sequentially updating KEj values in the array KE to KEj-E2S, and storing the updated array KE;

Screening subscript of the element with the minimum value from the array KE, marking as v, marking E1i as the ith element in the array E1, deleting the elements with the subscript S and the subscript v in the array E1, storing the array of the library lattice deleted with the subscript E1S and the subscript E1v as a check array, creating a blank check set, initializing integer variables k=1, k epsilon [1, N-1], and turning to S303;

S303, starting from the first element of the check array, subtracting the value of KEk from the value of each element in the check array by N-2 values N (1), N (2), N (N-2), creating a variable DW, w=1, 2, N-2, assigning the DW by:

Wherein L is the value of the test statistic obtained after the test of the array E1 and the array E2;

Sequentially forming a weighting array by N-2 values D1, D2, DN-2, recording the element with the smallest value in the weighting array as N (min), recording the subscript of N (min) in the weighting array as T, comparing the magnitudes of three values of N (min), E2S and E1v, adding the current T value into a test set when N (min) < min { CVs and Kuv }, wherein min () represents the element with the smallest value in the set, and turning to S304;

s304, when the value of k is smaller than N-1, increasing the value of k by 1, and turning to S303; when the value of k is equal to N-1, go to S305;

s305, screening out all high-weight elements in the array, marking the subscripts of the high-weight elements as high-weight subscripts, marking the elements with the subscripts as high-weight subscripts as risk source data in the array KA, storing all risk source data into a database, marking the subscripts outside the high-weight elements as common subscripts, and marking the subscripts as common subscripts as safety data in the array KA.

Creating a set D and a set E, wherein the set D comprises all data sources, the set E comprises all record events, a distance event matrix is initialized through the set D and the set E, the matrix is defined as E1×E2, the starting distance of the data sources in the set D is set to be infinity, all the data sources in the set D are repeatedly traversed, the shortest distance from all the data sources to a cloud is calculated, the set E is traversed, whether the data sources are the same as the data sources in the set E is judged, if the data sources are the same, the matrix E1×E2 is output, and if the data sources are different, the matrix E1×E2 is updated.

Further, in step S400, the data stream acquired by the cloud is input into the matrix through the output matrix e1×e2, and it is determined whether the primary classification is correct, and the secondary data classification is completed through the matrix.

A data classification system based on cloud source data, the system comprising: the cloud source data-based data classification method comprises a collecting module, a processing module, a storage module and a storage module, wherein the collecting module and the processing module acquire or calculate data, the storage module stores the data in a cloud, the collecting module and the storage module can run a computer program on the processing module, and the processor realizes the steps in the cloud source data-based data classification method when executing the computer program.

The collecting module is used for collecting the original data in the edge layer in a summarizing way and dividing the data into uploading data and data to be processed according to the source of the data;

The processing module is used for integrating the source data into a data stream and carrying out primary and secondary classification on the data stream;

And the storage module is used for storing the classified data in a classified mode.

The beneficial effects of the invention are as follows: by constructing a data stream for the data acquired by the cloud, recording source data, data sources and recording states in the data stream, analyzing and finely classifying the source data, the data sources and the recording states, the data stream classification is more accurate, the calculation force for processing the data classification in the power system is reduced, the time and the cost are greatly saved, and the subsequent data processing is more accurate.

Drawings

The above and other features of the present invention will become more apparent from the detailed description of the embodiments thereof given in conjunction with the accompanying drawings, in which like reference characters designate like or similar elements, it is evident that the drawings in the following description are merely examples of the present invention, and that other drawings may be obtained from these drawings without inventive effort for a person of ordinary skill in the art, in which

In the figure:

fig. 1 is a flowchart of a data classification method based on cloud source data.

Detailed Description

The conception, specific structure, and technical effects produced by the present application will be clearly and completely described below with reference to the embodiments and the drawings to fully understand the objects, aspects, and effects of the present application. It should be noted that, without conflict, the embodiments of the present application and features of the embodiments may be combined with each other.

As shown in fig. 1, a data classification method based on cloud source data includes the following steps:

The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

Finally, it should be noted that: the above embodiments are only for illustrating the technical aspects of the present invention and not for limiting the same, and although the present invention has been described in detail with reference to the above embodiments, it should be understood by those of ordinary skill in the art that: modifications and equivalents may be made to the specific embodiments of the invention without departing from the spirit and scope of the invention, which is intended to be covered by the claims.

Claims

1. The data classification method based on cloud source data is characterized by comprising the following steps of:

2. The method for classifying data based on cloud source data according to claim 1, wherein in step S100, data in a cloud database is acquired, the acquired data is defined as source data, a data source of the source data is acquired, the source data is divided into uploading data and data to be processed, the data to be processed is classified by combining the data sources, the data to be processed is standardized to obtain classified data results, an edge layer uploads the uploading data and the classified data results to a cloud layer, and the uploading data and the classified data results are primarily classified by using an LSTM-FCN data classification model and stored in the cloud database.

3. The method of claim 1, wherein in step S200, the classified data sources of the classified storage source data and the record events corresponding to the source data are integrated into a data stream, the data sources of the source data and the record events corresponding to the source data are integrated into a data stream, the data stream data has an event sequence, the data stream is added into the data stream set through the event sequence of the acquired source data, the data characteristics of the data stream are acquired, a data stream is defined as a_list, the source data a_data of a_list is acquired, the record events a_event of a_list are acquired, the data stream a_list classification weight value is given to the source data of adjacent data streams in the data stream set data through the data characteristics, and the weight proportion is calculatedWeight calculation of source data,/>Wherein p is the total number of the set data, qda is the weight proportion of source data between adjacent source data, a characteristic data set is constructed by giving a weight Qea to the duty ratio of a_event in the set data to the recorded event a_event, the characteristic data set is trained by a random forest model, the corresponding data source proportion is output, a weight value Qfa is given to the data source proportion, and the comprehensive weight value Q,/>, obtained by three weight values, is calculated

4. The method of claim 1, wherein in step S300, in the cloud database, the arrays KA, KB, KC of the data stream are read, the array KD is constructed by recording the time of the change of the recording event in the data stream, the array KA is read, the index KAi is the ith element in the array KA, all the elements in the array KB, the array KC, and the array KD are reordered according to the acquired sequential ordering relationship acquired in the array KA, the minimum value of the median value in the array KD is A1, the current maximum value in the array KD is A2, the index b1=a1/mean (KD), b2=a2/mean (KD), the mean (KD) represents the average value of all the elements in KD,

and (3) recording KBI as the ith element in the array KB, updating the value of KBI into KBI/KB1, recording the updated array KB as the array E2, and constructing an event matrix by the arrays E1 and E2.

5. The method for classifying and processing the data of the electric power internet of things based on the cloud edge cooperative architecture according to claim 1 is characterized in that an event matrix is established through the data flow, and the method for completing secondary classification of the data flow by using the event matrix is specifically as follows:

6. The method for classifying data of the electric power internet of things based on the cloud edge collaborative architecture according to claim 1, wherein in step S400, the data flow acquired by the cloud is input into the matrix through the output matrix e1×e2 to judge, and whether the primary classification is correct is determined, and the secondary data classification is completed through the matrix.

7. A data classification system based on cloud source data, the system comprising: the cloud source data-based data classification method comprises an acquisition module, a processing module, a storage module and a storage module, wherein the acquisition module and the processing module acquire or calculate data, the storage module stores the data in a cloud, the acquisition module and the storage module can run a computer program on the processing module, and the processor realizes the steps in the cloud source data-based data classification method according to any one of claims 1-6 when executing the computer program.