CN107832170B - Method and device for recovering missing data - Google Patents
Method and device for recovering missing data Download PDFInfo
- Publication number
- CN107832170B CN107832170B CN201711047191.7A CN201711047191A CN107832170B CN 107832170 B CN107832170 B CN 107832170B CN 201711047191 A CN201711047191 A CN 201711047191A CN 107832170 B CN107832170 B CN 107832170B
- Authority
- CN
- China
- Prior art keywords
- matrix
- data
- scada data
- factor
- missing
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000000034 method Methods 0.000 title claims abstract description 43
- 239000011159 matrix material Substances 0.000 claims abstract description 180
- 238000000354 decomposition reaction Methods 0.000 claims abstract description 43
- 108010094028 Prothrombin Proteins 0.000 claims description 36
- AGVAZMGAQJOSFJ-WZHZPDAFSA-M cobalt(2+);[(2r,3s,4r,5s)-5-(5,6-dimethylbenzimidazol-1-yl)-4-hydroxy-2-(hydroxymethyl)oxolan-3-yl] [(2r)-1-[3-[(1r,2r,3r,4z,7s,9z,12s,13s,14z,17s,18s,19r)-2,13,18-tris(2-amino-2-oxoethyl)-7,12,17-tris(3-amino-3-oxopropyl)-3,5,8,8,13,15,18,19-octamethyl-2 Chemical compound [Co+2].N#[C-].[N-]([C@@H]1[C@H](CC(N)=O)[C@@]2(C)CCC(=O)NC[C@@H](C)OP(O)(=O)O[C@H]3[C@H]([C@H](O[C@@H]3CO)N3C4=CC(C)=C(C)C=C4N=C3)O)\C2=C(C)/C([C@H](C\2(C)C)CCC(N)=O)=N/C/2=C\C([C@H]([C@@]/2(CC(N)=O)C)CCC(N)=O)=N\C\2=C(C)/C2=N[C@]1(C)[C@@](C)(CC(N)=O)[C@@H]2CCC(N)=O AGVAZMGAQJOSFJ-WZHZPDAFSA-M 0.000 claims description 36
- BTCSSZJGUNDROE-UHFFFAOYSA-N gamma-aminobutyric acid Chemical compound NCCCC(O)=O BTCSSZJGUNDROE-UHFFFAOYSA-N 0.000 claims description 28
- 238000011084 recovery Methods 0.000 claims description 25
- 238000007906 compression Methods 0.000 claims description 21
- 230000006835 compression Effects 0.000 claims description 21
- 230000006837 decompression Effects 0.000 claims description 17
- 238000004590 computer program Methods 0.000 claims description 13
- 238000013481 data capture Methods 0.000 claims description 9
- 238000006243 chemical reaction Methods 0.000 claims description 6
- 238000012217 deletion Methods 0.000 claims description 5
- 230000037430 deletion Effects 0.000 claims description 5
- 230000006870 function Effects 0.000 claims description 5
- 230000017105 transposition Effects 0.000 claims description 4
- 238000012545 processing Methods 0.000 description 10
- 238000013144 data compression Methods 0.000 description 7
- 238000012806 monitoring device Methods 0.000 description 5
- 238000010586 diagram Methods 0.000 description 4
- 238000010606 normalization Methods 0.000 description 4
- 238000000513 principal component analysis Methods 0.000 description 4
- 238000012544 monitoring process Methods 0.000 description 3
- FNMKZDDKPDBYJM-UHFFFAOYSA-N 3-(1,3-benzodioxol-5-yl)-7-(3-methylbut-2-enoxy)chromen-4-one Chemical compound C1=C2OCOC2=CC(C2=COC=3C(C2=O)=CC=C(C=3)OCC=C(C)C)=C1 FNMKZDDKPDBYJM-UHFFFAOYSA-N 0.000 description 2
- 230000005540 biological transmission Effects 0.000 description 2
- 230000007812 deficiency Effects 0.000 description 1
- 238000011478 gradient descent method Methods 0.000 description 1
- 238000005259 measurement Methods 0.000 description 1
- 238000005457 optimization Methods 0.000 description 1
- 230000001131 transforming effect Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/07—Responding to the occurrence of a fault, e.g. fault tolerance
- G06F11/14—Error detection or correction of the data by redundancy in operation
- G06F11/1402—Saving, restoring, recovering or retrying
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/07—Responding to the occurrence of a fault, e.g. fault tolerance
- G06F11/14—Error detection or correction of the data by redundancy in operation
- G06F11/1402—Saving, restoring, recovering or retrying
- G06F11/1415—Saving, restoring, recovering or retrying at system level
- G06F11/1443—Transmit or communication errors
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Quality & Reliability (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Compression, Expansion, Code Conversion, And Decoders (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention provides a method and a device for recovering missing data, which are used for recovering the missing data. The method for recovering the missing data comprises the following steps: acquiring a plurality of groups of data; carrying out probability matrix decomposition on a numerical matrix formed by the multiple groups of data; determining the location of missing data in the plurality of sets of data; solving the product of elements corresponding to the positions of the missing data in the plurality of groups of data in the result of the probability matrix decomposition as missing data; and restoring the obtained missing data to the position of the missing data in the plurality of data groups.
Description
Technical field
The present invention relates to data processing fields, more particularly, to the restoration methods and device of missing data.
Background technique
In data processing field, generally requires based on complete data and carry out data processing.
By taking the compress technique of data as an example, be divided into lossless compression and lossy compression two major classes, based on principal component analysis (PCA:
Principle Components Analysis) data compression algorithm be a kind of Lossy Compression Algorithm, according to different variables it
Between linear dependence carry out de-redundancy, to realize Data Dimensionality Reduction and data compression.But it is current based on principal component analysis
Data compression algorithm, mostly need in advance choose batch data carry out principal component analysis, when newly generated data cannot be worked as
When preceding principal component reconstructs well, then need to carry out the update of principal component.
That is, in the case where causing the incomplete situation of data due to data transmission fault etc., can not carry out it is main at
Analysis, generally can only be by removal deficiency of data part, then carries out principal component analysis calculating.But this simple processing
Mode is likely to result in the loss of partial data mode, so that the principal component inaccuracy generated, so that biggish reconstruct be caused to miss
Difference.
In addition, being not only data compression technique, also all there is such problems in other data processing techniques.
Summary of the invention
The present invention is proposed in view of problem above, and its purpose is to provide the missing numbers for the recovery for realizing missing data
According to restoration methods and device.
According to an aspect of the present invention, a kind of restoration methods of missing data are provided, comprising: obtain multi-group data;To institute
It states numerical matrix composed by multi-group data and carries out probability matrix decomposition;Determine the position of the data lacked in the multi-group data
It sets;It finds out in the result that the probability matrix decomposes and multiplies with the corresponding element in the position of data that is lacked in the multi-group data
Product is used as missing data;And calculated missing data is restored to the position of the data lacked in the multi-group data.
According to another aspect of the present invention, a kind of recovery device of missing data is provided, comprising: data capture unit,
Obtain multi-group data;Probability matrix decomposition unit carries out probability matrix point to numerical matrix composed by the multi-group data
Solution;Deletion sites determination unit determines the position of the data lacked in the multi-group data;Missing data seeks unit,
Find out member corresponding with the position of data lacked in the multi-group data in the decomposition result of the probability matrix decomposition unit
The product of element is as missing data;And data recovery unit, the missing data is sought into the missing data that unit is found out
It is restored to the position of the data lacked in the multi-group data.
According to another aspect of the present invention, a kind of computer-readable medium is provided, computer program is stored with, when described
The step of restoration methods of above-mentioned missing data are realized when computer program is executed by processor.
According to another aspect of the present invention, a kind of computer equipment is provided, comprising: processor;Memory, being stored with can
The computer program executed on a processor realizes above-mentioned missing when the computer program is executed by the processor
The step of restoration methods of data.
According to the present invention, (Probabilistic Matrix Factorization, PMF) benefit is decomposed by probability matrix
It is iterated calculating with the data of known portions, the partial data that can be lacked according to the data reconstruction of known portions.In this way, not
It will cause the loss of partial data mode.
Detailed description of the invention
Fig. 1 shows the flow chart of the restoration methods of the missing data of embodiment according to the present invention one.
Fig. 2 shows the flow charts of the restoration methods of the missing data of embodiment according to the present invention two.
Fig. 3 shows the block diagram of the recovery device of the missing data of embodiment according to the present invention three.
Fig. 4 shows the block diagram of the recovery device of the missing data of embodiment according to the present invention four.
Specific embodiment
Hereinafter, being described with reference to embodiments of the present invention.
In the present invention, for multi-group data, by being analyzed using probability matrix, to realize the data of missing
Recovery.
In addition, it should be noted that, in the present invention, multi-group data is 2 groups or more and each group separately includes multiple data
Data, the data types of the multiple data is numeric type or the type that can be converted to numeric type, and then the multiple number
According to data amount check it is preferably identical.
Embodiment one
In the present embodiment, it is assumed that the data comprising missing in multi-group data.
Fig. 1 shows the flow chart of the restoration methods of the missing data of embodiment according to the present invention one.
Referring to Fig.1, multi-group data is obtained in step S110 first, it will be consisting of corresponding numerical matrix.Specifically,
Multi-group data is obtained from data source.In one embodiment, which is one or more monitoring devices, i.e., in this step
Multiple groups monitoring data are obtained in chronological order from one or more monitoring devices, as the multi-group data.
As an example it is assumed that multi-group data is SCADA (Supervisory Control And shown in following table 1
Data Acquisition, data acquisition are controlled with monitoring) data, then in this step, from multiple biographies as monitoring device
Sensor obtains the multi-group data in chronological order, will be consisting of numerical matrix A shown in formula (1), every a line generation of the matrix A
The SCADA data at one moment of table, each column indicate the measurement result of a sensor.
Table 1
Date-time | Sensor 1 | Sensor 2 | …… | Sensor n |
2016/3/15 15:25:36 | 0.5 | 0.2 | 0.9 | |
2016/3/15 15:25:45 | 0.4 | 0.2 | ? | |
2016/3/15 15:25:52 | 0.1 | ? | 0.7 | |
2016/3/15 15:25:58 | 0.9 | 0.4 | 0.2 | |
2016/3/15 15:26:06 | 0.2 | 0.0 | 0.1 |
Therein "? " indicate missing values.
In addition, above example shows that multi-group data is SCADA data and data inherently numeric type data
Situation, even but in fact, SCADA data according to the difference of sensing data type also can include numeric type and enumeration type
Two types, numeric type can be divided into integer and two kinds of floating type again;Enumeration type can be divided into Boolean type and two kinds of classification type again.
Therefore, in order to the recovery to acquired data progress missing data, in step s 110, also according to needs
The pretreatment that data type conversion is carried out to the multi-group data, i.e., convert numeric type variable for non-numerical variable, such as
Boolean type variable is indicated with 0 and 1;Then floating type variable is converted by integer variable, in order to carry out the recovery of missing data.
Floating type variable is converted into original data type again after missing data recovery by above-mentioned data type conversion process.
In addition, in this step, in addition to above-mentioned data type conversion, according to actual needs may will also to multi-group data into
The normalized pretreatment of row.For by taking SCADA data as an example, data normalization processing will each sensor data it is linear
Within the scope of transforming to 0~1, different degrees of influence is generated to part field to prevent rounding error.In fact, returning to realize
One changes, as long as equalization is generally gone to handle, i.e., the data of each sensor subtract sensor generation in SCADA data
The mean value of total data exactly the data of each column are all subtracted for the other types data other than SCADA data
The mean value of the total data of the column.Similarly, after missing data restores, normalized will also carry out in turn, therefore should protect
Deposit the key messages such as mean value, the maximin of data used in normalization in the process.
It should be noted that, although above example shows the case where multi-group data is SCADA data, but it is not limited to
This, in the present invention, the source of data can be varied, such as the height and weight data of people, economic growth data etc. when
Between upper related data, be in addition also possible to spatially related data, be possibly even not associated each other
Data etc..
Then, in step S120, probability matrix decomposition is carried out to the numerical matrix.
Probability matrix decomposition is a kind of matrix disassembling method based on probability graph model, with singular value in the prior art point
The difference of solution is not necessarily to meet orthogonality, is iterated optimization to the matrix after decomposition by gradient descent method.
Specifically, probability matrix decomposition is the decomposition of following form as shown in following formula (2): for numerical matrix A={ aij,
Solve factor I matrix UkWith factor Ⅱ matrix Vk, by factor I matrix UkWith factor Ⅱ matrix VkConjugate transposition
Matrix Vk *Product as numerical matrix A probability matrix decompose result.
It should be noted that the factor I matrix U in above-mentioned formula (2)kIt is not necessarily unitary matrice, and factor Ⅱ matrix
VkIt is unitary matrice, Vk *Indicate VkAssociate matrix.
As can be seen that result and singular value decomposition in the prior art that probability matrix decomposesResult it is different, eliminate intermediate diagonal matrix Σ.
In turn, the essential idea that the probability matrix in the present invention decomposes are as follows: in the probability matrix of numerical matrix A decomposes,
Solve such factor I matrix UkAnd factor Ⅱ matrix Vk, i.e. the factor I matrix UkAnd factor Ⅱ matrix VkMost
Each element a in the smallization numerical matrix AijWith the factor I matrix UkAnd factor Ⅱ matrix VkIn respective element
Objective function.
Specifically, it is first determined a dimension, i.e. principal component number k, it is also assumed that the preceding k in numerical matrix A is arranged,
Then factor I matrix U is iteratively solvedkWith factor Ⅱ matrix Vk, so that following objective function is minimum:
Wherein, uiAnd vjRespectively matrix UkAnd VkI-th and j-th row vector transposition, λ be specification item weight system
Number,
Specifically, the process that above-mentioned probability matrix decomposes is as follows:
(1) random initializtion variable uiAnd vj;
(2) it enablesCalculate gradientWith
(3) according to above-mentioned gradient updating uiAnd vj,Wherein α and β is to set
Fixed step-length;
(4) it calculates
(5) above-mentioned (3) and (4) are repeated, until reaching the scheduled condition of convergence, such as φt+1< ε or | φt+1-φt| <
ε, wherein ε is the threshold value of setting.
The process that above-mentioned probability matrix decomposes can be calculated using alternating least-squares, Levenberg-Marquardt
Method or Wiberg algorithm etc. implement.
In addition, from the above, it can be seen that since each iteration only needs a given data to carry out parameter update, i.e.,
Make have missing data in numerical matrix A, probability matrix decomposes the decomposition that also can handle the numerical matrix.
Then, in step S130, the position of the data lacked in the multi-group data is determined.
In step S140, the position pair in the result of probability matrix decomposition with the data lacked in the multi-group data is found out
The product for the element answered is as missing data.
Specifically, since the result that the probability matrix as shown in formula (2) decomposes isSo according in matrix A
The position of the data of missing, by matrix UkWithIn corresponding position element multiplication, just can obtain missing data.
In step S150, calculated missing data is restored to the position of the data lacked in the multi-group data.By
This, the multi-group data after obtaining completion.
The restoration methods of missing data according to the present embodiment, since probability matrix decomposition only needs in each iteration
One given data carries out parameter update, therefore even if there is the data of missing in acquired multi-group data, also being capable of high-precision
The probability matrix that ground carries out numerical matrix decomposes, so missing data found out according to the result that probability matrix decomposes and will be acquired
Multi-group data completion, to provide complete data for carrying out other data processings.
Embodiment two
In the present embodiment, not only restore the data of the missing in multi-group data, but also the multi-group data is carried out
Data compression.
Fig. 2 shows the flow charts of the restoration methods of the missing data of embodiment according to the present invention two.
As shown in Fig. 2, in the present embodiment, in addition to the recovery for including the steps that realizing missing data in embodiment one
Except S110-S150, include the steps that realizing data compression decompression S260 and step S270.About step S110-S150,
It is not described in detail herein.
In step S260, the compression of the multi-group data is carried out using the result that the probability matrix decomposes.
Specifically, following formula (4) are based on, are obtained by the result of the probability matrix decomposition of step S120 and in step S120
Factor Ⅱ matrix VkIt is multiplied to carry out the compression of the dimensionality reduction of data:
It is exactly that logarithm matrix A carries out the compressed number obtained after dimensionality reduction compression according to the matrix B that formula (4) obtains
According to.In addition, due to needing in the decompression of matrix B using factor Ⅱ matrix VkAssociate matrix, i.e.So needing
Save the matrix.
Then, in step S270, when needed, the compressed data are unziped it.
Specifically, from above-mentioned formula (4) as can be seen that after Data Dimensionality Reduction compression only remaining factor I matrix Uk(one
As k < < m, m be A columns), as long as so decompression reconstruct when by it directly multiplied by factor Ⅱ matrix VkAssociate matrixData after decompression can be obtained.Therefore, it is solved according to following formula (5) in the compressed data of step S260 dimensionality reduction
Compression.
It is exactly the matrix after decompression.
In addition, in decompression step S270, after decompression, it is also necessary to which progress is located in advance with the data in step S110
Data after decompression are transformed to former categorical data by the process for managing contrary.
As long as being not necessarily required to it should be noted that step S260 and S270 is executed after step S120
It is executed after step S150.
The restoration methods of missing data according to the present embodiment can not only realize the extensive of missing data in multi-group data
Again to provide complete data, additionally it is possible to the dimensionality reduction compression for realizing the multi-group data comprising missing data, without will cause portion
The loss of divided data mode, and then not will cause biggish reconstructed error.Further, since can be realized the multiple groups of missing data
The substantially compression of data, so saving memory space and transmission cost.
Under same inventive concept, the present invention provides dress corresponding with the method for embodiment one and embodiment two
It sets, is described separately below.
Embodiment three
Fig. 3 shows the block diagram of the recovery device of the missing data of embodiment according to the present invention three.
As shown in figure 3, the recovery device 300 of the missing data of present embodiment includes: data capture unit 310, probability
Matrix decomposition unit 320, deletion sites determination unit 330, missing data seek unit 340 and data recovery unit 350.
Data capture unit 310 obtains multi-group data, will be consisting of corresponding numerical matrix.Specifically, data acquisition
Unit 310 obtains multi-group data from data source.In one embodiment, which is one or more monitoring devices, that is, is counted
Multiple groups monitoring data are obtained in chronological order from one or more monitoring devices according to acquiring unit 310, as the multi-group data.
In addition, as needed, data capture unit 310 also carries out data type conversion, normalization etc. to the multi-group data
Pretreatment, and the key messages such as mean value, maximin for saving data used in normalization in the process.
Probability matrix decomposition unit 320 carries out probability matrix decomposition to the numerical matrix.Specifically, probability matrix decomposes
Unit 320 is for numerical matrix A={ aij, solve factor I matrix UkWith factor Ⅱ matrix Vk, by factor I matrix Uk
With factor Ⅱ matrix VkAssociate matrix Vk *Product as numerical matrix A probability matrix decompose result.In turn,
What probability matrix decomposition unit 320 was solved in the probability matrix of numerical matrix A decomposes is such factor I matrix Uk
And factor Ⅱ matrix Vk, i.e. the factor I matrix UkAnd factor Ⅱ matrix VkMinimize each member in the numerical matrix A
Plain aijWith the factor I matrix UkAnd factor Ⅱ matrix VkIn respective element objective function.More specifically, probability square
Battle array decomposition unit 320 carries out probability matrix decomposition to numerical matrix A according to above-mentioned formula (3), obtains the square of form shown in formula (2)
Battle array decomposition result.In turn, the process and the step in embodiment one that probability matrix decomposition unit 320 carries out probability matrix decomposition
Process shown in S120 is identical, in this detailed description will be omitted.
In the present embodiment, it is assumed that the data comprising missing in the multi-group data that data capture unit 310 obtains.
Deletion sites determination unit 330 determines the data lacked in the multi-group data obtained by data capture unit 310
Position.
Missing data seek unit 340 find out in the decomposition result of probability matrix decomposition unit 320 with the multi-group data
The product of the corresponding element in the position of the data of middle missing is as missing data.Specifically, as the probability square as shown in above formula (2)
Battle array decompose result beSo missing data seeks unit 340 and obtains the decomposition of probability matrix decomposition unit 320
Matrix UkWithIn corresponding position element multiplication, just can obtain corresponding missing data in matrix A.
Missing data is sought the missing data that unit 340 is found out and is restored to the multi-group data by data recovery unit 350
The position of the data of middle missing.
The recovery device of the missing data of present embodiment functionally can be realized the missing number of above embodiment one
According to restoration methods.
Embodiment four
In the present embodiment, not only restore the data of the missing in multi-group data, but also the multi-group data is carried out
Data compression.
Fig. 4 shows the block diagram of the recovery device of the missing data of embodiment according to the present invention four.
As shown in figure 4, the recovery device 400 of the missing data of present embodiment is in addition to the device comprising embodiment three
Data capture unit 310, probability matrix decomposition unit 320, deletion sites determination unit 330, missing data in 300 seek list
It also include compression unit 460 and decompression unit 470 except member 340 and data recovery unit 350.About unit 310-350,
It is not described in detail herein.
Compression unit 460 carries out the compression of the multi-group data using the decomposition result of probability matrix decomposition unit 320.Tool
Body, the factor Ⅱ matrix that compression unit 460 is decomposed by the result that the probability matrix decomposes and by probability matrix
VkIt is multiplied, to obtain compressed data.More specifically, compression unit 460 is based on above-mentioned formula (4), the dimensionality reduction pressure of data is carried out
Contracting, and save it is being decomposed by probability matrix, in the decompression of matrix need split-matrix to be used.
Decompression unit 470 unzips it the compressed data of dimensionality reduction.Specifically, decompression unit 470 will pass through
The compressed data of compression unit 460 and the factor Ⅱ matrix VkAssociate matrix Vk *It is multiplied, to be decompressed
Data afterwards.More specifically, decompression unit 470 unzips it the compressed data of dimensionality reduction according to above-mentioned formula (5).This
Outside, after decompression, decompression unit 470 also needs to carry out to carry out the multi-group data with data capture unit 310 pre-
Data after decompression are transformed to former categorical data by the processing for handling contrary.
The recovery device of the missing data of present embodiment functionally can be realized the missing number of above embodiment two
According to restoration methods.
According to embodiment of the present invention, a kind of computer equipment is also provided.The computer equipment includes processing
Device and memory, memory are stored with the computer program that can be executed on a processor, when the computer program is processed
When device executes, the step of realizing the restoration methods of the missing data of embodiment according to the present invention.
Moreover, it should be understood that each unit in the device of illustrative embodiments can be implemented hardware according to the present invention
Component and/or component software.Those skilled in the art's processing according to performed by each unit of restriction, can be for example using existing
Field programmable gate array (FPGA) or specific integrated circuit (ASIC) Lai Shixian each unit.
In addition, the method for illustrative embodiments may be implemented as in computer readable recording medium according to the present invention
Computer program.Those skilled in the art can realize the computer program according to the description to the above method.When described
Computer program is performed in a computer realizes the above method of the invention.
Although being particularly shown and describing the present invention, those skilled in the art referring to its illustrative embodiments
Member is it should be understood that can carry out shape to it in the case where not departing from the spirit and scope of the present invention defined by claim
Various changes in formula and details.
Claims (12)
1. a kind of restoration methods for lacking SCADA data characterized by comprising
Obtain multiple groups SCADA data;
The pretreatment that data type conversion is carried out to the multiple groups SCADA data, is converted to integer for nonumeric type SCADA data
SCADA data, and then the integer SCADA data is converted into floating type SCADA data;
Probability matrix decomposition is carried out to numerical matrix composed by the pretreated multiple groups SCADA data;
Determine the position of the SCADA data lacked in the multiple groups SCADA data;
Find out the position pair in the result that the probability matrix decomposes with the SCADA data lacked in the multiple groups SCADA data
The product for the element answered is as missing SCADA data;And
Calculated missing SCADA data is restored to the position of the SCADA data lacked in the multiple groups SCADA data,
In the probability matrix decomposition step, factor I matrix and factor Ⅱ matrix are solved for the numerical matrix,
The product of the factor I matrix and the associate matrix of the factor Ⅱ matrix is decomposed as the probability matrix
As a result,
Factor I matrix is solved for the numerical matrix and factor Ⅱ matrix specifically includes, in the general of the numerical matrix
Rate matrix solves such factor I matrix and factor Ⅱ matrix, i.e. the factor I matrix and factor Ⅱ square in decomposing
Battle array minimizes under the respective element in each element and the factor I matrix and factor Ⅱ matrix in the numerical matrix
State objective function:
Wherein, uiFor the transposition of i-th of row vector of factor I matrix, vjFor turn of j-th of row vector of factor Ⅱ matrix
It setting, λ is specification item weight coefficient,
2. the restoration methods of missing SCADA data according to claim 1, which is characterized in that find out corresponding element
The step of product includes:
It will be corresponding with the position of the SCADA data of the missing respectively in the factor I matrix and the factor Ⅱ matrix
Element multiplication and as the missing SCADA data.
3. the restoration methods of missing SCADA data according to claim 1, which is characterized in that further include:
The compression of the multiple groups SCADA data is carried out using the result that the probability matrix decomposes.
4. the restoration methods of missing SCADA data according to claim 3, which is characterized in that described to utilize the probability
The compression that the result of matrix decomposition carries out the multiple groups SCADA data specifically includes, the result that the probability matrix is decomposed with
The factor Ⅱ matrix multiple, to obtain compressed SCADA data.
5. the restoration methods of missing SCADA data according to claim 4, which is characterized in that will be described compressed
SCADA data is multiplied with the associate matrix of the factor Ⅱ matrix, with the SCADA data after being decompressed.
6. a kind of recovery device for lacking SCADA data characterized by comprising
Data capture unit obtains multiple groups SCADA data, and carries out data type conversion to the multiple groups SCADA data
Pretreatment, is converted to integer SCADA data for nonumeric type SCADA data, and then the integer SCADA data is converted to floating
Point-type SCADA data;
Probability matrix decomposition unit carries out probability to numerical matrix composed by the pretreated multiple groups SCADA data
Matrix decomposition;
Deletion sites determination unit determines the position of the SCADA data lacked in the multiple groups SCADA data;
Missing data seeks unit, find out in the decomposition result of the probability matrix decomposition unit with the multiple groups SCADA number
According to the product of the corresponding element in the position of the SCADA data of middle missing as missing SCADA data;And
Data recovery unit, by the missing SCADA data seek the missing SCADA data that unit is found out be restored to it is described more
The position of the SCADA data lacked in group SCADA data,
The probability matrix decomposition unit solves factor I matrix and factor Ⅱ matrix for the numerical matrix, will be described
The product of the associate matrix of factor I matrix and the factor Ⅱ matrix as the probability matrix decompose as a result,
The probability matrix decomposition unit solves such factor I matrix in the probability matrix of the numerical matrix decomposes
And factor Ⅱ matrix, i.e. the factor I matrix and factor Ⅱ matrix minimize each element in the numerical matrix and should
Following objective functions of factor I matrix and the respective element in factor Ⅱ matrix:
Wherein, uiFor the transposition of i-th of row vector of factor I matrix, vjFor turn of j-th of row vector of factor Ⅱ matrix
It setting, λ is specification item weight coefficient,
7. the recovery device of missing SCADA data according to claim 6, which is characterized in that the missing data is sought
Unit will be corresponding with the position of the SCADA data of the missing respectively in the factor I matrix and the factor Ⅱ matrix
Element multiplication and as the missing SCADA data.
8. the recovery device of missing SCADA data according to claim 6, which is characterized in that further include:
Compression unit carries out the compression of the multiple groups SCADA data using the decomposition result of the probability matrix decomposition unit.
9. the recovery device of missing SCADA data according to claim 8, which is characterized in that the compression unit is by institute
The result and the factor Ⅱ matrix multiple that probability matrix decomposes are stated, to obtain compressed SCADA data.
10. the recovery device of missing SCADA data according to claim 9, which is characterized in that further include that decompression is single
The compressed SCADA data is multiplied, to be decompressed by member with the associate matrix of the factor Ⅱ matrix
SCADA data afterwards.
11. a kind of computer-readable medium, is stored with computer program, which is characterized in that when the computer program is processed
The step of restoration methods of missing SCADA data described in any one in claim 1 to 5 are realized when device executes.
12. a kind of computer equipment characterized by comprising
Processor;
Memory is stored with the computer program that can be executed on a processor, when the computer program is by the processor
When execution, described in any one in realization claim 1 to 5 the step of the restoration methods of missing SCADA data.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201711047191.7A CN107832170B (en) | 2017-10-31 | 2017-10-31 | Method and device for recovering missing data |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201711047191.7A CN107832170B (en) | 2017-10-31 | 2017-10-31 | Method and device for recovering missing data |
Publications (2)
Publication Number | Publication Date |
---|---|
CN107832170A CN107832170A (en) | 2018-03-23 |
CN107832170B true CN107832170B (en) | 2019-03-12 |
Family
ID=61651164
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201711047191.7A Active CN107832170B (en) | 2017-10-31 | 2017-10-31 | Method and device for recovering missing data |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN107832170B (en) |
Families Citing this family (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108880620B (en) * | 2018-08-20 | 2021-06-11 | 广东石油化工学院 | Power line communication signal reconstruction method |
CN108918928B (en) * | 2018-09-11 | 2020-11-10 | 广东石油化工学院 | Power signal self-adaptive reconstruction method in load decomposition |
CN109166626B (en) * | 2018-10-29 | 2021-09-14 | 中山大学 | Method for supplementing medical index missing data of peptic ulcer patient |
CN112165403B (en) * | 2020-09-29 | 2021-04-27 | 北京视界云天科技有限公司 | UDP (user Datagram protocol) data packet recovery method and device, computer equipment and storage medium |
CN113918541B (en) * | 2021-12-13 | 2022-04-26 | 广州市玄武无线科技股份有限公司 | Preheating data processing method and device and computer readable storage medium |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1526103A (en) * | 2001-07-11 | 2004-09-01 | ��ʽ���羫������ | Dct matrix decomposing method and dct device |
CN102402569A (en) * | 2010-09-08 | 2012-04-04 | 索尼公司 | Rating prediction device, rating prediction method, and program |
CN103942545A (en) * | 2014-05-07 | 2014-07-23 | 中国标准化研究院 | Method and device for identifying faces based on bidirectional compressed data space dimension reduction |
-
2017
- 2017-10-31 CN CN201711047191.7A patent/CN107832170B/en active Active
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1526103A (en) * | 2001-07-11 | 2004-09-01 | ��ʽ���羫������ | Dct matrix decomposing method and dct device |
CN102402569A (en) * | 2010-09-08 | 2012-04-04 | 索尼公司 | Rating prediction device, rating prediction method, and program |
CN103942545A (en) * | 2014-05-07 | 2014-07-23 | 中国标准化研究院 | Method and device for identifying faces based on bidirectional compressed data space dimension reduction |
Non-Patent Citations (1)
Title |
---|
面向时序数据的矩阵分解;黄晓宇 等;《软件学报》;20150930;第2262页至第2275页 * |
Also Published As
Publication number | Publication date |
---|---|
CN107832170A (en) | 2018-03-23 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN107832170B (en) | Method and device for recovering missing data | |
CN107800437B (en) | Data compression method and device | |
Sun et al. | Feature selection using rough entropy-based uncertainty measures in incomplete decision systems | |
CN110175541B (en) | Method for extracting sea level change nonlinear trend | |
Lan et al. | Matrix recovery from quantized and corrupted measurements | |
CN103559205A (en) | Parallel feature selection method based on MapReduce | |
CN112862127A (en) | Sensor data exception handling method and device, electronic equipment and medium | |
Ding et al. | An improved adaptive bivariate dimension-reduction method for efficient statistical moment and reliability evaluations | |
CN115618212A (en) | Power data processing method and device, computer equipment and storage medium | |
Ritz | Goodness‐of‐fit tests for mixed models | |
CN106407620B (en) | A kind of engineering structure response surface stochastic finite element analysis processing method based on ABAQUS | |
CN109635452B (en) | Efficient multimodal random uncertainty analysis method | |
CN107766294A (en) | Method and device for recovering missing data | |
JP2017151497A (en) | Time-sequential model parameter estimation method | |
Shmueli et al. | Updating kernel methods in spectral decomposition by affinity perturbations | |
Hallmann et al. | All solutions of the stochastic fixed point equation of the Quicksort process | |
Kruzick et al. | Spectral statistics of lattice graph structured, non-uniform percolations | |
Greenwood et al. | Information bounds for Gibbs samplers | |
Demir et al. | Maximum likelihood estimation for the parameters of the generalized gompertz distribution under progressive type-ii right censored samples | |
Min et al. | Variance reduced stochastic optimization for PCA and PLS | |
Chen et al. | Bayesian hierarchical modelling on dual response surfaces in partially replicated designs | |
CN111382891A (en) | Short-term load prediction method and short-term load prediction device | |
Tsagkatakis et al. | Matrix and tensor signal modelling in cyber physical systems | |
Tanak et al. | A new lifetime distribution by maximizing entropy: properties and applications | |
de Gooijer et al. | On the choice of basis in proper orthogonal decomposition-based surrogate models |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |