CN117150217B

CN117150217B - Data compression processing method based on big data

Info

Publication number: CN117150217B
Application number: CN202311441320.6A
Authority: CN
Inventors: 曲宝春; 张斌
Original assignee: Suzhou Aixiongsi Communication Technology Co ltd
Current assignee: Suzhou Aixiongsi Communication Technology Co ltd
Priority date: 2023-11-01
Filing date: 2023-11-01
Publication date: 2024-01-26
Anticipated expiration: 2043-11-01
Also published as: CN117150217A

Abstract

The invention relates to the technical field of data compression, in particular to a data compression processing method based on big data. The method comprises the following steps: acquiring a vibration data sequence corresponding to the operation process of industrial equipment; obtaining a variance gain corresponding to each data according to the difference condition of the data in the window corresponding to each data; obtaining the importance degree of each data according to the difference value of each data and the normal fluctuation range of the data, the fluctuation condition of the data in the window corresponding to each data and the variance gain; obtaining a loss tolerance value of each data based on the importance degree of each data and the normal fluctuation range of the data; continuously adjusting the threshold value of the Target-Puck algorithm based on the occurrence frequency of each loss tolerance value, and determining an optimal threshold value; and compressing the vibration data sequence by adopting a Douglas-Prak algorithm based on the optimal threshold value to obtain compressed data. The invention ensures the compression effect of the operation data of the industrial equipment.

Description

Data compression processing method based on big data

Technical Field

The invention relates to the technical field of data compression, in particular to a data compression processing method based on big data.

Background

In industrial equipment monitoring and fault diagnosis, equipment vibration data is an important monitoring index. However, device vibration data typically contains a large number of sampling points and high frequency components, resulting in a huge amount of data, which presents challenges for data transmission and storage. In order to effectively utilize large data to reduce the cost of storage and transmission, data compression is an important area of research.

Since the accuracy requirement of the vibration data is not high and the vibration data is time sequence data, the accuracy requirement of the data is not necessarily lossless when the data is stored, and the data can be compressed and stored in a lossy compression mode when the data is stored. The basic idea of the algorithm is to obtain a curve similar to original data by deleting some unimportant points, so that the aim of compressing the data is achieved, but the traditional method has poor compression effect because the selected threshold value is an empirical threshold value when vibration data of equipment are compressed.

Disclosure of Invention

In order to solve the problem that the existing Targes-Puck algorithm has poor compression effect when compressing vibration data of industrial equipment, the invention aims to provide a data compression processing method based on big data, and the adopted technical scheme is as follows:

the invention provides a data compression processing method based on big data, which comprises the following steps:

acquiring a vibration data sequence corresponding to the operation process of industrial equipment;

respectively taking each data in the vibration data sequence as central data to construct a window corresponding to each data; obtaining a variance gain corresponding to each data according to the difference condition of the data in the window corresponding to each data; obtaining the importance degree of each data according to the difference value of each data and the normal fluctuation range of the data, the fluctuation condition of the data in the window corresponding to each data and the variance gain;

obtaining a loss tolerance value of each data based on the importance degree of each data and the normal fluctuation range of the data; continuously adjusting the threshold value of the Target-Puck algorithm based on the occurrence frequency of each loss tolerance value, and determining an optimal threshold value;

and based on the optimal threshold, compressing the vibration data sequence by adopting a Target-Puck algorithm to obtain compressed data.

Preferably, the obtaining the variance gain corresponding to each data according to the difference condition of the data in the window corresponding to each data includes:

for the mth data in the vibration data sequence:

the variances of all data in the window corresponding to the mth data are marked as first variances, and the variances of all data except the mth data in the window corresponding to the mth data are marked as second variances;

and determining the absolute value of the difference between the first variance and the second variance as a variance gain corresponding to the mth data.

Preferably, the obtaining the importance degree of each data according to the difference value of each data and the normal fluctuation range of the data, the fluctuation condition of the data in the window corresponding to each data and the variance gain includes:

for the mth data in the vibration data sequence:

determining a deviation value corresponding to the mth data according to the difference between the mth data and the normal fluctuation range of the data; taking the ratio of the deviation value corresponding to the mth data to the maximum deviation value bearable by industrial equipment as the deviation degree of the mth data;

and calculating the importance degree of the mth data according to the deviation degree of the mth data and the variance gain corresponding to the mth data.

Preferably, determining the deviation value corresponding to the mth data according to the difference between the mth data and the normal fluctuation range of the data includes:

if the m-th data is smaller than the lower limit value of the normal fluctuation range of the data, taking the difference value between the lower limit value of the normal fluctuation range of the data and the m-th data as a deviation value corresponding to the m-th data; if the m-th data is larger than or equal to the lower limit value of the normal fluctuation range of the data and smaller than or equal to the upper limit value of the normal fluctuation range of the data, enabling the deviation value corresponding to the m-th data to be 0; and if the mth data is larger than the upper limit value of the normal fluctuation range of the data, taking the difference value between the mth data and the upper limit value of the normal fluctuation range of the data as a deviation value corresponding to the mth data.

Preferably, the importance of the mth data is calculated using the following formula:

；

wherein G is _m Is the importance degree of the mth data, f _m For the deviation value corresponding to the mth data, f _max Is the maximum difference value that can be tolerated by industrial equipment,for the first weight coefficient corresponding to the mth data,/th data>For the second weight coefficient corresponding to the mth data, Δd _m For the variance gain corresponding to the mth data, exp () is an exponential function based on a natural constant.

Preferably, the obtaining of the first weight coefficient and the second weight coefficient corresponding to the mth data includes:

if the variance gain corresponding to the mth data is greater than or equal to a preset variance gain threshold and the deviation degree of the mth data is 0, setting the first weight coefficient and the second weight coefficient corresponding to the mth data as basic weights;

if the variance gain corresponding to the mth data is smaller than the preset variance gain threshold and the deviation degree of the mth data is 0, carrying out negative correlation normalization processing on variances of all data in a window corresponding to the mth data to obtain a negative correlation normalization result, and recording the product of the negative correlation normalization result, the basic weight and the preset first super parameter as a first characteristic value; taking the sum of the basic weight and the first characteristic value as a first weight coefficient corresponding to the mth data, and taking the difference value between the basic weight and the first characteristic value as a second weight coefficient corresponding to the mth data; wherein, the preset first super parameter is larger than 0;

if the variance gain corresponding to the mth data is greater than or equal to a preset variance gain threshold, and the deviation degree of the mth data is not 0, marking the product of the basic weight, the deviation degree of the mth data and the preset first super parameter as a second characteristic value; taking the sum of the basic weight and the second characteristic value as a first weight coefficient corresponding to the mth data, and taking the difference value of the basic weight and the second characteristic value as a second weight coefficient corresponding to the mth data;

if the variance gain corresponding to the mth data is smaller than the preset variance gain threshold and the deviation degree of the mth data is not 0, taking the sum of the basic weight, the first characteristic value and the second characteristic value as a first weight coefficient corresponding to the mth data, marking the difference value between the basic weight and the first characteristic value as a first difference value, and taking the difference value between the first difference value and the second characteristic value as a second weight coefficient corresponding to the mth data.

Preferably, the obtaining the loss tolerance value of each data based on the importance degree of each data and the normal fluctuation range of the data includes:

for the mth data in the vibration data sequence:

taking the difference value between the upper limit value and the lower limit value of the normal fluctuation range of the data as the length of a normal fluctuation interval, and taking the product of a preset second super parameter and the length of the normal fluctuation interval as a first product; wherein, the preset second super parameter is larger than 0;

taking the product of the first product and the importance degree of the mth data as a loss tolerance value of the mth data.

Preferably, the determining the optimal threshold value based on the frequency of each loss tolerance value continuously adjusts the threshold value of the dawsonite-pock algorithm includes:

performing curve fitting on the frequency of occurrence of all loss tolerance values to obtain a first curve, wherein the abscissa of the first curve is the loss tolerance value, and the ordinate is the frequency of occurrence of the loss tolerance value; obtaining an extreme point on the first curve;

acquiring a characteristic area corresponding to each extreme point; sorting the loss tolerance values corresponding to all the extreme points according to the sequence from the large characteristic area to the small characteristic area to obtain a loss tolerance value sequence;

and taking the first element in the loss tolerance value sequence as a threshold value of a Douglas-Prak algorithm, simplifying and adjusting data in the vibration data sequence by adopting the Douglas-Prak algorithm to obtain a simplified data sequence, calculating the matching degree between the vibration data sequence and the simplified data sequence, taking the second element in the loss tolerance value sequence as the threshold value of the Douglas-Prak algorithm if the matching degree is larger than the matching threshold value, simplifying and adjusting the data in the vibration data sequence by adopting the Douglas-Prak algorithm until the matching degree between the vibration data sequence and the simplified data sequence is smaller than or equal to the matching threshold value, and taking the threshold value of the Douglas-Prak algorithm corresponding at the moment as an optimal threshold value.

Preferably, the obtaining the feature area corresponding to each extreme point includes:

for the nth extreme point:

taking the difference value between the loss tolerance value corresponding to the nth extreme point and the preset first value as the lower limit value of the association interval corresponding to the nth extreme point, and taking the sum of the loss tolerance value corresponding to the nth extreme point and the preset first value as the upper limit value of the association interval corresponding to the nth extreme point; acquiring an associated interval corresponding to an nth extreme point based on the lower limit value of the associated interval and the upper limit value of the associated interval; wherein the preset first value is greater than 0;

and taking the area surrounded by the association interval corresponding to the nth extreme point of the first curve and the transverse axis as the characteristic area corresponding to the nth extreme point.

Preferably, the acquiring the vibration data sequence corresponding to the operation process of the industrial equipment includes:

obtaining vibration data of each acquisition moment in the operation process of industrial equipment;

and sequencing the vibration data at all the acquisition moments according to the time sequence to obtain a vibration data sequence corresponding to the operation process of the industrial equipment.

The invention has at least the following beneficial effects:

according to the method, firstly, a vibration data sequence corresponding to an industrial equipment operation process is obtained, then difference conditions of data in windows corresponding to each data in the vibration data sequence are analyzed respectively, importance degree of each data is obtained, loss tolerance value of each data is obtained according to the importance degree of the data, limitation is carried out according to the loss tolerance value, the vibration data sequence is simplified through the Target-Puck algorithm, the threshold value of the Target-Puck algorithm is continuously adjusted based on the simplified result, the optimal threshold value is further obtained, data in the vibration data sequence is compressed through the optimal threshold value, and the optimal threshold value of the Target-Puck algorithm is adaptively determined by combining the change characteristics of the vibration data of the industrial equipment, so that the compression effect of the vibration data of the industrial equipment is guaranteed.

Drawings

In order to more clearly illustrate the embodiments of the invention or the technical solutions and advantages of the prior art, the following description will briefly explain the drawings used in the embodiments or the description of the prior art, and it is obvious that the drawings in the following description are only some embodiments of the invention, and other drawings can be obtained according to the drawings without inventive effort for a person skilled in the art.

Fig. 1 is a flowchart of a data compression processing method based on big data according to an embodiment of the present invention.

Detailed Description

In order to further describe the technical means and effects adopted by the present invention to achieve the preset purpose, the following detailed description is given below of a data compression processing method based on big data according to the present invention with reference to the accompanying drawings and the preferred embodiments.

Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs.

The following specifically describes a specific scheme of the data compression processing method based on big data provided by the invention with reference to the accompanying drawings.

An embodiment of a data compression processing method based on big data:

the specific scene aimed at by this embodiment is: because the data volume of the monitoring data generated in the operation process of the industrial equipment is large, the storage space occupied when the monitoring data is stored is large, in order to save the storage space, the collected monitoring data is often compressed, the existing Dalgar-Prak algorithm is used for compressing the monitoring data of the equipment, and the selected threshold value is an empirical threshold value, so that the compression effect of the data is poor.

The embodiment provides a data compression processing method based on big data, as shown in fig. 1, the data compression processing method based on big data in the embodiment includes the following steps:

step S1, a vibration data sequence corresponding to the operation process of industrial equipment is obtained.

The corresponding sensor is installed on the industrial equipment at first, and is used for collecting vibration data in the operation process of the industrial equipment, wherein the vibration data can be vibration acceleration, vibration speed, vibration displacement and the like, and the vibration acceleration is as follows: measuring an acceleration value of the industrial device at a point in time; vibration speed: measuring the speed of the industrial equipment at a point in time; vibration displacement: the displacement of the industrial equipment at a certain point in time is measured. In this embodiment, one type of vibration data is taken as an example, and other vibration data can be processed by the method provided in this embodiment. In this embodiment, vibration data is collected every 0.1 seconds, and in a specific application, the practitioner can set the frequency of vibration data collection according to the specific situation. So far, vibration data of each acquisition time in the operation process of the industrial equipment are obtained.

And sequencing vibration data at all acquisition moments in the operation process of the industrial equipment according to the time sequence to obtain a vibration data sequence corresponding to the operation process of the industrial equipment.

Step S2, respectively taking each data in the vibration data sequence as central data, and constructing a window corresponding to each data; obtaining a variance gain corresponding to each data according to the difference condition of the data in the window corresponding to each data; and obtaining the importance degree of each data according to the difference value of each data and the normal fluctuation range of the data, the fluctuation condition of the data in the window corresponding to each data and the variance gain.

The present embodiment has obtained a vibration data sequence corresponding to an operation process of an industrial apparatus, and data in the vibration data sequence is data to be compressed, so the present embodiment will analyze the data in the vibration data sequence next.

And respectively taking each data in the vibration data sequence as central data, constructing a window with the size of a multiplied by 1, and taking the window as a window corresponding to each data, wherein each data is central data in the corresponding window, and the value of a in the embodiment is 7, so that an implementer can set according to specific conditions in specific application.

If the difference of the data in the window corresponding to a certain data is larger, the fluctuation of the data and the surrounding data is larger, namely the corresponding variance is larger.

For the mth data in the vibration data sequence:

the variances of all data in the window corresponding to the mth data are marked as first variances, and the variances of all data except the mth data in the window corresponding to the mth data are marked as second variances; and determining the absolute value of the difference between the first variance and the second variance as a variance gain corresponding to the mth data. In this embodiment, the mth data is removed from the corresponding window, and the difference between the new difference and the original variance of all the data remaining after the data is removed is calculated, where the difference reflects the influence of the removed mth data on the variance of the data in the whole window, if the variance gain is larger, it means that the influence of the mth data on the variance of the data in the corresponding window is larger, that is, the mth data is more likely to be abnormal data; and if the variance gain is smaller, it means that the variance effect of the mth data on the data in its corresponding window is smaller, i.e., the more likely the mth data is normal data.

The importance degree of the data is related to the self amplitude and the change of the neighborhood data, the data value range of the normal operation of the device is obtained when the device is in normal operation, the deviation value corresponding to the mth data is determined according to the difference between the mth data and the normal fluctuation range of the data, and then the importance degree of the data is determined by combining the deviation value and the variance gain. Specifically, if the mth data is smaller than the lower limit value of the normal fluctuation range of the data, taking the difference value between the lower limit value of the normal fluctuation range of the data and the mth data as the deviation value corresponding to the mth data; if the m-th data is larger than or equal to the lower limit value of the normal fluctuation range of the data and smaller than or equal to the upper limit value of the normal fluctuation range of the data, enabling the deviation value corresponding to the m-th data to be 0; and if the mth data is larger than the upper limit value of the normal fluctuation range of the data, taking the difference value between the mth data and the upper limit value of the normal fluctuation range of the data as a deviation value corresponding to the mth data. And taking the ratio of the deviation value corresponding to the mth data to the maximum deviation value bearable by industrial equipment as the deviation degree of the mth data. The normal fluctuation range implementer of the data sets according to the specific situation, wherein the lower limit value of the normal fluctuation range of the data is smaller than the upper limit value of the normal fluctuation range of the data. It should be noted that, the practitioner of the maximum difference value that can be tolerated by the industrial equipment sets the maximum difference value according to the specific situation. The specific calculation formula of the importance degree of the mth data is as follows:

；

Indicating the degree of deviation of the mth data. When the deviation degree of the mth data is larger and the variance gain corresponding to the mth data is larger, the mth data is more likely to be abnormal data, so that the importance degree is larger; when the mth dataThe smaller the deviation degree of the mth data, the smaller the variance gain corresponding to the mth data, which means that the mth data is more likely to be normal data, and thus the smaller the importance degree thereof.

The acquisition process of the first weight coefficient and the second weight coefficient corresponding to the mth data specifically comprises the following steps:

and if the variance gain corresponding to the mth data is greater than or equal to a preset variance gain threshold and the deviation degree of the mth data is 0, setting the first weight coefficient and the second weight coefficient corresponding to the mth data as basic weights. In this embodiment, the base weight is 0.5, so that the first weight coefficient and the second weight coefficient corresponding to the mth data are both 0.5 at this time. In this embodiment, the preset variance gain threshold is 0.58, and in a specific application, the practitioner can set the variance gain threshold according to the specific situation.

If the variance gain corresponding to the mth data is smaller than the preset variance gain threshold and the deviation degree of the mth data is 0, carrying out negative correlation normalization processing on variances of all data in a window corresponding to the mth data to obtain a negative correlation normalization result, and recording the product of the negative correlation normalization result, the basic weight and the preset first super parameter as a first characteristic value; taking the sum of the basic weight and the first characteristic value as a first weight coefficient corresponding to the mth data, and taking the difference value between the basic weight and the first characteristic value as a second weight coefficient corresponding to the mth data; wherein, the preset first super parameter is greater than 0, and in this embodiment, the preset first super parameter is 3, and in a specific application, an implementer can set according to a specific situation; in this case, the specific calculation formulas of the first weight coefficient and the second weight coefficient corresponding to the mth data are as follows:

；

wherein,for the first weight coefficient corresponding to the mth data,/th data>For the second weight coefficient corresponding to the mth data, beta ₀ B as basis weight _m For the variance, delta, of all data within the window corresponding to the mth data ₁ The first super parameter is preset. exp (-b) _m ) Negative correlation normalization result, exp (-b), representing variance of all data in window corresponding to mth data _m )×(β ₀ ×δ ₁ ) Representing the first characteristic value.

If the variance gain corresponding to the mth data is greater than or equal to a preset variance gain threshold, and the deviation degree of the mth data is not 0, marking the product of the basic weight, the deviation degree of the mth data and the preset first super parameter as a second characteristic value; taking the sum of the basic weight and the second characteristic value as a first weight coefficient corresponding to the mth data, and taking the difference value of the basic weight and the second characteristic value as a second weight coefficient corresponding to the mth data; in this case, the specific calculation formulas of the first weight coefficient and the second weight coefficient corresponding to the mth data are as follows:

；

wherein,for the first weight coefficient corresponding to the mth data,/th data>For the second weight coefficient corresponding to the mth data, Δβ represents the base weight, f _m For the deviation value corresponding to the mth data, f _max Representing the maximum difference value that the device can withstand.Representing the second characteristic value.

If the variance gain corresponding to the mth data is smaller than the preset variance gain threshold and the deviation degree of the mth data is not 0, taking the sum of the basic weight, the first characteristic value and the second characteristic value as a first weight coefficient corresponding to the mth data, marking the difference value between the basic weight and the first characteristic value as a first difference value, and taking the difference value between the first difference value and the second characteristic value as a second weight coefficient corresponding to the mth data; in this case, the specific calculation formulas of the first weight coefficient and the second weight coefficient corresponding to the mth data are as follows:

；

wherein,for the first weight coefficient corresponding to the mth data,/th data>For the second weight coefficient corresponding to the mth data, beta ₀ Represents the basis weight, f _m For the deviation value corresponding to the mth data, f _max Representing the maximum difference value that the device can withstand, b _m For the variance, delta, of all data within the window corresponding to the mth data ₁ The first super parameter is preset. Beta ₀ -exp(-b _m )×(β ₀ ×δ ₁ ) Representing a first difference.

By adopting the method, the importance degree of each data in the vibration data sequence can be obtained.

Step S3, obtaining a loss tolerance value of each data based on the importance degree of each data and the normal fluctuation range of the data; and continuously adjusting the threshold value of the Fabry-Perot algorithm based on the occurrence frequency of each loss tolerance value, and determining the optimal threshold value.

The data in the vibration data sequence is at [ g ] ₁ ,g ₂ ]When in the state, the corresponding data is described as normal operation data, wherein g ₁ G represents the lower limit value of the normal fluctuation range of the data ₂ The method is characterized in that the method is used for representing the upper limit value of the normal fluctuation range of data, less information is contained in normal operation data, the principle of the Target Laplace-Prak algorithm is to approximate an original curve by keeping key data points, threshold parameters play a role in controlling the abstract degree in the algorithm, a smaller threshold can keep more details, and a larger threshold can compress the data points to a larger degree. The present embodiment obtains the importance degree of each data in the vibration data sequence, and next the present embodiment will obtain the loss tolerance value of each data based on the importance degree of each data and the normal fluctuation range of the data.

For the mth data in the vibration data sequence: taking the difference value between the upper limit value and the lower limit value of the normal fluctuation range of the data as the length of a normal fluctuation interval, and taking the product of a preset second super parameter and the length of the normal fluctuation interval as a first product; wherein, the preset second super parameter is greater than 0, and in this embodiment, the preset second super parameter is 1, and in a specific application, an implementer can set according to a specific situation; taking the product of the first product and the importance degree of the mth data as a loss tolerance value of the mth data.

By adopting the method, the loss tolerance value of each data in the vibration data sequence is obtained, the same loss tolerance value is used as the same loss tolerance value, the occurrence frequency of each loss tolerance value is counted, curve fitting is carried out on the occurrence frequency of all the loss tolerance values to obtain a curve, the curve obtained at the moment is recorded as a first curve, wherein the abscissa of the first curve is the loss tolerance value, and the ordinate is the occurrence frequency of the loss tolerance value; obtaining an extreme point on the first curve; the method for obtaining the extreme points is the prior art, and will not be described in detail here.

For the nth extreme point: taking the difference value between the loss tolerance value corresponding to the nth extreme point and the preset first value as the lower limit value of the association interval corresponding to the nth extreme point, and taking the sum of the loss tolerance value corresponding to the nth extreme point and the preset first value as the upper limit value of the association interval corresponding to the nth extreme point; acquiring an associated interval corresponding to an nth extreme point based on the lower limit value of the associated interval and the upper limit value of the associated interval; and taking the area surrounded by the association interval corresponding to the nth extreme point of the first curve and the transverse axis as the characteristic area corresponding to the nth extreme point. The specific acquisition method for presetting the first numerical value in the embodiment comprises the following steps: the ratio of the first product to the constant 15 may be set by the practitioner as the case may be in a particular application.

By adopting the method, the characteristic area corresponding to each extreme point can be obtained, and the loss tolerance values corresponding to all the extreme points are ordered according to the sequence from the large characteristic area to the small characteristic area, so that a loss tolerance value sequence is obtained. Taking the first element in the loss tolerance value sequence as a threshold value of a Fabry-Perot algorithm, and simplifying and adjusting data in the vibration data sequence by adopting the Fabry-Perot algorithm to obtain a simplified data sequence; the operation process of the Targelas-Puck algorithm is as follows: selecting a starting point P and an ending point Q on a vibration data sequence curve, and adding the starting point P and the ending point Q into a result point set; and calculating the distances from all points on the curve to the line segment PQ, and finding out the point M with the largest distance. If the distance of M is smaller than the threshold value of the Target-Puck algorithm, the whole curve is considered to be simplified enough, and the algorithm is ended; if the distance of M is greater than or equal to the threshold of the Proglas-Puck algorithm, M is added to the result point set. The curve is divided into two segments, one segment from the start point P to the point M and the other segment from the point M to the end point Q. The douglas-pock algorithm is applied recursively to the two curves, respectively. And combining the result point sets obtained by recursion to obtain a final simplified curve, wherein the data on the simplified curve form a simplified data sequence. Calculating the DTW distance between the vibration data sequence and the simplified data sequence, and taking the calculated negative correlation mapping value of the DTW distance as the matching degree between the vibration data sequence and the simplified data sequence, wherein the negative correlation mapping value of the DTW distance can be expressed by an exponential function, for example: taking a value of an exponential function taking a natural constant as a base and taking the negative DTW distance as an index as a matching degree; the inverse of the DTW distance may also be expressed as the inverse of the matching degree. The calculation method of the DTW distance is the prior art, and will not be described in detail here. And if the matching degree is greater than the matching threshold, taking the second element in the loss tolerance value sequence as the threshold of the Douglas-Prak algorithm, simplifying and adjusting the data in the vibration data sequence by adopting the Douglas-Prak algorithm to obtain a simplified data sequence, calculating the matching degree between the vibration data sequence and the simplified data sequence, and if the calculated matching degree is greater than the matching threshold, taking the third element in the loss tolerance value sequence as the threshold of the Douglas-Prak algorithm, and the like until the matching degree between the vibration data sequence and the simplified data sequence is less than or equal to the matching threshold, and taking the threshold of the Douglas-Prak algorithm corresponding at the moment as the optimal threshold.

So far, by adopting the method provided by the embodiment, the optimal threshold value is obtained.

And step S4, based on the optimal threshold, compressing the vibration data sequence by adopting a Target-Puck algorithm to obtain compressed data.

The method comprises the steps that an optimal threshold value is obtained, and then the method comprises the steps of compressing vibration data sequences corresponding to the operation process of industrial equipment by adopting a Tagella-Prak algorithm based on the optimal threshold value to obtain compressed data.

The method provided by the embodiment finishes the compression processing of the vibration data in the running process of the industrial equipment, and ensures the compression effect of the vibration data of the equipment.

According to the embodiment, firstly, the vibration data sequence corresponding to the operation process of the industrial equipment is obtained, then, the difference condition of data in windows corresponding to each data in the vibration data sequence is analyzed, the importance degree of each data is obtained, the loss tolerance value of each data is obtained according to the importance degree of the data, the loss tolerance value is limited, the vibration data sequence is simplified through the Target-Puck algorithm, the threshold value of the Target-Puck algorithm is continuously adjusted based on the simplified result, the optimal threshold value is further obtained, the data in the vibration data sequence is compressed by utilizing the optimal threshold value, and the optimal threshold value of the Target-Puck algorithm is adaptively determined by combining the change characteristics of the vibration data of the industrial equipment, so that the compression effect of the vibration data of the industrial equipment is ensured.

It should be noted that: the foregoing description of the preferred embodiments of the present invention is not intended to be limiting, but rather, any modifications, equivalents, improvements, etc. that fall within the principles of the present invention are intended to be included within the scope of the present invention.

Claims

1. The data compression processing method based on big data is characterized by comprising the following steps:

based on the optimal threshold, compressing the vibration data sequence by adopting a Target-Puck algorithm to obtain compressed data;

the obtaining the loss tolerance value of each data based on the importance degree of each data and the normal fluctuation range of the data comprises the following steps:

for the mth data in the vibration data sequence:

taking the product of the first product and the importance degree of the mth data as a loss tolerance value of the mth data;

the step of continuously adjusting the threshold value of the morse-plack algorithm based on the occurrence frequency of each loss tolerance value to determine an optimal threshold value comprises the following steps:

performing curve fitting on the frequency of each loss tolerance value to obtain a first curve, wherein the abscissa of the first curve is the loss tolerance value, and the ordinate is the frequency of the loss tolerance value; obtaining an extreme point on the first curve;

2. The method for data compression processing based on big data according to claim 1, wherein the obtaining the variance gain corresponding to each data according to the difference condition of the data in the window corresponding to each data comprises:

for the mth data in the vibration data sequence:

3. The method for compressing data based on big data according to claim 1, wherein the obtaining the importance degree of each data according to the difference value between each data and the normal fluctuation range of the data, the fluctuation condition of the data in the window corresponding to each data, and the variance gain comprises:

for the mth data in the vibration data sequence:

4. A data compression processing method based on big data according to claim 3, wherein determining the deviation value corresponding to the mth data based on the difference between the mth data and the normal fluctuation range of the data comprises:

5. A data compression processing method based on big data according to claim 3, wherein the importance degree of the mth data is calculated using the following formula:

；

6. The method for data compression processing based on big data according to claim 5, wherein the obtaining of the first weight coefficient and the second weight coefficient corresponding to the mth data includes:

7. The data compression processing method based on big data according to claim 1, wherein the obtaining of the feature area corresponding to each extreme point comprises:

for the nth extreme point:

8. The method for data compression processing based on big data according to claim 1, wherein the step of acquiring the vibration data sequence corresponding to the operation process of the industrial equipment comprises the steps of: