CN117150217B - Data compression processing method based on big data - Google Patents

Data compression processing method based on big data Download PDF

Info

Publication number
CN117150217B
CN117150217B CN202311441320.6A CN202311441320A CN117150217B CN 117150217 B CN117150217 B CN 117150217B CN 202311441320 A CN202311441320 A CN 202311441320A CN 117150217 B CN117150217 B CN 117150217B
Authority
CN
China
Prior art keywords
data
value
mth
taking
sequence
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202311441320.6A
Other languages
Chinese (zh)
Other versions
CN117150217A (en
Inventor
曲宝春
张斌
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Suzhou Aixiongsi Communication Technology Co ltd
Original Assignee
Suzhou Aixiongsi Communication Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Suzhou Aixiongsi Communication Technology Co ltd filed Critical Suzhou Aixiongsi Communication Technology Co ltd
Priority to CN202311441320.6A priority Critical patent/CN117150217B/en
Publication of CN117150217A publication Critical patent/CN117150217A/en
Application granted granted Critical
Publication of CN117150217B publication Critical patent/CN117150217B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/10Complex mathematical operations
    • G06F17/18Complex mathematical operations for evaluating statistical data, e.g. average values, frequency distributions, probability functions, regression analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02PCLIMATE CHANGE MITIGATION TECHNOLOGIES IN THE PRODUCTION OR PROCESSING OF GOODS
    • Y02P90/00Enabling technologies with a potential contribution to greenhouse gas [GHG] emissions mitigation
    • Y02P90/30Computing systems specially adapted for manufacturing

Landscapes

  • Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Mathematical Analysis (AREA)
  • Mathematical Optimization (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Computational Mathematics (AREA)
  • Pure & Applied Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Algebra (AREA)
  • Databases & Information Systems (AREA)
  • Software Systems (AREA)
  • Probability & Statistics with Applications (AREA)
  • Operations Research (AREA)
  • Artificial Intelligence (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Computation (AREA)
  • Measurement Of Mechanical Vibrations Or Ultrasonic Waves (AREA)

Abstract

The invention relates to the technical field of data compression, in particular to a data compression processing method based on big data. The method comprises the following steps: acquiring a vibration data sequence corresponding to the operation process of industrial equipment; obtaining a variance gain corresponding to each data according to the difference condition of the data in the window corresponding to each data; obtaining the importance degree of each data according to the difference value of each data and the normal fluctuation range of the data, the fluctuation condition of the data in the window corresponding to each data and the variance gain; obtaining a loss tolerance value of each data based on the importance degree of each data and the normal fluctuation range of the data; continuously adjusting the threshold value of the Target-Puck algorithm based on the occurrence frequency of each loss tolerance value, and determining an optimal threshold value; and compressing the vibration data sequence by adopting a Douglas-Prak algorithm based on the optimal threshold value to obtain compressed data. The invention ensures the compression effect of the operation data of the industrial equipment.

Description

Data compression processing method based on big data
Technical Field
The invention relates to the technical field of data compression, in particular to a data compression processing method based on big data.
Background
In industrial equipment monitoring and fault diagnosis, equipment vibration data is an important monitoring index. However, device vibration data typically contains a large number of sampling points and high frequency components, resulting in a huge amount of data, which presents challenges for data transmission and storage. In order to effectively utilize large data to reduce the cost of storage and transmission, data compression is an important area of research.
Since the accuracy requirement of the vibration data is not high and the vibration data is time sequence data, the accuracy requirement of the data is not necessarily lossless when the data is stored, and the data can be compressed and stored in a lossy compression mode when the data is stored. The basic idea of the algorithm is to obtain a curve similar to original data by deleting some unimportant points, so that the aim of compressing the data is achieved, but the traditional method has poor compression effect because the selected threshold value is an empirical threshold value when vibration data of equipment are compressed.
Disclosure of Invention
In order to solve the problem that the existing Targes-Puck algorithm has poor compression effect when compressing vibration data of industrial equipment, the invention aims to provide a data compression processing method based on big data, and the adopted technical scheme is as follows:
the invention provides a data compression processing method based on big data, which comprises the following steps:
acquiring a vibration data sequence corresponding to the operation process of industrial equipment;
respectively taking each data in the vibration data sequence as central data to construct a window corresponding to each data; obtaining a variance gain corresponding to each data according to the difference condition of the data in the window corresponding to each data; obtaining the importance degree of each data according to the difference value of each data and the normal fluctuation range of the data, the fluctuation condition of the data in the window corresponding to each data and the variance gain;
obtaining a loss tolerance value of each data based on the importance degree of each data and the normal fluctuation range of the data; continuously adjusting the threshold value of the Target-Puck algorithm based on the occurrence frequency of each loss tolerance value, and determining an optimal threshold value;
and based on the optimal threshold, compressing the vibration data sequence by adopting a Target-Puck algorithm to obtain compressed data.
Preferably, the obtaining the variance gain corresponding to each data according to the difference condition of the data in the window corresponding to each data includes:
for the mth data in the vibration data sequence:
the variances of all data in the window corresponding to the mth data are marked as first variances, and the variances of all data except the mth data in the window corresponding to the mth data are marked as second variances;
and determining the absolute value of the difference between the first variance and the second variance as a variance gain corresponding to the mth data.
Preferably, the obtaining the importance degree of each data according to the difference value of each data and the normal fluctuation range of the data, the fluctuation condition of the data in the window corresponding to each data and the variance gain includes:
for the mth data in the vibration data sequence:
determining a deviation value corresponding to the mth data according to the difference between the mth data and the normal fluctuation range of the data; taking the ratio of the deviation value corresponding to the mth data to the maximum deviation value bearable by industrial equipment as the deviation degree of the mth data;
and calculating the importance degree of the mth data according to the deviation degree of the mth data and the variance gain corresponding to the mth data.
Preferably, determining the deviation value corresponding to the mth data according to the difference between the mth data and the normal fluctuation range of the data includes:
if the m-th data is smaller than the lower limit value of the normal fluctuation range of the data, taking the difference value between the lower limit value of the normal fluctuation range of the data and the m-th data as a deviation value corresponding to the m-th data; if the m-th data is larger than or equal to the lower limit value of the normal fluctuation range of the data and smaller than or equal to the upper limit value of the normal fluctuation range of the data, enabling the deviation value corresponding to the m-th data to be 0; and if the mth data is larger than the upper limit value of the normal fluctuation range of the data, taking the difference value between the mth data and the upper limit value of the normal fluctuation range of the data as a deviation value corresponding to the mth data.
Preferably, the importance of the mth data is calculated using the following formula:
wherein G is m Is the importance degree of the mth data, f m For the deviation value corresponding to the mth data, f max Is the maximum difference value that can be tolerated by industrial equipment,for the first weight coefficient corresponding to the mth data,/th data>For the second weight coefficient corresponding to the mth data, Δd m For the variance gain corresponding to the mth data, exp () is an exponential function based on a natural constant.
Preferably, the obtaining of the first weight coefficient and the second weight coefficient corresponding to the mth data includes:
if the variance gain corresponding to the mth data is greater than or equal to a preset variance gain threshold and the deviation degree of the mth data is 0, setting the first weight coefficient and the second weight coefficient corresponding to the mth data as basic weights;
if the variance gain corresponding to the mth data is smaller than the preset variance gain threshold and the deviation degree of the mth data is 0, carrying out negative correlation normalization processing on variances of all data in a window corresponding to the mth data to obtain a negative correlation normalization result, and recording the product of the negative correlation normalization result, the basic weight and the preset first super parameter as a first characteristic value; taking the sum of the basic weight and the first characteristic value as a first weight coefficient corresponding to the mth data, and taking the difference value between the basic weight and the first characteristic value as a second weight coefficient corresponding to the mth data; wherein, the preset first super parameter is larger than 0;
if the variance gain corresponding to the mth data is greater than or equal to a preset variance gain threshold, and the deviation degree of the mth data is not 0, marking the product of the basic weight, the deviation degree of the mth data and the preset first super parameter as a second characteristic value; taking the sum of the basic weight and the second characteristic value as a first weight coefficient corresponding to the mth data, and taking the difference value of the basic weight and the second characteristic value as a second weight coefficient corresponding to the mth data;
if the variance gain corresponding to the mth data is smaller than the preset variance gain threshold and the deviation degree of the mth data is not 0, taking the sum of the basic weight, the first characteristic value and the second characteristic value as a first weight coefficient corresponding to the mth data, marking the difference value between the basic weight and the first characteristic value as a first difference value, and taking the difference value between the first difference value and the second characteristic value as a second weight coefficient corresponding to the mth data.
Preferably, the obtaining the loss tolerance value of each data based on the importance degree of each data and the normal fluctuation range of the data includes:
for the mth data in the vibration data sequence:
taking the difference value between the upper limit value and the lower limit value of the normal fluctuation range of the data as the length of a normal fluctuation interval, and taking the product of a preset second super parameter and the length of the normal fluctuation interval as a first product; wherein, the preset second super parameter is larger than 0;
taking the product of the first product and the importance degree of the mth data as a loss tolerance value of the mth data.
Preferably, the determining the optimal threshold value based on the frequency of each loss tolerance value continuously adjusts the threshold value of the dawsonite-pock algorithm includes:
performing curve fitting on the frequency of occurrence of all loss tolerance values to obtain a first curve, wherein the abscissa of the first curve is the loss tolerance value, and the ordinate is the frequency of occurrence of the loss tolerance value; obtaining an extreme point on the first curve;
acquiring a characteristic area corresponding to each extreme point; sorting the loss tolerance values corresponding to all the extreme points according to the sequence from the large characteristic area to the small characteristic area to obtain a loss tolerance value sequence;
and taking the first element in the loss tolerance value sequence as a threshold value of a Douglas-Prak algorithm, simplifying and adjusting data in the vibration data sequence by adopting the Douglas-Prak algorithm to obtain a simplified data sequence, calculating the matching degree between the vibration data sequence and the simplified data sequence, taking the second element in the loss tolerance value sequence as the threshold value of the Douglas-Prak algorithm if the matching degree is larger than the matching threshold value, simplifying and adjusting the data in the vibration data sequence by adopting the Douglas-Prak algorithm until the matching degree between the vibration data sequence and the simplified data sequence is smaller than or equal to the matching threshold value, and taking the threshold value of the Douglas-Prak algorithm corresponding at the moment as an optimal threshold value.
Preferably, the obtaining the feature area corresponding to each extreme point includes:
for the nth extreme point:
taking the difference value between the loss tolerance value corresponding to the nth extreme point and the preset first value as the lower limit value of the association interval corresponding to the nth extreme point, and taking the sum of the loss tolerance value corresponding to the nth extreme point and the preset first value as the upper limit value of the association interval corresponding to the nth extreme point; acquiring an associated interval corresponding to an nth extreme point based on the lower limit value of the associated interval and the upper limit value of the associated interval; wherein the preset first value is greater than 0;
and taking the area surrounded by the association interval corresponding to the nth extreme point of the first curve and the transverse axis as the characteristic area corresponding to the nth extreme point.
Preferably, the acquiring the vibration data sequence corresponding to the operation process of the industrial equipment includes:
obtaining vibration data of each acquisition moment in the operation process of industrial equipment;
and sequencing the vibration data at all the acquisition moments according to the time sequence to obtain a vibration data sequence corresponding to the operation process of the industrial equipment.
The invention has at least the following beneficial effects:
according to the method, firstly, a vibration data sequence corresponding to an industrial equipment operation process is obtained, then difference conditions of data in windows corresponding to each data in the vibration data sequence are analyzed respectively, importance degree of each data is obtained, loss tolerance value of each data is obtained according to the importance degree of the data, limitation is carried out according to the loss tolerance value, the vibration data sequence is simplified through the Target-Puck algorithm, the threshold value of the Target-Puck algorithm is continuously adjusted based on the simplified result, the optimal threshold value is further obtained, data in the vibration data sequence is compressed through the optimal threshold value, and the optimal threshold value of the Target-Puck algorithm is adaptively determined by combining the change characteristics of the vibration data of the industrial equipment, so that the compression effect of the vibration data of the industrial equipment is guaranteed.
Drawings
In order to more clearly illustrate the embodiments of the invention or the technical solutions and advantages of the prior art, the following description will briefly explain the drawings used in the embodiments or the description of the prior art, and it is obvious that the drawings in the following description are only some embodiments of the invention, and other drawings can be obtained according to the drawings without inventive effort for a person skilled in the art.
Fig. 1 is a flowchart of a data compression processing method based on big data according to an embodiment of the present invention.
Detailed Description
In order to further describe the technical means and effects adopted by the present invention to achieve the preset purpose, the following detailed description is given below of a data compression processing method based on big data according to the present invention with reference to the accompanying drawings and the preferred embodiments.
Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs.
The following specifically describes a specific scheme of the data compression processing method based on big data provided by the invention with reference to the accompanying drawings.
An embodiment of a data compression processing method based on big data:
the specific scene aimed at by this embodiment is: because the data volume of the monitoring data generated in the operation process of the industrial equipment is large, the storage space occupied when the monitoring data is stored is large, in order to save the storage space, the collected monitoring data is often compressed, the existing Dalgar-Prak algorithm is used for compressing the monitoring data of the equipment, and the selected threshold value is an empirical threshold value, so that the compression effect of the data is poor.
The embodiment provides a data compression processing method based on big data, as shown in fig. 1, the data compression processing method based on big data in the embodiment includes the following steps:
step S1, a vibration data sequence corresponding to the operation process of industrial equipment is obtained.
The corresponding sensor is installed on the industrial equipment at first, and is used for collecting vibration data in the operation process of the industrial equipment, wherein the vibration data can be vibration acceleration, vibration speed, vibration displacement and the like, and the vibration acceleration is as follows: measuring an acceleration value of the industrial device at a point in time; vibration speed: measuring the speed of the industrial equipment at a point in time; vibration displacement: the displacement of the industrial equipment at a certain point in time is measured. In this embodiment, one type of vibration data is taken as an example, and other vibration data can be processed by the method provided in this embodiment. In this embodiment, vibration data is collected every 0.1 seconds, and in a specific application, the practitioner can set the frequency of vibration data collection according to the specific situation. So far, vibration data of each acquisition time in the operation process of the industrial equipment are obtained.
And sequencing vibration data at all acquisition moments in the operation process of the industrial equipment according to the time sequence to obtain a vibration data sequence corresponding to the operation process of the industrial equipment.
Step S2, respectively taking each data in the vibration data sequence as central data, and constructing a window corresponding to each data; obtaining a variance gain corresponding to each data according to the difference condition of the data in the window corresponding to each data; and obtaining the importance degree of each data according to the difference value of each data and the normal fluctuation range of the data, the fluctuation condition of the data in the window corresponding to each data and the variance gain.
The present embodiment has obtained a vibration data sequence corresponding to an operation process of an industrial apparatus, and data in the vibration data sequence is data to be compressed, so the present embodiment will analyze the data in the vibration data sequence next.
And respectively taking each data in the vibration data sequence as central data, constructing a window with the size of a multiplied by 1, and taking the window as a window corresponding to each data, wherein each data is central data in the corresponding window, and the value of a in the embodiment is 7, so that an implementer can set according to specific conditions in specific application.
If the difference of the data in the window corresponding to a certain data is larger, the fluctuation of the data and the surrounding data is larger, namely the corresponding variance is larger.
For the mth data in the vibration data sequence:
the variances of all data in the window corresponding to the mth data are marked as first variances, and the variances of all data except the mth data in the window corresponding to the mth data are marked as second variances; and determining the absolute value of the difference between the first variance and the second variance as a variance gain corresponding to the mth data. In this embodiment, the mth data is removed from the corresponding window, and the difference between the new difference and the original variance of all the data remaining after the data is removed is calculated, where the difference reflects the influence of the removed mth data on the variance of the data in the whole window, if the variance gain is larger, it means that the influence of the mth data on the variance of the data in the corresponding window is larger, that is, the mth data is more likely to be abnormal data; and if the variance gain is smaller, it means that the variance effect of the mth data on the data in its corresponding window is smaller, i.e., the more likely the mth data is normal data.
The importance degree of the data is related to the self amplitude and the change of the neighborhood data, the data value range of the normal operation of the device is obtained when the device is in normal operation, the deviation value corresponding to the mth data is determined according to the difference between the mth data and the normal fluctuation range of the data, and then the importance degree of the data is determined by combining the deviation value and the variance gain. Specifically, if the mth data is smaller than the lower limit value of the normal fluctuation range of the data, taking the difference value between the lower limit value of the normal fluctuation range of the data and the mth data as the deviation value corresponding to the mth data; if the m-th data is larger than or equal to the lower limit value of the normal fluctuation range of the data and smaller than or equal to the upper limit value of the normal fluctuation range of the data, enabling the deviation value corresponding to the m-th data to be 0; and if the mth data is larger than the upper limit value of the normal fluctuation range of the data, taking the difference value between the mth data and the upper limit value of the normal fluctuation range of the data as a deviation value corresponding to the mth data. And taking the ratio of the deviation value corresponding to the mth data to the maximum deviation value bearable by industrial equipment as the deviation degree of the mth data. The normal fluctuation range implementer of the data sets according to the specific situation, wherein the lower limit value of the normal fluctuation range of the data is smaller than the upper limit value of the normal fluctuation range of the data. It should be noted that, the practitioner of the maximum difference value that can be tolerated by the industrial equipment sets the maximum difference value according to the specific situation. The specific calculation formula of the importance degree of the mth data is as follows:
wherein G is m Is the importance degree of the mth data, f m For the deviation value corresponding to the mth data, f max Is the maximum difference value that can be tolerated by industrial equipment,for the first weight coefficient corresponding to the mth data,/th data>For the second weight coefficient corresponding to the mth data, Δd m For the variance gain corresponding to the mth data, exp () is an exponential function based on a natural constant.
Indicating the degree of deviation of the mth data. When the deviation degree of the mth data is larger and the variance gain corresponding to the mth data is larger, the mth data is more likely to be abnormal data, so that the importance degree is larger; when the mth dataThe smaller the deviation degree of the mth data, the smaller the variance gain corresponding to the mth data, which means that the mth data is more likely to be normal data, and thus the smaller the importance degree thereof.
The acquisition process of the first weight coefficient and the second weight coefficient corresponding to the mth data specifically comprises the following steps:
and if the variance gain corresponding to the mth data is greater than or equal to a preset variance gain threshold and the deviation degree of the mth data is 0, setting the first weight coefficient and the second weight coefficient corresponding to the mth data as basic weights. In this embodiment, the base weight is 0.5, so that the first weight coefficient and the second weight coefficient corresponding to the mth data are both 0.5 at this time. In this embodiment, the preset variance gain threshold is 0.58, and in a specific application, the practitioner can set the variance gain threshold according to the specific situation.
If the variance gain corresponding to the mth data is smaller than the preset variance gain threshold and the deviation degree of the mth data is 0, carrying out negative correlation normalization processing on variances of all data in a window corresponding to the mth data to obtain a negative correlation normalization result, and recording the product of the negative correlation normalization result, the basic weight and the preset first super parameter as a first characteristic value; taking the sum of the basic weight and the first characteristic value as a first weight coefficient corresponding to the mth data, and taking the difference value between the basic weight and the first characteristic value as a second weight coefficient corresponding to the mth data; wherein, the preset first super parameter is greater than 0, and in this embodiment, the preset first super parameter is 3, and in a specific application, an implementer can set according to a specific situation; in this case, the specific calculation formulas of the first weight coefficient and the second weight coefficient corresponding to the mth data are as follows:
wherein,for the first weight coefficient corresponding to the mth data,/th data>For the second weight coefficient corresponding to the mth data, beta 0 B as basis weight m For the variance, delta, of all data within the window corresponding to the mth data 1 The first super parameter is preset. exp (-b) m ) Negative correlation normalization result, exp (-b), representing variance of all data in window corresponding to mth data m )×(β 0 ×δ 1 ) Representing the first characteristic value.
If the variance gain corresponding to the mth data is greater than or equal to a preset variance gain threshold, and the deviation degree of the mth data is not 0, marking the product of the basic weight, the deviation degree of the mth data and the preset first super parameter as a second characteristic value; taking the sum of the basic weight and the second characteristic value as a first weight coefficient corresponding to the mth data, and taking the difference value of the basic weight and the second characteristic value as a second weight coefficient corresponding to the mth data; in this case, the specific calculation formulas of the first weight coefficient and the second weight coefficient corresponding to the mth data are as follows:
wherein,for the first weight coefficient corresponding to the mth data,/th data>For the second weight coefficient corresponding to the mth data, Δβ represents the base weight, f m For the deviation value corresponding to the mth data, f max Representing the maximum difference value that the device can withstand.Representing the second characteristic value.
If the variance gain corresponding to the mth data is smaller than the preset variance gain threshold and the deviation degree of the mth data is not 0, taking the sum of the basic weight, the first characteristic value and the second characteristic value as a first weight coefficient corresponding to the mth data, marking the difference value between the basic weight and the first characteristic value as a first difference value, and taking the difference value between the first difference value and the second characteristic value as a second weight coefficient corresponding to the mth data; in this case, the specific calculation formulas of the first weight coefficient and the second weight coefficient corresponding to the mth data are as follows:
wherein,for the first weight coefficient corresponding to the mth data,/th data>For the second weight coefficient corresponding to the mth data, beta 0 Represents the basis weight, f m For the deviation value corresponding to the mth data, f max Representing the maximum difference value that the device can withstand, b m For the variance, delta, of all data within the window corresponding to the mth data 1 The first super parameter is preset. Beta 0 -exp(-b m )×(β 0 ×δ 1 ) Representing a first difference.
By adopting the method, the importance degree of each data in the vibration data sequence can be obtained.
Step S3, obtaining a loss tolerance value of each data based on the importance degree of each data and the normal fluctuation range of the data; and continuously adjusting the threshold value of the Fabry-Perot algorithm based on the occurrence frequency of each loss tolerance value, and determining the optimal threshold value.
The data in the vibration data sequence is at [ g ] 1 ,g 2 ]When in the state, the corresponding data is described as normal operation data, wherein g 1 G represents the lower limit value of the normal fluctuation range of the data 2 The method is characterized in that the method is used for representing the upper limit value of the normal fluctuation range of data, less information is contained in normal operation data, the principle of the Target Laplace-Prak algorithm is to approximate an original curve by keeping key data points, threshold parameters play a role in controlling the abstract degree in the algorithm, a smaller threshold can keep more details, and a larger threshold can compress the data points to a larger degree. The present embodiment obtains the importance degree of each data in the vibration data sequence, and next the present embodiment will obtain the loss tolerance value of each data based on the importance degree of each data and the normal fluctuation range of the data.
For the mth data in the vibration data sequence: taking the difference value between the upper limit value and the lower limit value of the normal fluctuation range of the data as the length of a normal fluctuation interval, and taking the product of a preset second super parameter and the length of the normal fluctuation interval as a first product; wherein, the preset second super parameter is greater than 0, and in this embodiment, the preset second super parameter is 1, and in a specific application, an implementer can set according to a specific situation; taking the product of the first product and the importance degree of the mth data as a loss tolerance value of the mth data.
By adopting the method, the loss tolerance value of each data in the vibration data sequence is obtained, the same loss tolerance value is used as the same loss tolerance value, the occurrence frequency of each loss tolerance value is counted, curve fitting is carried out on the occurrence frequency of all the loss tolerance values to obtain a curve, the curve obtained at the moment is recorded as a first curve, wherein the abscissa of the first curve is the loss tolerance value, and the ordinate is the occurrence frequency of the loss tolerance value; obtaining an extreme point on the first curve; the method for obtaining the extreme points is the prior art, and will not be described in detail here.
For the nth extreme point: taking the difference value between the loss tolerance value corresponding to the nth extreme point and the preset first value as the lower limit value of the association interval corresponding to the nth extreme point, and taking the sum of the loss tolerance value corresponding to the nth extreme point and the preset first value as the upper limit value of the association interval corresponding to the nth extreme point; acquiring an associated interval corresponding to an nth extreme point based on the lower limit value of the associated interval and the upper limit value of the associated interval; and taking the area surrounded by the association interval corresponding to the nth extreme point of the first curve and the transverse axis as the characteristic area corresponding to the nth extreme point. The specific acquisition method for presetting the first numerical value in the embodiment comprises the following steps: the ratio of the first product to the constant 15 may be set by the practitioner as the case may be in a particular application.
By adopting the method, the characteristic area corresponding to each extreme point can be obtained, and the loss tolerance values corresponding to all the extreme points are ordered according to the sequence from the large characteristic area to the small characteristic area, so that a loss tolerance value sequence is obtained. Taking the first element in the loss tolerance value sequence as a threshold value of a Fabry-Perot algorithm, and simplifying and adjusting data in the vibration data sequence by adopting the Fabry-Perot algorithm to obtain a simplified data sequence; the operation process of the Targelas-Puck algorithm is as follows: selecting a starting point P and an ending point Q on a vibration data sequence curve, and adding the starting point P and the ending point Q into a result point set; and calculating the distances from all points on the curve to the line segment PQ, and finding out the point M with the largest distance. If the distance of M is smaller than the threshold value of the Target-Puck algorithm, the whole curve is considered to be simplified enough, and the algorithm is ended; if the distance of M is greater than or equal to the threshold of the Proglas-Puck algorithm, M is added to the result point set. The curve is divided into two segments, one segment from the start point P to the point M and the other segment from the point M to the end point Q. The douglas-pock algorithm is applied recursively to the two curves, respectively. And combining the result point sets obtained by recursion to obtain a final simplified curve, wherein the data on the simplified curve form a simplified data sequence. Calculating the DTW distance between the vibration data sequence and the simplified data sequence, and taking the calculated negative correlation mapping value of the DTW distance as the matching degree between the vibration data sequence and the simplified data sequence, wherein the negative correlation mapping value of the DTW distance can be expressed by an exponential function, for example: taking a value of an exponential function taking a natural constant as a base and taking the negative DTW distance as an index as a matching degree; the inverse of the DTW distance may also be expressed as the inverse of the matching degree. The calculation method of the DTW distance is the prior art, and will not be described in detail here. And if the matching degree is greater than the matching threshold, taking the second element in the loss tolerance value sequence as the threshold of the Douglas-Prak algorithm, simplifying and adjusting the data in the vibration data sequence by adopting the Douglas-Prak algorithm to obtain a simplified data sequence, calculating the matching degree between the vibration data sequence and the simplified data sequence, and if the calculated matching degree is greater than the matching threshold, taking the third element in the loss tolerance value sequence as the threshold of the Douglas-Prak algorithm, and the like until the matching degree between the vibration data sequence and the simplified data sequence is less than or equal to the matching threshold, and taking the threshold of the Douglas-Prak algorithm corresponding at the moment as the optimal threshold.
So far, by adopting the method provided by the embodiment, the optimal threshold value is obtained.
And step S4, based on the optimal threshold, compressing the vibration data sequence by adopting a Target-Puck algorithm to obtain compressed data.
The method comprises the steps that an optimal threshold value is obtained, and then the method comprises the steps of compressing vibration data sequences corresponding to the operation process of industrial equipment by adopting a Tagella-Prak algorithm based on the optimal threshold value to obtain compressed data.
The method provided by the embodiment finishes the compression processing of the vibration data in the running process of the industrial equipment, and ensures the compression effect of the vibration data of the equipment.
According to the embodiment, firstly, the vibration data sequence corresponding to the operation process of the industrial equipment is obtained, then, the difference condition of data in windows corresponding to each data in the vibration data sequence is analyzed, the importance degree of each data is obtained, the loss tolerance value of each data is obtained according to the importance degree of the data, the loss tolerance value is limited, the vibration data sequence is simplified through the Target-Puck algorithm, the threshold value of the Target-Puck algorithm is continuously adjusted based on the simplified result, the optimal threshold value is further obtained, the data in the vibration data sequence is compressed by utilizing the optimal threshold value, and the optimal threshold value of the Target-Puck algorithm is adaptively determined by combining the change characteristics of the vibration data of the industrial equipment, so that the compression effect of the vibration data of the industrial equipment is ensured.
It should be noted that: the foregoing description of the preferred embodiments of the present invention is not intended to be limiting, but rather, any modifications, equivalents, improvements, etc. that fall within the principles of the present invention are intended to be included within the scope of the present invention.

Claims (8)

1. The data compression processing method based on big data is characterized by comprising the following steps:
acquiring a vibration data sequence corresponding to the operation process of industrial equipment;
respectively taking each data in the vibration data sequence as central data to construct a window corresponding to each data; obtaining a variance gain corresponding to each data according to the difference condition of the data in the window corresponding to each data; obtaining the importance degree of each data according to the difference value of each data and the normal fluctuation range of the data, the fluctuation condition of the data in the window corresponding to each data and the variance gain;
obtaining a loss tolerance value of each data based on the importance degree of each data and the normal fluctuation range of the data; continuously adjusting the threshold value of the Target-Puck algorithm based on the occurrence frequency of each loss tolerance value, and determining an optimal threshold value;
based on the optimal threshold, compressing the vibration data sequence by adopting a Target-Puck algorithm to obtain compressed data;
the obtaining the loss tolerance value of each data based on the importance degree of each data and the normal fluctuation range of the data comprises the following steps:
for the mth data in the vibration data sequence:
taking the difference value between the upper limit value and the lower limit value of the normal fluctuation range of the data as the length of a normal fluctuation interval, and taking the product of a preset second super parameter and the length of the normal fluctuation interval as a first product; wherein, the preset second super parameter is larger than 0;
taking the product of the first product and the importance degree of the mth data as a loss tolerance value of the mth data;
the step of continuously adjusting the threshold value of the morse-plack algorithm based on the occurrence frequency of each loss tolerance value to determine an optimal threshold value comprises the following steps:
performing curve fitting on the frequency of each loss tolerance value to obtain a first curve, wherein the abscissa of the first curve is the loss tolerance value, and the ordinate is the frequency of the loss tolerance value; obtaining an extreme point on the first curve;
acquiring a characteristic area corresponding to each extreme point; sorting the loss tolerance values corresponding to all the extreme points according to the sequence from the large characteristic area to the small characteristic area to obtain a loss tolerance value sequence;
and taking the first element in the loss tolerance value sequence as a threshold value of a Douglas-Prak algorithm, simplifying and adjusting data in the vibration data sequence by adopting the Douglas-Prak algorithm to obtain a simplified data sequence, calculating the matching degree between the vibration data sequence and the simplified data sequence, taking the second element in the loss tolerance value sequence as the threshold value of the Douglas-Prak algorithm if the matching degree is larger than the matching threshold value, simplifying and adjusting the data in the vibration data sequence by adopting the Douglas-Prak algorithm until the matching degree between the vibration data sequence and the simplified data sequence is smaller than or equal to the matching threshold value, and taking the threshold value of the Douglas-Prak algorithm corresponding at the moment as an optimal threshold value.
2. The method for data compression processing based on big data according to claim 1, wherein the obtaining the variance gain corresponding to each data according to the difference condition of the data in the window corresponding to each data comprises:
for the mth data in the vibration data sequence:
the variances of all data in the window corresponding to the mth data are marked as first variances, and the variances of all data except the mth data in the window corresponding to the mth data are marked as second variances;
and determining the absolute value of the difference between the first variance and the second variance as a variance gain corresponding to the mth data.
3. The method for compressing data based on big data according to claim 1, wherein the obtaining the importance degree of each data according to the difference value between each data and the normal fluctuation range of the data, the fluctuation condition of the data in the window corresponding to each data, and the variance gain comprises:
for the mth data in the vibration data sequence:
determining a deviation value corresponding to the mth data according to the difference between the mth data and the normal fluctuation range of the data; taking the ratio of the deviation value corresponding to the mth data to the maximum deviation value bearable by industrial equipment as the deviation degree of the mth data;
and calculating the importance degree of the mth data according to the deviation degree of the mth data and the variance gain corresponding to the mth data.
4. A data compression processing method based on big data according to claim 3, wherein determining the deviation value corresponding to the mth data based on the difference between the mth data and the normal fluctuation range of the data comprises:
if the m-th data is smaller than the lower limit value of the normal fluctuation range of the data, taking the difference value between the lower limit value of the normal fluctuation range of the data and the m-th data as a deviation value corresponding to the m-th data; if the m-th data is larger than or equal to the lower limit value of the normal fluctuation range of the data and smaller than or equal to the upper limit value of the normal fluctuation range of the data, enabling the deviation value corresponding to the m-th data to be 0; and if the mth data is larger than the upper limit value of the normal fluctuation range of the data, taking the difference value between the mth data and the upper limit value of the normal fluctuation range of the data as a deviation value corresponding to the mth data.
5. A data compression processing method based on big data according to claim 3, wherein the importance degree of the mth data is calculated using the following formula:
wherein G is m Is the importance degree of the mth data, f m For the deviation value corresponding to the mth data, f max Is the maximum difference value that can be tolerated by industrial equipment,for the first weight coefficient corresponding to the mth data,/th data>For the second weight coefficient corresponding to the mth data, Δd m For the variance gain corresponding to the mth data, exp () is an exponential function based on a natural constant.
6. The method for data compression processing based on big data according to claim 5, wherein the obtaining of the first weight coefficient and the second weight coefficient corresponding to the mth data includes:
if the variance gain corresponding to the mth data is greater than or equal to a preset variance gain threshold and the deviation degree of the mth data is 0, setting the first weight coefficient and the second weight coefficient corresponding to the mth data as basic weights;
if the variance gain corresponding to the mth data is smaller than the preset variance gain threshold and the deviation degree of the mth data is 0, carrying out negative correlation normalization processing on variances of all data in a window corresponding to the mth data to obtain a negative correlation normalization result, and recording the product of the negative correlation normalization result, the basic weight and the preset first super parameter as a first characteristic value; taking the sum of the basic weight and the first characteristic value as a first weight coefficient corresponding to the mth data, and taking the difference value between the basic weight and the first characteristic value as a second weight coefficient corresponding to the mth data; wherein, the preset first super parameter is larger than 0;
if the variance gain corresponding to the mth data is greater than or equal to a preset variance gain threshold, and the deviation degree of the mth data is not 0, marking the product of the basic weight, the deviation degree of the mth data and the preset first super parameter as a second characteristic value; taking the sum of the basic weight and the second characteristic value as a first weight coefficient corresponding to the mth data, and taking the difference value of the basic weight and the second characteristic value as a second weight coefficient corresponding to the mth data;
if the variance gain corresponding to the mth data is smaller than the preset variance gain threshold and the deviation degree of the mth data is not 0, taking the sum of the basic weight, the first characteristic value and the second characteristic value as a first weight coefficient corresponding to the mth data, marking the difference value between the basic weight and the first characteristic value as a first difference value, and taking the difference value between the first difference value and the second characteristic value as a second weight coefficient corresponding to the mth data.
7. The data compression processing method based on big data according to claim 1, wherein the obtaining of the feature area corresponding to each extreme point comprises:
for the nth extreme point:
taking the difference value between the loss tolerance value corresponding to the nth extreme point and the preset first value as the lower limit value of the association interval corresponding to the nth extreme point, and taking the sum of the loss tolerance value corresponding to the nth extreme point and the preset first value as the upper limit value of the association interval corresponding to the nth extreme point; acquiring an associated interval corresponding to an nth extreme point based on the lower limit value of the associated interval and the upper limit value of the associated interval; wherein the preset first value is greater than 0;
and taking the area surrounded by the association interval corresponding to the nth extreme point of the first curve and the transverse axis as the characteristic area corresponding to the nth extreme point.
8. The method for data compression processing based on big data according to claim 1, wherein the step of acquiring the vibration data sequence corresponding to the operation process of the industrial equipment comprises the steps of:
obtaining vibration data of each acquisition moment in the operation process of industrial equipment;
and sequencing the vibration data at all the acquisition moments according to the time sequence to obtain a vibration data sequence corresponding to the operation process of the industrial equipment.
CN202311441320.6A 2023-11-01 2023-11-01 Data compression processing method based on big data Active CN117150217B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311441320.6A CN117150217B (en) 2023-11-01 2023-11-01 Data compression processing method based on big data

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311441320.6A CN117150217B (en) 2023-11-01 2023-11-01 Data compression processing method based on big data

Publications (2)

Publication Number Publication Date
CN117150217A CN117150217A (en) 2023-12-01
CN117150217B true CN117150217B (en) 2024-01-26

Family

ID=88908632

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311441320.6A Active CN117150217B (en) 2023-11-01 2023-11-01 Data compression processing method based on big data

Country Status (1)

Country Link
CN (1) CN117150217B (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117459073B (en) * 2023-12-26 2024-03-05 大连亚明汽车部件股份有限公司 Intelligent management method for heat pump system operation data
CN117857648A (en) * 2024-03-04 2024-04-09 广东华宸建设工程质量检测有限公司 Big data-based construction engineering management cloud server communication method
CN117954037A (en) * 2024-03-26 2024-04-30 光大宏远(天津)技术有限公司 Psychological assessment data storage method and system

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101477687A (en) * 2009-01-22 2009-07-08 上海交通大学 Checkerboard angle point detection process under complex background
CN107644069A (en) * 2017-09-11 2018-01-30 千寻位置网络有限公司 High density Monitoring Data vacuates method
CN113032378A (en) * 2021-03-05 2021-06-25 北京工业大学 Ship behavior pattern mining method based on clustering algorithm and pattern mining
CN116505953A (en) * 2023-06-30 2023-07-28 湖南腾琨信息科技有限公司 Mass map data optimization compression processing method based on BIM and GIS

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101477687A (en) * 2009-01-22 2009-07-08 上海交通大学 Checkerboard angle point detection process under complex background
CN107644069A (en) * 2017-09-11 2018-01-30 千寻位置网络有限公司 High density Monitoring Data vacuates method
CN113032378A (en) * 2021-03-05 2021-06-25 北京工业大学 Ship behavior pattern mining method based on clustering algorithm and pattern mining
CN116505953A (en) * 2023-06-30 2023-07-28 湖南腾琨信息科技有限公司 Mass map data optimization compression processing method based on BIM and GIS

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
自动设置阈值的道格拉斯-普克压缩法;赵永清;山西煤炭管理干部学院学报;全文 *

Also Published As

Publication number Publication date
CN117150217A (en) 2023-12-01

Similar Documents

Publication Publication Date Title
CN117150217B (en) Data compression processing method based on big data
CN117407700B (en) Method for monitoring working environment in live working process
WO2021126079A1 (en) Method and apparatus for storing and querying time series data, and server and storage medium thereof
CN116592951B (en) Intelligent cable data acquisition method and system
CN116320042B (en) Internet of things terminal monitoring control system for edge calculation
CN117459072B (en) Data processing method for performance test of self-oxygen generating device
CN103346797B (en) A kind of real-time compression method for gear distress signal
CN109813269B (en) On-line calibration data sequence matching method for structure monitoring sensor
CN115329910A (en) Intelligent processing method for enterprise production emission data
CN117167903A (en) Artificial intelligence-based foreign matter fault detection method for heating ventilation equipment
EP2618269A1 (en) A method for processing of measurements from several sensors
CN102595138B (en) Method, device and terminal for image compression
CN117313020B (en) Data processing method of bearing type tension sensor
CN117459073B (en) Intelligent management method for heat pump system operation data
CN117272479B (en) High-strength geomembrane bursting strength prediction method based on load time course analysis
CN116776094B (en) Crystal oscillator temperature test data intelligent analysis storage system
CN117373600B (en) Medical detection vehicle data optimal storage method
CN116975503B (en) Soil erosion information management method and system
CN112700039B (en) Steady state detection and extraction method for load operation data of thermal power plant
CN109684970B (en) Window length determination method for moving principal component analysis of structural dynamic response
CN117851414B (en) Lightning arrester aging test data storage method and system
CN110543505B (en) Monitoring system based on time series data
CN114087940B (en) Use method of multifunctional vernier caliper
CN117390379B (en) On-line signal measuring device and confidence measuring device for signal characteristics
CN117470528B (en) Performance detection method of magnetorheological damper of steel reinforced concrete structure

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant