CN114816825B - Internet of things gateway data error correction method - Google Patents
Internet of things gateway data error correction method Download PDFInfo
- Publication number
- CN114816825B CN114816825B CN202210717724.2A CN202210717724A CN114816825B CN 114816825 B CN114816825 B CN 114816825B CN 202210717724 A CN202210717724 A CN 202210717724A CN 114816825 B CN114816825 B CN 114816825B
- Authority
- CN
- China
- Prior art keywords
- optimal time
- data
- time sequence
- unit
- optimal
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000000034 method Methods 0.000 title claims abstract description 41
- 238000004422 calculation algorithm Methods 0.000 claims abstract description 28
- 238000012706 support-vector machine Methods 0.000 claims abstract description 24
- 238000002372 labelling Methods 0.000 claims description 12
- 238000004364 calculation method Methods 0.000 claims description 10
- XLYOFNOQVPJJNP-UHFFFAOYSA-N water Substances O XLYOFNOQVPJJNP-UHFFFAOYSA-N 0.000 claims description 6
- 238000005457 optimization Methods 0.000 claims description 2
- 230000008569 process Effects 0.000 abstract description 4
- 238000007405 data analysis Methods 0.000 abstract description 2
- 230000002159 abnormal effect Effects 0.000 description 29
- 230000008859 change Effects 0.000 description 4
- 230000000694 effects Effects 0.000 description 4
- 238000007635 classification algorithm Methods 0.000 description 2
- 238000001514 detection method Methods 0.000 description 2
- 238000000605 extraction Methods 0.000 description 2
- 230000007246 mechanism Effects 0.000 description 2
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000005540 biological transmission Effects 0.000 description 1
- 238000004519 manufacturing process Methods 0.000 description 1
- 238000005065 mining Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/07—Responding to the occurrence of a fault, e.g. fault tolerance
- G06F11/0703—Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation
- G06F11/0793—Remedial or corrective actions
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/241—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
- G06F18/2411—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on the proximity to a decision surface, e.g. support vector machines
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D30/00—Reducing energy consumption in communication networks
- Y02D30/50—Reducing energy consumption in communication networks in wire-line communication networks, e.g. low power modes or reduced link rate
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Artificial Intelligence (AREA)
- Evolutionary Biology (AREA)
- Evolutionary Computation (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Bioinformatics & Computational Biology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Quality & Reliability (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
The invention relates to an error correction method for gateway data of the Internet of things, belonging to the technical field of big data analysis, and the method comprises the following steps: acquiring gateway sample data; dividing the sample data into a plurality of optimal time sequence units with equal length according to the optimal length numerical value; calculating the autocorrelation of each optimal time sequence unit and the normality of each optimal time sequence unit; determining the attention degree of each optimal time sequence unit according to the autocorrelation and the normal degree of each optimal time sequence unit; training a single-class support vector machine algorithm classifier by using the attention of each optimal time sequence unit, and correcting the error of the gateway data of the Internet of things by using the trained classifier. According to the method, the influence of different optimal time sequence units on the single-class support vector machine algorithm classifier in the training process is controlled according to the attention degree of each optimal time sequence unit, and the accuracy of the classifier is improved.
Description
Technical Field
The invention belongs to the technical field of big data analysis, and particularly relates to an error correction method for gateway data of an Internet of things.
Background
With the expansion of the application of the internet of things in actual life and production, the characteristic of taking data as the center is increasingly prominent. Whether the internet of things can be widely applied depends on extraction of useful information in gateway data to a certain extent, namely mining of the gateway data, and the data quality directly determines the extraction efficiency of the useful information and the correctness of final decision of the internet of things, so that the function realization and the user experience of an application scene are influenced. In order to be able to extract useful information in gateway data efficiently, the quality of the data needs to be improved.
In the scene of the internet of things, abnormal data can be generated due to factors such as unstable sensor performance, data transmission network faults, interference and damage caused by human or natural environments and the like, so that the data quality is rapidly reduced, and therefore, the identification of the abnormal data in the gateway data of the internet of things is particularly important.
The single-class support vector machine algorithm is an algorithm for detecting abnormal data, and can establish a single-classification algorithm of a data detection classifier only by normal data. However, when the classifier is trained, the sample possibly belonging to the abnormal data in the sample data may affect the classifier to learn the characteristics of the normal data, so that the accuracy of the classifier for detecting the abnormal data is low.
Disclosure of Invention
The invention provides an error correction method for gateway data of the Internet of things, and aims to solve the problem that sample data may belong to the class of sample data when a single-class support vector machine algorithm classifier is trained at present
The sample of the abnormal data can influence the classifier to learn the characteristics of the normal data, so that the accuracy of the classifier for detecting the abnormal data is low.
The invention discloses an error correction method for gateway data of the Internet of things, which adopts the following technical scheme: the method comprises the following steps:
acquiring single type sample data of a gateway;
dividing the sample data according to any length numerical value in a preset time length range to obtain a plurality of time sequence units with equal length, and forming time sequence data corresponding to the length numerical value by the plurality of time sequence units with equal length;
acquiring time series data corresponding to each length value within a preset time length range, fitting the time series data corresponding to each length value, determining an optimal length value according to a fitting result, and dividing the sample data into a plurality of optimal time series units with equal length according to the optimal length value;
calculating the autocorrelation of each optimal time-series unit;
converting all the obtained optimal time sequence units into a multidimensional space, wherein the dimension of the multidimensional space is equal to the optimal length value;
determining a neighboring data set of each of the optimal time series units within the multi-dimensional space centered on each of the optimal time series units and a radius of a numerical value determined from the sample data;
determining the normality of each optimal time sequence unit according to each optimal time sequence unit and the adjacent data set corresponding to each optimal time sequence unit;
determining the attention degree of each optimal time sequence unit according to the autocorrelation of each optimal time sequence unit and the normal degree of each optimal time sequence unit;
and training a single-class support vector machine algorithm classifier by using the attention of each optimal time sequence unit, and correcting the error of the gateway data of the Internet of things by using the trained classifier.
Further, the fitting the time series data corresponding to each length value and determining an optimal length value according to a fitting result includes:
fitting the time sequence data corresponding to each length value to obtain a fitting result corresponding to each length value;
when the fitting result corresponding to any length value is larger than the threshold value determined by the length value, marking the fitting result corresponding to the length value to obtain a marked fitting result;
judging the fitting result corresponding to each length value to obtain all the post-labeling fitting results, and selecting the maximum post-labeling fitting result from all the post-labeling fitting results;
and taking the length value corresponding to the maximum value of the marked fitting result as an optimal length value.
Further, said calculating an autocorrelation of each of said optimal time series units comprises:
respectively fitting the sample data contained in each optimal time sequence unit by using a least square method to obtain the autocorrelation of each optimal time sequence unit;
the autocorrelation calculation formula of each optimal time series unit is shown as the following formula:
wherein,is shown asAutocorrelation of each optimal time series unit;is shown asA total number of said sample data within an optimal time series unit;denotes the firstIn the unit of optimal time sequenceThe true value of the individual sample data;representing a linear equation fitted according to the least squares methodAnd predicting the predicted value of the first sample data in the optimal time sequence unit.
Further, said determining a neighboring data set of each said optimal time series unit within said multidimensional space centered at each said optimal time series unit and having a radius of a numerical value determined from said sample data comprises:
converting all the obtained optimal time sequence units into the multidimensional space to obtain a plurality of multidimensional coordinate points;
selecting any one of the optimal time sequence units to be recorded as a first optimal time sequence unit;
selecting the maximum value of the sample data and the minimum value of the sample data in the sample data, and calculating the difference value between the maximum value of the sample data and the minimum value of the sample data;
establishing a first multi-dimensional space geometry in the multi-dimensional space with the first optimal time series unit as a center and the difference value as a radius;
taking all the multi-dimensional coordinate points contained in the first multi-dimensional space geometrical body as an adjacent data set of the first optimal time sequence unit in the multi-dimensional space, and simultaneously calculating the density and the density center of the adjacent data set;
according to the determination method of the adjacent data sets of any optimal time sequence unit in the multidimensional space, the adjacent data sets of each optimal time sequence unit in the multidimensional space are determined, and the density center of each adjacent data set are calculated at the same time.
Further, the determining the degree of normality of each optimal time-series unit according to each optimal time-series unit and the adjacent data set corresponding to each optimal time-series unit includes:
calculating the distance between each optimal time sequence unit and the density center of the adjacent data set corresponding to each optimal time sequence unit;
and determining the normality degree of each optimal time sequence unit according to each distance and the density of the adjacent data sets corresponding to each distance.
Further, the calculation formula of the normality of each of the optimal time-series units is shown as follows:
wherein,is shown asNormality of each optimal time series unit;is shown asThe total number of the sample data in an optimal time sequence unit is also the dimension of the multidimensional space;is shown asIn the optimum time series unitDimension data;is shown asSecond of the density center of the adjacent data sets of the optimal time series unitDimension data;is shown asDistance between each optimal time series unit and the density center of the corresponding adjacent data set;denotes the firstIn adjacent data sets of an optimal time series unitThe first of the dataDimension data;is shown asThe total number of data contained in the adjacent data sets of the optimal time sequence units;is shown asDensity of adjacent data sets of the respective optimal time series units.
Further, the calculation formula of the attention of each of the optimal time series units is shown as follows:
wherein,is shown asAttention of the optimal time sequence unit;denotes the firstNormality of each optimal time series unit;is shown asAutocorrelation of each optimal time series unit;
when in useWhen the temperature of the water is higher than the set temperature,=when is coming into contact withWhen the temperature of the water is higher than the set temperature,=。
further, the training of the one-class support vector machine algorithm classifier by using the attention of each optimal time series unit comprises:
introducing the attention degree of each optimal time sequence unit into an optimization objective function of an OCSVM algorithm to obtain a decision function of a classifier belonging to a single-class support vector machine algorithm;
and training a single-class support vector machine algorithm classifier by using the decision function.
The invention has the beneficial effects that:
the OCSVM is a single classification algorithm which can construct an abnormal data classifier only by normal data, but when the classifier is trained, samples possibly belonging to the abnormal data in sample data influence the classifier to learn the characteristics of the normal data, so that the accuracy of the classifier in detecting the abnormal data is low. If the influence of the abnormal samples on the classifier is reduced, the classifier can better learn the characteristics of normal data, and the accuracy of detecting abnormal data by the classifier is improved.
For a small-scale internet of things application scenario adopting a heterogeneous deployment strategy, the internet of things gateway data has the following characteristics: 1) the gateway data are closely connected, have certain time correlation, and keep relatively stable in certain time, will not change sharply, and the relation between the adjacent gateway data is bigger. 2) The gateway of the internet of things uninterruptedly collects data in a specific mode, and the gateway data exist in a data flow mode along with the time, so that the gateway has the characteristic of dynamic property. Based on the above features, when performing gateway data anomaly detection, it needs to be converted into a time-series unit with a certain length, so the anomaly determination of the gateway data depends on two aspects: 1) correlation of the time series units themselves. If the time series data have the same change trend and have differences, the data are possibly abnormal data; 2) normality of time series units. Because the normal data and the abnormal data are different in forming mechanism, the abnormal data are far away from the normal data, and therefore the farther the data are from the cluster center, the higher the possibility of belonging to the abnormal data is.
Therefore, the invention provides an error correction method for gateway data of the Internet of things, which is improved based on a single-type support vector machine algorithm. The method comprises the steps of converting gateway sample data into time sequence units with a certain length, determining the attention degree of each time sequence unit according to the autocorrelation of each time sequence unit and the normality characteristic of each time sequence unit, controlling the influence of each time sequence unit on a single-class support vector machine algorithm classifier in the training process according to the attention degree, and improving the accuracy of the classifier.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to these drawings without creative efforts.
Fig. 1 is a schematic flowchart illustrating the general steps of an internet-of-things gateway data error correction method according to an embodiment of the present invention;
fig. 2 is a flowchart illustrating step S6 of the internet of things gateway data error correction method according to an embodiment of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be obtained by a person skilled in the art without making any creative effort based on the embodiments in the present invention, belong to the protection scope of the present invention.
As shown in fig. 1, an embodiment of a method for correcting error in gateway data of the internet of things includes:
and S1, acquiring the single type of sample data of the gateway.
The invention constructs a classifier to detect abnormal data through a single-class support vector machine algorithm. In order to ensure the accuracy of the classifier, the sample data is required to have better quality. Therefore, the classifier is trained by acquiring second-level data in the stable operation time period of the Internet of things as sample data, and the trained classifier is used for detecting abnormal data of the gateway of the Internet of things. Meanwhile, the invention acquires the sample data of a single type of the gateway. For example: in an application scene of the Internet of things, if the type of the sensor is a temperature sensor, the gateway temperature sample data is acquired by the method; if the type of sensor is a pressure sensor, then the gateway pressure sample data is obtained by the invention.
And S2, dividing the sample data according to any length numerical value in a preset time length range to obtain a plurality of time sequence units with equal length, and forming the time sequence data corresponding to the length numerical value by the plurality of time sequence units with equal length.
In the invention, because the sample data of the gateway is closely connected, has certain time correlation, is relatively stable within a certain time, and does not change rapidly, when the gateway data is detected to be abnormal, the change of the gateway data within a period of time needs to be ensured to be the same, and therefore, the gateway data needs to be converted into the time series data. When converting gateway data into time series data, the length of the time series data needs to be given, and the length needs to ensure that the autocorrelation of each time series data is large enough.
The preset time length range in the invention isSample data is expressed in terms ofAny length ofThe numerical value is divided to obtain a plurality of time sequence units with equal length, and the time sequence data corresponding to the numerical value with equal length is formed by the time sequence units with equal length. For example: if the length value is 5s, dividing the sample data according to the length value of 5s to obtain a plurality of time series units with the length of 5s, and forming time series data corresponding to the length value of 5s by all the time series units with the length of 5 s. If the time length numerical value is 10s, dividing the sample data according to the length numerical value of 10s to obtain a plurality of time series units with the length of 10s, and forming time series data corresponding to the length numerical value of 10s by all the time series units with the length of 10 s. Similarly, time series data corresponding to each length value in a preset time length range is obtained.
S3, acquiring time series data corresponding to each length numerical value in a preset time length range, fitting the time series data corresponding to each length numerical value, determining an optimal length numerical value according to a fitting result, and dividing the sample data into a plurality of optimal time series units with equal length according to the optimal length numerical value.
Fitting the time series data corresponding to each length numerical value and determining an optimal length numerical value according to a fitting result, wherein the fitting comprises the following steps: fitting the time sequence data corresponding to each length value to obtain a fitting result corresponding to each length value; when the fitting result corresponding to any length value is larger than the threshold value determined by the length value, marking the fitting result corresponding to the length value to obtain a marked fitting result; judging the fitting result corresponding to each length value to obtain all the post-labeling fitting results, and selecting the maximum post-labeling fitting result from all the post-labeling fitting results;
and taking the length value corresponding to the maximum value of the marked fitting result as an optimal length value.
According to the invention, time sequence data corresponding to each length value in a preset time length range is obtained. In the invention, the least square method is utilized to carry out time sequence data corresponding to each length numerical valueFitting to obtain a fitting result corresponding to each length value, wherein the length values of the time series units are usedIt shows that, since it has been explained in step S1 that the present invention obtains the second-level data in the internet of things smooth operation time period as sample data, the length value is used as the valueThe total number of the sample data in the time sequence unit obtained by dividing is also。
The specific calculation formula of the fitting result corresponding to any length value is shown in the following formula (1):
wherein,denotes the firstThe total number of said sample data contained within a time series unit, also representing the length value of the time series unit; n represents that the sample data is according toDividing the length to obtain n time sequence units;is as followsWithin a time series unitThe true value of each datum;derived from a linear formula fitted according to the least squares methodWithin a time series unitA predicted value of each datum;as contained in the first time series unitMean of individual data;is represented by a length valueA determined threshold value.
When linear fitting is carried out, the length value M of the time sequence unit limits the fitting effect, so that the requirements on the fitting effect are different for different length values M, and the threshold value of the fitting effect is set to be the length value MTherefore, when the length value is selected as M, only the fitting effect is greater thanThen, the fitting result corresponding to the length M is marked to obtain a marked fitting result.
In the same way, obtainAnd fitting the time series data corresponding to each length value in the whole range to obtain a plurality of fitting results, selecting the maximum value of the post-labeling fitting results from all the post-labeling fitting results, taking the length value corresponding to the maximum value of the post-labeling fitting results as an optimal length value, and dividing the sample data into a plurality of optimal time series units with equal length according to the optimal length value. Among them, the optimum length value in the present invention is usedIt is shown that, since it has been described in step S1 that the present invention obtains the data of the second level in the internet of things smooth operation time period as the sample data, the numerical value is calculated according to the optimal lengthThe total number of the sample data in the optimal time sequence unit obtained by division is also。
And S4, calculating the autocorrelation of each optimal time sequence unit.
Respectively fitting the sample data contained in each optimal time sequence unit by using a least square method to obtain the autocorrelation of each optimal time sequence unit;
the autocorrelation calculation formula of each optimal time series unit is shown as the following formula (2):
wherein,is shown asAutocorrelation of each optimal time series unit;is shown asA total number of said sample data within an optimal time series unit;is shown asIn the unit of optimal time sequenceThe true value of the individual sample data;representing a linear equation fitted according to the least squares methodIn the unit of optimal time sequenceA predicted value of individual sample data.
And S5, converting all the obtained optimal time sequence units into a multi-dimensional space, wherein the dimension of the multi-dimensional space is equal to the optimal length value.
In the invention, all the obtained optimal time sequence units are converted into a multidimensional space to obtain a plurality of multidimensional coordinate points. For example: if the optimum length value determined in the present invention isThen according to the optimum length valueThe total number of the sample data in the divided optimal time-series unit is alsoAnd simultaneously converting all the obtained optimal time sequence units intoObtaining a plurality ofAnd (5) dimensional coordinate points.
And S6, taking each optimal time sequence unit as a center and taking the numerical value determined according to the sample data as a radius, and determining an adjacent data set of each optimal time sequence unit in the multidimensional space.
Wherein, as shown in fig. 2: s61, converting all the obtained optimal time sequence units into the multidimensional space to obtain a plurality of multidimensional coordinate points; s62, selecting any one of the optimal time sequence units as a first optimal time sequence unit; s63, selecting the maximum value of the sample data in the sample data and the minimum value of the sample data, and calculating the difference value between the maximum value of the sample data and the minimum value of the sample data; s64, establishing a first multi-dimensional space geometrical body in the multi-dimensional space by taking the first optimal time sequence unit as a center and the difference value as a radius; s65, taking all the multi-dimensional coordinate points contained in the first multi-dimensional space geometry as an adjacent data set of the first optimal time sequence unit in the multi-dimensional space, and simultaneously calculating the density and the density center of the adjacent data set; and S66, according to the determination method of the adjacent data sets of any optimal time sequence unit in the multidimensional space, determining the adjacent data sets of each optimal time sequence unit in the multidimensional space, and simultaneously calculating the density and the density center of each adjacent data set.
In the invention, each optimal time sequence unit is mapped into a multidimensional space. Wherein, the firstAt the optimum timeFor data in inter-sequence unitsIt is shown that the process of the present invention,. To be provided withIs taken as the center toBeing side-longInclusion in dimensional space geometrySample data of each sample to be testedIs described as the firstThe optimal time series dataIn thatAdjacent datasets in dimensional space. WhereinAnd the difference value between the maximum value of the sample data and the minimum value of the sample data is a single type of gateway.
And S7, determining the normality of each optimal time sequence unit according to each optimal time sequence unit and the adjacent data set corresponding to each optimal time sequence unit.
Because only a small part of the abnormal data is in the acquired gateway data, and because the abnormal data and the normal data have different forming mechanisms, the abnormal data are far away from the normal data, and therefore, the normal degree of each time series unit is judged by judging the density of the adjacent data of each time series unit in a certain neighborhood and the distance between each time series unit and the density center of the adjacent data set in the sample data. If the density of the adjacent data set of a time sequence unit in the multidimensional space is larger, and the distance between the time sequence unit and the density center of the adjacent data set is smaller, the probability that the time sequence unit belongs to normal data is larger, and the probability that the time sequence unit belongs to abnormal data is smaller.
The calculation formula of the normality of each optimal time series unit in the invention is shown as the following formula (3):
wherein,is shown asNormality of each optimal time series unit;denotes the firstThe total number of the sample data in the optimal time sequence unit is also the dimension of the multidimensional space;denotes the firstIn the optimum time series unitDimension data;denotes the firstThe first of the density centers of adjacent data sets of the optimal time series unitDimension data;is shown asDistance between each optimal time series unit and the density center of the corresponding adjacent data set;is shown asIn adjacent data sets of an optimal time series unitA first of the dataDimension data;denotes the firstThe total number of data contained in the adjacent data sets of the optimal time sequence units;is shown asThe density of adjacent data sets of the optimal time series unit is larger, the time series unitThe greater the probability of belonging to normal data, the smaller the probability of belonging to abnormal data.
And S8, determining the attention of each optimal time sequence unit according to the autocorrelation of each optimal time sequence unit and the normality of each optimal time sequence unit.
Autocorrelation of optimal time series unitsThe greater the degree of normalityThe larger the probability that the data belongs to normal data is, the larger the attention is; autocorrelation of optimal time series dataThe smaller, the degree of normalityThe smaller the probability that it belongs to abnormal data, the smaller the attention. By setting a larger attention degree for normal data and a smaller attention degree for abnormal data, the influence of a sample which may be the abnormal data on the classifier is reduced when the classifier is trained, so that the classifier learns more characteristics of the normal data, and the accuracy of detecting the abnormal data by the classifier is improved.
The calculation formula of the attention of each optimal time series unit is shown as the following formula (4):
wherein,representing the attention of the time-series unit;representing the degree of normality of the time series unit;representing the autocorrelation of a time series of units;
when in useWhen the utility model is used, the water is discharged,=when is coming into contact withWhen the temperature of the water is higher than the set temperature,=。
and S9, training a single-class support vector machine algorithm classifier by using the attention of each optimal time sequence unit, and correcting error of the gateway data of the Internet of things by using the trained classifier.
According to the method, the attention of each optimal time sequence unit is utilized to train the single-class support vector machine algorithm classifier, when the single-class support vector machine algorithm classifier is trained, the attention of each optimal time sequence unit is firstly introduced into an optimized objective function of an OCSVM algorithm to obtain a decision function belonging to the single-class support vector machine algorithm classifier, and the decision function is utilized to train the single-class support vector machine algorithm classifier.
When the trained classifier is used for correcting the data to be detected of the gateway of the Internet of things, the type of the data to be detected is ensured to be the same as the type of sample data used when the single-class support vector machine algorithm classifier is trained. Dividing the data to be detected into a plurality of optimal time sequence units with equal length according to the optimal length numerical value, inputting the optimal time sequence units with equal length into a decision function of a single-class support vector machine algorithm classifier, and outputting the numerical value of the decision function. And judging whether the data to be detected is normal data or not according to the numerical value of the decision function.
If the numerical value of the decision function is 1, indicating that the time sequence data to be detected is a normal sample; and if the numerical value of the decision function is-1, indicating that the time sequence data to be detected is an abnormal sample.
In summary, the invention provides an error correction method for gateway data of the internet of things, which is improved based on a single-type support vector machine algorithm. The method comprises the steps of converting gateway sample data into time sequence units with a certain length, determining the attention degree of each time sequence unit according to the autocorrelation of each time sequence unit and the normality characteristic of each time sequence unit, controlling the influence of each time sequence unit on a single-class support vector machine algorithm classifier in the training process according to the attention degree, and improving the accuracy of the classifier.
The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents, improvements and the like that fall within the spirit and principle of the present invention are intended to be included therein.
Claims (8)
1. An error correction method for gateway data of an internet of things is characterized by comprising the following steps:
acquiring single type sample data of a gateway;
dividing the sample data according to any length value in a preset time length range to obtain a plurality of time sequence units with equal length, and forming time sequence data corresponding to the length value by the plurality of time sequence units with equal length;
acquiring time series data corresponding to each length value within a preset time length range, fitting the time series data corresponding to each length value, determining an optimal length value according to a fitting result, and dividing the sample data into a plurality of optimal time series units with equal length according to the optimal length value;
calculating the autocorrelation of each optimal time-series unit;
converting all the obtained optimal time sequence units into a multidimensional space, wherein the dimension of the multidimensional space is equal to the optimal length value;
determining a neighboring data set of each of the optimal time series units within the multi-dimensional space centered on each of the optimal time series units and having a radius of a numerical value determined from the sample data;
determining the normality of each optimal time sequence unit according to each optimal time sequence unit and the adjacent data set corresponding to each optimal time sequence unit;
determining the attention degree of each optimal time sequence unit according to the autocorrelation of each optimal time sequence unit and the normal degree of each optimal time sequence unit;
and training a single-class support vector machine algorithm classifier by using the attention of each optimal time sequence unit, and correcting the error of the gateway data of the Internet of things by using the trained classifier.
2. The internet of things gateway data error correction method according to claim 1, wherein fitting the time series data corresponding to each length value and determining an optimal length value according to a fitting result includes:
fitting the time sequence data corresponding to each length value to obtain a fitting result corresponding to each length value;
when the fitting result corresponding to any length value is larger than the threshold value determined by the length value, marking the fitting result corresponding to the length value to obtain a marked fitting result;
judging the fitting result corresponding to each length value to obtain all the post-labeling fitting results, and selecting the maximum post-labeling fitting result from all the post-labeling fitting results;
and taking the length value corresponding to the maximum value of the marked fitting result as an optimal length value.
3. The method for correcting the error of the gateway data of the internet of things according to claim 1, wherein the calculating the autocorrelation of each optimal time sequence unit comprises:
respectively fitting the sample data contained in each optimal time sequence unit by using a least square method to obtain the autocorrelation of each optimal time sequence unit;
the autocorrelation calculation formula of each optimal time series unit is shown as the following formula:
wherein,is shown asAutocorrelation of each optimal time series unit;is shown asA total number of said sample data within an optimal time series unit;is shown asWithin an optimal time series unitThe true value of the individual sample data;representing a linear equation fitted according to the least squares methodIn the unit of optimal time sequenceA predicted value of individual sample data.
4. The method according to claim 1, wherein the determining the neighboring data set of each optimal time-series unit in the multidimensional space with each optimal time-series unit as a center and a numerical value determined according to the sample data as a radius comprises:
converting all the obtained optimal time sequence units into the multidimensional space to obtain a plurality of multidimensional coordinate points;
selecting any one of the optimal time sequence units to be recorded as a first optimal time sequence unit;
selecting the maximum value of the sample data and the minimum value of the sample data in the sample data, and calculating the difference value between the maximum value of the sample data and the minimum value of the sample data;
establishing a first multi-dimensional space geometry within the multi-dimensional space centered on the first optimal time series unit and with the difference as a radius;
taking all the multi-dimensional coordinate points contained in the first multi-dimensional space geometric body as an adjacent data set of the first optimal time sequence unit in the multi-dimensional space, and simultaneously calculating the density and the density center of the adjacent data set;
according to the determination method of the adjacent data sets of any optimal time sequence unit in the multidimensional space, the adjacent data sets of each optimal time sequence unit in the multidimensional space are determined, and the density center of each adjacent data set are calculated at the same time.
5. The method for correcting the error of the gateway data of the internet of things according to claim 4, wherein the determining the normality of each optimal time-series unit according to each optimal time-series unit and the adjacent data set corresponding to each optimal time-series unit comprises:
calculating the distance between each optimal time sequence unit and the density center of the adjacent data set corresponding to each optimal time sequence unit;
and determining the normality degree of each optimal time sequence unit according to each distance and the density of the adjacent data sets corresponding to each distance.
6. The method for correcting error in gateway data of the internet of things as claimed in claim 5, wherein the calculation formula of the normality of each optimal time series unit is as follows:
wherein,is shown asNormality of each optimal time series unit;is shown asThe total number of the sample data in the optimal time sequence unit is also the dimension of the multidimensional space;denotes the firstIn the optimum time series unitDimension data;is shown asThe first of the density centers of adjacent data sets of the optimal time series unitDimension data;is shown asDistance between each optimal time series unit and the density center of the corresponding adjacent data set;is shown asIn adjacent data sets of an optimal time series unitThe first of the dataDimension data;is shown asThe total number of data contained in the adjacent data sets of the optimal time sequence units;denotes the firstDensity of adjacent data sets of the respective optimal time series units.
7. The method for correcting the error of the gateway data of the internet of things according to claim 1, wherein a calculation formula of the attention of each optimal time sequence unit is shown as follows:
wherein,is shown asAttention of the optimal time sequence unit;is shown asNormality of each optimal time series unit;denotes the firstAutocorrelation of each optimal time series unit;
8. the method for correcting errors in gateway data of the internet of things according to claim 1, wherein training a one-class support vector machine algorithm classifier by using the attention of each optimal time series unit comprises:
introducing the attention degree of each optimal time sequence unit into an optimization objective function of an OCSVM algorithm to obtain a decision function of a classifier belonging to a single-class support vector machine algorithm;
and training a single-class support vector machine algorithm classifier by using the decision function.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210717724.2A CN114816825B (en) | 2022-06-23 | 2022-06-23 | Internet of things gateway data error correction method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210717724.2A CN114816825B (en) | 2022-06-23 | 2022-06-23 | Internet of things gateway data error correction method |
Publications (2)
Publication Number | Publication Date |
---|---|
CN114816825A CN114816825A (en) | 2022-07-29 |
CN114816825B true CN114816825B (en) | 2022-09-09 |
Family
ID=82521917
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202210717724.2A Active CN114816825B (en) | 2022-06-23 | 2022-06-23 | Internet of things gateway data error correction method |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN114816825B (en) |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2018163342A1 (en) * | 2017-03-09 | 2018-09-13 | 日本電気株式会社 | Abnormality detection device, abnormality detection method and abnormality detection program |
CN110807468A (en) * | 2019-09-19 | 2020-02-18 | 平安科技(深圳)有限公司 | Method, device, equipment and storage medium for detecting abnormal mails |
CN111737099A (en) * | 2020-06-09 | 2020-10-02 | 国网电力科学研究院有限公司 | Data center anomaly detection method and device based on Gaussian distribution |
CN112148955A (en) * | 2020-10-22 | 2020-12-29 | 南京航空航天大学 | Method and system for detecting abnormal time sequence data of Internet of things |
CN112381180A (en) * | 2020-12-09 | 2021-02-19 | 杭州拓深科技有限公司 | Power equipment fault monitoring method based on mutual reconstruction single-class self-encoder |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106506556B (en) * | 2016-12-29 | 2019-11-19 | 北京神州绿盟信息安全科技股份有限公司 | A kind of network flow abnormal detecting method and device |
-
2022
- 2022-06-23 CN CN202210717724.2A patent/CN114816825B/en active Active
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2018163342A1 (en) * | 2017-03-09 | 2018-09-13 | 日本電気株式会社 | Abnormality detection device, abnormality detection method and abnormality detection program |
CN110807468A (en) * | 2019-09-19 | 2020-02-18 | 平安科技(深圳)有限公司 | Method, device, equipment and storage medium for detecting abnormal mails |
CN111737099A (en) * | 2020-06-09 | 2020-10-02 | 国网电力科学研究院有限公司 | Data center anomaly detection method and device based on Gaussian distribution |
CN112148955A (en) * | 2020-10-22 | 2020-12-29 | 南京航空航天大学 | Method and system for detecting abnormal time sequence data of Internet of things |
CN112381180A (en) * | 2020-12-09 | 2021-02-19 | 杭州拓深科技有限公司 | Power equipment fault monitoring method based on mutual reconstruction single-class self-encoder |
Non-Patent Citations (2)
Title |
---|
Smart car parking: Temporal clustering and anomaly detection in urban car parking;Yanxu Zheng等;《2014 IEEE Ninth International Conference on Intelligent Sensors, Sensor Networks and Information Processing (ISSNIP)》;20140609;第1-6页 * |
入侵检测中的机器学习方法及其应用研究;李战春;《中国博士学位论文全文数据库 信息科技辑》;20090515(第05期);第I139-24页 * |
Also Published As
Publication number | Publication date |
---|---|
CN114816825A (en) | 2022-07-29 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110309886B (en) | Wireless sensor high-dimensional data real-time anomaly detection method based on deep learning | |
CN108734359B (en) | Wind power prediction data preprocessing method | |
CN110224771B (en) | Spectrum sensing method and device based on BP neural network and information geometry | |
CN110826642A (en) | Unsupervised anomaly detection method for sensor data | |
CN114861788A (en) | Load abnormity detection method and system based on DBSCAN clustering | |
CN113297723B (en) | Mean shift-grey correlation analysis-based optimization method for electric spindle temperature measurement point | |
CN115878603A (en) | Water quality missing data interpolation algorithm based on K nearest neighbor algorithm and GAN network | |
CN114153888A (en) | Abnormal value detection method and device for time series data | |
CN112329944B (en) | Data flow concept drift detection method based on historical model diversity | |
CN118013323B (en) | Driving motor state analysis method and system for large-caliber electric half ball valve | |
CN111314910A (en) | Novel wireless sensor network abnormal data detection method for mapping isolation forest | |
CN115374851A (en) | Gas data anomaly detection method and device | |
CN114816825B (en) | Internet of things gateway data error correction method | |
CN115577246A (en) | Method for detecting anti-vibration performance of gas cylinder protective cover | |
CN113746813B (en) | Network attack detection system and method based on two-stage learning model | |
Wang et al. | Fault detection for the class imbalance problem in semiconductor manufacturing processes | |
US20230126258A1 (en) | Machine learning device, method for generating learning models, and program | |
CN115130496A (en) | Pipeline pressure signal anomaly detection method based on Bagging and RM-LOF integrated single classifier | |
CN111354019A (en) | Visual tracking failure detection system based on neural network and training method thereof | |
CN117909852B (en) | Monitoring data state division method for hydraulic loop ecological data analysis | |
CN111160391A (en) | Space division-based rapid relative density noise detection method and storage medium | |
CN114936203B (en) | Method based on time sequence data and business data fusion analysis | |
CN118133059B (en) | Intelligent security risk detection method and system based on digital twinning | |
CN112085053B (en) | Data drift discrimination method and device based on nearest neighbor method | |
CN112000705B (en) | Unbalanced data stream mining method based on active drift detection |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant | ||
PE01 | Entry into force of the registration of the contract for pledge of patent right | ||
PE01 | Entry into force of the registration of the contract for pledge of patent right |
Denomination of invention: A Data Error Correction Method for IoT Gateway Effective date of registration: 20231017 Granted publication date: 20220909 Pledgee: Bank of Hankou Limited by Share Ltd. Financial Services Center Pledgor: Optical Valley Technology Co.,Ltd. Registration number: Y2023980061529 |