CN114816825B - Internet of things gateway data error correction method - Google Patents

Internet of things gateway data error correction method Download PDF

Info

Publication number
CN114816825B
CN114816825B CN202210717724.2A CN202210717724A CN114816825B CN 114816825 B CN114816825 B CN 114816825B CN 202210717724 A CN202210717724 A CN 202210717724A CN 114816825 B CN114816825 B CN 114816825B
Authority
CN
China
Prior art keywords
optimal time
data
time sequence
unit
optimal
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202210717724.2A
Other languages
Chinese (zh)
Other versions
CN114816825A (en
Inventor
蔡黔江
严可达
许大为
侯金彪
占浩
刘强
涂杰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Optical Valley Technology Co ltd
Original Assignee
Optical Valley Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Optical Valley Technology Co ltd filed Critical Optical Valley Technology Co ltd
Priority to CN202210717724.2A priority Critical patent/CN114816825B/en
Publication of CN114816825A publication Critical patent/CN114816825A/en
Application granted granted Critical
Publication of CN114816825B publication Critical patent/CN114816825B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/0703Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation
    • G06F11/0793Remedial or corrective actions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2411Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on the proximity to a decision surface, e.g. support vector machines
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D30/00Reducing energy consumption in communication networks
    • Y02D30/50Reducing energy consumption in communication networks in wire-line communication networks, e.g. low power modes or reduced link rate

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Quality & Reliability (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention relates to an error correction method for gateway data of the Internet of things, belonging to the technical field of big data analysis, and the method comprises the following steps: acquiring gateway sample data; dividing the sample data into a plurality of optimal time sequence units with equal length according to the optimal length numerical value; calculating the autocorrelation of each optimal time sequence unit and the normality of each optimal time sequence unit; determining the attention degree of each optimal time sequence unit according to the autocorrelation and the normal degree of each optimal time sequence unit; training a single-class support vector machine algorithm classifier by using the attention of each optimal time sequence unit, and correcting the error of the gateway data of the Internet of things by using the trained classifier. According to the method, the influence of different optimal time sequence units on the single-class support vector machine algorithm classifier in the training process is controlled according to the attention degree of each optimal time sequence unit, and the accuracy of the classifier is improved.

Description

Error correction method for gateway data of Internet of things
Technical Field
The invention belongs to the technical field of big data analysis, and particularly relates to an error correction method for gateway data of an Internet of things.
Background
With the expansion of the application of the internet of things in actual life and production, the characteristic of taking data as the center is increasingly prominent. Whether the internet of things can be widely applied depends on extraction of useful information in gateway data to a certain extent, namely mining of the gateway data, and the data quality directly determines the extraction efficiency of the useful information and the correctness of final decision of the internet of things, so that the function realization and the user experience of an application scene are influenced. In order to be able to extract useful information in gateway data efficiently, the quality of the data needs to be improved.
In the scene of the internet of things, abnormal data can be generated due to factors such as unstable sensor performance, data transmission network faults, interference and damage caused by human or natural environments and the like, so that the data quality is rapidly reduced, and therefore, the identification of the abnormal data in the gateway data of the internet of things is particularly important.
The single-class support vector machine algorithm is an algorithm for detecting abnormal data, and can establish a single-classification algorithm of a data detection classifier only by normal data. However, when the classifier is trained, the sample possibly belonging to the abnormal data in the sample data may affect the classifier to learn the characteristics of the normal data, so that the accuracy of the classifier for detecting the abnormal data is low.
Disclosure of Invention
The invention provides an error correction method for gateway data of the Internet of things, and aims to solve the problem that sample data may belong to the class of sample data when a single-class support vector machine algorithm classifier is trained at present
The sample of the abnormal data can influence the classifier to learn the characteristics of the normal data, so that the accuracy of the classifier for detecting the abnormal data is low.
The invention discloses an error correction method for gateway data of the Internet of things, which adopts the following technical scheme: the method comprises the following steps:
acquiring single type sample data of a gateway;
dividing the sample data according to any length numerical value in a preset time length range to obtain a plurality of time sequence units with equal length, and forming time sequence data corresponding to the length numerical value by the plurality of time sequence units with equal length;
acquiring time series data corresponding to each length value within a preset time length range, fitting the time series data corresponding to each length value, determining an optimal length value according to a fitting result, and dividing the sample data into a plurality of optimal time series units with equal length according to the optimal length value;
calculating the autocorrelation of each optimal time-series unit;
converting all the obtained optimal time sequence units into a multidimensional space, wherein the dimension of the multidimensional space is equal to the optimal length value;
determining a neighboring data set of each of the optimal time series units within the multi-dimensional space centered on each of the optimal time series units and a radius of a numerical value determined from the sample data;
determining the normality of each optimal time sequence unit according to each optimal time sequence unit and the adjacent data set corresponding to each optimal time sequence unit;
determining the attention degree of each optimal time sequence unit according to the autocorrelation of each optimal time sequence unit and the normal degree of each optimal time sequence unit;
and training a single-class support vector machine algorithm classifier by using the attention of each optimal time sequence unit, and correcting the error of the gateway data of the Internet of things by using the trained classifier.
Further, the fitting the time series data corresponding to each length value and determining an optimal length value according to a fitting result includes:
fitting the time sequence data corresponding to each length value to obtain a fitting result corresponding to each length value;
when the fitting result corresponding to any length value is larger than the threshold value determined by the length value, marking the fitting result corresponding to the length value to obtain a marked fitting result;
judging the fitting result corresponding to each length value to obtain all the post-labeling fitting results, and selecting the maximum post-labeling fitting result from all the post-labeling fitting results;
and taking the length value corresponding to the maximum value of the marked fitting result as an optimal length value.
Further, said calculating an autocorrelation of each of said optimal time series units comprises:
respectively fitting the sample data contained in each optimal time sequence unit by using a least square method to obtain the autocorrelation of each optimal time sequence unit;
the autocorrelation calculation formula of each optimal time series unit is shown as the following formula:
Figure 807223DEST_PATH_IMAGE001
wherein,
Figure 103076DEST_PATH_IMAGE002
is shown as
Figure 730366DEST_PATH_IMAGE003
Autocorrelation of each optimal time series unit;
Figure 707549DEST_PATH_IMAGE004
is shown as
Figure 166212DEST_PATH_IMAGE003
A total number of said sample data within an optimal time series unit;
Figure 836228DEST_PATH_IMAGE005
denotes the first
Figure 419656DEST_PATH_IMAGE003
In the unit of optimal time sequence
Figure 466110DEST_PATH_IMAGE006
The true value of the individual sample data;
Figure 982542DEST_PATH_IMAGE007
representing a linear equation fitted according to the least squares method
Figure 626056DEST_PATH_IMAGE003
And predicting the predicted value of the first sample data in the optimal time sequence unit.
Further, said determining a neighboring data set of each said optimal time series unit within said multidimensional space centered at each said optimal time series unit and having a radius of a numerical value determined from said sample data comprises:
converting all the obtained optimal time sequence units into the multidimensional space to obtain a plurality of multidimensional coordinate points;
selecting any one of the optimal time sequence units to be recorded as a first optimal time sequence unit;
selecting the maximum value of the sample data and the minimum value of the sample data in the sample data, and calculating the difference value between the maximum value of the sample data and the minimum value of the sample data;
establishing a first multi-dimensional space geometry in the multi-dimensional space with the first optimal time series unit as a center and the difference value as a radius;
taking all the multi-dimensional coordinate points contained in the first multi-dimensional space geometrical body as an adjacent data set of the first optimal time sequence unit in the multi-dimensional space, and simultaneously calculating the density and the density center of the adjacent data set;
according to the determination method of the adjacent data sets of any optimal time sequence unit in the multidimensional space, the adjacent data sets of each optimal time sequence unit in the multidimensional space are determined, and the density center of each adjacent data set are calculated at the same time.
Further, the determining the degree of normality of each optimal time-series unit according to each optimal time-series unit and the adjacent data set corresponding to each optimal time-series unit includes:
calculating the distance between each optimal time sequence unit and the density center of the adjacent data set corresponding to each optimal time sequence unit;
and determining the normality degree of each optimal time sequence unit according to each distance and the density of the adjacent data sets corresponding to each distance.
Further, the calculation formula of the normality of each of the optimal time-series units is shown as follows:
Figure 962359DEST_PATH_IMAGE008
wherein,
Figure 546924DEST_PATH_IMAGE009
is shown as
Figure 917863DEST_PATH_IMAGE003
Normality of each optimal time series unit;
Figure 664102DEST_PATH_IMAGE004
is shown as
Figure 753281DEST_PATH_IMAGE003
The total number of the sample data in an optimal time sequence unit is also the dimension of the multidimensional space;
Figure 875957DEST_PATH_IMAGE010
is shown as
Figure 366982DEST_PATH_IMAGE003
In the optimum time series unit
Figure 549701DEST_PATH_IMAGE006
Dimension data;
Figure 860597DEST_PATH_IMAGE011
is shown as
Figure 786965DEST_PATH_IMAGE003
Second of the density center of the adjacent data sets of the optimal time series unit
Figure 132495DEST_PATH_IMAGE006
Dimension data;
Figure 282854DEST_PATH_IMAGE012
is shown as
Figure 81046DEST_PATH_IMAGE003
Distance between each optimal time series unit and the density center of the corresponding adjacent data set;
Figure 545525DEST_PATH_IMAGE013
denotes the first
Figure 745562DEST_PATH_IMAGE003
In adjacent data sets of an optimal time series unit
Figure 270085DEST_PATH_IMAGE014
The first of the data
Figure 352310DEST_PATH_IMAGE006
Dimension data;
Figure 620480DEST_PATH_IMAGE015
is shown as
Figure 471762DEST_PATH_IMAGE003
The total number of data contained in the adjacent data sets of the optimal time sequence units;
Figure 167185DEST_PATH_IMAGE016
is shown as
Figure 674390DEST_PATH_IMAGE003
Density of adjacent data sets of the respective optimal time series units.
Further, the calculation formula of the attention of each of the optimal time series units is shown as follows:
Figure 542989DEST_PATH_IMAGE017
wherein,
Figure 452039DEST_PATH_IMAGE018
is shown as
Figure 583943DEST_PATH_IMAGE003
Attention of the optimal time sequence unit;
Figure 578444DEST_PATH_IMAGE009
denotes the first
Figure 188417DEST_PATH_IMAGE003
Normality of each optimal time series unit;
Figure 951973DEST_PATH_IMAGE002
is shown as
Figure 272357DEST_PATH_IMAGE003
Autocorrelation of each optimal time series unit;
wherein,
Figure 754154DEST_PATH_IMAGE019
for the judgment function, the specific rule of the judgment function is as follows:
Figure 167818DEST_PATH_IMAGE020
when in use
Figure 848198DEST_PATH_IMAGE021
When the temperature of the water is higher than the set temperature,
Figure 321904DEST_PATH_IMAGE019
=
Figure 25418DEST_PATH_IMAGE022
when is coming into contact with
Figure 242773DEST_PATH_IMAGE023
When the temperature of the water is higher than the set temperature,
Figure 246501DEST_PATH_IMAGE019
=
Figure 625530DEST_PATH_IMAGE024
further, the training of the one-class support vector machine algorithm classifier by using the attention of each optimal time series unit comprises:
introducing the attention degree of each optimal time sequence unit into an optimization objective function of an OCSVM algorithm to obtain a decision function of a classifier belonging to a single-class support vector machine algorithm;
and training a single-class support vector machine algorithm classifier by using the decision function.
The invention has the beneficial effects that:
the OCSVM is a single classification algorithm which can construct an abnormal data classifier only by normal data, but when the classifier is trained, samples possibly belonging to the abnormal data in sample data influence the classifier to learn the characteristics of the normal data, so that the accuracy of the classifier in detecting the abnormal data is low. If the influence of the abnormal samples on the classifier is reduced, the classifier can better learn the characteristics of normal data, and the accuracy of detecting abnormal data by the classifier is improved.
For a small-scale internet of things application scenario adopting a heterogeneous deployment strategy, the internet of things gateway data has the following characteristics: 1) the gateway data are closely connected, have certain time correlation, and keep relatively stable in certain time, will not change sharply, and the relation between the adjacent gateway data is bigger. 2) The gateway of the internet of things uninterruptedly collects data in a specific mode, and the gateway data exist in a data flow mode along with the time, so that the gateway has the characteristic of dynamic property. Based on the above features, when performing gateway data anomaly detection, it needs to be converted into a time-series unit with a certain length, so the anomaly determination of the gateway data depends on two aspects: 1) correlation of the time series units themselves. If the time series data have the same change trend and have differences, the data are possibly abnormal data; 2) normality of time series units. Because the normal data and the abnormal data are different in forming mechanism, the abnormal data are far away from the normal data, and therefore the farther the data are from the cluster center, the higher the possibility of belonging to the abnormal data is.
Therefore, the invention provides an error correction method for gateway data of the Internet of things, which is improved based on a single-type support vector machine algorithm. The method comprises the steps of converting gateway sample data into time sequence units with a certain length, determining the attention degree of each time sequence unit according to the autocorrelation of each time sequence unit and the normality characteristic of each time sequence unit, controlling the influence of each time sequence unit on a single-class support vector machine algorithm classifier in the training process according to the attention degree, and improving the accuracy of the classifier.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to these drawings without creative efforts.
Fig. 1 is a schematic flowchart illustrating the general steps of an internet-of-things gateway data error correction method according to an embodiment of the present invention;
fig. 2 is a flowchart illustrating step S6 of the internet of things gateway data error correction method according to an embodiment of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be obtained by a person skilled in the art without making any creative effort based on the embodiments in the present invention, belong to the protection scope of the present invention.
As shown in fig. 1, an embodiment of a method for correcting error in gateway data of the internet of things includes:
and S1, acquiring the single type of sample data of the gateway.
The invention constructs a classifier to detect abnormal data through a single-class support vector machine algorithm. In order to ensure the accuracy of the classifier, the sample data is required to have better quality. Therefore, the classifier is trained by acquiring second-level data in the stable operation time period of the Internet of things as sample data, and the trained classifier is used for detecting abnormal data of the gateway of the Internet of things. Meanwhile, the invention acquires the sample data of a single type of the gateway. For example: in an application scene of the Internet of things, if the type of the sensor is a temperature sensor, the gateway temperature sample data is acquired by the method; if the type of sensor is a pressure sensor, then the gateway pressure sample data is obtained by the invention.
And S2, dividing the sample data according to any length numerical value in a preset time length range to obtain a plurality of time sequence units with equal length, and forming the time sequence data corresponding to the length numerical value by the plurality of time sequence units with equal length.
In the invention, because the sample data of the gateway is closely connected, has certain time correlation, is relatively stable within a certain time, and does not change rapidly, when the gateway data is detected to be abnormal, the change of the gateway data within a period of time needs to be ensured to be the same, and therefore, the gateway data needs to be converted into the time series data. When converting gateway data into time series data, the length of the time series data needs to be given, and the length needs to ensure that the autocorrelation of each time series data is large enough.
The preset time length range in the invention is
Figure 81919DEST_PATH_IMAGE025
Sample data is expressed in terms of
Figure 837385DEST_PATH_IMAGE025
Any length ofThe numerical value is divided to obtain a plurality of time sequence units with equal length, and the time sequence data corresponding to the numerical value with equal length is formed by the time sequence units with equal length. For example: if the length value is 5s, dividing the sample data according to the length value of 5s to obtain a plurality of time series units with the length of 5s, and forming time series data corresponding to the length value of 5s by all the time series units with the length of 5 s. If the time length numerical value is 10s, dividing the sample data according to the length numerical value of 10s to obtain a plurality of time series units with the length of 10s, and forming time series data corresponding to the length numerical value of 10s by all the time series units with the length of 10 s. Similarly, time series data corresponding to each length value in a preset time length range is obtained.
S3, acquiring time series data corresponding to each length numerical value in a preset time length range, fitting the time series data corresponding to each length numerical value, determining an optimal length numerical value according to a fitting result, and dividing the sample data into a plurality of optimal time series units with equal length according to the optimal length numerical value.
Fitting the time series data corresponding to each length numerical value and determining an optimal length numerical value according to a fitting result, wherein the fitting comprises the following steps: fitting the time sequence data corresponding to each length value to obtain a fitting result corresponding to each length value; when the fitting result corresponding to any length value is larger than the threshold value determined by the length value, marking the fitting result corresponding to the length value to obtain a marked fitting result; judging the fitting result corresponding to each length value to obtain all the post-labeling fitting results, and selecting the maximum post-labeling fitting result from all the post-labeling fitting results;
and taking the length value corresponding to the maximum value of the marked fitting result as an optimal length value.
According to the invention, time sequence data corresponding to each length value in a preset time length range is obtained. In the invention, the least square method is utilized to carry out time sequence data corresponding to each length numerical valueFitting to obtain a fitting result corresponding to each length value, wherein the length values of the time series units are used
Figure 695620DEST_PATH_IMAGE026
It shows that, since it has been explained in step S1 that the present invention obtains the second-level data in the internet of things smooth operation time period as sample data, the length value is used as the value
Figure 42287DEST_PATH_IMAGE026
The total number of the sample data in the time sequence unit obtained by dividing is also
Figure 985973DEST_PATH_IMAGE026
The specific calculation formula of the fitting result corresponding to any length value is shown in the following formula (1):
Figure 545130DEST_PATH_IMAGE027
(1)
wherein,
Figure 257871DEST_PATH_IMAGE026
denotes the first
Figure 978702DEST_PATH_IMAGE003
The total number of said sample data contained within a time series unit, also representing the length value of the time series unit; n represents that the sample data is according to
Figure 144105DEST_PATH_IMAGE026
Dividing the length to obtain n time sequence units;
Figure 38111DEST_PATH_IMAGE005
is as follows
Figure 870938DEST_PATH_IMAGE003
Within a time series unit
Figure 762671DEST_PATH_IMAGE006
The true value of each datum;
Figure 415369DEST_PATH_IMAGE007
derived from a linear formula fitted according to the least squares method
Figure 316329DEST_PATH_IMAGE003
Within a time series unit
Figure 800400DEST_PATH_IMAGE006
A predicted value of each datum;
Figure 863034DEST_PATH_IMAGE028
as contained in the first time series unit
Figure 3028DEST_PATH_IMAGE026
Mean of individual data;
Figure 238837DEST_PATH_IMAGE029
is represented by a length value
Figure 520957DEST_PATH_IMAGE026
A determined threshold value.
When linear fitting is carried out, the length value M of the time sequence unit limits the fitting effect, so that the requirements on the fitting effect are different for different length values M, and the threshold value of the fitting effect is set to be the length value M
Figure 20072DEST_PATH_IMAGE030
Therefore, when the length value is selected as M, only the fitting effect is greater than
Figure 647362DEST_PATH_IMAGE030
Then, the fitting result corresponding to the length M is marked to obtain a marked fitting result.
In the same way, obtain
Figure 686862DEST_PATH_IMAGE031
And fitting the time series data corresponding to each length value in the whole range to obtain a plurality of fitting results, selecting the maximum value of the post-labeling fitting results from all the post-labeling fitting results, taking the length value corresponding to the maximum value of the post-labeling fitting results as an optimal length value, and dividing the sample data into a plurality of optimal time series units with equal length according to the optimal length value. Among them, the optimum length value in the present invention is used
Figure 83209DEST_PATH_IMAGE004
It is shown that, since it has been described in step S1 that the present invention obtains the data of the second level in the internet of things smooth operation time period as the sample data, the numerical value is calculated according to the optimal length
Figure 487645DEST_PATH_IMAGE004
The total number of the sample data in the optimal time sequence unit obtained by division is also
Figure 602232DEST_PATH_IMAGE004
And S4, calculating the autocorrelation of each optimal time sequence unit.
Respectively fitting the sample data contained in each optimal time sequence unit by using a least square method to obtain the autocorrelation of each optimal time sequence unit;
the autocorrelation calculation formula of each optimal time series unit is shown as the following formula (2):
Figure 445423DEST_PATH_IMAGE032
(2)
wherein,
Figure 696275DEST_PATH_IMAGE002
is shown as
Figure 271613DEST_PATH_IMAGE003
Autocorrelation of each optimal time series unit;
Figure 670234DEST_PATH_IMAGE004
is shown as
Figure 989219DEST_PATH_IMAGE003
A total number of said sample data within an optimal time series unit;
Figure 360158DEST_PATH_IMAGE005
is shown as
Figure 168714DEST_PATH_IMAGE003
In the unit of optimal time sequence
Figure 992313DEST_PATH_IMAGE006
The true value of the individual sample data;
Figure 114990DEST_PATH_IMAGE007
representing a linear equation fitted according to the least squares method
Figure 340435DEST_PATH_IMAGE003
In the unit of optimal time sequence
Figure 788734DEST_PATH_IMAGE006
A predicted value of individual sample data.
And S5, converting all the obtained optimal time sequence units into a multi-dimensional space, wherein the dimension of the multi-dimensional space is equal to the optimal length value.
In the invention, all the obtained optimal time sequence units are converted into a multidimensional space to obtain a plurality of multidimensional coordinate points. For example: if the optimum length value determined in the present invention is
Figure 99630DEST_PATH_IMAGE004
Then according to the optimum length value
Figure 822735DEST_PATH_IMAGE004
The total number of the sample data in the divided optimal time-series unit is also
Figure 168266DEST_PATH_IMAGE004
And simultaneously converting all the obtained optimal time sequence units into
Figure 521887DEST_PATH_IMAGE004
Obtaining a plurality of
Figure 54499DEST_PATH_IMAGE004
And (5) dimensional coordinate points.
And S6, taking each optimal time sequence unit as a center and taking the numerical value determined according to the sample data as a radius, and determining an adjacent data set of each optimal time sequence unit in the multidimensional space.
Wherein, as shown in fig. 2: s61, converting all the obtained optimal time sequence units into the multidimensional space to obtain a plurality of multidimensional coordinate points; s62, selecting any one of the optimal time sequence units as a first optimal time sequence unit; s63, selecting the maximum value of the sample data in the sample data and the minimum value of the sample data, and calculating the difference value between the maximum value of the sample data and the minimum value of the sample data; s64, establishing a first multi-dimensional space geometrical body in the multi-dimensional space by taking the first optimal time sequence unit as a center and the difference value as a radius; s65, taking all the multi-dimensional coordinate points contained in the first multi-dimensional space geometry as an adjacent data set of the first optimal time sequence unit in the multi-dimensional space, and simultaneously calculating the density and the density center of the adjacent data set; and S66, according to the determination method of the adjacent data sets of any optimal time sequence unit in the multidimensional space, determining the adjacent data sets of each optimal time sequence unit in the multidimensional space, and simultaneously calculating the density and the density center of each adjacent data set.
In the invention, each optimal time sequence unit is mapped into a multidimensional space. Wherein, the first
Figure 784558DEST_PATH_IMAGE003
At the optimum timeFor data in inter-sequence units
Figure 781333DEST_PATH_IMAGE033
It is shown that the process of the present invention,
Figure 305855DEST_PATH_IMAGE034
. To be provided with
Figure 585483DEST_PATH_IMAGE033
Is taken as the center to
Figure 853654DEST_PATH_IMAGE035
Being side-long
Figure 908197DEST_PATH_IMAGE004
Inclusion in dimensional space geometry
Figure 869200DEST_PATH_IMAGE015
Sample data of each sample to be tested
Figure 376405DEST_PATH_IMAGE015
Is described as the first
Figure 979424DEST_PATH_IMAGE003
The optimal time series data
Figure 154054DEST_PATH_IMAGE033
In that
Figure 754799DEST_PATH_IMAGE004
Adjacent datasets in dimensional space. Wherein
Figure 14880DEST_PATH_IMAGE036
And the difference value between the maximum value of the sample data and the minimum value of the sample data is a single type of gateway.
And S7, determining the normality of each optimal time sequence unit according to each optimal time sequence unit and the adjacent data set corresponding to each optimal time sequence unit.
Because only a small part of the abnormal data is in the acquired gateway data, and because the abnormal data and the normal data have different forming mechanisms, the abnormal data are far away from the normal data, and therefore, the normal degree of each time series unit is judged by judging the density of the adjacent data of each time series unit in a certain neighborhood and the distance between each time series unit and the density center of the adjacent data set in the sample data. If the density of the adjacent data set of a time sequence unit in the multidimensional space is larger, and the distance between the time sequence unit and the density center of the adjacent data set is smaller, the probability that the time sequence unit belongs to normal data is larger, and the probability that the time sequence unit belongs to abnormal data is smaller.
The calculation formula of the normality of each optimal time series unit in the invention is shown as the following formula (3):
Figure 421590DEST_PATH_IMAGE037
(3)
wherein,
Figure 450726DEST_PATH_IMAGE009
is shown as
Figure 753531DEST_PATH_IMAGE003
Normality of each optimal time series unit;
Figure 235328DEST_PATH_IMAGE004
denotes the first
Figure 648992DEST_PATH_IMAGE003
The total number of the sample data in the optimal time sequence unit is also the dimension of the multidimensional space;
Figure 532634DEST_PATH_IMAGE010
denotes the first
Figure 740762DEST_PATH_IMAGE003
In the optimum time series unit
Figure 709855DEST_PATH_IMAGE006
Dimension data;
Figure 661630DEST_PATH_IMAGE011
denotes the first
Figure 665358DEST_PATH_IMAGE003
The first of the density centers of adjacent data sets of the optimal time series unit
Figure 106704DEST_PATH_IMAGE006
Dimension data;
Figure 563093DEST_PATH_IMAGE012
is shown as
Figure 52980DEST_PATH_IMAGE003
Distance between each optimal time series unit and the density center of the corresponding adjacent data set;
Figure 645636DEST_PATH_IMAGE013
is shown as
Figure 257883DEST_PATH_IMAGE003
In adjacent data sets of an optimal time series unit
Figure 201568DEST_PATH_IMAGE014
A first of the data
Figure 760725DEST_PATH_IMAGE006
Dimension data;
Figure 207887DEST_PATH_IMAGE015
denotes the first
Figure 928718DEST_PATH_IMAGE003
The total number of data contained in the adjacent data sets of the optimal time sequence units;
Figure 162297DEST_PATH_IMAGE016
is shown as
Figure 259566DEST_PATH_IMAGE003
The density of adjacent data sets of the optimal time series unit is larger, the time series unit
Figure 154710DEST_PATH_IMAGE033
The greater the probability of belonging to normal data, the smaller the probability of belonging to abnormal data.
And S8, determining the attention of each optimal time sequence unit according to the autocorrelation of each optimal time sequence unit and the normality of each optimal time sequence unit.
Autocorrelation of optimal time series units
Figure 46442DEST_PATH_IMAGE002
The greater the degree of normality
Figure 699140DEST_PATH_IMAGE009
The larger the probability that the data belongs to normal data is, the larger the attention is; autocorrelation of optimal time series data
Figure 600100DEST_PATH_IMAGE002
The smaller, the degree of normality
Figure 21854DEST_PATH_IMAGE009
The smaller the probability that it belongs to abnormal data, the smaller the attention. By setting a larger attention degree for normal data and a smaller attention degree for abnormal data, the influence of a sample which may be the abnormal data on the classifier is reduced when the classifier is trained, so that the classifier learns more characteristics of the normal data, and the accuracy of detecting the abnormal data by the classifier is improved.
The calculation formula of the attention of each optimal time series unit is shown as the following formula (4):
Figure 350067DEST_PATH_IMAGE038
(4)
wherein,
Figure 286799DEST_PATH_IMAGE018
representing the attention of the time-series unit;
Figure 991450DEST_PATH_IMAGE009
representing the degree of normality of the time series unit;
Figure 533290DEST_PATH_IMAGE002
representing the autocorrelation of a time series of units;
wherein,
Figure 32405DEST_PATH_IMAGE019
to determine the function, the specific rule of the decision function is as follows:
Figure 659695DEST_PATH_IMAGE020
when in use
Figure 636878DEST_PATH_IMAGE021
When the utility model is used, the water is discharged,
Figure 33225DEST_PATH_IMAGE019
=
Figure 703240DEST_PATH_IMAGE022
when is coming into contact with
Figure 817827DEST_PATH_IMAGE023
When the temperature of the water is higher than the set temperature,
Figure 395439DEST_PATH_IMAGE019
=
Figure 115133DEST_PATH_IMAGE024
and S9, training a single-class support vector machine algorithm classifier by using the attention of each optimal time sequence unit, and correcting error of the gateway data of the Internet of things by using the trained classifier.
According to the method, the attention of each optimal time sequence unit is utilized to train the single-class support vector machine algorithm classifier, when the single-class support vector machine algorithm classifier is trained, the attention of each optimal time sequence unit is firstly introduced into an optimized objective function of an OCSVM algorithm to obtain a decision function belonging to the single-class support vector machine algorithm classifier, and the decision function is utilized to train the single-class support vector machine algorithm classifier.
When the trained classifier is used for correcting the data to be detected of the gateway of the Internet of things, the type of the data to be detected is ensured to be the same as the type of sample data used when the single-class support vector machine algorithm classifier is trained. Dividing the data to be detected into a plurality of optimal time sequence units with equal length according to the optimal length numerical value, inputting the optimal time sequence units with equal length into a decision function of a single-class support vector machine algorithm classifier, and outputting the numerical value of the decision function. And judging whether the data to be detected is normal data or not according to the numerical value of the decision function.
If the numerical value of the decision function is 1, indicating that the time sequence data to be detected is a normal sample; and if the numerical value of the decision function is-1, indicating that the time sequence data to be detected is an abnormal sample.
In summary, the invention provides an error correction method for gateway data of the internet of things, which is improved based on a single-type support vector machine algorithm. The method comprises the steps of converting gateway sample data into time sequence units with a certain length, determining the attention degree of each time sequence unit according to the autocorrelation of each time sequence unit and the normality characteristic of each time sequence unit, controlling the influence of each time sequence unit on a single-class support vector machine algorithm classifier in the training process according to the attention degree, and improving the accuracy of the classifier.
The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents, improvements and the like that fall within the spirit and principle of the present invention are intended to be included therein.

Claims (8)

1. An error correction method for gateway data of an internet of things is characterized by comprising the following steps:
acquiring single type sample data of a gateway;
dividing the sample data according to any length value in a preset time length range to obtain a plurality of time sequence units with equal length, and forming time sequence data corresponding to the length value by the plurality of time sequence units with equal length;
acquiring time series data corresponding to each length value within a preset time length range, fitting the time series data corresponding to each length value, determining an optimal length value according to a fitting result, and dividing the sample data into a plurality of optimal time series units with equal length according to the optimal length value;
calculating the autocorrelation of each optimal time-series unit;
converting all the obtained optimal time sequence units into a multidimensional space, wherein the dimension of the multidimensional space is equal to the optimal length value;
determining a neighboring data set of each of the optimal time series units within the multi-dimensional space centered on each of the optimal time series units and having a radius of a numerical value determined from the sample data;
determining the normality of each optimal time sequence unit according to each optimal time sequence unit and the adjacent data set corresponding to each optimal time sequence unit;
determining the attention degree of each optimal time sequence unit according to the autocorrelation of each optimal time sequence unit and the normal degree of each optimal time sequence unit;
and training a single-class support vector machine algorithm classifier by using the attention of each optimal time sequence unit, and correcting the error of the gateway data of the Internet of things by using the trained classifier.
2. The internet of things gateway data error correction method according to claim 1, wherein fitting the time series data corresponding to each length value and determining an optimal length value according to a fitting result includes:
fitting the time sequence data corresponding to each length value to obtain a fitting result corresponding to each length value;
when the fitting result corresponding to any length value is larger than the threshold value determined by the length value, marking the fitting result corresponding to the length value to obtain a marked fitting result;
judging the fitting result corresponding to each length value to obtain all the post-labeling fitting results, and selecting the maximum post-labeling fitting result from all the post-labeling fitting results;
and taking the length value corresponding to the maximum value of the marked fitting result as an optimal length value.
3. The method for correcting the error of the gateway data of the internet of things according to claim 1, wherein the calculating the autocorrelation of each optimal time sequence unit comprises:
respectively fitting the sample data contained in each optimal time sequence unit by using a least square method to obtain the autocorrelation of each optimal time sequence unit;
the autocorrelation calculation formula of each optimal time series unit is shown as the following formula:
Figure 767207DEST_PATH_IMAGE001
wherein,
Figure 165828DEST_PATH_IMAGE002
is shown as
Figure 750393DEST_PATH_IMAGE003
Autocorrelation of each optimal time series unit;
Figure 121331DEST_PATH_IMAGE004
is shown as
Figure 873430DEST_PATH_IMAGE003
A total number of said sample data within an optimal time series unit;
Figure 697029DEST_PATH_IMAGE005
is shown as
Figure 85285DEST_PATH_IMAGE003
Within an optimal time series unit
Figure 310730DEST_PATH_IMAGE006
The true value of the individual sample data;
Figure 493450DEST_PATH_IMAGE007
representing a linear equation fitted according to the least squares method
Figure 69925DEST_PATH_IMAGE003
In the unit of optimal time sequence
Figure 793030DEST_PATH_IMAGE006
A predicted value of individual sample data.
4. The method according to claim 1, wherein the determining the neighboring data set of each optimal time-series unit in the multidimensional space with each optimal time-series unit as a center and a numerical value determined according to the sample data as a radius comprises:
converting all the obtained optimal time sequence units into the multidimensional space to obtain a plurality of multidimensional coordinate points;
selecting any one of the optimal time sequence units to be recorded as a first optimal time sequence unit;
selecting the maximum value of the sample data and the minimum value of the sample data in the sample data, and calculating the difference value between the maximum value of the sample data and the minimum value of the sample data;
establishing a first multi-dimensional space geometry within the multi-dimensional space centered on the first optimal time series unit and with the difference as a radius;
taking all the multi-dimensional coordinate points contained in the first multi-dimensional space geometric body as an adjacent data set of the first optimal time sequence unit in the multi-dimensional space, and simultaneously calculating the density and the density center of the adjacent data set;
according to the determination method of the adjacent data sets of any optimal time sequence unit in the multidimensional space, the adjacent data sets of each optimal time sequence unit in the multidimensional space are determined, and the density center of each adjacent data set are calculated at the same time.
5. The method for correcting the error of the gateway data of the internet of things according to claim 4, wherein the determining the normality of each optimal time-series unit according to each optimal time-series unit and the adjacent data set corresponding to each optimal time-series unit comprises:
calculating the distance between each optimal time sequence unit and the density center of the adjacent data set corresponding to each optimal time sequence unit;
and determining the normality degree of each optimal time sequence unit according to each distance and the density of the adjacent data sets corresponding to each distance.
6. The method for correcting error in gateway data of the internet of things as claimed in claim 5, wherein the calculation formula of the normality of each optimal time series unit is as follows:
Figure 872981DEST_PATH_IMAGE008
wherein,
Figure 492182DEST_PATH_IMAGE009
is shown as
Figure 290373DEST_PATH_IMAGE003
Normality of each optimal time series unit;
Figure 20432DEST_PATH_IMAGE004
is shown as
Figure 689311DEST_PATH_IMAGE003
The total number of the sample data in the optimal time sequence unit is also the dimension of the multidimensional space;
Figure 479412DEST_PATH_IMAGE010
denotes the first
Figure 764900DEST_PATH_IMAGE003
In the optimum time series unit
Figure 33070DEST_PATH_IMAGE006
Dimension data;
Figure 884352DEST_PATH_IMAGE011
is shown as
Figure 845354DEST_PATH_IMAGE003
The first of the density centers of adjacent data sets of the optimal time series unit
Figure 352559DEST_PATH_IMAGE006
Dimension data;
Figure 424420DEST_PATH_IMAGE012
is shown as
Figure 599050DEST_PATH_IMAGE003
Distance between each optimal time series unit and the density center of the corresponding adjacent data set;
Figure 199795DEST_PATH_IMAGE013
is shown as
Figure 459875DEST_PATH_IMAGE003
In adjacent data sets of an optimal time series unit
Figure 866586DEST_PATH_IMAGE014
The first of the data
Figure 895722DEST_PATH_IMAGE006
Dimension data;
Figure 667369DEST_PATH_IMAGE015
is shown as
Figure 414745DEST_PATH_IMAGE003
The total number of data contained in the adjacent data sets of the optimal time sequence units;
Figure 828409DEST_PATH_IMAGE016
denotes the first
Figure 712051DEST_PATH_IMAGE003
Density of adjacent data sets of the respective optimal time series units.
7. The method for correcting the error of the gateway data of the internet of things according to claim 1, wherein a calculation formula of the attention of each optimal time sequence unit is shown as follows:
Figure 185758DEST_PATH_IMAGE017
wherein,
Figure 945729DEST_PATH_IMAGE018
is shown as
Figure 897505DEST_PATH_IMAGE003
Attention of the optimal time sequence unit;
Figure 901233DEST_PATH_IMAGE009
is shown as
Figure 14682DEST_PATH_IMAGE003
Normality of each optimal time series unit;
Figure 471071DEST_PATH_IMAGE002
denotes the first
Figure 23275DEST_PATH_IMAGE003
Autocorrelation of each optimal time series unit;
wherein,
Figure 881510DEST_PATH_IMAGE019
for the judgment function, the specific rule of the judgment function is as follows:
Figure 697019DEST_PATH_IMAGE020
when in use
Figure 375125DEST_PATH_IMAGE021
When the temperature of the water is higher than the set temperature,
Figure 668703DEST_PATH_IMAGE019
=
Figure 381444DEST_PATH_IMAGE022
when is coming into contact with
Figure 164593DEST_PATH_IMAGE023
When the temperature of the water is higher than the set temperature,
Figure 595574DEST_PATH_IMAGE019
=
Figure 692843DEST_PATH_IMAGE024
8. the method for correcting errors in gateway data of the internet of things according to claim 1, wherein training a one-class support vector machine algorithm classifier by using the attention of each optimal time series unit comprises:
introducing the attention degree of each optimal time sequence unit into an optimization objective function of an OCSVM algorithm to obtain a decision function of a classifier belonging to a single-class support vector machine algorithm;
and training a single-class support vector machine algorithm classifier by using the decision function.
CN202210717724.2A 2022-06-23 2022-06-23 Internet of things gateway data error correction method Active CN114816825B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210717724.2A CN114816825B (en) 2022-06-23 2022-06-23 Internet of things gateway data error correction method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210717724.2A CN114816825B (en) 2022-06-23 2022-06-23 Internet of things gateway data error correction method

Publications (2)

Publication Number Publication Date
CN114816825A CN114816825A (en) 2022-07-29
CN114816825B true CN114816825B (en) 2022-09-09

Family

ID=82521917

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210717724.2A Active CN114816825B (en) 2022-06-23 2022-06-23 Internet of things gateway data error correction method

Country Status (1)

Country Link
CN (1) CN114816825B (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2018163342A1 (en) * 2017-03-09 2018-09-13 日本電気株式会社 Abnormality detection device, abnormality detection method and abnormality detection program
CN110807468A (en) * 2019-09-19 2020-02-18 平安科技(深圳)有限公司 Method, device, equipment and storage medium for detecting abnormal mails
CN111737099A (en) * 2020-06-09 2020-10-02 国网电力科学研究院有限公司 Data center anomaly detection method and device based on Gaussian distribution
CN112148955A (en) * 2020-10-22 2020-12-29 南京航空航天大学 Method and system for detecting abnormal time sequence data of Internet of things
CN112381180A (en) * 2020-12-09 2021-02-19 杭州拓深科技有限公司 Power equipment fault monitoring method based on mutual reconstruction single-class self-encoder

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106506556B (en) * 2016-12-29 2019-11-19 北京神州绿盟信息安全科技股份有限公司 A kind of network flow abnormal detecting method and device

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2018163342A1 (en) * 2017-03-09 2018-09-13 日本電気株式会社 Abnormality detection device, abnormality detection method and abnormality detection program
CN110807468A (en) * 2019-09-19 2020-02-18 平安科技(深圳)有限公司 Method, device, equipment and storage medium for detecting abnormal mails
CN111737099A (en) * 2020-06-09 2020-10-02 国网电力科学研究院有限公司 Data center anomaly detection method and device based on Gaussian distribution
CN112148955A (en) * 2020-10-22 2020-12-29 南京航空航天大学 Method and system for detecting abnormal time sequence data of Internet of things
CN112381180A (en) * 2020-12-09 2021-02-19 杭州拓深科技有限公司 Power equipment fault monitoring method based on mutual reconstruction single-class self-encoder

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
Smart car parking: Temporal clustering and anomaly detection in urban car parking;Yanxu Zheng等;《2014 IEEE Ninth International Conference on Intelligent Sensors, Sensor Networks and Information Processing (ISSNIP)》;20140609;第1-6页 *
入侵检测中的机器学习方法及其应用研究;李战春;《中国博士学位论文全文数据库 信息科技辑》;20090515(第05期);第I139-24页 *

Also Published As

Publication number Publication date
CN114816825A (en) 2022-07-29

Similar Documents

Publication Publication Date Title
CN110309886B (en) Wireless sensor high-dimensional data real-time anomaly detection method based on deep learning
CN108734359B (en) Wind power prediction data preprocessing method
CN110224771B (en) Spectrum sensing method and device based on BP neural network and information geometry
CN110826642A (en) Unsupervised anomaly detection method for sensor data
CN114861788A (en) Load abnormity detection method and system based on DBSCAN clustering
CN113297723B (en) Mean shift-grey correlation analysis-based optimization method for electric spindle temperature measurement point
CN115878603A (en) Water quality missing data interpolation algorithm based on K nearest neighbor algorithm and GAN network
CN114153888A (en) Abnormal value detection method and device for time series data
CN112329944B (en) Data flow concept drift detection method based on historical model diversity
CN118013323B (en) Driving motor state analysis method and system for large-caliber electric half ball valve
CN111314910A (en) Novel wireless sensor network abnormal data detection method for mapping isolation forest
CN115374851A (en) Gas data anomaly detection method and device
CN114816825B (en) Internet of things gateway data error correction method
CN115577246A (en) Method for detecting anti-vibration performance of gas cylinder protective cover
CN113746813B (en) Network attack detection system and method based on two-stage learning model
Wang et al. Fault detection for the class imbalance problem in semiconductor manufacturing processes
US20230126258A1 (en) Machine learning device, method for generating learning models, and program
CN115130496A (en) Pipeline pressure signal anomaly detection method based on Bagging and RM-LOF integrated single classifier
CN111354019A (en) Visual tracking failure detection system based on neural network and training method thereof
CN117909852B (en) Monitoring data state division method for hydraulic loop ecological data analysis
CN111160391A (en) Space division-based rapid relative density noise detection method and storage medium
CN114936203B (en) Method based on time sequence data and business data fusion analysis
CN118133059B (en) Intelligent security risk detection method and system based on digital twinning
CN112085053B (en) Data drift discrimination method and device based on nearest neighbor method
CN112000705B (en) Unbalanced data stream mining method based on active drift detection

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
PE01 Entry into force of the registration of the contract for pledge of patent right
PE01 Entry into force of the registration of the contract for pledge of patent right

Denomination of invention: A Data Error Correction Method for IoT Gateway

Effective date of registration: 20231017

Granted publication date: 20220909

Pledgee: Bank of Hankou Limited by Share Ltd. Financial Services Center

Pledgor: Optical Valley Technology Co.,Ltd.

Registration number: Y2023980061529