CN114528907B

CN114528907B - Industrial abnormal data detection method and device

Info

Publication number: CN114528907B
Application number: CN202111665118.2A
Authority: CN
Inventors: 朱明皓; 高勃; 荆涛; 王光宇; 柴学科; 高青鹤
Original assignee: Beijing Jiaotong University
Current assignee: Beijing Jiaotong University
Priority date: 2021-12-31
Filing date: 2021-12-31
Publication date: 2023-04-07
Anticipated expiration: 2041-12-31
Also published as: CN114528907A

Abstract

The application provides an industrial abnormal data detection method and device; the method comprises the following steps: initializing collected industrial data into data points in a data space, setting a density threshold larger than the collection dimension of the industrial data, and initializing the neighborhood radius of each data point; for each data point, determining a sparse value of the data point by using a difference value of the data point and other surrounding data points on a neighborhood radius, determining an outlier of the data point by using a distance from the neighborhood data point, and taking the sparse value and the outlier as a target solution; for each data point, initializing an individual optimal solution with the target solution; iteration is carried out on the individual optimal solution by adopting a group particle algorithm; in response to the preset iteration times, determining an individual optimal solution of each data point in the last iteration, and reversely deducing a corresponding neighborhood radius by using the individual optimal solution; for each data point, in response to the number of neighborhood data points within the neighborhood radius being less than or equal to the density threshold, determining the data point as an outlier.

Description

Industrial abnormal data detection method and device

Technical Field

The embodiment of the application relates to the technical field of industrial data processing, in particular to an industrial abnormal data detection method and device.

Background

In the industrial data processing, the data points in the industrial big data are often detected by adopting globally unified parameters, but the industrial big data are wide and dispersed in source and closely related to a specific industrial field, so that the globally unified parameters cannot effectively eliminate the abnormal data points in the industrial field. Further, industrial fault data in industrial production cannot be removed as data with important significance, but the related industrial abnormal data detection method cannot effectively distinguish abnormal data points needing to be removed from industrial fault data which are not to be removed.

Based on this, a solution that can realize accurate detection of industrial abnormal data is required.

Disclosure of Invention

In view of the above, an object of the present application is to provide a method and an apparatus for detecting industrial abnormal data.

Based on the above purpose, the present application provides an industrial abnormal data detection method, which is applied to a database, and includes:

initializing collected industrial data into data points in a data space, setting a density threshold larger than the collection dimension of the industrial data, and initializing the neighborhood radius of each data point;

for each data point, determining a sparse value of the data point by using a difference value of the data point and other surrounding data points on the neighborhood radius, determining an outlier of the data point by using a distance from a neighborhood data point, and taking the sparse value and the outlier as a target solution;

for each of the data points, initializing an individual optimal solution with the target solution; adopting a group particle algorithm to iterate the individual optimal solution;

in response to the preset iteration times, determining the individual optimal solution of each data point in the last iteration, and reversely deducing the corresponding neighborhood radius by using the individual optimal solution;

for each of the data points, determining that the data point is an outlier in response to the number of neighborhood data points within the neighborhood radius being less than or equal to the density threshold.

Based on the same inventive concept, the present application further provides an industrial abnormal data detection apparatus, which is connected to a database and includes: the device comprises an initialization module, a target solution module, an iteration module and an abnormal point detection module;

wherein the initialization module is configured to initialize the collected industrial data to data points in a data space, set a density threshold greater than an industrial data collection dimension, and initialize a neighborhood radius of each of the data points;

the target solution module is configured to determine a sparse value of each data point by using a difference value of the data point and other surrounding data points on the neighborhood radius, determine an outlier of the data point by using a distance between each data point and a neighborhood data point, and use the sparse value and the outlier as a target solution;

the iterative module configured to initialize an individual optimal solution for each of the data points with the target solution; iterating the individual optimal solution by adopting a group particle algorithm; in response to the preset iteration times, determining the individual optimal solution of each data point in the last iteration, and reversely deducing the corresponding neighborhood radius by using the individual optimal solution;

the outlier detection module is configured to determine that each data point is an outlier in response to the number of neighborhood data points for the data point within the neighborhood radius being less than or equal to the density threshold.

From the above, the method and the device for detecting the industrial abnormal data provided by the application are designed based on the MOPSO (multi-objective particle swarm optimization) and the DBSCAN (density-based clustering method), the different conditions of each data point are comprehensively considered, the respective neighborhood radius is set for each data point, a sparse value and an outlier are designed for each data point to serve as a target solution, the global optimal solution and the individual optimal solution are selected based on the pareto domination principle, the respective neighborhood radius of each data point is obtained by combining the iteration process, each data point can utilize the respective neighborhood radius to evaluate the abnormal data, and therefore the detection accuracy of the abnormal data is improved.

Furthermore, the MOPSO algorithm can be combined with DBSCAN, effective clustering of data clustering is completed when abnormal data are detected, any two attributes, namely dimensions, of data in a data set are subjected to correlation analysis by utilizing a clustering process of clustering to obtain two attributes with the strongest correlation, whether the data represented by the two correlated attributes are abnormal at the same time is analyzed, and if the data represented by the two correlated attributes are abnormal at the same time, the data are industrial fault data and cannot be removed; otherwise, the invalid data need to be removed for transmitting or collecting the fault data by the sensor.

Drawings

In order to more clearly illustrate the technical solutions in the present application or the related art, the drawings needed to be used in the description of the embodiments or the related art will be briefly introduced below, and it is obvious that the drawings in the following description are only embodiments of the present application, and it is obvious for those skilled in the art that other drawings can be obtained according to these drawings without creative efforts.

FIG. 1 is a flow chart of an industrial anomaly data detection method according to an embodiment of the present application;

FIG. 2 is a block diagram of an exemplary embodiment of an apparatus for detecting industrial anomaly data;

fig. 3 is a schematic structural diagram of an electronic device according to an embodiment of the present application.

Detailed Description

In order to make the objects, technical solutions and advantages of the present application more apparent, the present application is further described in detail below with reference to the accompanying drawings in combination with specific embodiments.

It should be noted that technical terms or scientific terms used in the embodiments of the present application should have a general meaning as understood by those having ordinary skill in the art to which the present application belongs, unless otherwise defined. The use of "first," "second," and similar terms in the embodiments of the present application do not denote any order, quantity, or importance, but rather the terms are used to distinguish one element from another. The word "comprising" or "comprises", and the like, means that the element or item listed before the word covers the element or item listed after the word and its equivalents, but does not exclude other elements or items. The terms "connected" or "coupled" and the like are not restricted to physical or mechanical connections, but may include electrical connections, whether direct or indirect.

As described in the background section, the related industrial abnormal data detection method also has difficulty in satisfying the need for abnormal data detection of large data in industrial production.

In the process of implementing the present application, the applicant finds that the related industrial abnormal data detection method has the main problems that: in modern industrial processes, a large amount of engineering data can be collected. However, due to the change of the surrounding environment, improper manual operation, abnormal sensor and other reasons, the acquired data is abnormal, and therefore the abnormal data needs to be cleared up to be capable of carrying out effective data analysis.

In the abnormal data (also called abnormal point) detection method of DBSCAN (density-based clustering method), two parameters of Eps (neighborhood radius) and MinPts (density threshold) are required, in the related method, eps and MinPts are often set globally, that is, the same or similar values are set for all data in the data set to be detected, but the applicant researches and finds that the parameter has great sensitivity to the influence of the result, and because Eps and MinPts are set globally, eps of data points with high density is the same as Eps of data points with low density, so that unified global Eps can flood part of outlier data points, that is, possible abnormal data can be flooded.

Further, the applicant also finds that, in research, when different Eps are set for data points with different densities, outliers can be effectively detected, and the method for setting different Eps for data points with different densities is very suitable for large data detection in the industrial field based on the characteristics of industrial large data.

Specifically, as the industrial big data has wide and dispersed sources and is closely related to the specific industrial field, the complexity of the industrial field can cause various abnormal point data, the density difference of different abnormal point data is large, and if a uniform global Eps value is used, some abnormal data cannot be effectively removed, so that independent Eps need to be set for the data points in the industrial big data set, and the abnormal data can be effectively removed; meanwhile, the abnormal data are not all invalid data, and some abnormal data are industrial fault data, which has great significance for industrial fault analysis and cannot be removed, so that the detection of the abnormal data cannot be uniformly removed, and error data generated by industrial faults need to be identified so as to be split from the error data.

Moreover, because the data volume of the data set in the industrial field is huge, the processing time and the processing complexity of the abnormal data are effectively reduced when the abnormal data are processed, and if the detection of the abnormal data is effectively combined with the clustering of the data, the working cost can be reduced, and the processing efficiency is improved.

It is to be appreciated that the method can be performed by any computing, processing capable apparatus, device, platform, cluster of devices.

Hereinafter, the technical method of the present application will be described in detail with reference to specific examples.

Referring to fig. 1, an industrial abnormal data detection method according to an embodiment of the present application includes the following steps:

step S101, initializing collected industrial data into data points in a data space, setting a density threshold larger than an industrial data collection dimension, and initializing a neighborhood radius of each data point.

In the embodiment of the application, the intelligent battery cell manufacturing data is taken as a specific example, and in the intelligent battery cell manufacturing, a winding process of a chip needs to be performed by using a winding machine, so that the positive electrode plate and the negative electrode plate are subjected to diaphragm assembly manufacturing to form a basic battery cell.

Among the factors that influence the winding process may be: positive plate length, negative plate length, insulation resistance, lower diaphragm length, first alignment degree, second alignment degree, third alignment degree, and the like.

In this embodiment, in each data acquisition of the winding process, the above-mentioned 7 factors affecting the winding process are taken as 7 data dimensions at the time of data acquisition.

Further, the data collected for each winding process is taken as one piece of data, so that each piece of data comprises 7 dimensions of single data; and, a plurality of pieces of data acquired in the multiple winding process are collectively merged into an industrial data set to be processed, and table 1 shown below is formed and stored in a database.

TABLE 1 Industrial data set of winding procedure

Length of negative plate	Insulation resistance test	Lower diaphragm length	Degree of alignment 2	Length of positive plate	Degree of alignment 1	Degree of alignment 3
							8344.550	43.800	8850.640	3.1390	8074.315	0.424	1.755
8347.714	1220.00	8858.268	2.9220	8075.840	0.822	1.729
							8344.965	39.100	8852.46	3.1400	8074.615	0.514	1.755
8345.936	1650.000	8857.128	2.8680	8075.031	0.947	1.756
							8346.235	39.100	8853.056	3.1660	8074.708	0.448	1.728
8345.890	1700.000	8856.766	2.8670	8077.133	0.770	1.729
							8344.203	1740.000	8856.214	3.2210	8075.192	0.396	1.756
8345.196	1170.000	8855.955	2.8100	8076.001	0.874	1.812
							8344.342	45.400	8856.887	3.2200	8074.153	0.455	1.783
8346.027	1030.000	8855.247	2.9770	8076.348	0.728	1.812
							8343.856	43.600	8856.541	3.1390	8075.239	0.396	1.783
8345.590	1550.000	8861.460	2.8370	8075.909	0.867	1.840
							8345.012	1700.000	8857.629	2.9740	8074.869	0.718	1.811
8345.844	1750.000	8861.270	2.9220	8076.278	0.881	1.785
							8345.035	1740.000	8858.802	3.0290	8075.447	0.773	1.840
8345.658	1740.000	8862.529	2.7280	8075.262	1.002	1.840
							8344.873	1720.000	8858.198	3.0550	8075.770	0.794	1.840
8345.704	1080.000	8864.566	2.7820	8075.701	0.943	1.784
							8343.926	1770.000	8859.182	2.9710	8075.078	0.641	1.840

Further, the industrial data set collected in the database is initialized to be data expressed in a data space, specifically, each piece of data is taken as a data point in the data space, and a Euclidean distance algorithm is adopted to assign a distance between every two data points.

In the example of the winding process of the present embodiment, each row in table 1 is taken as 1 data point of the industrial data set in the data space, wherein each data point has 7 dimensions.

Further, the data in the collected industrial data set is processed according to DBSCAN (density-based clustering method).

Specifically, a MinPts (density threshold) is set for the industrial data set and is taken as the MinPts for each data point in the industrial data set, and in the DBSCAN algorithm, the MinPts is set to be greater than the dimension of the industrial data set, and may be set to be greater than the data acquisition dimension plus 1, or equal to the data acquisition dimension plus 1.

In the example of the winding process of the present embodiment, minPts is set equal to the data acquisition dimension plus 1, that is, minPts =8.

Further, the Eps values for each data point in the industrial data set are initialized to initiate the iterative process described below.

In this embodiment, the initialization of the Eps may be randomly selected for each data point by using a random function or the like, or may be an average value of the distance between each data point and other data points, and the average value is used as the initialization value of the Eps.

And S102, determining a sparse value of each data point by using the difference value of the data point and other surrounding data points on the neighborhood radius, determining an outlier of the data point by using the distance between the data point and the neighborhood data point, and taking the sparse value and the outlier as a target solution.

In the embodiment of the present disclosure, the neighborhood radius is obtained by combining the MOPSO (multi-objective particle swarm optimization) algorithm with the DBSCAN algorithm.

First, a sparse value and an outlier are designed for each data point in the industrial data set to measure the probability that each data point becomes anomalous data, which may also be referred to as an outlier or outlier data point in this application.

Further, for each data point, the measurement results of the sparse value and the outlier are jointly used as a target solution, a non-inferior solution, namely a non-dominant solution, is obtained from a plurality of target solutions based on a Pareto (Pareto) dominant principle, and the Pareto dominant principle is combined with the MOPSO to obtain an optimal neighborhood radius.

Specifically, when measuring the sparse value of each data point, firstly, determining a plurality of other data points around the data point according to the sequence from near to far from the data point to be measured, respectively calculating the difference value of the data point and other data points on the neighborhood radius, and measuring the sparse value of the data point by using the difference value; the larger the difference between the data point to be measured and a data point around the data point is, the larger the difference between the data point to be measured and the data point around the data point is, and further, the larger the sum result is, the larger the difference between the current data point and other data points around the current data point is, and the larger the difference is, the more likely it is to become an abnormal point.

In this embodiment, when selecting the number of the plurality of surrounding data points, the data point having the same value as MinPts can be selected according to the value of MinPts.

Further, based on the above discussion, the following formula is designed to calculate for sparse values:

wherein, eps _i Neighborhood radius, eps, representing the data points to be measured _j Representing the neighborhood radius of other data points around the data point to be measured, and calculating the absolute value of the difference between the two, summing the obtained absolute values, and calculating F ₁ As the sparse value for this current data point.

Further, when F ₁ The smaller the probability that the data point becomes an outlier.

Further, when measuring the outlier of each data point, firstly, all other data points in the current data point to be measured in the Eps are determined, the sum of the distances between the data point to be measured and the other data points is calculated, and the outlier of the data point to be measured is measured by using the sum of the distances, wherein when the distance value is larger, the probability that the data point to be measured is more likely to become an abnormal point is indicated.

Further, based on the above discussion, the following formula is designed to calculate outliers:

wherein x is _i Representing data points to be measured, x _j Representing other data points within the data point to be measured Eps and calculating the resulting F ₂ As an outlier of the data point to be measured.

Further, when F ₂ The smaller the probability that the data point becomes an outlier.

Further, the obtained sparse value and the outlier are jointly used as a target solution to measure the possibility that the data point is an abnormal point.

Step S103, initializing an individual optimal solution by using the target solution for each data point; and iterating the individual optimal solution by adopting a group particle algorithm.

In embodiments of the present application, an iterative process may be utilized to convert min (F) ₁ ，F ₁ ) And as a target, iterating the optimal global optimal solution and the individual optimal solution in the target solution, and taking the smaller or smallest target solution as the individual optimal solution to obtain the neighborhood radius corresponding to the optimal individual optimal solution.

First, since the iterative process in this embodiment is based on the DBSCAN algorithm of MOPSO, the target solution of each data point can be regarded as one particle in the iteration.

Further, if the iteration is the first iteration, the individual optimal solution of each data point needs to be initialized, and the iteration of the individual optimal solution is started.

The obtained target solution of each data point can be used as an initialization value of the individual optimal solution; and initializing the speed of each particle, namely the target solution, by using a random function, and simultaneously, considering the neighborhood radius of each data point as the position of the particle in the MOPSO for facilitating understanding.

It should be noted that, initialization of the neighborhood radius, the particle velocity, and the individual optimal solution may be performed in a first iteration process after starting iteration, or each reference may be initialized first and then the iteration process is started.

In the embodiment of the application, a Pareto domination principle can be adopted, all non-dominated solutions are selected from all target solutions of each iteration, the non-dominated solutions form a non-dominated solution set, and a global optimal solution is selected from the non-dominated solution set.

Specifically, according to the Pareto domination principle, in all target solutions of the iteration, if both a sparse value and an outlier of one target solution reach the minimum at the same time, that is, if there is no sparse value of any other target solution larger than the sparse value of the target solution, and there is no outlier of any other target solution smaller than the outlier of the target solution, it may be considered that the target solution may dominate all other target solutions, and the target solution is used as the only non-dominated solution in all target solutions of the iteration.

Further, in all target solutions of each iteration, if the sparse value and the outlier of one target solution are not simultaneously minimized, the target solution with the minimized sparse value and the minimized outlier is found.

In this case, it may be the case that, among all target solutions in the current iteration, at least one target solution whose sparse value is the smallest is present, but the outlier of the target solution is not the smallest, and at least one target solution whose outlier is the smallest is present, but the outlier of the target solution is not the smallest, and for the target solution whose sparse value and the outlier are the smallest, any one of the target solutions cannot dominate or is not dominated by any other target solution, and therefore, such target solutions may be regarded as non-dominated solutions.

Further, a non-dominated solution set is constructed, and all non-dominated solutions are put into the non-dominated solution set, and in the present embodiment, all non-dominated solutions obtained from past iterations are included in the non-dominated solution set.

Further, in each iteration, the maximum value and the minimum value of the calculated sparse values are used as the range of an abscissa, the maximum value and the minimum value of the calculated outlier are used as the range of an ordinate, a target space is formed, and the target space is further divided into a plurality of sub-regions by using uniform grids, wherein the size sparsity of the grids can be adjusted according to specific requirements.

When the target space is formed, the outlier may be set as the abscissa and the sparse value may be set as the ordinate.

Further, according to the respective sparse value and outlier of the particle, the position of each particle in the target space, that is, the sub-region where the particle is located, can be determined.

Further, the number of particles contained in each sub-region is determined, and the number of particles is used as the spatial density value of each particle in the sub-region, wherein the spatial density value is larger if the number of particles is larger, and is smaller otherwise.

Then, according to the Pareto governing principle, the principle that the smaller the spatial density value is, the better the spatial density value is, the smallest spatial density value is selected from all the particles as a global optimal solution.

In an embodiment of the present disclosure, in each iteration, the velocity of the particle may be updated with the individual optimal solution and the global optimal solution according to the MOPSO.

Specifically, the following velocity update formula may be taken:

V _i+1 ＝ω×V _i +C ₁ ×rand()×(pbest _i -Eps _i )+C ₂ ×rand()×(gbest _i -Eps _i )

where ω denotes the inertia factor, C ₁ And C ₂ Represents a learning factor, C in the present embodiment ₁ And C ₂ Can take 2,V _i+1 Indicates the speed, V, of this iteration _i Indicates the speed of the last iteration, gbest _i Represents the neighborhood radius, pbest, corresponding to the last iteration's globally optimal solution _i Representing the neighborhood radius, eps, corresponding to the individual optimal solution of the last iteration _i The neighborhood radius of the last iteration is represented, and the neighborhood radius in the speed updating formulaThe position of the particles is also represented.

Further, the first part of the velocity update formula, i.e. "ω × V _i The part can be called as a memory item and represents the influence of the last speed and direction, wherein the value of an inertia factor can influence the range of searching the optimal result, and if the value is larger, the global optimization capability is strong, and the local optimization capability is weak; if the value is smaller, the global optimization capability is weak, the local optimization capability is strong, and a dynamic inertia factor can be adopted to obtain a better optimization result; the second part of the formula, namely "C ₁ ×rand()×(pbest _i -Eps _i ) "part, which may be called self-knowledge item, is a vector pointing from the current point to the best point of the particle itself, and represents that the motion of the particle is derived from self-experience; the third part of the formula, namely "C ₂ ×rand()×(gbest _i -Eps _i ) The "part, which may be called a group recognition item, is a vector pointing from the current point to the best point of the group, reflecting collaboration and knowledge sharing among the particles.

Further, the position of the particle is updated by the velocity using the following position update formula:

Eps _i+1 ＝V _i+1 +Eps _i

wherein, eps _i+1 Is expressed as the neighborhood radius, V, of this iteration _i+1 Representing the speed of this iteration, eps _i And expressing the neighborhood radius of the last iteration, wherein the neighborhood radius in the position updating formula represents the position concept of the particle in the MOPSO iteration method.

And further, calculating and updating the sparse value and the outlier in the next iteration by using the neighborhood radius of the particles in the iteration.

In this embodiment, if the iteration is not the first iteration, for the individual optimal solution in each iteration, the target solution of the iteration may be obtained by comparing with the individual optimal solution of the historical iteration.

Specifically, according to the Pareto domination principle, the target solution of the data point in the current iteration is compared with the individual optimal solution in the historical iteration, the non-domination solution is used as the individual optimal solution of the current iteration, and the solution is placed in the non-domination solution set.

The overflow threshold may be designed for the non-dominated solution set, and when the number of non-dominated solutions in the non-dominated solution set exceeds a preset overflow threshold, no new non-dominated solution is added to the non-dominated solution set.

And S104, in response to the preset iteration times, determining the individual optimal solution of each data point in the last iteration, and reversely deducing the corresponding neighborhood radius by using the individual optimal solution.

In the embodiment of the present application, when iteration is started, a maximum iteration number may also be designed for the iteration: gMAX; and when the iteration times reach gMAX, stopping the iteration of the individual optimal solution and the global optimal solution, and acquiring the individual optimal solution in the last iteration.

After iteration is completed, for each data point, a neighborhood radius corresponding to the individual optimal solution is determined, and data abnormal points are detected by utilizing the neighborhood radius.

Step S105, for each data point, determining that the data point is an abnormal point in response to the number of the neighborhood data points in the neighborhood radius being less than or equal to the density threshold.

In an embodiment of the present disclosure, for each data point, a density threshold may be used to measure whether the data point is abnormal.

Specifically, for each data point, the number of other data points in the neighborhood radius is determined by using the obtained neighborhood radius.

Further, when the number of other data points is less than or equal to the preset density threshold, the data point is considered as an abnormal point.

In this embodiment, for the detection of the outlier, the outlier may be compared with the density threshold one by traversing each data point, or may be performed in a manner of combining clustering and detection by combining with the DBSCAN algorithm.

The efficiency is reduced and the operation cost is high due to the fact that each data point is traversed for carrying out anomaly detection, detection of the anomaly points can be completed under the condition that clustering is not carried out, clustering and anomaly detection of data can be simultaneously achieved by combining the anomaly point detection of the DBSCAN algorithm, and based on conventional industrial requirements, the mode of combining the DBSCAN algorithm is preferred in the embodiment.

Specifically, for each 1 data point, when the DBSCAN iteration is initiated, an unaccessed tag may be set, the obtained neighborhood radius of the data point is called, all first neighborhood data points of the data point within the neighborhood radius are traversed, and the number of all first neighborhood data points is determined.

Further, when the number of the first neighborhood data points is less than or equal to the density threshold, the data points can be determined as abnormal points, an abnormal data set is constructed, and the data points are placed in the abnormal data set; and when the number of the first neighborhood data points is larger than the density threshold value, not determining the data points as abnormal data, constructing a target class cluster related to the data points, and taking the data points as core data points of the first neighborhood data points.

Further, each 1 first neighborhood data point is analyzed to complete the clustering process.

Specifically, the obtained neighborhood radius of each first neighborhood data point is called, all second neighborhood data points of each first neighborhood data point in the neighborhood radius are traversed, and the number of all second neighborhood data points is determined.

Further, for each first neighborhood data point, when the number of second neighborhood data points is less than or equal to the density threshold, the first neighborhood data point can be determined to be abnormal data and put into an abnormal data set; when the number of the second neighborhood data points is greater than the density threshold, the first neighborhood data points can be determined to be non-abnormal data, and the first neighborhood data points are placed into a target cluster, wherein the target cluster at the moment is the target cluster of the core data points corresponding to the first neighborhood data points, namely the constructed target cluster.

In another embodiment of the present application, the detection of anomalous data is performed on a plurality of industrial data sets using an industrial anomalous data detection method.

In this embodiment, in the case of multiple dimensions and multiple industrial data sets, when abnormal data is determined, correlation analysis may be performed on multiple attributes, that is, dimensions, in all industrial data in advance, and a data point set represented by each of the multiple attributes with the strongest correlation may be determined.

Specifically, based on the industrial data set of the winding process shown in table 1, the negative electrode sheet length and the positive electrode sheet length are determined as two attributes having the greatest correlation, that is, dimensions, according to the calculation of the pearson correlation coefficient.

Further, the length of the negative electrode plate, the insulation resistance, the length of the lower diaphragm, the alignment degree 1, the alignment degree 2 and the alignment degree 3 form an industrial data set 1 with the dimension of 6, and the length of the positive electrode plate, the insulation resistance, the length of the lower diaphragm, the alignment degree 1, the alignment degree 2 and the alignment degree 3 form the industrial data set 1 with the dimension of 6.

Further, the Euclidean distance matrix D of the data points in the industrial data set 1 is obtained through calculation ₁ And the Euclidean distance matrix D of data points in the industrial data set 2 ₂ Wherein D is ₁ And D ₂ Is a matrix of order n, D ₁ In line i represents D ₁ The distance of the ith data point (also called the ith data point) from the other data points in (1).

Further, eps was calculated for each data point in the industrial data set 1 and the industrial data set 2 using the same MOPSO iterative method as in the previous example.

Further, the same DBSCAN algorithm as in the previous embodiment is used for detection of abnormal data.

Further, when all the data in the industrial data set 1 are detected, the abnormal data points of the industrial data set 1 shown in table 2 can be obtained.

Table 2 anomaly data points for industrial dataset 1

Insulation resistance	Lower diaphragm length	Degree of alignment 2	Length of positive plate	Degree of alignment 1	Degree of alignment 3
						54.8000	8851.7090	2.4620	8029.6650	1.2910	1.4080
56.6000	8850.9680	2.3270	8029.2720	1.3460	1.3200
						53.9000	8851.8990	2.3560	8029.0870	1.4420	1.3530
51.9000	8851.5020	2.3480	8029.6650	1.2320	1.3200
						53.7000	8853.4520	2.4270	8028.8330	1.1580	1.3790
56.8000	8851.1230	2.3900	8029.3640	1.2010	1.3200
						55.4000	8853.1770	2.3880	8029.0870	1.3080	1.4260
56.5000	8853.1070	2.5250	8028.8560	1.1890	1.4340
						54.7000	8854.6600	2.5510	8029.4110	1.1540	1.3750
52.9000	8854.6080	2.5510	8029.4570	1.2060	1.4590

Further, the industrial data set 2 is processed in the same manner as described above, and abnormal data points of the industrial data set 2 shown in table 3 are obtained.

TABLE 3 anomaly data points for Industrial data set 1

Length of negative electrode plate	Insulation resistance	Lower diaphragm length	Degree of alignment 2	Degree of alignment 1	Degree of alignment 3
						8322.8630	54.8000	8851.7090	2.4620	1.2910	1.4080
8322.3320	56.6000	8850.9680	2.3270	1.3460	1.3200
						8322.5630	55.2000	8853.5910	2.4230	1.2110	1.4850
8321.8700	56.6000	8856.6100	2.4880	1.1110	1.4340
						8321.3860	53.7000	8853.4520	2.4270	1.1580	1.3790
8321.7310	59.8000	8848.5680	1.9640	1.7400	1.1010
						8322.3540	51.8000	8850.2080	1.9500	1.5410	1.0540
8322.3540	56.5000	8853.1070	2.5250	1.1890	1.4340
						8322.3090	54.7000	8854.6600	2.5510	1.1540	1.3750
8321.4080	52.9000	8854.6080	2.5510	1.2060	1.4590

Further, common outliers in the industrial data set 1 and the industrial data set 2, that is, outlier data points in the data sets represented by the two attributes with the stronger association, are extracted and shown in table 4 below.

TABLE 4 common points of anomaly for Industrial data set 1 and Industrial data set 2

Length of negative plate	Insulation resistance	Lower diaphragm length	Degree of alignment 2	Length of positive plate	Degree of alignment 1	Degree of alignment 3
							8322.8630	54.8000	8851.7090	2.4620	8029.6650	1.2910	1.4080
8322.3320	56.6000	8850.9680	2.3270	8029.2720	1.3460	1.3200
							8321.3860	53.7000	8853.4520	2.4270	8028.8330	1.1580	1.3790
8322.3540	56.5000	8853.1070	2.5250	8028.8560	1.1890	1.4340
							8322.3090	54.7000	8854.6600	2.5510	8029.4110	1.1540	1.3750
8321.4080	52.9000	8854.6080	2.5510	8029.4570	1.2060	1.4590

Further, the common abnormal point in table 4 may be used as industrial fault data and important industrial analysis data, and the relevant fault analyst may analyze the common abnormal point and exclude the industrial fault data from the abnormal point.

Therefore, the method for detecting the industrial abnormal data is designed based on the MOPSO (multi-objective particle swarm optimization) and the DBSCAN (density-based clustering method), different conditions of each data point are comprehensively considered, respective neighborhood radius is set for each data point, a sparse value and an outlier are designed for each data point to serve as a target solution, a global optimal solution and an individual optimal solution are selected based on the principle of pareto domination, the respective neighborhood radius of each data point is obtained by combining an iteration process, each data point can utilize the respective neighborhood radius to evaluate the abnormal data, and therefore detection accuracy of the abnormal data is improved.

Furthermore, the method can be combined with DBSCAN, effective clustering of data clustering is completed when abnormal data are detected, association analysis is performed on any two attributes, namely dimensions, of data in a data set by utilizing a clustering process to obtain two attributes with the strongest association, whether the data represented by the two associated attributes are abnormal at the same time or not is analyzed, and if the data represented by the two associated attributes are abnormal at the same time, the data are industrial fault data and cannot be removed; otherwise, the invalid data need to be removed for transmitting or collecting the fault data by the sensor.

It should be noted that the method of the embodiments of the present application may be executed by a single device, such as a computer or a server. The method of the embodiment can also be applied to a distributed scene and completed by the mutual cooperation of a plurality of devices. In such a distributed scenario, one of the devices may only perform one or more steps of the method of the embodiments of the present application, and the devices may interact with each other to complete the method.

It should be noted that the above describes some embodiments of the present application. Other embodiments are within the scope of the following claims. In some cases, the actions or steps recited in the claims may be performed in a different order than in the embodiments described above and still achieve desirable results. In addition, the processes depicted in the accompanying figures do not necessarily require the particular order shown, or sequential order, to achieve desirable results. In some embodiments, multitasking and parallel processing may also be possible or may be advantageous.

Based on the same inventive concept, corresponding to any embodiment method, the embodiment of the application also provides an industrial abnormal data detection device.

Referring to fig. 2, the industrial abnormal data detecting apparatus is connected to a database, and the apparatus may include: the device comprises an initialization module, a target solution module, an iteration module and an abnormal point detection module.

The initialization module 201 is configured to initialize the collected industrial data to data points in a data space, set a density threshold greater than an industrial data collection dimension, and initialize a neighborhood radius of each of the data points.

The target solution module 202 is configured to determine a sparse value of each data point by using a difference between the data point and other surrounding data points in the neighborhood radius, determine an outlier of the data point by using a distance between the data point and the neighborhood data point, and use the sparse value and the outlier as a target solution.

The iteration module 203 configured to initialize an individual optimal solution for each of the data points using the target solution; iterating the individual optimal solution by adopting a group particle algorithm; and in response to reaching the preset iteration times, determining the individual optimal solution of each data point in the last iteration, and reversely deducing the corresponding neighborhood radius by using the individual optimal solution.

The outlier detection module 204 is configured to determine each of the data points as an outlier in response to a number of the neighborhood data points for the data point within the neighborhood radius being less than or equal to the density threshold.

Wherein, the iteration module 203 is specifically configured to: determining a global optimal solution among the target solutions;

according to a pareto domination principle, obtaining non-domination solutions in the target solution in a smaller and more optimal domination mode, keeping the non-domination solutions in iteration all the time, and determining a global optimal solution of the iteration in all the non-domination solutions of historical iteration;

initializing the speed of each target solution in the first iteration;

calculating the speed of the current iteration by using the speed of the previous iteration, the neighborhood radius, the individual optimal solution and the global optimal solution, calculating the neighborhood radius of the current iteration by using the speed of the current iteration and the neighborhood radius of the previous iteration, and updating each target solution;

and according to a pareto domination principle, determining the individual optimal solution of the current iteration in the target solution of the current iteration of each data point and the individual optimal solution of the historical iteration by adopting a smaller and more optimal domination mode, and executing the next iteration.

The outlier detection module 204 is specifically configured to: traversing a first neighborhood data point for each data point within the neighborhood radius thereof;

in response to determining that the number of the first neighborhood data points is less than or equal to the density threshold, taking the data points as the outliers and placing the outliers into an outlier dataset;

responsive to determining that the number of the first neighborhood data points is greater than the density threshold, not considering the data points as the outliers and constructing a target cluster for the data points;

for each of the first neighborhood data points, determining a number of second neighborhood data points within the neighborhood radius thereof;

in response to determining that the number of the second neighborhood data points is less than or equal to the density threshold, taking the first neighborhood data points as the abnormal points and placing the abnormal points into the abnormal data set;

responsive to determining that the number of second neighborhood data points is greater than the density threshold, not considering the first neighborhood data point as the outlier and placing the first neighborhood data point in the target cluster for the data point.

For convenience of description, the above devices are described as being divided into various modules by functions, and are described separately. Of course, the functionality of the various modules may be implemented in the same one or more pieces of software and/or hardware in practicing embodiments of the present application.

The apparatus of the foregoing embodiment is used to implement the corresponding method for detecting industrial abnormal data in any of the foregoing embodiments, and has the beneficial effects of the corresponding method embodiment, which are not described herein again.

Based on the same inventive concept, corresponding to any of the above embodiments, the embodiments of the present application further provide an electronic device, which includes a memory, a processor, and a computer program stored in the memory and executable on the processor, and when the processor executes the program, the industrial abnormal data detection method according to any of the above embodiments is implemented.

Fig. 3 is a schematic diagram illustrating a more specific hardware structure of an electronic device according to this embodiment, where the electronic device may include: a processor 1010, a memory 1020, an input/output interface 1030, a communication interface 1040, and a bus 1050. Wherein the processor 1010, memory 1020, input/output interface 1030, and communication interface 1040 are communicatively coupled to each other within the device via bus 1050.

The processor 1010 may be implemented by a general-purpose CPU (Central Processing Unit), a microprocessor, an Application Specific Integrated Circuit (ASIC), or one or more Integrated circuits, and is configured to execute related programs to implement the technical solutions provided in the embodiments of the present Application.

The Memory 1020 may be implemented in the form of a ROM (Read Only Memory), a RAM (Random Access Memory), a static Memory device, a dynamic Memory device, or the like. The memory 1020 may store an operating system and other application programs, and when the technical solution provided by the embodiment of the present application is implemented by software or firmware, the relevant program codes are stored in the memory 1020 and called to be executed by the processor 1010.

The input/output interface 1030 is used for connecting an input/output module to input and output information. The i/o module may be configured as a component in a device (not shown) or may be external to the device to provide a corresponding function. The input devices may include a keyboard, a mouse, a touch screen, a microphone, various sensors, etc., and the output devices may include a display, a speaker, a vibrator, an indicator light, etc.

The communication interface 1040 is used for connecting a communication module (not shown in the drawings) to implement communication interaction between the present apparatus and other apparatuses. The communication module can realize communication in a wired mode (such as USB, network cable and the like) and also can realize communication in a wireless mode (such as mobile network, WIFI, bluetooth and the like).

Bus 1050 includes a path that transfers information between various components of the device, such as processor 1010, memory 1020, input/output interface 1030, and communication interface 1040.

It should be noted that although the above-mentioned device only shows the processor 1010, the memory 1020, the input/output interface 1030, the communication interface 1040 and the bus 1050, in a specific implementation, the device may also include other components necessary for normal operation. In addition, those skilled in the art will appreciate that the above-described apparatus may also include only the components necessary to implement the embodiments of the present application, and need not include all of the components shown in the figures.

Based on the same inventive concept, corresponding to any of the above embodiments, the present application also provides a non-transitory computer readable storage medium storing computer instructions for causing the computer to execute the industrial abnormal data detection method according to any of the above embodiments.

Computer-readable media of the present embodiments, including both non-transitory and non-transitory, removable and non-removable media, may implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of computer storage media include, but are not limited to, phase change memory (PRAM), static Random Access Memory (SRAM), dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), read Only Memory (ROM), electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic tape magnetic disk storage or other magnetic storage devices, or any other non-transmission medium that can be used to store information that can be accessed by a computing device.

The computer instructions stored in the storage medium of the above embodiment are used to enable the computer to execute the industrial abnormal data detection method according to any one of the above embodiments, and have the beneficial effects of the corresponding method embodiment, which are not described herein again.

Those of ordinary skill in the art will understand that: the discussion of any embodiment above is meant to be exemplary only, and is not intended to intimate that the scope of the disclosure, including the claims, is limited to these examples; within the context of the present application, features from the above embodiments or from different embodiments may also be combined, steps may be implemented in any order, and there are many other variations of the different aspects of the embodiments of the present application as described above, which are not provided in detail for the sake of brevity.

In addition, well-known power/ground connections to Integrated Circuit (IC) chips and other components may or may not be shown in the provided figures for simplicity of illustration and discussion, and so as not to obscure the embodiments of the application. Furthermore, devices may be shown in block diagram form in order to avoid obscuring embodiments of the application, and this also takes into account the fact that specifics with respect to implementation of such block diagram devices are highly dependent upon the platform within which the embodiments of the application are to be implemented (i.e., specifics should be well within purview of one skilled in the art). Where specific details (e.g., circuits) are set forth in order to describe example embodiments of the application, it should be apparent to one skilled in the art that embodiments of the application can be practiced without, or with variation of, these specific details. Accordingly, the description is to be regarded as illustrative instead of restrictive.

While the present application has been described in conjunction with specific embodiments thereof, many alternatives, modifications, and variations of these embodiments will be apparent to those of ordinary skill in the art in light of the foregoing description. For example, other memory architectures, such as Dynamic RAM (DRAM), may use the discussed embodiments.

The embodiments of the present application are intended to embrace all such alternatives, modifications and variances that fall within the broad scope of the appended claims. Therefore, any omissions, modifications, substitutions, improvements, and the like that may be made without departing from the spirit and principles of the embodiments of the present application are intended to be included within the scope of the present application.

Claims

1. An industrial abnormal data detection method is applied to a database and comprises the following steps:

wherein said determining a sparse value for said data point using a difference in said neighborhood radius from other surrounding data points comprises,

the sparse value is calculated using the formula shown below,

wherein, F ₁ Sparse value representing the data point, eps _i A neighborhood radius, eps, representing the data point _j Neighborhood radius, x, representing the other data points around the neighborhood _j Representing any other data point around the neighborhood within the radius of the data point, D representing all other data points around, abs representing the absolute value of the difference between the two calculated;

said determining an outlier of a neighborhood data point using a distance to said data point comprises,

outliers were calculated using the formula shown below:

wherein x is _i Representing said data points and calculating F ₂ As an outlier of a data point to be measured, distance represents a Distance between the data point and the other surrounding data points;

for each of the data points, initializing an individual optimal solution with the target solution; iterating the individual optimal solution by adopting a group particle algorithm;

wherein the employing a group particle algorithm iterates the individual optimal solutions, including,

determining a global optimal solution among the target solutions in each iteration;

initializing the speed of each target solution in the first iteration;

according to a pareto domination principle, determining the individual optimal solution of the current iteration in the target solution of the current iteration of each data point and the individual optimal solution of the historical iteration in a smaller and more optimal domination mode, and executing the next iteration;

wherein said determining, in each iteration, a globally optimal solution among said target solutions comprises,

according to a pareto domination principle, a smaller and more optimal domination mode is adopted, the non-domination solution of the current iteration is determined in the target solution of each iteration, and the non-domination solution of each iteration is put into a non-domination solution set, wherein the non-domination solution set comprises all non-domination solutions of all iterations;

establishing a target space by taking the sparse value and the outlier as coordinate axes, and equally dividing the target space into a plurality of sub-regions;

determining the position of each target solution in the target space and the number of the target solutions contained in each sub-area;

in response to determining the sub-region containing the least number of target solutions, taking the target solution in the sub-region as the global optimal solution;

said calculating said velocity of the current iteration using said velocity of the last iteration, said neighborhood radius, said individual optimal solution, and said global optimal solution, and calculating said neighborhood radius of the current iteration using said velocity of the current iteration and said neighborhood radius of the last iteration, comprising,

the velocity is calculated using the following formula:

wherein ω represents the inertiaSex factor, C ₁ And C ₂ Represents a learning factor, V _i+1 Representing said speed, V, of the current iteration _i Represents said speed of the last iteration, gbest _i The neighborhood radius, pbest, corresponding to the globally optimal solution representing the last iteration _i Representing the neighborhood radius, eps, corresponding to the individual optimal solution of the last iteration _i Representing the neighborhood radius of a last iteration;

and calculating the neighborhood radius using the following formula:

Eps _i+1 ＝V _i+1 +Eps _i

wherein, eps _i+1 The neighborhood radius, V, representing this iteration _i Representing the speed of the current iteration;

2. The method of claim 1, wherein said determining that the data point is an outlier comprises:

for each said data point, traversing a first neighborhood data point for that data point within its said neighborhood radius;

3. The method of claim 1, wherein determining the sparse value of the data point using the difference in the neighborhood radius from other surrounding data points comprises:

determining a certain number of other data points around the data point according to the sequence of the distance from near to far;

the certain number is equal to the density threshold;

determining that the sparse value is smaller in response to the data point having a larger difference from the surrounding other data points in the neighborhood radius.

4. The method of claim 1, wherein determining an outlier of a neighborhood data point using a distance to the data point comprises:

determining all neighborhood data points of the data point within the neighborhood radius using the neighborhood radius;

taking a sum of distances of the data point to all other of the neighborhood data points, and determining that the outlier is smaller in response to a larger sum.

5. The method of claim 1, wherein the determining the individual optimal solution for the current iteration among the target solution for the current iteration and the individual optimal solutions for historical iterations for each of the data points comprises:

for each data point, according to the pareto dominance principle, in the individual optimal solution of the historical iteration of the data point and the target solution of the current iteration, in response to the existence of one non-dominance solution, determining the non-dominance solution as the individual optimal solution of the current iteration, and putting the non-dominance solution into the non-dominance solution set;

and in response to determining that a plurality of non-dominant solutions exist, randomly selecting one of all the non-dominant solutions as the individual optimal solution of the iteration, and putting the solution into the non-dominant solution set.

6. The method of claim 1, wherein determining the non-dominant solution of the current iteration in the target solution of each iteration comprises:

in response to the presence of a minimum of both the sparse value and the outlier for one of the target solutions among all of the target solutions for each iteration, determining that the target solution dominates all other target solutions and as the only non-dominated solution;

in all the target solutions of each iteration, in response to the fact that the sparse value and the outlier of one target solution are not the smallest and a plurality of target solutions exist and any one of the sparse value and the outlier is the smallest, it is determined that none of the target solutions dominates any other target solution and none of the target solutions dominates any other target solution, and the target solutions are used as non-dominated solutions.

7. An apparatus for detecting industrial abnormal data, the apparatus being connected to a database and comprising: the system comprises an initialization module, a target solution module, an iteration module and an abnormal point detection module;

the initialization module is configured to initialize the collected industrial data to data points in a data space, set a density threshold larger than an industrial data collection dimension, and initialize a neighborhood radius of each data point;

the sparse value is calculated using the formula shown below,

wherein, F ₁ Sparse values representing the data points, eps _i A neighborhood radius, eps, representing the data point _j Neighborhood radius, x, representing the other data points around the neighborhood _j Representing any other data point around the neighborhood within the radius of the data point, D representing all other data points around, abs representing the absolute value of the difference between the two calculated;

outliers were calculated using the formula shown below:

initializing the speed of each target solution in the first iteration;

wherein the determining, in each iteration, a globally optimal solution among the target solutions comprises,

according to a pareto domination principle, a smaller and more optimal domination mode is adopted, non-dominated solutions of the iteration are determined in the target solution of each iteration, the non-dominated solutions of each iteration are put into a non-dominated solution set, and the non-dominated solution set comprises all non-dominated solutions of all past iterations;

in response to determining the sub-region containing the minimum number of target solutions, taking the target solution in the sub-region as the global optimal solution;

the velocity is calculated using the following formula:

where ω denotes the inertia factor, C ₁ And C ₂ Denotes a learning factor, V _i+1 Representing said speed, V, of the current iteration _i Represents said speed of the last iteration, gbest _i The neighborhood radius, pbest, corresponding to the globally optimal solution representing the last iteration _i Representing the neighborhood radius, eps, corresponding to the individual optimal solution of the last iteration _i Representing the neighborhood radius of the last iteration;

and calculating the neighborhood radius using the following formula:

Eps _i+1 ＝V _i+1 +Eps _i

wherein, eps _i+1 Representing the neighborhood radius, V, of the iteration _i Representing the speed of the current iteration;

the outlier detection module is configured to determine each of the data points as an outlier in response to a number of the neighborhood data points for the data point within the neighborhood radius being less than or equal to the density threshold.