CN114528907B - Industrial abnormal data detection method and device - Google Patents

Industrial abnormal data detection method and device Download PDF

Info

Publication number
CN114528907B
CN114528907B CN202111665118.2A CN202111665118A CN114528907B CN 114528907 B CN114528907 B CN 114528907B CN 202111665118 A CN202111665118 A CN 202111665118A CN 114528907 B CN114528907 B CN 114528907B
Authority
CN
China
Prior art keywords
iteration
data
data point
neighborhood
solution
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202111665118.2A
Other languages
Chinese (zh)
Other versions
CN114528907A (en
Inventor
朱明皓
高勃
荆涛
王光宇
柴学科
高青鹤
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Jiaotong University
Original Assignee
Beijing Jiaotong University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Jiaotong University filed Critical Beijing Jiaotong University
Priority to CN202111665118.2A priority Critical patent/CN114528907B/en
Publication of CN114528907A publication Critical patent/CN114528907A/en
Application granted granted Critical
Publication of CN114528907B publication Critical patent/CN114528907B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02PCLIMATE CHANGE MITIGATION TECHNOLOGIES IN THE PRODUCTION OR PROCESSING OF GOODS
    • Y02P90/00Enabling technologies with a potential contribution to greenhouse gas [GHG] emissions mitigation
    • Y02P90/30Computing systems specially adapted for manufacturing

Landscapes

  • Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The application provides an industrial abnormal data detection method and device; the method comprises the following steps: initializing collected industrial data into data points in a data space, setting a density threshold larger than the collection dimension of the industrial data, and initializing the neighborhood radius of each data point; for each data point, determining a sparse value of the data point by using a difference value of the data point and other surrounding data points on a neighborhood radius, determining an outlier of the data point by using a distance from the neighborhood data point, and taking the sparse value and the outlier as a target solution; for each data point, initializing an individual optimal solution with the target solution; iteration is carried out on the individual optimal solution by adopting a group particle algorithm; in response to the preset iteration times, determining an individual optimal solution of each data point in the last iteration, and reversely deducing a corresponding neighborhood radius by using the individual optimal solution; for each data point, in response to the number of neighborhood data points within the neighborhood radius being less than or equal to the density threshold, determining the data point as an outlier.

Description

Industrial abnormal data detection method and device
Technical Field
The embodiment of the application relates to the technical field of industrial data processing, in particular to an industrial abnormal data detection method and device.
Background
In the industrial data processing, the data points in the industrial big data are often detected by adopting globally unified parameters, but the industrial big data are wide and dispersed in source and closely related to a specific industrial field, so that the globally unified parameters cannot effectively eliminate the abnormal data points in the industrial field. Further, industrial fault data in industrial production cannot be removed as data with important significance, but the related industrial abnormal data detection method cannot effectively distinguish abnormal data points needing to be removed from industrial fault data which are not to be removed.
Based on this, a solution that can realize accurate detection of industrial abnormal data is required.
Disclosure of Invention
In view of the above, an object of the present application is to provide a method and an apparatus for detecting industrial abnormal data.
Based on the above purpose, the present application provides an industrial abnormal data detection method, which is applied to a database, and includes:
initializing collected industrial data into data points in a data space, setting a density threshold larger than the collection dimension of the industrial data, and initializing the neighborhood radius of each data point;
for each data point, determining a sparse value of the data point by using a difference value of the data point and other surrounding data points on the neighborhood radius, determining an outlier of the data point by using a distance from a neighborhood data point, and taking the sparse value and the outlier as a target solution;
for each of the data points, initializing an individual optimal solution with the target solution; adopting a group particle algorithm to iterate the individual optimal solution;
in response to the preset iteration times, determining the individual optimal solution of each data point in the last iteration, and reversely deducing the corresponding neighborhood radius by using the individual optimal solution;
for each of the data points, determining that the data point is an outlier in response to the number of neighborhood data points within the neighborhood radius being less than or equal to the density threshold.
Based on the same inventive concept, the present application further provides an industrial abnormal data detection apparatus, which is connected to a database and includes: the device comprises an initialization module, a target solution module, an iteration module and an abnormal point detection module;
wherein the initialization module is configured to initialize the collected industrial data to data points in a data space, set a density threshold greater than an industrial data collection dimension, and initialize a neighborhood radius of each of the data points;
the target solution module is configured to determine a sparse value of each data point by using a difference value of the data point and other surrounding data points on the neighborhood radius, determine an outlier of the data point by using a distance between each data point and a neighborhood data point, and use the sparse value and the outlier as a target solution;
the iterative module configured to initialize an individual optimal solution for each of the data points with the target solution; iterating the individual optimal solution by adopting a group particle algorithm; in response to the preset iteration times, determining the individual optimal solution of each data point in the last iteration, and reversely deducing the corresponding neighborhood radius by using the individual optimal solution;
the outlier detection module is configured to determine that each data point is an outlier in response to the number of neighborhood data points for the data point within the neighborhood radius being less than or equal to the density threshold.
From the above, the method and the device for detecting the industrial abnormal data provided by the application are designed based on the MOPSO (multi-objective particle swarm optimization) and the DBSCAN (density-based clustering method), the different conditions of each data point are comprehensively considered, the respective neighborhood radius is set for each data point, a sparse value and an outlier are designed for each data point to serve as a target solution, the global optimal solution and the individual optimal solution are selected based on the pareto domination principle, the respective neighborhood radius of each data point is obtained by combining the iteration process, each data point can utilize the respective neighborhood radius to evaluate the abnormal data, and therefore the detection accuracy of the abnormal data is improved.
Furthermore, the MOPSO algorithm can be combined with DBSCAN, effective clustering of data clustering is completed when abnormal data are detected, any two attributes, namely dimensions, of data in a data set are subjected to correlation analysis by utilizing a clustering process of clustering to obtain two attributes with the strongest correlation, whether the data represented by the two correlated attributes are abnormal at the same time is analyzed, and if the data represented by the two correlated attributes are abnormal at the same time, the data are industrial fault data and cannot be removed; otherwise, the invalid data need to be removed for transmitting or collecting the fault data by the sensor.
Drawings
In order to more clearly illustrate the technical solutions in the present application or the related art, the drawings needed to be used in the description of the embodiments or the related art will be briefly introduced below, and it is obvious that the drawings in the following description are only embodiments of the present application, and it is obvious for those skilled in the art that other drawings can be obtained according to these drawings without creative efforts.
FIG. 1 is a flow chart of an industrial anomaly data detection method according to an embodiment of the present application;
FIG. 2 is a block diagram of an exemplary embodiment of an apparatus for detecting industrial anomaly data;
fig. 3 is a schematic structural diagram of an electronic device according to an embodiment of the present application.
Detailed Description
In order to make the objects, technical solutions and advantages of the present application more apparent, the present application is further described in detail below with reference to the accompanying drawings in combination with specific embodiments.
It should be noted that technical terms or scientific terms used in the embodiments of the present application should have a general meaning as understood by those having ordinary skill in the art to which the present application belongs, unless otherwise defined. The use of "first," "second," and similar terms in the embodiments of the present application do not denote any order, quantity, or importance, but rather the terms are used to distinguish one element from another. The word "comprising" or "comprises", and the like, means that the element or item listed before the word covers the element or item listed after the word and its equivalents, but does not exclude other elements or items. The terms "connected" or "coupled" and the like are not restricted to physical or mechanical connections, but may include electrical connections, whether direct or indirect.
As described in the background section, the related industrial abnormal data detection method also has difficulty in satisfying the need for abnormal data detection of large data in industrial production.
In the process of implementing the present application, the applicant finds that the related industrial abnormal data detection method has the main problems that: in modern industrial processes, a large amount of engineering data can be collected. However, due to the change of the surrounding environment, improper manual operation, abnormal sensor and other reasons, the acquired data is abnormal, and therefore the abnormal data needs to be cleared up to be capable of carrying out effective data analysis.
In the abnormal data (also called abnormal point) detection method of DBSCAN (density-based clustering method), two parameters of Eps (neighborhood radius) and MinPts (density threshold) are required, in the related method, eps and MinPts are often set globally, that is, the same or similar values are set for all data in the data set to be detected, but the applicant researches and finds that the parameter has great sensitivity to the influence of the result, and because Eps and MinPts are set globally, eps of data points with high density is the same as Eps of data points with low density, so that unified global Eps can flood part of outlier data points, that is, possible abnormal data can be flooded.
Further, the applicant also finds that, in research, when different Eps are set for data points with different densities, outliers can be effectively detected, and the method for setting different Eps for data points with different densities is very suitable for large data detection in the industrial field based on the characteristics of industrial large data.
Specifically, as the industrial big data has wide and dispersed sources and is closely related to the specific industrial field, the complexity of the industrial field can cause various abnormal point data, the density difference of different abnormal point data is large, and if a uniform global Eps value is used, some abnormal data cannot be effectively removed, so that independent Eps need to be set for the data points in the industrial big data set, and the abnormal data can be effectively removed; meanwhile, the abnormal data are not all invalid data, and some abnormal data are industrial fault data, which has great significance for industrial fault analysis and cannot be removed, so that the detection of the abnormal data cannot be uniformly removed, and error data generated by industrial faults need to be identified so as to be split from the error data.
Moreover, because the data volume of the data set in the industrial field is huge, the processing time and the processing complexity of the abnormal data are effectively reduced when the abnormal data are processed, and if the detection of the abnormal data is effectively combined with the clustering of the data, the working cost can be reduced, and the processing efficiency is improved.
It is to be appreciated that the method can be performed by any computing, processing capable apparatus, device, platform, cluster of devices.
Hereinafter, the technical method of the present application will be described in detail with reference to specific examples.
Referring to fig. 1, an industrial abnormal data detection method according to an embodiment of the present application includes the following steps:
step S101, initializing collected industrial data into data points in a data space, setting a density threshold larger than an industrial data collection dimension, and initializing a neighborhood radius of each data point.
In the embodiment of the application, the intelligent battery cell manufacturing data is taken as a specific example, and in the intelligent battery cell manufacturing, a winding process of a chip needs to be performed by using a winding machine, so that the positive electrode plate and the negative electrode plate are subjected to diaphragm assembly manufacturing to form a basic battery cell.
Among the factors that influence the winding process may be: positive plate length, negative plate length, insulation resistance, lower diaphragm length, first alignment degree, second alignment degree, third alignment degree, and the like.
In this embodiment, in each data acquisition of the winding process, the above-mentioned 7 factors affecting the winding process are taken as 7 data dimensions at the time of data acquisition.
Further, the data collected for each winding process is taken as one piece of data, so that each piece of data comprises 7 dimensions of single data; and, a plurality of pieces of data acquired in the multiple winding process are collectively merged into an industrial data set to be processed, and table 1 shown below is formed and stored in a database.
TABLE 1 Industrial data set of winding procedure
Length of negative plate Insulation resistance test Lower diaphragm length Degree of alignment 2 Length of positive plate Degree of alignment 1 Degree of alignment 3
8344.550 43.800 8850.640 3.1390 8074.315 0.424 1.755
8347.714 1220.00 8858.268 2.9220 8075.840 0.822 1.729
8344.965 39.100 8852.46 3.1400 8074.615 0.514 1.755
8345.936 1650.000 8857.128 2.8680 8075.031 0.947 1.756
8346.235 39.100 8853.056 3.1660 8074.708 0.448 1.728
8345.890 1700.000 8856.766 2.8670 8077.133 0.770 1.729
8344.203 1740.000 8856.214 3.2210 8075.192 0.396 1.756
8345.196 1170.000 8855.955 2.8100 8076.001 0.874 1.812
8344.342 45.400 8856.887 3.2200 8074.153 0.455 1.783
8346.027 1030.000 8855.247 2.9770 8076.348 0.728 1.812
8343.856 43.600 8856.541 3.1390 8075.239 0.396 1.783
8345.590 1550.000 8861.460 2.8370 8075.909 0.867 1.840
8345.012 1700.000 8857.629 2.9740 8074.869 0.718 1.811
8345.844 1750.000 8861.270 2.9220 8076.278 0.881 1.785
8345.035 1740.000 8858.802 3.0290 8075.447 0.773 1.840
8345.658 1740.000 8862.529 2.7280 8075.262 1.002 1.840
8344.873 1720.000 8858.198 3.0550 8075.770 0.794 1.840
8345.704 1080.000 8864.566 2.7820 8075.701 0.943 1.784
8343.926 1770.000 8859.182 2.9710 8075.078 0.641 1.840
Further, the industrial data set collected in the database is initialized to be data expressed in a data space, specifically, each piece of data is taken as a data point in the data space, and a Euclidean distance algorithm is adopted to assign a distance between every two data points.
In the example of the winding process of the present embodiment, each row in table 1 is taken as 1 data point of the industrial data set in the data space, wherein each data point has 7 dimensions.
Further, the data in the collected industrial data set is processed according to DBSCAN (density-based clustering method).
Specifically, a MinPts (density threshold) is set for the industrial data set and is taken as the MinPts for each data point in the industrial data set, and in the DBSCAN algorithm, the MinPts is set to be greater than the dimension of the industrial data set, and may be set to be greater than the data acquisition dimension plus 1, or equal to the data acquisition dimension plus 1.
In the example of the winding process of the present embodiment, minPts is set equal to the data acquisition dimension plus 1, that is, minPts =8.
Further, the Eps values for each data point in the industrial data set are initialized to initiate the iterative process described below.
In this embodiment, the initialization of the Eps may be randomly selected for each data point by using a random function or the like, or may be an average value of the distance between each data point and other data points, and the average value is used as the initialization value of the Eps.
And S102, determining a sparse value of each data point by using the difference value of the data point and other surrounding data points on the neighborhood radius, determining an outlier of the data point by using the distance between the data point and the neighborhood data point, and taking the sparse value and the outlier as a target solution.
In the embodiment of the present disclosure, the neighborhood radius is obtained by combining the MOPSO (multi-objective particle swarm optimization) algorithm with the DBSCAN algorithm.
First, a sparse value and an outlier are designed for each data point in the industrial data set to measure the probability that each data point becomes anomalous data, which may also be referred to as an outlier or outlier data point in this application.
Further, for each data point, the measurement results of the sparse value and the outlier are jointly used as a target solution, a non-inferior solution, namely a non-dominant solution, is obtained from a plurality of target solutions based on a Pareto (Pareto) dominant principle, and the Pareto dominant principle is combined with the MOPSO to obtain an optimal neighborhood radius.
Specifically, when measuring the sparse value of each data point, firstly, determining a plurality of other data points around the data point according to the sequence from near to far from the data point to be measured, respectively calculating the difference value of the data point and other data points on the neighborhood radius, and measuring the sparse value of the data point by using the difference value; the larger the difference between the data point to be measured and a data point around the data point is, the larger the difference between the data point to be measured and the data point around the data point is, and further, the larger the sum result is, the larger the difference between the current data point and other data points around the current data point is, and the larger the difference is, the more likely it is to become an abnormal point.
In this embodiment, when selecting the number of the plurality of surrounding data points, the data point having the same value as MinPts can be selected according to the value of MinPts.
Further, based on the above discussion, the following formula is designed to calculate for sparse values:
Figure BDA0003450878210000061
wherein, eps i Neighborhood radius, eps, representing the data points to be measured j Representing the neighborhood radius of other data points around the data point to be measured, and calculating the absolute value of the difference between the two, summing the obtained absolute values, and calculating F 1 As the sparse value for this current data point.
Further, when F 1 The smaller the probability that the data point becomes an outlier.
Further, when measuring the outlier of each data point, firstly, all other data points in the current data point to be measured in the Eps are determined, the sum of the distances between the data point to be measured and the other data points is calculated, and the outlier of the data point to be measured is measured by using the sum of the distances, wherein when the distance value is larger, the probability that the data point to be measured is more likely to become an abnormal point is indicated.
Further, based on the above discussion, the following formula is designed to calculate outliers:
Figure BDA0003450878210000071
wherein x is i Representing data points to be measured, x j Representing other data points within the data point to be measured Eps and calculating the resulting F 2 As an outlier of the data point to be measured.
Further, when F 2 The smaller the probability that the data point becomes an outlier.
Further, the obtained sparse value and the outlier are jointly used as a target solution to measure the possibility that the data point is an abnormal point.
Step S103, initializing an individual optimal solution by using the target solution for each data point; and iterating the individual optimal solution by adopting a group particle algorithm.
In embodiments of the present application, an iterative process may be utilized to convert min (F) 1 ,F 1 ) And as a target, iterating the optimal global optimal solution and the individual optimal solution in the target solution, and taking the smaller or smallest target solution as the individual optimal solution to obtain the neighborhood radius corresponding to the optimal individual optimal solution.
First, since the iterative process in this embodiment is based on the DBSCAN algorithm of MOPSO, the target solution of each data point can be regarded as one particle in the iteration.
Further, if the iteration is the first iteration, the individual optimal solution of each data point needs to be initialized, and the iteration of the individual optimal solution is started.
The obtained target solution of each data point can be used as an initialization value of the individual optimal solution; and initializing the speed of each particle, namely the target solution, by using a random function, and simultaneously, considering the neighborhood radius of each data point as the position of the particle in the MOPSO for facilitating understanding.
It should be noted that, initialization of the neighborhood radius, the particle velocity, and the individual optimal solution may be performed in a first iteration process after starting iteration, or each reference may be initialized first and then the iteration process is started.
In the embodiment of the application, a Pareto domination principle can be adopted, all non-dominated solutions are selected from all target solutions of each iteration, the non-dominated solutions form a non-dominated solution set, and a global optimal solution is selected from the non-dominated solution set.
Specifically, according to the Pareto domination principle, in all target solutions of the iteration, if both a sparse value and an outlier of one target solution reach the minimum at the same time, that is, if there is no sparse value of any other target solution larger than the sparse value of the target solution, and there is no outlier of any other target solution smaller than the outlier of the target solution, it may be considered that the target solution may dominate all other target solutions, and the target solution is used as the only non-dominated solution in all target solutions of the iteration.
Further, in all target solutions of each iteration, if the sparse value and the outlier of one target solution are not simultaneously minimized, the target solution with the minimized sparse value and the minimized outlier is found.
In this case, it may be the case that, among all target solutions in the current iteration, at least one target solution whose sparse value is the smallest is present, but the outlier of the target solution is not the smallest, and at least one target solution whose outlier is the smallest is present, but the outlier of the target solution is not the smallest, and for the target solution whose sparse value and the outlier are the smallest, any one of the target solutions cannot dominate or is not dominated by any other target solution, and therefore, such target solutions may be regarded as non-dominated solutions.
Further, a non-dominated solution set is constructed, and all non-dominated solutions are put into the non-dominated solution set, and in the present embodiment, all non-dominated solutions obtained from past iterations are included in the non-dominated solution set.
Further, in each iteration, the maximum value and the minimum value of the calculated sparse values are used as the range of an abscissa, the maximum value and the minimum value of the calculated outlier are used as the range of an ordinate, a target space is formed, and the target space is further divided into a plurality of sub-regions by using uniform grids, wherein the size sparsity of the grids can be adjusted according to specific requirements.
When the target space is formed, the outlier may be set as the abscissa and the sparse value may be set as the ordinate.
Further, according to the respective sparse value and outlier of the particle, the position of each particle in the target space, that is, the sub-region where the particle is located, can be determined.
Further, the number of particles contained in each sub-region is determined, and the number of particles is used as the spatial density value of each particle in the sub-region, wherein the spatial density value is larger if the number of particles is larger, and is smaller otherwise.
Then, according to the Pareto governing principle, the principle that the smaller the spatial density value is, the better the spatial density value is, the smallest spatial density value is selected from all the particles as a global optimal solution.
In an embodiment of the present disclosure, in each iteration, the velocity of the particle may be updated with the individual optimal solution and the global optimal solution according to the MOPSO.
Specifically, the following velocity update formula may be taken:
V i+1 =ω×V i +C 1 ×rand()×(pbest i -Eps i )+C 2 ×rand()×(gbest i -Eps i )
where ω denotes the inertia factor, C 1 And C 2 Represents a learning factor, C in the present embodiment 1 And C 2 Can take 2,V i+1 Indicates the speed, V, of this iteration i Indicates the speed of the last iteration, gbest i Represents the neighborhood radius, pbest, corresponding to the last iteration's globally optimal solution i Representing the neighborhood radius, eps, corresponding to the individual optimal solution of the last iteration i The neighborhood radius of the last iteration is represented, and the neighborhood radius in the speed updating formulaThe position of the particles is also represented.
Further, the first part of the velocity update formula, i.e. "ω × V i The part can be called as a memory item and represents the influence of the last speed and direction, wherein the value of an inertia factor can influence the range of searching the optimal result, and if the value is larger, the global optimization capability is strong, and the local optimization capability is weak; if the value is smaller, the global optimization capability is weak, the local optimization capability is strong, and a dynamic inertia factor can be adopted to obtain a better optimization result; the second part of the formula, namely "C 1 ×rand()×(pbest i -Eps i ) "part, which may be called self-knowledge item, is a vector pointing from the current point to the best point of the particle itself, and represents that the motion of the particle is derived from self-experience; the third part of the formula, namely "C 2 ×rand()×(gbest i -Eps i ) The "part, which may be called a group recognition item, is a vector pointing from the current point to the best point of the group, reflecting collaboration and knowledge sharing among the particles.
Further, the position of the particle is updated by the velocity using the following position update formula:
Eps i+1 =V i+1 +Eps i
wherein, eps i+1 Is expressed as the neighborhood radius, V, of this iteration i+1 Representing the speed of this iteration, eps i And expressing the neighborhood radius of the last iteration, wherein the neighborhood radius in the position updating formula represents the position concept of the particle in the MOPSO iteration method.
And further, calculating and updating the sparse value and the outlier in the next iteration by using the neighborhood radius of the particles in the iteration.
In this embodiment, if the iteration is not the first iteration, for the individual optimal solution in each iteration, the target solution of the iteration may be obtained by comparing with the individual optimal solution of the historical iteration.
Specifically, according to the Pareto domination principle, the target solution of the data point in the current iteration is compared with the individual optimal solution in the historical iteration, the non-domination solution is used as the individual optimal solution of the current iteration, and the solution is placed in the non-domination solution set.
The overflow threshold may be designed for the non-dominated solution set, and when the number of non-dominated solutions in the non-dominated solution set exceeds a preset overflow threshold, no new non-dominated solution is added to the non-dominated solution set.
And S104, in response to the preset iteration times, determining the individual optimal solution of each data point in the last iteration, and reversely deducing the corresponding neighborhood radius by using the individual optimal solution.
In the embodiment of the present application, when iteration is started, a maximum iteration number may also be designed for the iteration: gMAX; and when the iteration times reach gMAX, stopping the iteration of the individual optimal solution and the global optimal solution, and acquiring the individual optimal solution in the last iteration.
After iteration is completed, for each data point, a neighborhood radius corresponding to the individual optimal solution is determined, and data abnormal points are detected by utilizing the neighborhood radius.
Step S105, for each data point, determining that the data point is an abnormal point in response to the number of the neighborhood data points in the neighborhood radius being less than or equal to the density threshold.
In an embodiment of the present disclosure, for each data point, a density threshold may be used to measure whether the data point is abnormal.
Specifically, for each data point, the number of other data points in the neighborhood radius is determined by using the obtained neighborhood radius.
Further, when the number of other data points is less than or equal to the preset density threshold, the data point is considered as an abnormal point.
In this embodiment, for the detection of the outlier, the outlier may be compared with the density threshold one by traversing each data point, or may be performed in a manner of combining clustering and detection by combining with the DBSCAN algorithm.
The efficiency is reduced and the operation cost is high due to the fact that each data point is traversed for carrying out anomaly detection, detection of the anomaly points can be completed under the condition that clustering is not carried out, clustering and anomaly detection of data can be simultaneously achieved by combining the anomaly point detection of the DBSCAN algorithm, and based on conventional industrial requirements, the mode of combining the DBSCAN algorithm is preferred in the embodiment.
Specifically, for each 1 data point, when the DBSCAN iteration is initiated, an unaccessed tag may be set, the obtained neighborhood radius of the data point is called, all first neighborhood data points of the data point within the neighborhood radius are traversed, and the number of all first neighborhood data points is determined.
Further, when the number of the first neighborhood data points is less than or equal to the density threshold, the data points can be determined as abnormal points, an abnormal data set is constructed, and the data points are placed in the abnormal data set; and when the number of the first neighborhood data points is larger than the density threshold value, not determining the data points as abnormal data, constructing a target class cluster related to the data points, and taking the data points as core data points of the first neighborhood data points.
Further, each 1 first neighborhood data point is analyzed to complete the clustering process.
Specifically, the obtained neighborhood radius of each first neighborhood data point is called, all second neighborhood data points of each first neighborhood data point in the neighborhood radius are traversed, and the number of all second neighborhood data points is determined.
Further, for each first neighborhood data point, when the number of second neighborhood data points is less than or equal to the density threshold, the first neighborhood data point can be determined to be abnormal data and put into an abnormal data set; when the number of the second neighborhood data points is greater than the density threshold, the first neighborhood data points can be determined to be non-abnormal data, and the first neighborhood data points are placed into a target cluster, wherein the target cluster at the moment is the target cluster of the core data points corresponding to the first neighborhood data points, namely the constructed target cluster.
In another embodiment of the present application, the detection of anomalous data is performed on a plurality of industrial data sets using an industrial anomalous data detection method.
In this embodiment, in the case of multiple dimensions and multiple industrial data sets, when abnormal data is determined, correlation analysis may be performed on multiple attributes, that is, dimensions, in all industrial data in advance, and a data point set represented by each of the multiple attributes with the strongest correlation may be determined.
Specifically, based on the industrial data set of the winding process shown in table 1, the negative electrode sheet length and the positive electrode sheet length are determined as two attributes having the greatest correlation, that is, dimensions, according to the calculation of the pearson correlation coefficient.
Further, the length of the negative electrode plate, the insulation resistance, the length of the lower diaphragm, the alignment degree 1, the alignment degree 2 and the alignment degree 3 form an industrial data set 1 with the dimension of 6, and the length of the positive electrode plate, the insulation resistance, the length of the lower diaphragm, the alignment degree 1, the alignment degree 2 and the alignment degree 3 form the industrial data set 1 with the dimension of 6.
Further, the Euclidean distance matrix D of the data points in the industrial data set 1 is obtained through calculation 1 And the Euclidean distance matrix D of data points in the industrial data set 2 2 Wherein D is 1 And D 2 Is a matrix of order n, D 1 In line i represents D 1 The distance of the ith data point (also called the ith data point) from the other data points in (1).
Further, eps was calculated for each data point in the industrial data set 1 and the industrial data set 2 using the same MOPSO iterative method as in the previous example.
Further, the same DBSCAN algorithm as in the previous embodiment is used for detection of abnormal data.
Further, when all the data in the industrial data set 1 are detected, the abnormal data points of the industrial data set 1 shown in table 2 can be obtained.
Table 2 anomaly data points for industrial dataset 1
Insulation resistance Lower diaphragm length Degree of alignment 2 Length of positive plate Degree of alignment 1 Degree of alignment 3
54.8000 8851.7090 2.4620 8029.6650 1.2910 1.4080
56.6000 8850.9680 2.3270 8029.2720 1.3460 1.3200
53.9000 8851.8990 2.3560 8029.0870 1.4420 1.3530
51.9000 8851.5020 2.3480 8029.6650 1.2320 1.3200
53.7000 8853.4520 2.4270 8028.8330 1.1580 1.3790
56.8000 8851.1230 2.3900 8029.3640 1.2010 1.3200
55.4000 8853.1770 2.3880 8029.0870 1.3080 1.4260
56.5000 8853.1070 2.5250 8028.8560 1.1890 1.4340
54.7000 8854.6600 2.5510 8029.4110 1.1540 1.3750
52.9000 8854.6080 2.5510 8029.4570 1.2060 1.4590
Further, the industrial data set 2 is processed in the same manner as described above, and abnormal data points of the industrial data set 2 shown in table 3 are obtained.
TABLE 3 anomaly data points for Industrial data set 1
Length of negative electrode plate Insulation resistance Lower diaphragm length Degree of alignment 2 Degree of alignment 1 Degree of alignment 3
8322.8630 54.8000 8851.7090 2.4620 1.2910 1.4080
8322.3320 56.6000 8850.9680 2.3270 1.3460 1.3200
8322.5630 55.2000 8853.5910 2.4230 1.2110 1.4850
8321.8700 56.6000 8856.6100 2.4880 1.1110 1.4340
8321.3860 53.7000 8853.4520 2.4270 1.1580 1.3790
8321.7310 59.8000 8848.5680 1.9640 1.7400 1.1010
8322.3540 51.8000 8850.2080 1.9500 1.5410 1.0540
8322.3540 56.5000 8853.1070 2.5250 1.1890 1.4340
8322.3090 54.7000 8854.6600 2.5510 1.1540 1.3750
8321.4080 52.9000 8854.6080 2.5510 1.2060 1.4590
Further, common outliers in the industrial data set 1 and the industrial data set 2, that is, outlier data points in the data sets represented by the two attributes with the stronger association, are extracted and shown in table 4 below.
TABLE 4 common points of anomaly for Industrial data set 1 and Industrial data set 2
Length of negative plate Insulation resistance Lower diaphragm length Degree of alignment 2 Length of positive plate Degree of alignment 1 Degree of alignment 3
8322.8630 54.8000 8851.7090 2.4620 8029.6650 1.2910 1.4080
8322.3320 56.6000 8850.9680 2.3270 8029.2720 1.3460 1.3200
8321.3860 53.7000 8853.4520 2.4270 8028.8330 1.1580 1.3790
8322.3540 56.5000 8853.1070 2.5250 8028.8560 1.1890 1.4340
8322.3090 54.7000 8854.6600 2.5510 8029.4110 1.1540 1.3750
8321.4080 52.9000 8854.6080 2.5510 8029.4570 1.2060 1.4590
Further, the common abnormal point in table 4 may be used as industrial fault data and important industrial analysis data, and the relevant fault analyst may analyze the common abnormal point and exclude the industrial fault data from the abnormal point.
Therefore, the method for detecting the industrial abnormal data is designed based on the MOPSO (multi-objective particle swarm optimization) and the DBSCAN (density-based clustering method), different conditions of each data point are comprehensively considered, respective neighborhood radius is set for each data point, a sparse value and an outlier are designed for each data point to serve as a target solution, a global optimal solution and an individual optimal solution are selected based on the principle of pareto domination, the respective neighborhood radius of each data point is obtained by combining an iteration process, each data point can utilize the respective neighborhood radius to evaluate the abnormal data, and therefore detection accuracy of the abnormal data is improved.
Furthermore, the method can be combined with DBSCAN, effective clustering of data clustering is completed when abnormal data are detected, association analysis is performed on any two attributes, namely dimensions, of data in a data set by utilizing a clustering process to obtain two attributes with the strongest association, whether the data represented by the two associated attributes are abnormal at the same time or not is analyzed, and if the data represented by the two associated attributes are abnormal at the same time, the data are industrial fault data and cannot be removed; otherwise, the invalid data need to be removed for transmitting or collecting the fault data by the sensor.
It should be noted that the method of the embodiments of the present application may be executed by a single device, such as a computer or a server. The method of the embodiment can also be applied to a distributed scene and completed by the mutual cooperation of a plurality of devices. In such a distributed scenario, one of the devices may only perform one or more steps of the method of the embodiments of the present application, and the devices may interact with each other to complete the method.
It should be noted that the above describes some embodiments of the present application. Other embodiments are within the scope of the following claims. In some cases, the actions or steps recited in the claims may be performed in a different order than in the embodiments described above and still achieve desirable results. In addition, the processes depicted in the accompanying figures do not necessarily require the particular order shown, or sequential order, to achieve desirable results. In some embodiments, multitasking and parallel processing may also be possible or may be advantageous.
Based on the same inventive concept, corresponding to any embodiment method, the embodiment of the application also provides an industrial abnormal data detection device.
Referring to fig. 2, the industrial abnormal data detecting apparatus is connected to a database, and the apparatus may include: the device comprises an initialization module, a target solution module, an iteration module and an abnormal point detection module.
The initialization module 201 is configured to initialize the collected industrial data to data points in a data space, set a density threshold greater than an industrial data collection dimension, and initialize a neighborhood radius of each of the data points.
The target solution module 202 is configured to determine a sparse value of each data point by using a difference between the data point and other surrounding data points in the neighborhood radius, determine an outlier of the data point by using a distance between the data point and the neighborhood data point, and use the sparse value and the outlier as a target solution.
The iteration module 203 configured to initialize an individual optimal solution for each of the data points using the target solution; iterating the individual optimal solution by adopting a group particle algorithm; and in response to reaching the preset iteration times, determining the individual optimal solution of each data point in the last iteration, and reversely deducing the corresponding neighborhood radius by using the individual optimal solution.
The outlier detection module 204 is configured to determine each of the data points as an outlier in response to a number of the neighborhood data points for the data point within the neighborhood radius being less than or equal to the density threshold.
Wherein, the iteration module 203 is specifically configured to: determining a global optimal solution among the target solutions;
according to a pareto domination principle, obtaining non-domination solutions in the target solution in a smaller and more optimal domination mode, keeping the non-domination solutions in iteration all the time, and determining a global optimal solution of the iteration in all the non-domination solutions of historical iteration;
initializing the speed of each target solution in the first iteration;
calculating the speed of the current iteration by using the speed of the previous iteration, the neighborhood radius, the individual optimal solution and the global optimal solution, calculating the neighborhood radius of the current iteration by using the speed of the current iteration and the neighborhood radius of the previous iteration, and updating each target solution;
and according to a pareto domination principle, determining the individual optimal solution of the current iteration in the target solution of the current iteration of each data point and the individual optimal solution of the historical iteration by adopting a smaller and more optimal domination mode, and executing the next iteration.
The outlier detection module 204 is specifically configured to: traversing a first neighborhood data point for each data point within the neighborhood radius thereof;
in response to determining that the number of the first neighborhood data points is less than or equal to the density threshold, taking the data points as the outliers and placing the outliers into an outlier dataset;
responsive to determining that the number of the first neighborhood data points is greater than the density threshold, not considering the data points as the outliers and constructing a target cluster for the data points;
for each of the first neighborhood data points, determining a number of second neighborhood data points within the neighborhood radius thereof;
in response to determining that the number of the second neighborhood data points is less than or equal to the density threshold, taking the first neighborhood data points as the abnormal points and placing the abnormal points into the abnormal data set;
responsive to determining that the number of second neighborhood data points is greater than the density threshold, not considering the first neighborhood data point as the outlier and placing the first neighborhood data point in the target cluster for the data point.
For convenience of description, the above devices are described as being divided into various modules by functions, and are described separately. Of course, the functionality of the various modules may be implemented in the same one or more pieces of software and/or hardware in practicing embodiments of the present application.
The apparatus of the foregoing embodiment is used to implement the corresponding method for detecting industrial abnormal data in any of the foregoing embodiments, and has the beneficial effects of the corresponding method embodiment, which are not described herein again.
Based on the same inventive concept, corresponding to any of the above embodiments, the embodiments of the present application further provide an electronic device, which includes a memory, a processor, and a computer program stored in the memory and executable on the processor, and when the processor executes the program, the industrial abnormal data detection method according to any of the above embodiments is implemented.
Fig. 3 is a schematic diagram illustrating a more specific hardware structure of an electronic device according to this embodiment, where the electronic device may include: a processor 1010, a memory 1020, an input/output interface 1030, a communication interface 1040, and a bus 1050. Wherein the processor 1010, memory 1020, input/output interface 1030, and communication interface 1040 are communicatively coupled to each other within the device via bus 1050.
The processor 1010 may be implemented by a general-purpose CPU (Central Processing Unit), a microprocessor, an Application Specific Integrated Circuit (ASIC), or one or more Integrated circuits, and is configured to execute related programs to implement the technical solutions provided in the embodiments of the present Application.
The Memory 1020 may be implemented in the form of a ROM (Read Only Memory), a RAM (Random Access Memory), a static Memory device, a dynamic Memory device, or the like. The memory 1020 may store an operating system and other application programs, and when the technical solution provided by the embodiment of the present application is implemented by software or firmware, the relevant program codes are stored in the memory 1020 and called to be executed by the processor 1010.
The input/output interface 1030 is used for connecting an input/output module to input and output information. The i/o module may be configured as a component in a device (not shown) or may be external to the device to provide a corresponding function. The input devices may include a keyboard, a mouse, a touch screen, a microphone, various sensors, etc., and the output devices may include a display, a speaker, a vibrator, an indicator light, etc.
The communication interface 1040 is used for connecting a communication module (not shown in the drawings) to implement communication interaction between the present apparatus and other apparatuses. The communication module can realize communication in a wired mode (such as USB, network cable and the like) and also can realize communication in a wireless mode (such as mobile network, WIFI, bluetooth and the like).
Bus 1050 includes a path that transfers information between various components of the device, such as processor 1010, memory 1020, input/output interface 1030, and communication interface 1040.
It should be noted that although the above-mentioned device only shows the processor 1010, the memory 1020, the input/output interface 1030, the communication interface 1040 and the bus 1050, in a specific implementation, the device may also include other components necessary for normal operation. In addition, those skilled in the art will appreciate that the above-described apparatus may also include only the components necessary to implement the embodiments of the present application, and need not include all of the components shown in the figures.
The apparatus of the foregoing embodiment is used to implement the corresponding method for detecting industrial abnormal data in any of the foregoing embodiments, and has the beneficial effects of the corresponding method embodiment, which are not described herein again.
Based on the same inventive concept, corresponding to any of the above embodiments, the present application also provides a non-transitory computer readable storage medium storing computer instructions for causing the computer to execute the industrial abnormal data detection method according to any of the above embodiments.
Computer-readable media of the present embodiments, including both non-transitory and non-transitory, removable and non-removable media, may implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of computer storage media include, but are not limited to, phase change memory (PRAM), static Random Access Memory (SRAM), dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), read Only Memory (ROM), electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic tape magnetic disk storage or other magnetic storage devices, or any other non-transmission medium that can be used to store information that can be accessed by a computing device.
The computer instructions stored in the storage medium of the above embodiment are used to enable the computer to execute the industrial abnormal data detection method according to any one of the above embodiments, and have the beneficial effects of the corresponding method embodiment, which are not described herein again.
Those of ordinary skill in the art will understand that: the discussion of any embodiment above is meant to be exemplary only, and is not intended to intimate that the scope of the disclosure, including the claims, is limited to these examples; within the context of the present application, features from the above embodiments or from different embodiments may also be combined, steps may be implemented in any order, and there are many other variations of the different aspects of the embodiments of the present application as described above, which are not provided in detail for the sake of brevity.
In addition, well-known power/ground connections to Integrated Circuit (IC) chips and other components may or may not be shown in the provided figures for simplicity of illustration and discussion, and so as not to obscure the embodiments of the application. Furthermore, devices may be shown in block diagram form in order to avoid obscuring embodiments of the application, and this also takes into account the fact that specifics with respect to implementation of such block diagram devices are highly dependent upon the platform within which the embodiments of the application are to be implemented (i.e., specifics should be well within purview of one skilled in the art). Where specific details (e.g., circuits) are set forth in order to describe example embodiments of the application, it should be apparent to one skilled in the art that embodiments of the application can be practiced without, or with variation of, these specific details. Accordingly, the description is to be regarded as illustrative instead of restrictive.
While the present application has been described in conjunction with specific embodiments thereof, many alternatives, modifications, and variations of these embodiments will be apparent to those of ordinary skill in the art in light of the foregoing description. For example, other memory architectures, such as Dynamic RAM (DRAM), may use the discussed embodiments.
The embodiments of the present application are intended to embrace all such alternatives, modifications and variances that fall within the broad scope of the appended claims. Therefore, any omissions, modifications, substitutions, improvements, and the like that may be made without departing from the spirit and principles of the embodiments of the present application are intended to be included within the scope of the present application.

Claims (7)

1. An industrial abnormal data detection method is applied to a database and comprises the following steps:
initializing collected industrial data into data points in a data space, setting a density threshold larger than the collection dimension of the industrial data, and initializing the neighborhood radius of each data point;
for each data point, determining a sparse value of the data point by using a difference value of the data point and other surrounding data points on the neighborhood radius, determining an outlier of the data point by using a distance from a neighborhood data point, and taking the sparse value and the outlier as a target solution;
wherein said determining a sparse value for said data point using a difference in said neighborhood radius from other surrounding data points comprises,
the sparse value is calculated using the formula shown below,
Figure QLYQS_1
wherein, F 1 Sparse value representing the data point, eps i A neighborhood radius, eps, representing the data point j Neighborhood radius, x, representing the other data points around the neighborhood j Representing any other data point around the neighborhood within the radius of the data point, D representing all other data points around, abs representing the absolute value of the difference between the two calculated;
said determining an outlier of a neighborhood data point using a distance to said data point comprises,
outliers were calculated using the formula shown below:
Figure QLYQS_2
wherein x is i Representing said data points and calculating F 2 As an outlier of a data point to be measured, distance represents a Distance between the data point and the other surrounding data points;
for each of the data points, initializing an individual optimal solution with the target solution; iterating the individual optimal solution by adopting a group particle algorithm;
wherein the employing a group particle algorithm iterates the individual optimal solutions, including,
determining a global optimal solution among the target solutions in each iteration;
according to a pareto domination principle, obtaining non-domination solutions in the target solution in a smaller and more optimal domination mode, keeping the non-domination solutions in iteration all the time, and determining a global optimal solution of the iteration in all the non-domination solutions of historical iteration;
initializing the speed of each target solution in the first iteration;
calculating the speed of the current iteration by using the speed of the previous iteration, the neighborhood radius, the individual optimal solution and the global optimal solution, calculating the neighborhood radius of the current iteration by using the speed of the current iteration and the neighborhood radius of the previous iteration, and updating each target solution;
according to a pareto domination principle, determining the individual optimal solution of the current iteration in the target solution of the current iteration of each data point and the individual optimal solution of the historical iteration in a smaller and more optimal domination mode, and executing the next iteration;
wherein said determining, in each iteration, a globally optimal solution among said target solutions comprises,
according to a pareto domination principle, a smaller and more optimal domination mode is adopted, the non-domination solution of the current iteration is determined in the target solution of each iteration, and the non-domination solution of each iteration is put into a non-domination solution set, wherein the non-domination solution set comprises all non-domination solutions of all iterations;
establishing a target space by taking the sparse value and the outlier as coordinate axes, and equally dividing the target space into a plurality of sub-regions;
determining the position of each target solution in the target space and the number of the target solutions contained in each sub-area;
in response to determining the sub-region containing the least number of target solutions, taking the target solution in the sub-region as the global optimal solution;
said calculating said velocity of the current iteration using said velocity of the last iteration, said neighborhood radius, said individual optimal solution, and said global optimal solution, and calculating said neighborhood radius of the current iteration using said velocity of the current iteration and said neighborhood radius of the last iteration, comprising,
the velocity is calculated using the following formula:
V i+1 =ω×V i +C 1 ×rand()×(pbest i -Eps i )+C 2 ×rand()×(gbest i -Eps i )
wherein ω represents the inertiaSex factor, C 1 And C 2 Represents a learning factor, V i+1 Representing said speed, V, of the current iteration i Represents said speed of the last iteration, gbest i The neighborhood radius, pbest, corresponding to the globally optimal solution representing the last iteration i Representing the neighborhood radius, eps, corresponding to the individual optimal solution of the last iteration i Representing the neighborhood radius of a last iteration;
and calculating the neighborhood radius using the following formula:
Eps i+1 =V i+1 +Eps i
wherein, eps i+1 The neighborhood radius, V, representing this iteration i Representing the speed of the current iteration;
in response to the preset iteration times, determining the individual optimal solution of each data point in the last iteration, and reversely deducing the corresponding neighborhood radius by using the individual optimal solution;
for each of the data points, determining that the data point is an outlier in response to the number of neighborhood data points within the neighborhood radius being less than or equal to the density threshold.
2. The method of claim 1, wherein said determining that the data point is an outlier comprises:
for each said data point, traversing a first neighborhood data point for that data point within its said neighborhood radius;
in response to determining that the number of the first neighborhood data points is less than or equal to the density threshold, taking the data points as the outliers and placing the outliers into an outlier dataset;
responsive to determining that the number of the first neighborhood data points is greater than the density threshold, not considering the data points as the outliers and constructing a target cluster for the data points;
for each of the first neighborhood data points, determining a number of second neighborhood data points within the neighborhood radius thereof;
in response to determining that the number of the second neighborhood data points is less than or equal to the density threshold, taking the first neighborhood data points as the abnormal points and placing the abnormal points into the abnormal data set;
responsive to determining that the number of second neighborhood data points is greater than the density threshold, not considering the first neighborhood data point as the outlier and placing the first neighborhood data point in the target cluster for the data point.
3. The method of claim 1, wherein determining the sparse value of the data point using the difference in the neighborhood radius from other surrounding data points comprises:
determining a certain number of other data points around the data point according to the sequence of the distance from near to far;
the certain number is equal to the density threshold;
determining that the sparse value is smaller in response to the data point having a larger difference from the surrounding other data points in the neighborhood radius.
4. The method of claim 1, wherein determining an outlier of a neighborhood data point using a distance to the data point comprises:
determining all neighborhood data points of the data point within the neighborhood radius using the neighborhood radius;
taking a sum of distances of the data point to all other of the neighborhood data points, and determining that the outlier is smaller in response to a larger sum.
5. The method of claim 1, wherein the determining the individual optimal solution for the current iteration among the target solution for the current iteration and the individual optimal solutions for historical iterations for each of the data points comprises:
for each data point, according to the pareto dominance principle, in the individual optimal solution of the historical iteration of the data point and the target solution of the current iteration, in response to the existence of one non-dominance solution, determining the non-dominance solution as the individual optimal solution of the current iteration, and putting the non-dominance solution into the non-dominance solution set;
and in response to determining that a plurality of non-dominant solutions exist, randomly selecting one of all the non-dominant solutions as the individual optimal solution of the iteration, and putting the solution into the non-dominant solution set.
6. The method of claim 1, wherein determining the non-dominant solution of the current iteration in the target solution of each iteration comprises:
in response to the presence of a minimum of both the sparse value and the outlier for one of the target solutions among all of the target solutions for each iteration, determining that the target solution dominates all other target solutions and as the only non-dominated solution;
in all the target solutions of each iteration, in response to the fact that the sparse value and the outlier of one target solution are not the smallest and a plurality of target solutions exist and any one of the sparse value and the outlier is the smallest, it is determined that none of the target solutions dominates any other target solution and none of the target solutions dominates any other target solution, and the target solutions are used as non-dominated solutions.
7. An apparatus for detecting industrial abnormal data, the apparatus being connected to a database and comprising: the system comprises an initialization module, a target solution module, an iteration module and an abnormal point detection module;
the initialization module is configured to initialize the collected industrial data to data points in a data space, set a density threshold larger than an industrial data collection dimension, and initialize a neighborhood radius of each data point;
the target solution module is configured to determine a sparse value of each data point by using a difference value of the data point and other surrounding data points on the neighborhood radius, determine an outlier of the data point by using a distance between each data point and a neighborhood data point, and use the sparse value and the outlier as a target solution;
wherein said determining a sparse value for said data point using a difference in said neighborhood radius from other surrounding data points comprises,
the sparse value is calculated using the formula shown below,
Figure QLYQS_3
wherein, F 1 Sparse values representing the data points, eps i A neighborhood radius, eps, representing the data point j Neighborhood radius, x, representing the other data points around the neighborhood j Representing any other data point around the neighborhood within the radius of the data point, D representing all other data points around, abs representing the absolute value of the difference between the two calculated;
said determining an outlier of a neighborhood data point using a distance to said data point comprises,
outliers were calculated using the formula shown below:
Figure QLYQS_4
wherein x is i Representing said data points and calculating F 2 As an outlier of a data point to be measured, distance represents a Distance between the data point and the other surrounding data points;
the iterative module configured to initialize an individual optimal solution for each of the data points with the target solution; iterating the individual optimal solution by adopting a group particle algorithm; in response to the preset iteration times, determining the individual optimal solution of each data point in the last iteration, and reversely deducing the corresponding neighborhood radius by using the individual optimal solution;
wherein the employing a group particle algorithm iterates the individual optimal solutions, including,
determining a global optimal solution among the target solutions in each iteration;
according to a pareto domination principle, obtaining non-domination solutions in the target solution in a smaller and more optimal domination mode, keeping the non-domination solutions in iteration all the time, and determining a global optimal solution of the iteration in all the non-domination solutions of historical iteration;
initializing the speed of each target solution in the first iteration;
calculating the speed of the current iteration by using the speed of the previous iteration, the neighborhood radius, the individual optimal solution and the global optimal solution, calculating the neighborhood radius of the current iteration by using the speed of the current iteration and the neighborhood radius of the previous iteration, and updating each target solution;
according to a pareto domination principle, determining the individual optimal solution of the current iteration in the target solution of the current iteration of each data point and the individual optimal solution of the historical iteration in a smaller and more optimal domination mode, and executing the next iteration;
wherein the determining, in each iteration, a globally optimal solution among the target solutions comprises,
according to a pareto domination principle, a smaller and more optimal domination mode is adopted, non-dominated solutions of the iteration are determined in the target solution of each iteration, the non-dominated solutions of each iteration are put into a non-dominated solution set, and the non-dominated solution set comprises all non-dominated solutions of all past iterations;
establishing a target space by taking the sparse value and the outlier as coordinate axes, and equally dividing the target space into a plurality of sub-regions;
determining the position of each target solution in the target space and the number of the target solutions contained in each sub-area;
in response to determining the sub-region containing the minimum number of target solutions, taking the target solution in the sub-region as the global optimal solution;
said calculating said velocity of the current iteration using said velocity of the last iteration, said neighborhood radius, said individual optimal solution, and said global optimal solution, and calculating said neighborhood radius of the current iteration using said velocity of the current iteration and said neighborhood radius of the last iteration, comprising,
the velocity is calculated using the following formula:
V i+1 =ω×V i +C 1 ×rand()×(pbest i -Eps i )+C 2 ×rand()×(gbest i -Eps i )
where ω denotes the inertia factor, C 1 And C 2 Denotes a learning factor, V i+1 Representing said speed, V, of the current iteration i Represents said speed of the last iteration, gbest i The neighborhood radius, pbest, corresponding to the globally optimal solution representing the last iteration i Representing the neighborhood radius, eps, corresponding to the individual optimal solution of the last iteration i Representing the neighborhood radius of the last iteration;
and calculating the neighborhood radius using the following formula:
Eps i+1 =V i+1 +Eps i
wherein, eps i+1 Representing the neighborhood radius, V, of the iteration i Representing the speed of the current iteration;
the outlier detection module is configured to determine each of the data points as an outlier in response to a number of the neighborhood data points for the data point within the neighborhood radius being less than or equal to the density threshold.
CN202111665118.2A 2021-12-31 2021-12-31 Industrial abnormal data detection method and device Active CN114528907B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111665118.2A CN114528907B (en) 2021-12-31 2021-12-31 Industrial abnormal data detection method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111665118.2A CN114528907B (en) 2021-12-31 2021-12-31 Industrial abnormal data detection method and device

Publications (2)

Publication Number Publication Date
CN114528907A CN114528907A (en) 2022-05-24
CN114528907B true CN114528907B (en) 2023-04-07

Family

ID=81621400

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111665118.2A Active CN114528907B (en) 2021-12-31 2021-12-31 Industrial abnormal data detection method and device

Country Status (1)

Country Link
CN (1) CN114528907B (en)

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2018072351A1 (en) * 2016-10-20 2018-04-26 北京工业大学 Method for optimizing support vector machine on basis of particle swarm optimization algorithm
CN113743478A (en) * 2021-08-18 2021-12-03 深圳前海微众银行股份有限公司 Abnormal data detection method, electronic device, and storage medium

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7636700B2 (en) * 2004-02-03 2009-12-22 Hrl Laboratories, Llc Object recognition system incorporating swarming domain classifiers
CN110232416B (en) * 2019-06-13 2022-10-11 中国人民解放军空军工程大学 Equipment fault prediction method based on HSMM-SVM
CN110288212B (en) * 2019-06-14 2021-07-02 石家庄铁道大学 Improved MOPSO-based electric taxi newly-built charging station site selection method

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2018072351A1 (en) * 2016-10-20 2018-04-26 北京工业大学 Method for optimizing support vector machine on basis of particle swarm optimization algorithm
CN113743478A (en) * 2021-08-18 2021-12-03 深圳前海微众银行股份有限公司 Abnormal data detection method, electronic device, and storage medium

Also Published As

Publication number Publication date
CN114528907A (en) 2022-05-24

Similar Documents

Publication Publication Date Title
CN109074464B (en) Iterative reweighted least squares for differential privacy
CN114861788A (en) Load abnormity detection method and system based on DBSCAN clustering
CN111898247A (en) Landslide displacement prediction method, equipment and storage medium
CN110858072A (en) Method and device for determining running state of equipment
CN116257663A (en) Abnormality detection and association analysis method and related equipment for unmanned ground vehicle
CN113946983A (en) Method and device for evaluating weak links of product reliability and computer equipment
CN114528907B (en) Industrial abnormal data detection method and device
CN114049463A (en) Binary tree data gridding and grid point data obtaining method and device
JP5715445B2 (en) Quality estimation apparatus, quality estimation method, and program for causing computer to execute quality estimation method
CN114449439A (en) Method and device for positioning underground pipe gallery space
CN110210092B (en) Body temperature data processing method and device, storage medium and terminal equipment
CN116630320A (en) Method and device for detecting battery pole piece, electronic equipment and storage medium
CN112765362A (en) Knowledge graph entity alignment method based on improved self-encoder and related equipment
CN115689061B (en) Wind power ultra-short term power prediction method and related equipment
CN114417964B (en) Satellite operator classification method and device and electronic equipment
CN111798263A (en) Transaction trend prediction method and device
CN113890833B (en) Network coverage prediction method, device, equipment and storage medium
CN116486146A (en) Fault detection method, system, device and medium for rotary mechanical equipment
CN112474435B (en) Rapid sorting method and device for battery modules
CN115879031A (en) Load classification method for adjustable load area and related equipment
CN115081742A (en) Ultra-short-term power prediction method for distributed wind power plant and related equipment
CN112528500B (en) Evaluation method and evaluation equipment for scene graph construction model
US20150142433A1 (en) Irregular Pattern Identification using Landmark based Convolution
CN114330719A (en) Method and electronic equipment for discovering association rule from time sequence chart of events
CN114185882A (en) Bad data correction method and device, storage medium and electronic equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant