CN113919235A

CN113919235A - Method and medium for detecting abnormal emission of mobile source pollution based on LSTM evolution clustering

Info

Publication number: CN113919235A
Application number: CN202111269866.9A
Authority: CN
Inventors: 许镇义; 王仁军; 康宇; 曹洋; 王瑞宾
Original assignee: Institute of Artificial Intelligence of Hefei Comprehensive National Science Center
Current assignee: Institute of Artificial Intelligence of Hefei Comprehensive National Science Center
Priority date: 2021-10-29
Filing date: 2021-10-29
Publication date: 2022-01-11
Anticipated expiration: 2041-10-29
Also published as: CN113919235B

Abstract

The invention discloses a mobile source pollution abnormal emission detection method and medium based on LSTM evolutionary clustering, wherein the method takes motor vehicle OBD time sequence data as a research object and comprises the following steps: extracting an OBD time sequence data set of the motor vehicle; analyzing the correlation of the influence factors of the motor vehicle exhaust pollutant emission; constructing a time sequence running condition of the motor vehicle; and constructing an unsupervised detection model of vehicle exhaust emission. The mobile source pollution abnormal emission detection method based on LSTM evolution clustering optimizes the weight of an input data time step by using an evolution algorithm evolution principle, helps LSTM to promote the attention to the time step, and further improves the pollutant concentration prediction accuracy; the method can help technicians to analyze and process abnormal vehicle emission, and provides a feasible method for reducing urban air pollution.

Description

Method and medium for detecting abnormal emission of mobile source pollution based on LSTM evolution clustering

Technical Field

The invention relates to the technical field of environmental monitoring, in particular to a mobile source pollution abnormal emission detection method and medium based on LSTM evolution clustering.

Background

The existing prediction for the pollution emission of a mobile source usually adopts a static analysis method, namely, a plurality of historical emission data of a vehicle are analyzed simultaneously to obtain a prediction model of future emission, but the dynamic change characteristic in a time sequence state is ignored. Meanwhile, in an actual driving scenario, nitrogen oxide (NOx) emissions of a mobile source are affected by various indexes (such as actual output torque percentage, engine water temperature, engine fuel temperature, and the like), so that the time-series emission prediction analysis of vehicle pollutants has high complexity. In this regard, it is effective to conduct a study using a network model having a time attention mechanism. LSTM is a network model with long and short term attention mechanism and is very beneficial to the time sequence prediction of the emission of mobile pollution sources. However, the attention is focused on the variable characteristics of the input data, and certain loss exists in the bias weight of the data in time step, so that the accuracy of the prediction of the pollutant emission concentration is insufficient.

The emission level of the mobile source pollution is generally divided by adopting a threshold comparison method, namely, the vehicle is judged to belong to a high-emission or low-emission type by comparing a pollutant emission concentration predicted value with the threshold, but the detection method lacks strict scientific basis and cannot truly reflect the emission characteristics of the vehicle running on the road. The invention adopts a new effective unsupervised detection mode.

Disclosure of Invention

The invention provides a mobile source pollution abnormal emission detection method and medium based on LSTM evolution clustering, which can at least solve one of technical problems in the background technology.

In order to achieve the purpose, the invention adopts the following technical scheme:

a method for detecting abnormal emission of mobile source pollution based on LSTM evolutionary clustering comprises executing the following steps by computer equipment,

collecting monitoring data of a vehicle-mounted diagnosis system of a road moving source, and inputting the monitoring data into a preset LSTM evolution optimization emission prediction model of pollutant NOx to detect abnormal pollution emission;

the LSTM evolution optimization emission prediction model of the pollutant NOx is constructed by the following steps:

s1: extracting an OBD data set of the motor vehicle; collecting monitoring data of a vehicle-mounted diagnosis system of a road moving source, wherein the monitoring data comprises pollutant NOx in tail gas and other vehicle attribute data, and preprocessing a data set;

s2: analyzing the correlation of pollutant emission influence factors; performing Spearman correlation analysis on various attribute data, calculating correlation coefficients of each attribute and pollutant NOx, and screening out specified influence attributes;

s3: constructing a time sequence dynamic driving condition; the method comprises the following steps that a multi-dimensional time sequence working condition data set is formed by pollutant NOx and vehicle appointed influence attributes and is divided into a training set, a testing set and a verification set;

s4: constructing an unsupervised tail gas emission detection model; namely, an LSTM evolution optimization emission prediction model of pollutant NOx is constructed, and a high emission category is aggregated by adopting an unsupervised clustering algorithm.

Further, the step S1 is specifically subdivided into the following steps:

s11: collecting data from an OBD (on-board diagnostics) of a diesel vehicle at a sampling interval of 5s, wherein the sampling attributes comprise the engine speed, the actual output torque percentage, the engine water temperature, the engine oil temperature, the post-treatment downstream NOx value, the post-treatment downstream oxygen value, the atmospheric pressure, the ambient temperature, the post-treatment exhaust gas mass flow, the urea tank liquid level percentage, the urea tank temperature, the vehicle speed and the accelerator pedal opening;

s12: preprocessing operations such as missing value filling and irrelevant attribute deleting are carried out on the collected OBD data, wherein the missing value data are filled by using adjacent values.

Further, the step S2 is specifically subdivided into the following steps:

s21: the calculation formula of the Spearman correlation coefficient rho of the tail gas pollutant NOx and the influencing factors is as follows:

wherein x is_iFor the ith sample value of the influencing factor,

is the mean value of the property, y_iFor the ith sample value of the contaminant,

is the mean value thereof;

s22: selecting a main influence attribute according to a calculation result of the correlation coefficient rho, wherein an expression is as follows:

|ρ|≥0.4

further, step S3: the time sequence dynamic driving condition data set is constructed and divided into the following steps:

s31: determining the total number n of samples, the time step t and the attribute dimension m, and constructing a time sequence attribute data set X ═ X¹，X²，...，X^p，...，X^n-t+1Therein of

Its corresponding tag dataset y ═ y¹，y²，...，y^p，...，y^n-t+1Therein of

S32: according to the proportion of 7: 2: 1 divides the time series data set into a training set, a test set and a verification set.

Further, the step S4: the construction of the exhaust emission unsupervised detection model can be subdivided into the following steps:

s41: constructing an LSTM evolution optimization model, and optimizing attention level weight parameters of the LSTM model by using an evolution algorithm;

s42: after a prediction result of the pollutant concentration is obtained by the S41 model, a data set consisting of a prediction error and an influence attribute of the pollutant concentration is standardized by using an unsupervised K-means clustering algorithm;

the normalization is calculated as follows:

wherein mu represents the mean value of the column where X is located, and sigma represents the variance of the column where X is located;

s43: performing K-means clustering on the standardized data set obtained in the step S42;

s44: in step S43, determining the optimal clustering number K (K belongs to {2, 3, 4, 5, 6}) of the K-means clustering algorithm by using the DBI index; the DBI is calculated as follows:

wherein k represents the number of clusters, avg (C)_i) Represents the ith type sample point to the cluster center u_iEuclidean distance average of d_cen(u_i，u_j) Indicates the ith cluster center u_iAnd the j-th cluster center u_jThe Euclidean distance between;

s45: after the optimal cluster number k is obtained in step S44, the high emission class is discriminated by calculating the score; the cluster i score is calculated as follows:

wherein i is more than or equal to 1 and less than or equal to k and mu (mu)_i) Representative Cluster i contaminant predictionMean value of errors, σ (σ)_i) Standard deviation, θ, representing the prediction error of the cluster i contaminant_iRepresents the ratio of the number of clusters i, 0 < theta_i＜1；

Step S46: the score set S ═ { S } calculated in step S45₁，S₂，...，S_kH, selecting the maximum value S in the set_maxThe corresponding category is a high emission category.

Further, the step S41 specifically includes:

the LSTM network has three gates, namely an input gate, an output gate and a forgetting gate;

for the LSTM network, assume f_t，i_t，o_tThe values representing the forgetting gate, the input gate and the output gate at time t, respectively, are calculated as follows:

f_t＝σ(W_xfx_t+W_hfh_t-1+W_cfC_t-1+b_f)

i_t＝σ(W_xix_t+W_hih_t-1+W_ciC_t-1+b_i)

o_t＝σ(W_xox_t+W_hoh_t-1+W_coC_t-1+b_o)

wherein, X_tData representing input at time t, h_t-1Represents the output value at time t-1, C_t-1Cell memory value, W, representing time t-1_**Represents a weight coefficient, b_*Representing an offset vector, and sigma representing a sigmoid function, wherein the function expression of the sigma represents as follows:

wherein the content of the first and second substances,

σ(x)∈(0，1)。

further, the specific process of optimizing the LSTM by the evolutionary algorithm in step S41 is as follows:

(1) initializing a population and individuals; 0/1 encoding is carried out on the training set with the input step length of t, n individuals are encoded on the assumption that the population size is n, and the encoding length of each individual is 6 x t;

(2) calculating a weight and a fitness value; taking 0/1 coded information of every 6 unit lengths on an individual as a 2-system numerical value, converting the 2-system numerical value into 10-system numerical values, wherein t total numerical values correspond to the weight of t items in a time step respectively; carrying out array multiplication operation on the weight and the training set of the step length t, feeding the obtained result into an LSTM network, and taking the prediction error of the network as the fitness value of the individual; obtaining fitness values of n individuals in the contemporary population;

(3) selecting; dividing the population into m groups randomly, wherein m is more than 1 and less than n, n can be evenly divided by m, and selecting the individual with the optimal fitness in each group;

(4) crossover and mutation; carrying out permutation, combination and pairing on the plurality of individuals selected in the step (3), namely 2 pairs, and then carrying out cross operation to generate offspring individuals accompanied with a certain variation probability;

(5) generating a new population; forming a new population by the individuals selected in the step (3) and the newly generated filial generation individuals in the step (4), and eliminating the rest individuals; the number of the filial generation individuals generated in the step (4) needs to meet the condition: the new population and the original population have the same scale;

(6) repeating the steps (2) to (5) p times, namely evolving p generations; and selecting the individual with the optimal fitness in the population of the last generation as the optimal solution of the attention weight to be obtained.

Further, after the LSTM model obtains the prediction result, a prediction error needs to be calculated to determine whether to continue using EA for optimization, and a specific prediction error calculation index RMSE is as follows:

where y represents the true value vector and y' represents the predictor vector.

Further, the K-means clustering algorithm in step S43 includes the following steps:

s431: randomly initializing samples into k cluster centers;

s432: calculating the distances from the sample points to all cluster centers, and dividing the distances into cluster ranges with the minimum distances;

s433: respectively calculating the sample mean values of k clusters, and marking the sample mean values as new k cluster centers;

s434: repeating the steps S432 to S433 until the cluster centers are not changed;

s435: the algorithm stops at a.

In yet another aspect, the present invention also discloses a computer readable storage medium storing a computer program, which when executed by a processor causes the processor to perform the steps of the method as described above.

In yet another aspect, the present invention also discloses a computer device comprising a memory and a processor, the memory storing a computer program which, when executed by the processor, causes the processor to perform the steps of the above method.

According to the technical scheme, the mobile source pollution abnormal emission detection method and medium based on LSTM evolution clustering optimize the weight of the input data time step by using the evolution algorithm evolution principle, help LSTM to improve the attention to the time step, and further improve the pollutant concentration prediction accuracy.

The method obtains the mobile source pollutant concentration emission model by utilizing deep network learning and training of evolution optimization, and the model has higher prediction precision compared with the traditional non-optimization model under the time sequence dimension. And furthermore, the unsupervised abnormal emission detection and identification of the mobile source are realized, technicians are helped to analyze and process the abnormal emission of the vehicle, and a feasible method is provided for reducing the urban air pollution.

Drawings

FIG. 1 is a flow chart of a method of the present invention;

FIG. 2 is a LSTM evolution optimization model structure;

FIG. 3 is a schematic diagram of an LSTM gated structure;

FIG. 4 is a diagram showing the predicted effect of the LSTM evolution optimization model on the validation set;

FIG. 5 is a clustering result visualization;

FIG. 6 is a graph of scores for each cluster;

fig. 7 is an anomaly visualization of the true NOx concentration and prediction error.

Detailed Description

In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention.

In order to solve the problem of time sequence detection of mobile source pollution emission, the invention provides a road mobile source pollution emission time sequence analysis and unsupervised detection method based on an optimization model combining an evolutionary algorithm and an LSTM, which can predict the emission of tail gas and judge the pollution type of a vehicle by using an unsupervised method.

As shown in FIG. 1, the invention takes the OBD time series data of the motor vehicle as a research object, and comprises the following steps:

s1: extracting an OBD data set of the motor vehicle; collecting on-Board Diagnostics (OBD) monitoring data of a road moving source, wherein the monitoring data comprise pollutant NOx in tail gas and other vehicle attribute data, and preprocessing a data set;

s2: analyzing the correlation of pollutant emission influence factors; performing Spearman correlation analysis on the multiple attribute data, calculating the correlation coefficient of each attribute and pollutant NOx, and screening out main influence attributes;

s3: constructing a time sequence dynamic driving condition; forming a multi-dimensional time sequence working condition data set by the pollutant NOx and the main influence attributes of the vehicle, and dividing the multi-dimensional time sequence working condition data set into a training set, a testing set and a verification set;

s4: constructing an unsupervised tail gas emission detection model; and constructing an LSTM evolution optimization emission prediction model of the pollutant NOx, and aggregating high emission categories by adopting an unsupervised clustering algorithm.

Further, the above step S1: the method comprises the following steps of collecting the OBD data of the tail gas of the motor vehicle, preprocessing the collected data, and specifically:

s11: the method comprises the steps of collecting data from OBD data of the diesel vehicle, wherein the data set is from tests of the diesel vehicle in the mixed fertilizer market in 2021 years, the data are 2121 in total, and the sampling interval is 5s, wherein the sampling attributes comprise engine rotating speed, actual output torque percentage, engine water temperature, engine oil temperature, aftertreatment downstream NOx value, aftertreatment downstream oxygen value, atmospheric pressure, ambient temperature, aftertreatment exhaust gas mass flow, urea tank liquid level percentage, urea tank temperature, vehicle speed, accelerator pedal opening degree and the like.

S12: preprocessing operations such as missing value filling and irrelevant attribute deleting are carried out on the collected OBD data (wherein the missing value data is filled by using adjacent values).

Further, the above step S2: carrying out Spearman correlation analysis on the influence factors of pollutant emission and calculating a correlation coefficient, and specifically subdividing the steps into the following steps:

wherein x is_iFor the ith sample value of the influencing factor,

is the mean value thereof;

|ρ|≥0.4

s31: determining a sample populationN, time step t, attribute dimension m, and constructing a time sequence attribute data set X ═ X¹，X²，...，X^p，...，X^n-t+1Therein of

Further, the above step S4: the construction of the exhaust emission unsupervised detection model can be subdivided into the following steps:

s41: and (3) constructing an LSTM evolution optimization model, namely optimizing the attention layer weight parameters of the LSTM model by using an evolutionary algorithm, and reducing the prediction error of the LSTM model. The structure diagram of the LSTM evolution optimization model is shown in figure 2;

s411: specifically, in step S41, the LSTM network has three gates, which are an input gate, an output gate, and a forgetting gate, and a schematic diagram of a network structure thereof is shown in fig. 3;

s412: for the LSTM network, assume f_t，i_t，o_tThe values representing the forgetting gate, the input gate and the output gate at time t, respectively, are calculated as follows:

f_t＝σ(W_xfx_t+W_hfh_t-1+W_cfC_t-1+b_f)

i_t＝σ(W_xix_t+W_hih_t-1+W_ciC_t-1+b_i)

o_t＝σ(W_xox_t+W_hoh_t-1+W_coC_t-1+b_o)

wherein, X_tData representing input at time t, h_t-1Represents the output value at time t-1, C_t-1When represents t-1The cell memory value, W_**Represents a weight coefficient, b_*Representing an offset vector, and sigma representing a sigmoid function, wherein the function expression of the sigma represents as follows:

wherein the content of the first and second substances,

σ(x)∈(0，1)。

s413: the main flow of the evolutionary algorithm is as follows:

(1) initializing a population;

(2) calculating the fitness, namely measuring the quality degree of the individuals in the initial population;

(3) selecting, namely selecting individuals with good and bad fitness function value measurement for next generation inheritance;

(4) crossing, namely selecting two individuals as parent individuals, and randomly selecting the coding values of the specific positions of the two individuals as the gene information of the corresponding positions of the offspring individuals;

(5) mutation, namely changing the code value of the random position of the child node;

(6) generating a new generation of population;

(7) and (6) judging whether the specific conditions are met or not, if not, repeating the steps (2) - (7), and if so, outputting an optimal result.

S414: specifically, the evolutionary algorithm optimizes the principle of LSTM: an attention layer is added before the LSTM to solve the defect of time step attention bias of the LSTM and improve the accuracy of model prediction. The invention solves the (approximate) optimal weight of the attention layer by using the EA principle, namely, excellent individuals (solutions) in each generation of population are selected by using a competitive elimination mechanism, filial individuals are generated by crossing and mutation operators, and random probability is increased to ensure the difference among the individuals so as to avoid obtaining a local optimal solution. After multi-generation evolution and multiplication, the population seeks a global (approximately) optimal solution to the attention level. The specific process is as follows:

(1) and initializing the population and the individuals. 0/1 encoding is carried out on the training set with the input step size of t, n individuals are encoded on the assumption that the population size is n, and the encoding length of each individual is 6 x t.

(2) The weights and fitness values are calculated. The 0/1 coded information of every 6 unit lengths on an individual is regarded as a 2-system numerical value and is converted into 10-system numerical values, and t total values are respectively corresponding to the weights of t items in a time step. Array multiplication is carried out on the weight and the training set of the step length t, the obtained result is fed into an LSTM network, and the prediction error of the network is used as the fitness value of the individual. Further, the contemporary population obtains fitness values of n individuals in total.

(3) And (4) selecting. And (3) randomly dividing the population into m groups (m is more than 1 and less than n, and n can be evenly divided by m), and selecting the individual with the optimal fitness in each group.

(4) Crossover and mutation. And (3) carrying out permutation, combination and pairing (2 pairs) on the plurality of individuals selected in the step (3), and then carrying out cross operation to generate offspring individuals accompanied with a certain mutation probability.

(5) A new population is generated. And (4) forming a new population by the individuals selected in the step (3) and the newly generated filial individuals in the step (4), and eliminating the rest individuals. Specifically, the number of the offspring generated in step (4) needs to satisfy the condition: the new population and the original population have the same size.

(6) Repeating the steps (2) to (5) p times, i.e. evolving p generations. And selecting the individual with the optimal fitness in the population of the last generation as the (approximate) optimal solution of the attention weight to be obtained.

S415: in the structure of fig. 1, after the LSTM model obtains the prediction result, the prediction error needs to be calculated to determine whether to continue using EA for optimization, and a specific prediction error calculation index RMSE is as follows:

where y denotes a true value vector (n-dimensional) and y' denotes a predicted value vector (n-dimensional).

S42: after the prediction result of the pollutant concentration is obtained by the S41 model, the prediction effect is shown in FIG. 4, and the data set consisting of the prediction error of the pollutant concentration and the influence attribute is standardized by using an unsupervised K-means clustering algorithm. The normalization is calculated as follows:

s43: k-means clustering is performed on the normalized data set obtained in step S42. The cluster visualization effect of the experimental data set is shown in fig. 5; the K-means clustering algorithm mainly comprises the following steps:

s431: randomly initializing samples into k cluster centers;

s435: the algorithm stops.

S44: in step S43, the DBI (Davies-Bouldin Index) Index is used to determine the optimal clustering number K (K belongs to {2, 3, 4, 5, 6}) of the K-means clustering algorithm. The DBI is calculated as follows:

wherein k represents the number of clusters, avg (C)_i) Represents the ith type sample point to the cluster center u_iEuclidean distance average of d_cen(u_i，u_j) Indicates the ith cluster center u_iAnd the j-th cluster center u_jThe euclidean distance between them.

S45: after the optimal cluster number k is obtained in step S44, the high emission class is discriminated by calculating the score. The cluster i score is calculated as follows:

wherein i is more than or equal to 1 and less than or equal to k and mu (mu)_i) Represents (cluster i) the mean value of the prediction error of the contaminant, σ (σ)_i) Represents (cluster i) standard deviation of pollutant prediction error, θ_iRepresents the ratio of the number of clusters i (0 < theta)_i＜1)。

The statistical result of the number ratio θ of each cluster is as follows (the optimal cluster number k in this experiment is 3):

cluster pin	Number ratio (%)
		clu0	8
clu1	35
		clu2	57

Step S46: the score set S ═ { S } calculated in step S45₁，S₂，...，S_kH, selecting the maximum value S in the set_maxThe corresponding category is a high emission category. The scoring is shown in fig. 6 (the optimal cluster number k is 3 in this experiment).

By observing the abnormal label (analog) in the true value of the pollutant emission concentration in (upper) of fig. 7, it can be found that, in the time sequence dimension, the pollutant concentration of the abnormal emission point changes remarkably and abruptly relative to the normal (analog) emission time of the previous segment of the abnormal emission point, the authenticity of the abnormal emission can be visually judged, and the effectiveness of the unsupervised abnormal emission detection of the invention is verified. Based on the method, the reason for abnormal emission of the mobile source can be further explored on the premise of obtaining the abnormal emission moment, and the urban air pollution is improved.

It is understood that the system provided by the embodiment of the present invention corresponds to the method provided by the embodiment of the present invention, and the explanation, the example and the beneficial effects of the related contents can refer to the corresponding parts in the method.

The embodiment of the application also provides an electronic device, which comprises a processor, a communication interface, a memory and a communication bus, wherein the processor, the communication interface and the memory complete mutual communication through the communication bus,

a memory for storing a computer program;

the processor is used for realizing the detection method for abnormal emission of the mobile source pollution based on the LSTM evolutionary clustering when executing the program stored in the memory, and the method comprises the following steps:

s4: constructing an unsupervised tail gas emission detection model; namely, an LSTM evolution optimization emission prediction model of pollutant NOx is constructed, and a high emission category is aggregated by adopting an unsupervised clustering algorithm. The communication bus mentioned in the electronic device may be a Peripheral Component Interconnect (PCI) bus or an Extended Industry Standard Architecture (EISA) bus. The communication bus may be divided into an address bus, a data bus, a control bus, etc. For ease of illustration, only one thick line is shown, but this does not mean that there is only one bus or one type of bus.

The communication interface is used for communication between the electronic equipment and other equipment.

The Memory may include a Random Access Memory (RAM) or a Non-Volatile Memory (NVM), such as at least one disk Memory. Optionally, the memory may also be at least one memory device located remotely from the processor.

The Processor may be a general-purpose Processor, including a Central Processing Unit (CPU), a Network Processor (NP), and the like; the Integrated Circuit may also be a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA), or other Programmable logic devices, discrete Gate or transistor logic devices, or discrete hardware components.

In yet another embodiment provided by the present application, a computer-readable storage medium is further provided, in which a computer program is stored, and the computer program, when executed by a processor, implements the steps of any of the above-mentioned moving source pollution abnormal emission detection methods based on LSTM evolutionary clustering.

In yet another embodiment provided by the present application, there is further provided a computer program product containing instructions that, when run on a computer, cause the computer to perform any of the above-described methods for mobile-sourced pollutant abnormal emission detection based on LSTM evolutionary clustering.

In the above embodiments, the implementation may be wholly or partially realized by software, hardware, firmware, or any combination thereof. When implemented in software, may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. When loaded and executed on a computer, cause the processes or functions described in accordance with the embodiments of the application to occur, in whole or in part. The computer may be a general purpose computer, a special purpose computer, a network of computers, or other programmable device. The computer instructions may be stored in a computer readable storage medium or transmitted from one computer readable storage medium to another, for example, from one website site, computer, server, or data center to another website site, computer, server, or data center via wired (e.g., coaxial cable, fiber optic, Digital Subscriber Line (DSL)) or wireless (e.g., infrared, wireless, microwave, etc.). The computer-readable storage medium can be any available medium that can be accessed by a computer or a data storage device, such as a server, a data center, etc., that incorporates one or more of the available media. The usable medium may be a magnetic medium (e.g., floppy Disk, hard Disk, magnetic tape), an optical medium (e.g., DVD), or a semiconductor medium (e.g., Solid State Disk (SSD)), among others.

It is noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.

All the embodiments in the present specification are described in a related manner, and the same and similar parts among the embodiments may be referred to each other, and each embodiment focuses on the differences from the other embodiments. In particular, for the system embodiment, since it is substantially similar to the method embodiment, the description is simple, and for the relevant points, reference may be made to the partial description of the method embodiment.

The above examples are only intended to illustrate the technical solution of the present invention, but not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.

Claims

1. A mobile source pollution abnormal emission detection method based on LSTM evolutionary clustering is characterized in that the following steps are executed by computer equipment,

2. The LSTM evolutionary clustering-based mobile source pollutant abnormal emission detection method according to claim 1, characterized in that: the step S1 is specifically subdivided into the following steps:

3. The LSTM evolutionary clustering-based mobile source pollutant abnormal emission detection method according to claim 2, characterized in that: the step S2 is specifically subdivided into the following steps:

wherein x is_iFor the ith sample value of the influencing factor,

is the mean value thereof;

|ρ|≥0.4。

4. the LSTM evolutionary clustering-based mobile source pollutant abnormal emission detection method according to claim 3, characterized in that: step S3: the time sequence dynamic driving condition data set is constructed and divided into the following steps:

S32: the time series data set is divided into a training set, a test set and a verification set according to the ratio of 7: 2: 1.

5. The LSTM evolutionary clustering-based mobile source pollutant abnormal emission detection method according to claim 4, characterized in that: the above step S4: the construction of the exhaust emission unsupervised detection model can be subdivided into the following steps:

the normalization is calculated as follows:

wherein i is more than or equal to 1 and less than or equal to k and mu (mu)_i) Mean value, σ (σ) representing the prediction error of the cluster i contaminant_i) Standard deviation, θ, representing the prediction error of the cluster i contaminant_iIndicates the number of clusters i0 < theta_i＜1；

6. The LSTM evolutionary clustering-based mobile source pollutant abnormal emission detection method according to claim 5, characterized in that: the step S41 specifically includes:

f_t＝σ(W_xfx_t+W_hfh_t-1+W_cfC_t-1+b_f)

i_t＝σ(W_xix_t+W_hih_t-1+W_ciC_t-1+b_i)

o_t＝σ(W_xox_t+W_hoh_t-1+W_coC_t-1+b_o)

wherein the content of the first and second substances,

σ(x)∈(0,1)。

7. the LSTM evolutionary clustering-based mobile source pollutant abnormal emission detection method according to claim 6, characterized in that: the specific process of optimizing the LSTM by the evolutionary algorithm in step S41 is as follows:

(3) selecting; the population is divided into m groups randomly, m is more than 1 and less than n, n can be evenly divided by m, and the individual with the optimal fitness in each group is selected;

8. The LSTM evolutionary clustering-based mobile source pollutant abnormal emission detection method according to claim 7, characterized in that:

after the prediction result is obtained by the LSTM model, a prediction error needs to be calculated to judge whether to continue using EA for optimization, and a specific prediction error calculation index RMSE is as follows:

9. The LSTM evolutionary clustering-based mobile source pollutant abnormal emission detection method according to claim 8, characterized in that:

the K-means clustering algorithm in the step S43 comprises the following steps:

s431: randomly initializing samples into k cluster centers;

s435: the algorithm stops.

10. A computer-readable storage medium, storing a computer program which, when executed by a processor, causes the processor to carry out the steps of the method according to any one of claims 1 to 9.