CN113919235A - Method and medium for detecting abnormal emission of mobile source pollution based on LSTM evolution clustering - Google Patents

Method and medium for detecting abnormal emission of mobile source pollution based on LSTM evolution clustering Download PDF

Info

Publication number
CN113919235A
CN113919235A CN202111269866.9A CN202111269866A CN113919235A CN 113919235 A CN113919235 A CN 113919235A CN 202111269866 A CN202111269866 A CN 202111269866A CN 113919235 A CN113919235 A CN 113919235A
Authority
CN
China
Prior art keywords
lstm
emission
pollutant
value
clustering
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202111269866.9A
Other languages
Chinese (zh)
Other versions
CN113919235B (en
Inventor
许镇义
王仁军
康宇
曹洋
王瑞宾
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Institute of Artificial Intelligence of Hefei Comprehensive National Science Center
Original Assignee
Institute of Artificial Intelligence of Hefei Comprehensive National Science Center
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Institute of Artificial Intelligence of Hefei Comprehensive National Science Center filed Critical Institute of Artificial Intelligence of Hefei Comprehensive National Science Center
Priority to CN202111269866.9A priority Critical patent/CN113919235B/en
Publication of CN113919235A publication Critical patent/CN113919235A/en
Application granted granted Critical
Publication of CN113919235B publication Critical patent/CN113919235B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F30/00Computer-aided design [CAD]
    • G06F30/20Design optimisation, verification or simulation
    • G06F30/27Design optimisation, verification or simulation using machine learning, e.g. artificial intelligence, neural networks, support vector machines [SVM] or training a model
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • G06F18/232Non-hierarchical techniques
    • G06F18/2321Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions
    • G06F18/23213Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions with fixed number of clusters, e.g. K-means clustering
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/044Recurrent networks, e.g. Hopfield networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/088Non-supervised learning, e.g. competitive learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/12Computing arrangements based on biological models using genetic models
    • G06N3/126Evolutionary algorithms, e.g. genetic algorithms or genetic programming
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2113/00Details relating to the application field
    • G06F2113/08Fluids
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Computation (AREA)
  • General Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • Biophysics (AREA)
  • Health & Medical Sciences (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Software Systems (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computing Systems (AREA)
  • Biomedical Technology (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Mathematical Physics (AREA)
  • Probability & Statistics with Applications (AREA)
  • Physiology (AREA)
  • Genetics & Genomics (AREA)
  • Medical Informatics (AREA)
  • Computer Hardware Design (AREA)
  • Geometry (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention discloses a mobile source pollution abnormal emission detection method and medium based on LSTM evolutionary clustering, wherein the method takes motor vehicle OBD time sequence data as a research object and comprises the following steps: extracting an OBD time sequence data set of the motor vehicle; analyzing the correlation of the influence factors of the motor vehicle exhaust pollutant emission; constructing a time sequence running condition of the motor vehicle; and constructing an unsupervised detection model of vehicle exhaust emission. The mobile source pollution abnormal emission detection method based on LSTM evolution clustering optimizes the weight of an input data time step by using an evolution algorithm evolution principle, helps LSTM to promote the attention to the time step, and further improves the pollutant concentration prediction accuracy; the method can help technicians to analyze and process abnormal vehicle emission, and provides a feasible method for reducing urban air pollution.

Description

Method and medium for detecting abnormal emission of mobile source pollution based on LSTM evolution clustering
Technical Field
The invention relates to the technical field of environmental monitoring, in particular to a mobile source pollution abnormal emission detection method and medium based on LSTM evolution clustering.
Background
The existing prediction for the pollution emission of a mobile source usually adopts a static analysis method, namely, a plurality of historical emission data of a vehicle are analyzed simultaneously to obtain a prediction model of future emission, but the dynamic change characteristic in a time sequence state is ignored. Meanwhile, in an actual driving scenario, nitrogen oxide (NOx) emissions of a mobile source are affected by various indexes (such as actual output torque percentage, engine water temperature, engine fuel temperature, and the like), so that the time-series emission prediction analysis of vehicle pollutants has high complexity. In this regard, it is effective to conduct a study using a network model having a time attention mechanism. LSTM is a network model with long and short term attention mechanism and is very beneficial to the time sequence prediction of the emission of mobile pollution sources. However, the attention is focused on the variable characteristics of the input data, and certain loss exists in the bias weight of the data in time step, so that the accuracy of the prediction of the pollutant emission concentration is insufficient.
The emission level of the mobile source pollution is generally divided by adopting a threshold comparison method, namely, the vehicle is judged to belong to a high-emission or low-emission type by comparing a pollutant emission concentration predicted value with the threshold, but the detection method lacks strict scientific basis and cannot truly reflect the emission characteristics of the vehicle running on the road. The invention adopts a new effective unsupervised detection mode.
Disclosure of Invention
The invention provides a mobile source pollution abnormal emission detection method and medium based on LSTM evolution clustering, which can at least solve one of technical problems in the background technology.
In order to achieve the purpose, the invention adopts the following technical scheme:
a method for detecting abnormal emission of mobile source pollution based on LSTM evolutionary clustering comprises executing the following steps by computer equipment,
collecting monitoring data of a vehicle-mounted diagnosis system of a road moving source, and inputting the monitoring data into a preset LSTM evolution optimization emission prediction model of pollutant NOx to detect abnormal pollution emission;
the LSTM evolution optimization emission prediction model of the pollutant NOx is constructed by the following steps:
s1: extracting an OBD data set of the motor vehicle; collecting monitoring data of a vehicle-mounted diagnosis system of a road moving source, wherein the monitoring data comprises pollutant NOx in tail gas and other vehicle attribute data, and preprocessing a data set;
s2: analyzing the correlation of pollutant emission influence factors; performing Spearman correlation analysis on various attribute data, calculating correlation coefficients of each attribute and pollutant NOx, and screening out specified influence attributes;
s3: constructing a time sequence dynamic driving condition; the method comprises the following steps that a multi-dimensional time sequence working condition data set is formed by pollutant NOx and vehicle appointed influence attributes and is divided into a training set, a testing set and a verification set;
s4: constructing an unsupervised tail gas emission detection model; namely, an LSTM evolution optimization emission prediction model of pollutant NOx is constructed, and a high emission category is aggregated by adopting an unsupervised clustering algorithm.
Further, the step S1 is specifically subdivided into the following steps:
s11: collecting data from an OBD (on-board diagnostics) of a diesel vehicle at a sampling interval of 5s, wherein the sampling attributes comprise the engine speed, the actual output torque percentage, the engine water temperature, the engine oil temperature, the post-treatment downstream NOx value, the post-treatment downstream oxygen value, the atmospheric pressure, the ambient temperature, the post-treatment exhaust gas mass flow, the urea tank liquid level percentage, the urea tank temperature, the vehicle speed and the accelerator pedal opening;
s12: preprocessing operations such as missing value filling and irrelevant attribute deleting are carried out on the collected OBD data, wherein the missing value data are filled by using adjacent values.
Further, the step S2 is specifically subdivided into the following steps:
s21: the calculation formula of the Spearman correlation coefficient rho of the tail gas pollutant NOx and the influencing factors is as follows:
Figure BDA0003327723330000031
wherein x isiFor the ith sample value of the influencing factor,
Figure BDA0003327723330000032
is the mean value of the property, yiFor the ith sample value of the contaminant,
Figure BDA0003327723330000033
is the mean value thereof;
s22: selecting a main influence attribute according to a calculation result of the correlation coefficient rho, wherein an expression is as follows:
|ρ|≥0.4
further, step S3: the time sequence dynamic driving condition data set is constructed and divided into the following steps:
s31: determining the total number n of samples, the time step t and the attribute dimension m, and constructing a time sequence attribute data set X ═ X1,X2,...,Xp,...,Xn-t+1Therein of
Figure BDA0003327723330000034
Its corresponding tag dataset y ═ y1,y2,...,yp,...,yn-t+1Therein of
Figure BDA0003327723330000035
S32: according to the proportion of 7: 2: 1 divides the time series data set into a training set, a test set and a verification set.
Further, the step S4: the construction of the exhaust emission unsupervised detection model can be subdivided into the following steps:
s41: constructing an LSTM evolution optimization model, and optimizing attention level weight parameters of the LSTM model by using an evolution algorithm;
s42: after a prediction result of the pollutant concentration is obtained by the S41 model, a data set consisting of a prediction error and an influence attribute of the pollutant concentration is standardized by using an unsupervised K-means clustering algorithm;
the normalization is calculated as follows:
Figure BDA0003327723330000041
wherein mu represents the mean value of the column where X is located, and sigma represents the variance of the column where X is located;
s43: performing K-means clustering on the standardized data set obtained in the step S42;
s44: in step S43, determining the optimal clustering number K (K belongs to {2, 3, 4, 5, 6}) of the K-means clustering algorithm by using the DBI index; the DBI is calculated as follows:
Figure BDA0003327723330000042
wherein k represents the number of clusters, avg (C)i) Represents the ith type sample point to the cluster center uiEuclidean distance average of dcen(ui,uj) Indicates the ith cluster center uiAnd the j-th cluster center ujThe Euclidean distance between;
s45: after the optimal cluster number k is obtained in step S44, the high emission class is discriminated by calculating the score; the cluster i score is calculated as follows:
Figure BDA0003327723330000043
wherein i is more than or equal to 1 and less than or equal to k and mu (mu)i) Representative Cluster i contaminant predictionMean value of errors, σ (σ)i) Standard deviation, θ, representing the prediction error of the cluster i contaminantiRepresents the ratio of the number of clusters i, 0 < thetai<1;
Step S46: the score set S ═ { S } calculated in step S451,S2,...,SkH, selecting the maximum value S in the setmaxThe corresponding category is a high emission category.
Further, the step S41 specifically includes:
the LSTM network has three gates, namely an input gate, an output gate and a forgetting gate;
for the LSTM network, assume ft,it,otThe values representing the forgetting gate, the input gate and the output gate at time t, respectively, are calculated as follows:
ft=σ(Wxfxt+Whfht-1+WcfCt-1+bf)
it=σ(Wxixt+Whiht-1+WciCt-1+bi)
ot=σ(Wxoxt+Whoht-1+WcoCt-1+bo)
wherein, XtData representing input at time t, ht-1Represents the output value at time t-1, Ct-1Cell memory value, W, representing time t-1**Represents a weight coefficient, b*Representing an offset vector, and sigma representing a sigmoid function, wherein the function expression of the sigma represents as follows:
Figure BDA0003327723330000051
wherein the content of the first and second substances,
Figure BDA0003327723330000052
σ(x)∈(0,1)。
further, the specific process of optimizing the LSTM by the evolutionary algorithm in step S41 is as follows:
(1) initializing a population and individuals; 0/1 encoding is carried out on the training set with the input step length of t, n individuals are encoded on the assumption that the population size is n, and the encoding length of each individual is 6 x t;
(2) calculating a weight and a fitness value; taking 0/1 coded information of every 6 unit lengths on an individual as a 2-system numerical value, converting the 2-system numerical value into 10-system numerical values, wherein t total numerical values correspond to the weight of t items in a time step respectively; carrying out array multiplication operation on the weight and the training set of the step length t, feeding the obtained result into an LSTM network, and taking the prediction error of the network as the fitness value of the individual; obtaining fitness values of n individuals in the contemporary population;
(3) selecting; dividing the population into m groups randomly, wherein m is more than 1 and less than n, n can be evenly divided by m, and selecting the individual with the optimal fitness in each group;
(4) crossover and mutation; carrying out permutation, combination and pairing on the plurality of individuals selected in the step (3), namely 2 pairs, and then carrying out cross operation to generate offspring individuals accompanied with a certain variation probability;
(5) generating a new population; forming a new population by the individuals selected in the step (3) and the newly generated filial generation individuals in the step (4), and eliminating the rest individuals; the number of the filial generation individuals generated in the step (4) needs to meet the condition: the new population and the original population have the same scale;
(6) repeating the steps (2) to (5) p times, namely evolving p generations; and selecting the individual with the optimal fitness in the population of the last generation as the optimal solution of the attention weight to be obtained.
Further, after the LSTM model obtains the prediction result, a prediction error needs to be calculated to determine whether to continue using EA for optimization, and a specific prediction error calculation index RMSE is as follows:
Figure BDA0003327723330000061
where y represents the true value vector and y' represents the predictor vector.
Further, the K-means clustering algorithm in step S43 includes the following steps:
s431: randomly initializing samples into k cluster centers;
s432: calculating the distances from the sample points to all cluster centers, and dividing the distances into cluster ranges with the minimum distances;
s433: respectively calculating the sample mean values of k clusters, and marking the sample mean values as new k cluster centers;
s434: repeating the steps S432 to S433 until the cluster centers are not changed;
s435: the algorithm stops at a.
In yet another aspect, the present invention also discloses a computer readable storage medium storing a computer program, which when executed by a processor causes the processor to perform the steps of the method as described above.
In yet another aspect, the present invention also discloses a computer device comprising a memory and a processor, the memory storing a computer program which, when executed by the processor, causes the processor to perform the steps of the above method.
According to the technical scheme, the mobile source pollution abnormal emission detection method and medium based on LSTM evolution clustering optimize the weight of the input data time step by using the evolution algorithm evolution principle, help LSTM to improve the attention to the time step, and further improve the pollutant concentration prediction accuracy.
The method obtains the mobile source pollutant concentration emission model by utilizing deep network learning and training of evolution optimization, and the model has higher prediction precision compared with the traditional non-optimization model under the time sequence dimension. And furthermore, the unsupervised abnormal emission detection and identification of the mobile source are realized, technicians are helped to analyze and process the abnormal emission of the vehicle, and a feasible method is provided for reducing the urban air pollution.
Drawings
FIG. 1 is a flow chart of a method of the present invention;
FIG. 2 is a LSTM evolution optimization model structure;
FIG. 3 is a schematic diagram of an LSTM gated structure;
FIG. 4 is a diagram showing the predicted effect of the LSTM evolution optimization model on the validation set;
FIG. 5 is a clustering result visualization;
FIG. 6 is a graph of scores for each cluster;
fig. 7 is an anomaly visualization of the true NOx concentration and prediction error.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention.
In order to solve the problem of time sequence detection of mobile source pollution emission, the invention provides a road mobile source pollution emission time sequence analysis and unsupervised detection method based on an optimization model combining an evolutionary algorithm and an LSTM, which can predict the emission of tail gas and judge the pollution type of a vehicle by using an unsupervised method.
As shown in FIG. 1, the invention takes the OBD time series data of the motor vehicle as a research object, and comprises the following steps:
s1: extracting an OBD data set of the motor vehicle; collecting on-Board Diagnostics (OBD) monitoring data of a road moving source, wherein the monitoring data comprise pollutant NOx in tail gas and other vehicle attribute data, and preprocessing a data set;
s2: analyzing the correlation of pollutant emission influence factors; performing Spearman correlation analysis on the multiple attribute data, calculating the correlation coefficient of each attribute and pollutant NOx, and screening out main influence attributes;
s3: constructing a time sequence dynamic driving condition; forming a multi-dimensional time sequence working condition data set by the pollutant NOx and the main influence attributes of the vehicle, and dividing the multi-dimensional time sequence working condition data set into a training set, a testing set and a verification set;
s4: constructing an unsupervised tail gas emission detection model; and constructing an LSTM evolution optimization emission prediction model of the pollutant NOx, and aggregating high emission categories by adopting an unsupervised clustering algorithm.
Further, the above step S1: the method comprises the following steps of collecting the OBD data of the tail gas of the motor vehicle, preprocessing the collected data, and specifically:
s11: the method comprises the steps of collecting data from OBD data of the diesel vehicle, wherein the data set is from tests of the diesel vehicle in the mixed fertilizer market in 2021 years, the data are 2121 in total, and the sampling interval is 5s, wherein the sampling attributes comprise engine rotating speed, actual output torque percentage, engine water temperature, engine oil temperature, aftertreatment downstream NOx value, aftertreatment downstream oxygen value, atmospheric pressure, ambient temperature, aftertreatment exhaust gas mass flow, urea tank liquid level percentage, urea tank temperature, vehicle speed, accelerator pedal opening degree and the like.
S12: preprocessing operations such as missing value filling and irrelevant attribute deleting are carried out on the collected OBD data (wherein the missing value data is filled by using adjacent values).
Further, the above step S2: carrying out Spearman correlation analysis on the influence factors of pollutant emission and calculating a correlation coefficient, and specifically subdividing the steps into the following steps:
s21: the calculation formula of the Spearman correlation coefficient rho of the tail gas pollutant NOx and the influencing factors is as follows:
Figure BDA0003327723330000091
wherein x isiFor the ith sample value of the influencing factor,
Figure BDA0003327723330000092
is the mean value of the property, yiFor the ith sample value of the contaminant,
Figure BDA0003327723330000093
is the mean value thereof;
s22: selecting a main influence attribute according to a calculation result of the correlation coefficient rho, wherein an expression is as follows:
|ρ|≥0.4
further, step S3: the time sequence dynamic driving condition data set is constructed and divided into the following steps:
s31: determining a sample populationN, time step t, attribute dimension m, and constructing a time sequence attribute data set X ═ X1,X2,...,Xp,...,Xn-t+1Therein of
Figure BDA0003327723330000094
Its corresponding tag dataset y ═ y1,y2,...,yp,...,yn-t+1Therein of
Figure BDA0003327723330000095
S32: according to the proportion of 7: 2: 1 divides the time series data set into a training set, a test set and a verification set.
Further, the above step S4: the construction of the exhaust emission unsupervised detection model can be subdivided into the following steps:
s41: and (3) constructing an LSTM evolution optimization model, namely optimizing the attention layer weight parameters of the LSTM model by using an evolutionary algorithm, and reducing the prediction error of the LSTM model. The structure diagram of the LSTM evolution optimization model is shown in figure 2;
s411: specifically, in step S41, the LSTM network has three gates, which are an input gate, an output gate, and a forgetting gate, and a schematic diagram of a network structure thereof is shown in fig. 3;
s412: for the LSTM network, assume ft,it,otThe values representing the forgetting gate, the input gate and the output gate at time t, respectively, are calculated as follows:
ft=σ(Wxfxt+Whfht-1+WcfCt-1+bf)
it=σ(Wxixt+Whiht-1+WciCt-1+bi)
ot=σ(Wxoxt+Whoht-1+WcoCt-1+bo)
wherein, XtData representing input at time t, ht-1Represents the output value at time t-1, Ct-1When represents t-1The cell memory value, W**Represents a weight coefficient, b*Representing an offset vector, and sigma representing a sigmoid function, wherein the function expression of the sigma represents as follows:
Figure BDA0003327723330000101
wherein the content of the first and second substances,
Figure BDA0003327723330000102
σ(x)∈(0,1)。
s413: the main flow of the evolutionary algorithm is as follows:
(1) initializing a population;
(2) calculating the fitness, namely measuring the quality degree of the individuals in the initial population;
(3) selecting, namely selecting individuals with good and bad fitness function value measurement for next generation inheritance;
(4) crossing, namely selecting two individuals as parent individuals, and randomly selecting the coding values of the specific positions of the two individuals as the gene information of the corresponding positions of the offspring individuals;
(5) mutation, namely changing the code value of the random position of the child node;
(6) generating a new generation of population;
(7) and (6) judging whether the specific conditions are met or not, if not, repeating the steps (2) - (7), and if so, outputting an optimal result.
S414: specifically, the evolutionary algorithm optimizes the principle of LSTM: an attention layer is added before the LSTM to solve the defect of time step attention bias of the LSTM and improve the accuracy of model prediction. The invention solves the (approximate) optimal weight of the attention layer by using the EA principle, namely, excellent individuals (solutions) in each generation of population are selected by using a competitive elimination mechanism, filial individuals are generated by crossing and mutation operators, and random probability is increased to ensure the difference among the individuals so as to avoid obtaining a local optimal solution. After multi-generation evolution and multiplication, the population seeks a global (approximately) optimal solution to the attention level. The specific process is as follows:
(1) and initializing the population and the individuals. 0/1 encoding is carried out on the training set with the input step size of t, n individuals are encoded on the assumption that the population size is n, and the encoding length of each individual is 6 x t.
(2) The weights and fitness values are calculated. The 0/1 coded information of every 6 unit lengths on an individual is regarded as a 2-system numerical value and is converted into 10-system numerical values, and t total values are respectively corresponding to the weights of t items in a time step. Array multiplication is carried out on the weight and the training set of the step length t, the obtained result is fed into an LSTM network, and the prediction error of the network is used as the fitness value of the individual. Further, the contemporary population obtains fitness values of n individuals in total.
(3) And (4) selecting. And (3) randomly dividing the population into m groups (m is more than 1 and less than n, and n can be evenly divided by m), and selecting the individual with the optimal fitness in each group.
(4) Crossover and mutation. And (3) carrying out permutation, combination and pairing (2 pairs) on the plurality of individuals selected in the step (3), and then carrying out cross operation to generate offspring individuals accompanied with a certain mutation probability.
(5) A new population is generated. And (4) forming a new population by the individuals selected in the step (3) and the newly generated filial individuals in the step (4), and eliminating the rest individuals. Specifically, the number of the offspring generated in step (4) needs to satisfy the condition: the new population and the original population have the same size.
(6) Repeating the steps (2) to (5) p times, i.e. evolving p generations. And selecting the individual with the optimal fitness in the population of the last generation as the (approximate) optimal solution of the attention weight to be obtained.
S415: in the structure of fig. 1, after the LSTM model obtains the prediction result, the prediction error needs to be calculated to determine whether to continue using EA for optimization, and a specific prediction error calculation index RMSE is as follows:
Figure BDA0003327723330000121
where y denotes a true value vector (n-dimensional) and y' denotes a predicted value vector (n-dimensional).
S42: after the prediction result of the pollutant concentration is obtained by the S41 model, the prediction effect is shown in FIG. 4, and the data set consisting of the prediction error of the pollutant concentration and the influence attribute is standardized by using an unsupervised K-means clustering algorithm. The normalization is calculated as follows:
Figure BDA0003327723330000122
wherein mu represents the mean value of the column where X is located, and sigma represents the variance of the column where X is located;
s43: k-means clustering is performed on the normalized data set obtained in step S42. The cluster visualization effect of the experimental data set is shown in fig. 5; the K-means clustering algorithm mainly comprises the following steps:
s431: randomly initializing samples into k cluster centers;
s432: calculating the distances from the sample points to all cluster centers, and dividing the distances into cluster ranges with the minimum distances;
s433: respectively calculating the sample mean values of k clusters, and marking the sample mean values as new k cluster centers;
s434: repeating the steps S432 to S433 until the cluster centers are not changed;
s435: the algorithm stops.
S44: in step S43, the DBI (Davies-Bouldin Index) Index is used to determine the optimal clustering number K (K belongs to {2, 3, 4, 5, 6}) of the K-means clustering algorithm. The DBI is calculated as follows:
Figure BDA0003327723330000131
wherein k represents the number of clusters, avg (C)i) Represents the ith type sample point to the cluster center uiEuclidean distance average of dcen(ui,uj) Indicates the ith cluster center uiAnd the j-th cluster center ujThe euclidean distance between them.
S45: after the optimal cluster number k is obtained in step S44, the high emission class is discriminated by calculating the score. The cluster i score is calculated as follows:
Figure BDA0003327723330000132
wherein i is more than or equal to 1 and less than or equal to k and mu (mu)i) Represents (cluster i) the mean value of the prediction error of the contaminant, σ (σ)i) Represents (cluster i) standard deviation of pollutant prediction error, θiRepresents the ratio of the number of clusters i (0 < theta)i<1)。
The statistical result of the number ratio θ of each cluster is as follows (the optimal cluster number k in this experiment is 3):
cluster pin Number ratio (%)
clu0 8
clu1 35
clu2 57
Step S46: the score set S ═ { S } calculated in step S451,S2,...,SkH, selecting the maximum value S in the setmaxThe corresponding category is a high emission category. The scoring is shown in fig. 6 (the optimal cluster number k is 3 in this experiment).
By observing the abnormal label (analog) in the true value of the pollutant emission concentration in (upper) of fig. 7, it can be found that, in the time sequence dimension, the pollutant concentration of the abnormal emission point changes remarkably and abruptly relative to the normal (analog) emission time of the previous segment of the abnormal emission point, the authenticity of the abnormal emission can be visually judged, and the effectiveness of the unsupervised abnormal emission detection of the invention is verified. Based on the method, the reason for abnormal emission of the mobile source can be further explored on the premise of obtaining the abnormal emission moment, and the urban air pollution is improved.
In yet another aspect, the present invention also discloses a computer readable storage medium storing a computer program, which when executed by a processor causes the processor to perform the steps of the method as described above.
In yet another aspect, the present invention also discloses a computer device comprising a memory and a processor, the memory storing a computer program which, when executed by the processor, causes the processor to perform the steps of the above method.
It is understood that the system provided by the embodiment of the present invention corresponds to the method provided by the embodiment of the present invention, and the explanation, the example and the beneficial effects of the related contents can refer to the corresponding parts in the method.
The embodiment of the application also provides an electronic device, which comprises a processor, a communication interface, a memory and a communication bus, wherein the processor, the communication interface and the memory complete mutual communication through the communication bus,
a memory for storing a computer program;
the processor is used for realizing the detection method for abnormal emission of the mobile source pollution based on the LSTM evolutionary clustering when executing the program stored in the memory, and the method comprises the following steps:
s1: extracting an OBD data set of the motor vehicle; collecting monitoring data of a vehicle-mounted diagnosis system of a road moving source, wherein the monitoring data comprises pollutant NOx in tail gas and other vehicle attribute data, and preprocessing a data set;
s2: analyzing the correlation of pollutant emission influence factors; performing Spearman correlation analysis on various attribute data, calculating correlation coefficients of each attribute and pollutant NOx, and screening out specified influence attributes;
s3: constructing a time sequence dynamic driving condition; the method comprises the following steps that a multi-dimensional time sequence working condition data set is formed by pollutant NOx and vehicle appointed influence attributes and is divided into a training set, a testing set and a verification set;
s4: constructing an unsupervised tail gas emission detection model; namely, an LSTM evolution optimization emission prediction model of pollutant NOx is constructed, and a high emission category is aggregated by adopting an unsupervised clustering algorithm. The communication bus mentioned in the electronic device may be a Peripheral Component Interconnect (PCI) bus or an Extended Industry Standard Architecture (EISA) bus. The communication bus may be divided into an address bus, a data bus, a control bus, etc. For ease of illustration, only one thick line is shown, but this does not mean that there is only one bus or one type of bus.
The communication interface is used for communication between the electronic equipment and other equipment.
The Memory may include a Random Access Memory (RAM) or a Non-Volatile Memory (NVM), such as at least one disk Memory. Optionally, the memory may also be at least one memory device located remotely from the processor.
The Processor may be a general-purpose Processor, including a Central Processing Unit (CPU), a Network Processor (NP), and the like; the Integrated Circuit may also be a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA), or other Programmable logic devices, discrete Gate or transistor logic devices, or discrete hardware components.
In yet another embodiment provided by the present application, a computer-readable storage medium is further provided, in which a computer program is stored, and the computer program, when executed by a processor, implements the steps of any of the above-mentioned moving source pollution abnormal emission detection methods based on LSTM evolutionary clustering.
In yet another embodiment provided by the present application, there is further provided a computer program product containing instructions that, when run on a computer, cause the computer to perform any of the above-described methods for mobile-sourced pollutant abnormal emission detection based on LSTM evolutionary clustering.
In the above embodiments, the implementation may be wholly or partially realized by software, hardware, firmware, or any combination thereof. When implemented in software, may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. When loaded and executed on a computer, cause the processes or functions described in accordance with the embodiments of the application to occur, in whole or in part. The computer may be a general purpose computer, a special purpose computer, a network of computers, or other programmable device. The computer instructions may be stored in a computer readable storage medium or transmitted from one computer readable storage medium to another, for example, from one website site, computer, server, or data center to another website site, computer, server, or data center via wired (e.g., coaxial cable, fiber optic, Digital Subscriber Line (DSL)) or wireless (e.g., infrared, wireless, microwave, etc.). The computer-readable storage medium can be any available medium that can be accessed by a computer or a data storage device, such as a server, a data center, etc., that incorporates one or more of the available media. The usable medium may be a magnetic medium (e.g., floppy Disk, hard Disk, magnetic tape), an optical medium (e.g., DVD), or a semiconductor medium (e.g., Solid State Disk (SSD)), among others.
It is noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.
All the embodiments in the present specification are described in a related manner, and the same and similar parts among the embodiments may be referred to each other, and each embodiment focuses on the differences from the other embodiments. In particular, for the system embodiment, since it is substantially similar to the method embodiment, the description is simple, and for the relevant points, reference may be made to the partial description of the method embodiment.
The above examples are only intended to illustrate the technical solution of the present invention, but not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.

Claims (10)

1. A mobile source pollution abnormal emission detection method based on LSTM evolutionary clustering is characterized in that the following steps are executed by computer equipment,
collecting monitoring data of a vehicle-mounted diagnosis system of a road moving source, and inputting the monitoring data into a preset LSTM evolution optimization emission prediction model of pollutant NOx to detect abnormal pollution emission;
the LSTM evolution optimization emission prediction model of the pollutant NOx is constructed by the following steps:
s1: extracting an OBD data set of the motor vehicle; collecting monitoring data of a vehicle-mounted diagnosis system of a road moving source, wherein the monitoring data comprises pollutant NOx in tail gas and other vehicle attribute data, and preprocessing a data set;
s2: analyzing the correlation of pollutant emission influence factors; performing Spearman correlation analysis on various attribute data, calculating correlation coefficients of each attribute and pollutant NOx, and screening out specified influence attributes;
s3: constructing a time sequence dynamic driving condition; the method comprises the following steps that a multi-dimensional time sequence working condition data set is formed by pollutant NOx and vehicle appointed influence attributes and is divided into a training set, a testing set and a verification set;
s4: constructing an unsupervised tail gas emission detection model; namely, an LSTM evolution optimization emission prediction model of pollutant NOx is constructed, and a high emission category is aggregated by adopting an unsupervised clustering algorithm.
2. The LSTM evolutionary clustering-based mobile source pollutant abnormal emission detection method according to claim 1, characterized in that: the step S1 is specifically subdivided into the following steps:
s11: collecting data from an OBD (on-board diagnostics) of a diesel vehicle at a sampling interval of 5s, wherein the sampling attributes comprise the engine speed, the actual output torque percentage, the engine water temperature, the engine oil temperature, the post-treatment downstream NOx value, the post-treatment downstream oxygen value, the atmospheric pressure, the ambient temperature, the post-treatment exhaust gas mass flow, the urea tank liquid level percentage, the urea tank temperature, the vehicle speed and the accelerator pedal opening;
s12: preprocessing operations such as missing value filling and irrelevant attribute deleting are carried out on the collected OBD data, wherein the missing value data are filled by using adjacent values.
3. The LSTM evolutionary clustering-based mobile source pollutant abnormal emission detection method according to claim 2, characterized in that: the step S2 is specifically subdivided into the following steps:
s21: the calculation formula of the Spearman correlation coefficient rho of the tail gas pollutant NOx and the influencing factors is as follows:
Figure FDA0003327723320000021
wherein x isiFor the ith sample value of the influencing factor,
Figure FDA0003327723320000022
is the mean value of the property, yiFor the ith sample value of the contaminant,
Figure FDA0003327723320000023
is the mean value thereof;
s22: selecting a main influence attribute according to a calculation result of the correlation coefficient rho, wherein an expression is as follows:
|ρ|≥0.4。
4. the LSTM evolutionary clustering-based mobile source pollutant abnormal emission detection method according to claim 3, characterized in that: step S3: the time sequence dynamic driving condition data set is constructed and divided into the following steps:
s31: determining the total number n of samples, the time step t and the attribute dimension m, and constructing a time sequence attribute data set X ═ X1,X2,...,Xp,...,Xn-t+1Therein of
Figure FDA0003327723320000024
Its corresponding tag dataset y ═ y1,y2,...,yp,...,yn-t+1Therein of
Figure FDA0003327723320000025
S32: the time series data set is divided into a training set, a test set and a verification set according to the ratio of 7: 2: 1.
5. The LSTM evolutionary clustering-based mobile source pollutant abnormal emission detection method according to claim 4, characterized in that: the above step S4: the construction of the exhaust emission unsupervised detection model can be subdivided into the following steps:
s41: constructing an LSTM evolution optimization model, and optimizing attention level weight parameters of the LSTM model by using an evolution algorithm;
s42: after a prediction result of the pollutant concentration is obtained by the S41 model, a data set consisting of a prediction error and an influence attribute of the pollutant concentration is standardized by using an unsupervised K-means clustering algorithm;
the normalization is calculated as follows:
Figure FDA0003327723320000026
wherein mu represents the mean value of the column where X is located, and sigma represents the variance of the column where X is located;
s43: performing K-means clustering on the standardized data set obtained in the step S42;
s44: in step S43, determining the optimal clustering number K (K belongs to {2, 3, 4, 5, 6}) of the K-means clustering algorithm by using the DBI index; the DBI is calculated as follows:
Figure FDA0003327723320000031
wherein k represents the number of clusters, avg (C)i) Represents the ith type sample point to the cluster center uiEuclidean distance average of dcen(ui,uj) Indicates the ith cluster center uiAnd the j-th cluster center ujThe Euclidean distance between;
s45: after the optimal cluster number k is obtained in step S44, the high emission class is discriminated by calculating the score; the cluster i score is calculated as follows:
Figure FDA0003327723320000032
wherein i is more than or equal to 1 and less than or equal to k and mu (mu)i) Mean value, σ (σ) representing the prediction error of the cluster i contaminanti) Standard deviation, θ, representing the prediction error of the cluster i contaminantiIndicates the number of clusters i0 < thetai<1;
Step S46: the score set S ═ { S } calculated in step S451,S2,...,SkH, selecting the maximum value S in the setmaxThe corresponding category is a high emission category.
6. The LSTM evolutionary clustering-based mobile source pollutant abnormal emission detection method according to claim 5, characterized in that: the step S41 specifically includes:
the LSTM network has three gates, namely an input gate, an output gate and a forgetting gate;
for the LSTM network, assume ft,it,otThe values representing the forgetting gate, the input gate and the output gate at time t, respectively, are calculated as follows:
ft=σ(Wxfxt+Whfht-1+WcfCt-1+bf)
it=σ(Wxixt+Whiht-1+WciCt-1+bi)
ot=σ(Wxoxt+Whoht-1+WcoCt-1+bo)
wherein, XtData representing input at time t, ht-1Represents the output value at time t-1, Ct-1Cell memory value, W, representing time t-1**Represents a weight coefficient, b*Representing an offset vector, and sigma representing a sigmoid function, wherein the function expression of the sigma represents as follows:
Figure FDA0003327723320000041
wherein the content of the first and second substances,
Figure FDA0003327723320000042
σ(x)∈(0,1)。
7. the LSTM evolutionary clustering-based mobile source pollutant abnormal emission detection method according to claim 6, characterized in that: the specific process of optimizing the LSTM by the evolutionary algorithm in step S41 is as follows:
(1) initializing a population and individuals; 0/1 encoding is carried out on the training set with the input step length of t, n individuals are encoded on the assumption that the population size is n, and the encoding length of each individual is 6 x t;
(2) calculating a weight and a fitness value; taking 0/1 coded information of every 6 unit lengths on an individual as a 2-system numerical value, converting the 2-system numerical value into 10-system numerical values, wherein t total numerical values correspond to the weight of t items in a time step respectively; carrying out array multiplication operation on the weight and the training set of the step length t, feeding the obtained result into an LSTM network, and taking the prediction error of the network as the fitness value of the individual; obtaining fitness values of n individuals in the contemporary population;
(3) selecting; the population is divided into m groups randomly, m is more than 1 and less than n, n can be evenly divided by m, and the individual with the optimal fitness in each group is selected;
(4) crossover and mutation; carrying out permutation, combination and pairing on the plurality of individuals selected in the step (3), namely 2 pairs, and then carrying out cross operation to generate offspring individuals accompanied with a certain variation probability;
(5) generating a new population; forming a new population by the individuals selected in the step (3) and the newly generated filial generation individuals in the step (4), and eliminating the rest individuals; the number of the filial generation individuals generated in the step (4) needs to meet the condition: the new population and the original population have the same scale;
(6) repeating the steps (2) to (5) p times, namely evolving p generations; and selecting the individual with the optimal fitness in the population of the last generation as the optimal solution of the attention weight to be obtained.
8. The LSTM evolutionary clustering-based mobile source pollutant abnormal emission detection method according to claim 7, characterized in that:
after the prediction result is obtained by the LSTM model, a prediction error needs to be calculated to judge whether to continue using EA for optimization, and a specific prediction error calculation index RMSE is as follows:
Figure FDA0003327723320000051
where y represents the true value vector and y' represents the predictor vector.
9. The LSTM evolutionary clustering-based mobile source pollutant abnormal emission detection method according to claim 8, characterized in that:
the K-means clustering algorithm in the step S43 comprises the following steps:
s431: randomly initializing samples into k cluster centers;
s432: calculating the distances from the sample points to all cluster centers, and dividing the distances into cluster ranges with the minimum distances;
s433: respectively calculating the sample mean values of k clusters, and marking the sample mean values as new k cluster centers;
s434: repeating the steps S432 to S433 until the cluster centers are not changed;
s435: the algorithm stops.
10. A computer-readable storage medium, storing a computer program which, when executed by a processor, causes the processor to carry out the steps of the method according to any one of claims 1 to 9.
CN202111269866.9A 2021-10-29 2021-10-29 Mobile source pollution abnormal emission detection method and medium based on LSTM evolution clustering Active CN113919235B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111269866.9A CN113919235B (en) 2021-10-29 2021-10-29 Mobile source pollution abnormal emission detection method and medium based on LSTM evolution clustering

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111269866.9A CN113919235B (en) 2021-10-29 2021-10-29 Mobile source pollution abnormal emission detection method and medium based on LSTM evolution clustering

Publications (2)

Publication Number Publication Date
CN113919235A true CN113919235A (en) 2022-01-11
CN113919235B CN113919235B (en) 2024-06-21

Family

ID=79243534

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111269866.9A Active CN113919235B (en) 2021-10-29 2021-10-29 Mobile source pollution abnormal emission detection method and medium based on LSTM evolution clustering

Country Status (1)

Country Link
CN (1) CN113919235B (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114461704A (en) * 2022-04-14 2022-05-10 浙江理工大学 Method and system for predicting loom availability based on loom productivity
CN115439956A (en) * 2022-09-01 2022-12-06 合肥综合性国家科学中心人工智能研究院(安徽省人工智能实验室) Abnormal time sequence detection method based on self-supervision characterization network and storage medium
CN116881749A (en) * 2023-09-01 2023-10-13 北京建工环境修复股份有限公司 Pollution site construction monitoring method and system
CN117408440A (en) * 2023-12-15 2024-01-16 湖南蒙拓环境科技有限公司 River drain sewage intelligent treatment method and system based on multidimensional sensor

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106599520A (en) * 2016-12-31 2017-04-26 中国科学技术大学 LSTM-RNN model-based air pollutant concentration forecast method
CN106650825A (en) * 2016-12-31 2017-05-10 中国科学技术大学 Automotive exhaust emission data fusion system
CN109492830A (en) * 2018-12-17 2019-03-19 杭州电子科技大学 A kind of mobile pollution source concentration of emission prediction technique based on space-time deep learning
CN111047012A (en) * 2019-12-06 2020-04-21 重庆大学 Air quality prediction method based on deep bidirectional long-short term memory network
CN113435471A (en) * 2021-05-17 2021-09-24 合肥综合性国家科学中心人工智能研究院(安徽省人工智能实验室) Deep feature clustering high-emission mobile source pollution identification method and system
US20220153236A1 (en) * 2019-08-02 2022-05-19 Central South University Method and System for Protecting Operation of Train Under Air Pollution Environment

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106599520A (en) * 2016-12-31 2017-04-26 中国科学技术大学 LSTM-RNN model-based air pollutant concentration forecast method
CN106650825A (en) * 2016-12-31 2017-05-10 中国科学技术大学 Automotive exhaust emission data fusion system
CN109492830A (en) * 2018-12-17 2019-03-19 杭州电子科技大学 A kind of mobile pollution source concentration of emission prediction technique based on space-time deep learning
US20220153236A1 (en) * 2019-08-02 2022-05-19 Central South University Method and System for Protecting Operation of Train Under Air Pollution Environment
CN111047012A (en) * 2019-12-06 2020-04-21 重庆大学 Air quality prediction method based on deep bidirectional long-short term memory network
CN113435471A (en) * 2021-05-17 2021-09-24 合肥综合性国家科学中心人工智能研究院(安徽省人工智能实验室) Deep feature clustering high-emission mobile source pollution identification method and system

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
ZHENYI XU, ET AL.: "Residual Autoencoder-LSTM for City Region Vehicle Emission Pollution Prediction", IEEE, 15 June 2018 (2018-06-15), pages 811 - 816, XP033388926, DOI: 10.1109/ICCA.2018.8444183 *
赵晓阳;兰孝文;张晓琳;: "基于神经网络的空气质量预测模型构建研究", 信息与电脑(理论版), no. 05, 10 March 2020 (2020-03-10) *

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114461704A (en) * 2022-04-14 2022-05-10 浙江理工大学 Method and system for predicting loom availability based on loom productivity
CN115439956A (en) * 2022-09-01 2022-12-06 合肥综合性国家科学中心人工智能研究院(安徽省人工智能实验室) Abnormal time sequence detection method based on self-supervision characterization network and storage medium
CN116881749A (en) * 2023-09-01 2023-10-13 北京建工环境修复股份有限公司 Pollution site construction monitoring method and system
CN116881749B (en) * 2023-09-01 2023-11-17 北京建工环境修复股份有限公司 Pollution site construction monitoring method and system
CN117408440A (en) * 2023-12-15 2024-01-16 湖南蒙拓环境科技有限公司 River drain sewage intelligent treatment method and system based on multidimensional sensor
CN117408440B (en) * 2023-12-15 2024-03-08 湖南蒙拓环境科技有限公司 River drain sewage intelligent treatment method and system based on multidimensional sensor

Also Published As

Publication number Publication date
CN113919235B (en) 2024-06-21

Similar Documents

Publication Publication Date Title
CN113919235B (en) Mobile source pollution abnormal emission detection method and medium based on LSTM evolution clustering
Han et al. Intelligent decision model of road maintenance based on improved weight random forest algorithm
CN111914090B (en) Method and device for enterprise industry classification identification and characteristic pollutant identification
CN114781538A (en) Air quality prediction method and system of GA-BP neural network coupling decision tree
CN113449204B (en) Social event classification method and device based on local aggregation graph attention network
CN117034143B (en) Distributed system fault diagnosis method and device based on machine learning
CN113435471A (en) Deep feature clustering high-emission mobile source pollution identification method and system
CN111949535A (en) Software defect prediction device and method based on open source community knowledge
CN111210085B (en) Coal mine gas concentration early warning method based on multi-view ensemble learning
CN117541095A (en) Agricultural land soil environment quality classification method
US20210241147A1 (en) Method and device for predicting pair of similar questions and electronic equipment
CN116910526A (en) Model training method, device, communication equipment and readable storage medium
Nemes et al. Summary measures for binary classification systems in animal ecology.
CN114781369A (en) Network harmful information keyword extraction method and harmful keyword library construction method
CN113919234A (en) Mobile source emission prediction method, system and equipment based on time sequence characteristic migration
CN111565192A (en) Credibility-based multi-model cooperative defense method for internal network security threats
Jin et al. Concept drift detection based on decision distribution in inconsistent information system
CN113657441A (en) Classification algorithm based on weighted Pearson correlation coefficient and combined with feature screening
CN112733903A (en) Air quality monitoring and alarming method, system, device and medium based on SVM-RF-DT combination
Li et al. A distance-based dynamic random testing strategy for natural language processing DNN models
Zhang et al. Optimal Sparse Survival Trees
Chen Construction of a carbon neutral enterprise environmental performance assessment model based on transformer-GRU
CN117540277B (en) Lost circulation early warning method based on WGAN-GP-TabNet algorithm
CN113378884B (en) Software defect prediction method based on cost sensitivity and random forest
CN112416789B (en) Process metric element evaluation method for evolution software

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant