CN110175195A - Mixed gas detection model construction method based on extreme random tree - Google Patents
Mixed gas detection model construction method based on extreme random tree Download PDFInfo
- Publication number
- CN110175195A CN110175195A CN201910329097.3A CN201910329097A CN110175195A CN 110175195 A CN110175195 A CN 110175195A CN 201910329097 A CN201910329097 A CN 201910329097A CN 110175195 A CN110175195 A CN 110175195A
- Authority
- CN
- China
- Prior art keywords
- feature
- mixed gas
- gas
- extreme random
- algorithm
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000010276 construction Methods 0.000 title claims abstract description 21
- 238000001514 detection method Methods 0.000 title claims abstract description 18
- 238000004422 calculation algorithm Methods 0.000 claims abstract description 75
- 238000000513 principal component analysis Methods 0.000 claims abstract description 8
- 238000012216 screening Methods 0.000 claims abstract description 6
- 238000000034 method Methods 0.000 claims description 18
- 238000003066 decision tree Methods 0.000 claims description 14
- 238000012549 training Methods 0.000 claims description 13
- 239000011159 matrix material Substances 0.000 claims description 10
- 230000008569 process Effects 0.000 claims description 9
- 230000001186 cumulative effect Effects 0.000 claims description 7
- 230000003993 interaction Effects 0.000 claims description 4
- 230000003252 repetitive effect Effects 0.000 claims description 3
- 239000000523 sample Substances 0.000 description 17
- 238000007637 random forest analysis Methods 0.000 description 14
- 238000002474 experimental method Methods 0.000 description 9
- 238000010586 diagram Methods 0.000 description 8
- 230000004044 response Effects 0.000 description 8
- 238000004458 analytical method Methods 0.000 description 6
- 238000013528 artificial neural network Methods 0.000 description 5
- 238000002790 cross-validation Methods 0.000 description 5
- 241001269238 Data Species 0.000 description 4
- 238000009825 accumulation Methods 0.000 description 4
- VNWKTOKETHGBQD-UHFFFAOYSA-N methane Chemical compound C VNWKTOKETHGBQD-UHFFFAOYSA-N 0.000 description 4
- 238000012706 support-vector machine Methods 0.000 description 4
- VGGSQFUCUMXWEO-UHFFFAOYSA-N Ethene Chemical compound C=C VGGSQFUCUMXWEO-UHFFFAOYSA-N 0.000 description 3
- 239000005977 Ethylene Substances 0.000 description 3
- 230000000694 effects Effects 0.000 description 3
- 230000008786 sensory perception of smell Effects 0.000 description 3
- 238000012360 testing method Methods 0.000 description 3
- 230000000052 comparative effect Effects 0.000 description 2
- 238000005516 engineering process Methods 0.000 description 2
- 238000012545 processing Methods 0.000 description 2
- UGFAIRIUMAVXCW-UHFFFAOYSA-N Carbon monoxide Chemical compound [O+]#[C-] UGFAIRIUMAVXCW-UHFFFAOYSA-N 0.000 description 1
- 230000001580 bacterial effect Effects 0.000 description 1
- 239000008280 blood Substances 0.000 description 1
- 210000004369 blood Anatomy 0.000 description 1
- 238000004364 calculation method Methods 0.000 description 1
- 229910002091 carbon monoxide Inorganic materials 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 238000006243 chemical reaction Methods 0.000 description 1
- 238000013480 data collection Methods 0.000 description 1
- 239000003814 drug Substances 0.000 description 1
- 229940079593 drug Drugs 0.000 description 1
- 235000013399 edible fruits Nutrition 0.000 description 1
- 229940117927 ethylene oxide Drugs 0.000 description 1
- 238000000605 extraction Methods 0.000 description 1
- 230000004927 fusion Effects 0.000 description 1
- 238000005259 measurement Methods 0.000 description 1
- 230000001537 neural effect Effects 0.000 description 1
- 238000005457 optimization Methods 0.000 description 1
- 238000002360 preparation method Methods 0.000 description 1
- 230000009467 reduction Effects 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 238000005070 sampling Methods 0.000 description 1
- 239000002689 soil Substances 0.000 description 1
Classifications
-
- G—PHYSICS
- G01—MEASURING; TESTING
- G01N—INVESTIGATING OR ANALYSING MATERIALS BY DETERMINING THEIR CHEMICAL OR PHYSICAL PROPERTIES
- G01N33/00—Investigating or analysing materials by specific methods not covered by groups G01N1/00 - G01N31/00
- G01N33/0004—Gaseous mixtures, e.g. polluted air
- G01N33/0009—General constructional details of gas analysers, e.g. portable test equipment
- G01N33/0027—General constructional details of gas analysers, e.g. portable test equipment concerning the detector
- G01N33/0031—General constructional details of gas analysers, e.g. portable test equipment concerning the detector comprising two or more sensors, e.g. a sensor array
- G01N33/0034—General constructional details of gas analysers, e.g. portable test equipment concerning the detector comprising two or more sensors, e.g. a sensor array comprising neural networks or related mathematical techniques
-
- G—PHYSICS
- G01—MEASURING; TESTING
- G01N—INVESTIGATING OR ANALYSING MATERIALS BY DETERMINING THEIR CHEMICAL OR PHYSICAL PROPERTIES
- G01N33/00—Investigating or analysing materials by specific methods not covered by groups G01N1/00 - G01N31/00
- G01N33/0004—Gaseous mixtures, e.g. polluted air
- G01N33/0009—General constructional details of gas analysers, e.g. portable test equipment
- G01N33/0027—General constructional details of gas analysers, e.g. portable test equipment concerning the detector
- G01N33/0036—General constructional details of gas analysers, e.g. portable test equipment concerning the detector specially adapted to detect a particular component
- G01N33/004—CO or CO2
-
- G—PHYSICS
- G01—MEASURING; TESTING
- G01N—INVESTIGATING OR ANALYSING MATERIALS BY DETERMINING THEIR CHEMICAL OR PHYSICAL PROPERTIES
- G01N33/00—Investigating or analysing materials by specific methods not covered by groups G01N1/00 - G01N31/00
- G01N33/0004—Gaseous mixtures, e.g. polluted air
- G01N33/0009—General constructional details of gas analysers, e.g. portable test equipment
- G01N33/0027—General constructional details of gas analysers, e.g. portable test equipment concerning the detector
- G01N33/0036—General constructional details of gas analysers, e.g. portable test equipment concerning the detector specially adapted to detect a particular component
- G01N33/0047—Organic compounds
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/245—Query processing
- G06F16/2458—Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
- G06F16/2474—Sequence data queries, e.g. querying versioned data
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/10—Complex mathematical operations
- G06F17/16—Matrix or vector computation, e.g. matrix-matrix or matrix-vector multiplication, matrix factorization
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/213—Feature extraction, e.g. by transforming the feature space; Summarisation; Mappings, e.g. subspace methods
- G06F18/2135—Feature extraction, e.g. by transforming the feature space; Summarisation; Mappings, e.g. subspace methods based on approximation criteria, e.g. principal component analysis
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/241—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
- G06F18/2411—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on the proximity to a decision surface, e.g. support vector machines
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/243—Classification techniques relating to the number of classes
- G06F18/24323—Tree-organised classifiers
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- Life Sciences & Earth Sciences (AREA)
- Chemical & Material Sciences (AREA)
- Health & Medical Sciences (AREA)
- General Engineering & Computer Science (AREA)
- Mathematical Physics (AREA)
- Artificial Intelligence (AREA)
- Evolutionary Computation (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Analytical Chemistry (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Evolutionary Biology (AREA)
- Food Science & Technology (AREA)
- Medicinal Chemistry (AREA)
- Bioinformatics & Computational Biology (AREA)
- Biochemistry (AREA)
- General Health & Medical Sciences (AREA)
- Immunology (AREA)
- Pathology (AREA)
- Combustion & Propulsion (AREA)
- Software Systems (AREA)
- Databases & Information Systems (AREA)
- Computational Mathematics (AREA)
- Mathematical Analysis (AREA)
- Mathematical Optimization (AREA)
- Pure & Applied Mathematics (AREA)
- Fuzzy Systems (AREA)
- Probability & Statistics with Applications (AREA)
- Computational Linguistics (AREA)
- Computing Systems (AREA)
- Algebra (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
The invention discloses a kind of mixed gas detection model construction method based on extreme random tree, including carrying out data acquisition to mixed gas, obtain data set, the data set includes at least three gas signal time serieses, and the optimal crooked route of gas signal time series is calculated, it is screened using optimal crooked route;Gas characteristic is extracted to the gas signal time series after screening using Principal Component Analysis;Model is established using extreme random number algorithm, and is classified to target mixed gas.The present invention proposes the mixed gas detection model construction method based on extreme random tree, largely improves classification accuracy and time efficiency.
Description
Technical field
The present invention relates to machine olfaction technical fields, in particular to based on the mixed gas detection model of extreme random tree
Construction method.
Background technique
In current mixed gas detection field, Many researchers have been achieved for good classifying quality, such as using branch
Hold vector machine (SVM), artificial neural network (ANN), k-nearest neighbor (KNN) scheduling algorithm.In order to improve the accuracy rate of classification, wherein
There is researcher to propose a kind of Adaboost.M2 model of optimization, by multiple Classifiers Combination, carries out the classification experiments of drug, pass through
The setting of different fusion rules, final highest recognition accuracy are 91.75%.There are also the posterior probability extracted from SVM
Algorithm for estimating detects 10 kinds of bacterial components in people's blood using machine olfaction, recognition accuracy is higher but time cost compared with
Greatly.Another part researcher's document solves the uncertainty relationship in gas source positioning using the processing of probability bayesian algorithm, leads to simultaneously
The path planning algorithm of Markov decision process is crossed, the location efficiency of gas in practice is improved.PCA and artificial neural network
(ANN) application of algorithm can be improved and differentiate soil moisture content, but ANN algorithm shortage is explanatory, and restrains speed
Degree is slower, and efficiency is lower.There is no the levels that a kind of algorithm can make detection accuracy reach 99% or more in the prior art.And
And never have researcher considered gas sensor itself data accuracy problem;And for traditional characteristic extracting mode
PCA is the algorithm when dimension is higher, when algorithm dimension is not high, needs to construct its feature;And it is calculated in classification
It is stronger for anti-capability of fitting in method, at the same training time speed is fast and the more algorithm of higher classification accuracy not
It supports.But there has been no the models of the extreme random tree algorithm based on random forest innovatory algorithm for current patent, to solve to mix
Field of gas detection problem.
Therefore, how a kind of mixed gas detection model structure based on extreme random tree, with high measurement accuracy is provided
Construction method is those skilled in the art's technical problem urgently to be resolved.
Summary of the invention
The present invention situation low for two kinds of mixed gas classification accuracies, the models such as traditional support vector machines (SVM)
Classification accuracy and time efficiency are not high enough, therefore the present invention proposes the mixed gas detection model based on extreme random tree
Construction method largely improves classification accuracy and time efficiency.Concrete scheme is as follows:
S1, data acquisition is carried out to mixed gas, obtains data set, the data set includes at least three gas signals
Time series, and the optimal crooked route of gas signal time series is calculated, when carrying out gas signal using optimal crooked route
Between sequence screening;
S2, gas characteristic is extracted to the gas signal time series after screening using Principal Component Analysis;
S3, model is established using extreme random number algorithm, and classify to target mixed gas.
Preferably, the optimal crooked route calculating process of gas time sequence is as follows in the S1:
S11, the distance matrix for constructing two gas signal time serieses;Two time serieses are respectively X=(x1, x2...
xm), Y=(y1, y2... yn), wherein two length of time series are m, n.Dm×nFor m × n of two time serieses construction
Distance matrix
Wherein, Dm×nIn element dijIt is to pass through xiAnd yiCoordinate distance is calculated, calculating process are as follows:
dij=| | xi-yj||w
It is exactly Euclidean distance 2- norm, 1≤i≤m, 1≤j≤n as w=2;
S12, pass through Dm×nOne is found apart from the smallest crooked route pmin, i.e., optimal crooked route
pmin={ p1,p2,…pd,…pk}
k∈{max(m,n),m+n+1}
Wherein, pdFor search to point dijWhen, the current Cumulative Distance of crooked route, then pd+1Calculating formula are as follows:
pd+1=pd+min[d(i+1)j,d(i+1)(j+1),di(j+1)];
S13, give up PminMaximum two groups of gas signal time serieses, residual gas signal time sequence is as step 2
Input data.
Preferably, the S2 is specifically included:
The primitive character building of S21, gas signal;It constructs to obtain the original spy of gas signal multidimensional using interaction feature method
Sign;
S22, dimension-reduction treatment is carried out using Principal Component Analysis to the gas signal multidimensional primitive character, obtained original
Data sample.
Preferably, the S3 is specifically included:
S31, in the disaggregated model of extreme random tree, each base classifier is instructed using whole primary data samples
Practice, wherein raw data set D, sample size N, feature quantity M;
S32, decision tree is generated according to CART algorithm;When carrying out node split, in each division node at random from M
M feature is selected in feature, is randomly selected several classifications and is put into one of branch, remaining classification is put into another branch, together
When calculate the best split values of each node, select optimum attributes division, and without cut operator in division;Division
Subset iteration out generates a decision tree to preset value;
S33, by step S31, S32 repetitive operation K times, ultimately generate the extreme random tree mould being made of K decision tree
Type;
S34, the extreme random tree-model after training is tested, final classification results is generated eventually by ballot.
Compared with the prior art the present invention has the advantages that
The invention proposes the dynamic time warping algorithms based on DTW, and classification accuracy is improved 26.87%;It is based on
Primitive character building and Principal Component Analysis Algorithm, classification accuracy improve 25.8%;Change eventually by extreme random tree algorithm
Into the time efficiency problem in random forests algorithm, final classification accuracy rate has reached 99.17%, and time efficiency is than random
Forest algorithm improves 66.85%, only 103.2568 seconds.The method proposed through the invention, solves for mixed gas
Classification problem, random forests algorithm is made that and is largely improved, the classification for improving machine olfaction system is accurate
Rate offers theoretical foundation to simulate the algorithm of olfactory neural system.Using extremely random tree algorithm, generated by ballot decision
Prediction result, generalization ability are stronger;Using whole primary data samples training base classifier, training result precision is higher;Due to
It is random selection in node split, randomness substantially enhances.
Detailed description of the invention
In order to more clearly explain the embodiment of the invention or the technical proposal in the existing technology, below will to embodiment or
Attached drawing needed to be used in the description of the prior art is briefly described, it should be apparent that, the accompanying drawings in the following description is only
The embodiment of the present invention for those of ordinary skill in the art without creative efforts, can be with
Other attached drawings are obtained according to the attached drawing of offer.
Fig. 1 is that the present invention is based on the flow charts of the mixed gas detection model construction method of extreme random tree;
Fig. 2 is that inventive sensor acquires gas data response diagram;
Fig. 3 is inventive sensor TGS2602 to the dynamic response curve figure in the case of Et_L_Me_H;
Fig. 4 is that feature of present invention engineering is abstracted three-dimensional feature figure;
Fig. 5 is the extreme random tree algorithm schematic diagram of the present invention;
Fig. 6 is 10 folding cross validation accuracy rate schematic diagrames after DTW of the present invention;
Fig. 7 is cross validation accuracy rate schematic diagram after feature of present invention building;
Fig. 8 is inventive algorithm model running time comparison diagram.
Specific embodiment
Following will be combined with the drawings in the embodiments of the present invention, and technical solution in the embodiment of the present invention carries out clear, complete
Site preparation description, it is clear that described embodiments are only a part of the embodiments of the present invention, instead of all the embodiments.It is based on
Embodiment in the present invention, it is obtained by those of ordinary skill in the art without making creative efforts it is all its
His embodiment, shall fall within the protection scope of the present invention.
A kind of mixed gas detection model construction method based on extreme random tree is present embodiments provided,
S1 dynamic time warping algorithm (DTW)
The present embodiment is detected with the mixed gas that ethylene-CH4 and ethylene-CO are mixed to get.By 6 under every kind of label
Secondary experiment forms different data sets, wherein each label refers to a kind of gas mixing classification.Continue in the data sampling stage
Time is 300 seconds.Gas is not passed through in the initial 60 second time.The mixed gas for setting concentration ratio is passed through gas at 60 seconds
Interior, mixed gas be passed through the time be 180 seconds.It is passed through without mixed gas within last 60 seconds.Sensor array is classified as 8 sensor groups
At sensor frequency is set as 50HZ, and mixed gas data set is acquired by 8 sensors and obtained.According to time rule by data
Collection is stored, and each data set includes 11 column datas: time (s), temperature, humidity (%) and TGS2600, TGS2612,
TGS2611, TGS2610, TGS2602, TGS2602, TGS2620, TGS2620 sensor acquire data.Sensor acquires data
It is indicated for its resistance value with A, unified value is then converted to by Rs (KOhm)=10* (3110-A)/A.For certain primary experiment
Sensor response diagram referring to Figure of description 2, by taking Et_H_Me_n situation as an example, Et indicate ethylene H represent high concentration, Me table
Showing that methane n represents concentration is zero, and abscissa is the time, and ordinate is the sensor reading after conversion.
In order to probe into the acquisition data cases of sensor, for TGS2602, under same label, (i.e. Et_M_Me_M is marked
Label) the case where respond tracing analysis, be respectively TGS2602 to Et_L_Me_H situation referring to Figure of description 3 (1)-(6)
Under dynamic response curve.As can be seen from the figure for the response in same situation, there are different journeys in same sensor
The variation of degree.Wherein it can clearly be seen that in finally experiment twice, discovery sensor response curve and have before obvious
It is different.It can therefore be concluded that in an experiment, because the problems such as the configuration of experiment condition, it may appear that different degrees of data are different
Cause situation.
By analysis before, data need to carry out effective pretreatment work.Since mixed gas data are when being based on
Between sequence gas signal response curve, for data sets carry out dynamic time warping work.Dynamic time warping is based on dynamic
State plans a kind of algorithm of (DP) thought, and characteristic parameter dislocation is optimized, its basic principle is found in time sequence
Optimal crooked route between column.It is found in other sequences by the coordinate value of data point in a sequence most identical
The point of feature calculates the distance between same characteristic features point after finding, and is made with this to calculate the sum of the distance of two time serieses
For optimal crooked route.
Assuming that two time serieses are respectively X=(x1, x2... xm), Y=(y1, y2... yn), wherein two time sequences
Column length is m, n.Dm×nFor the distance matrix of m × n of two time serieses construction.
Wherein Dm×nIn element dijIt is to pass through xiAnd yiCoordinate distance is calculated, calculating process are as follows:
dij=| | xi-yj||w
It is exactly Euclidean distance 2- norm as w=2.And pass through Dm×nOne is found apart from the smallest crooked route pmin,
It is exactly the DTW distance between two time serieses.
pmin={ p1,p2,…pd,…pk}
k∈{max(m,n),m+n+1}
Wherein, if pdFor search to point dijWhen, the current Cumulative Distance of crooked route.
For pminSearching to meet three conditions are as follows: 1) fixed starting-point, the starting point in path are d11, terminal dmn。
2) monotonicity is consistent, if the current point d of searchij, current Cumulative Distance is pd, pd+1=pd+di′j′, then i ' > > i, j ' >
> j.3) continuity is consistent, if the current point of search is dij, current Cumulative Distance is pd, pd+1=pd+di′j′, then i ' < < i+
1, j ' < < j+1.Meet three above condition, searching route initial position is determined by first point, and determine search road at the two or three point
The position of next point of diameter is one of right, top or the upper right side of current point, if current point is pd, and it is false
If Searching point is d at this timeij, then pd+1Calculating formula are as follows:
pd+1=pd+min[d(i+1)j, d(i+1)(j+1), di(j+1)]
Finally obtain pmin, different and generate accumulation to solve sequence length while by cumulative distance handling averagely
The case where having differences property of distance.
D=pmin/k
D is the Cumulative Distance for equalizing two sequences.
Due to the limitation of 3 constraint condition, DTW algorithm has traversed all observation points, and every original series are ok
Find corresponding points.Eventually by the setting of dynamic time warping algorithm (DTW) algorithm, we carry out sample from initial data
Preliminary screening, to complete the further promotion to classifying quality.
Every kind of label of raw data set includes 6 repetition experimental datas, that is, each sensor is directed to a kind of mixed gas
Classification carries out 6 groups of acquisitions, the time series of 6 gas signals is obtained, by P in DTW algorithmminIt calculates, gives up PminIt is maximum
Experimental data twice, input data of the remaining data as S2.
The selection of S2 data and feature extraction:
Primitive character building
In doing comparative test, primitive character training is used to designed classifier and comparison-of-pair sorting's device and has been constructed
Feature after is trained comparison.Original data set has 8 dimensional features, to improve classification accuracy, carries out structure to data characteristics
The case where building, comparing different features finds the feature best to classifying quality.Why feature construction is carried out, is because of instruction
Practice data to determine, the highest accuracy rate that can reach just determines therewith.By feature construction, recognizer can handle
The problem of habit ability difference.So improving sorting algorithm accuracy rate by building new feature on the basis of primitive character.
Common feature construction method has interaction feature, such as feature A and B, and creates feature A*B, A-B, A/B, A+B
This meeting is so that feature space explodes.The present embodiment is applied to due to carrying out the acquisition of gas signal data using 8 sensors
Feature be 8 dimensional features, created feature be A-B, A/ B, then after creating interaction feature, obtain the original spy of gas signal multidimensional
Sign, characteristic become 56.
The specific implementation step of principal component analysis (PCA)
The specific implementation steps are as follows for principal component analysis:
(1) initial data is standardized
PCA is the covariance matrix based on data, data it is not of uniform size, in order to be consistent the dimension of data, therefore
Initial characteristic data should be standardized first.Data are subtracted to the mean value of dimension, then the standard deviation divided by dimension.
E(Xi) indicate data mean value, D (Xi) indicate data variance.
(2) covariance matrix of data is calculated
The covariance matrix of data is exactly the correlation matrix of primitive character after standardization.It is derived as shown in formula.
Correlation matrix R can be expressed as
(3) characteristic value and feature vector of coefficient R are calculated
By characteristic equationThe characteristic value for solving correlation matrix is λi(i=1,2,3...p), feature
Vector is the sequence carried out characteristic value from big to small, λ1≥λ2≥...≥λp≥0.By λiIt substitutes into (R- λ iE) x=0, asks
Solve feature vector ai, and by aiUnit turns to ei。
(4) by calculating accumulation contribution rate, principal component is found out
The accumulation contribution rate of the good characteristic value of calculated permutations, it is general before t characteristic value accumulation contribution rate to 85%-
When 95%, so that it may take this t as principal component, when t takes 3, t=3 in the present embodiment, the contribution rate of accumulative total of characteristic value reaches
90%.
(5) load of principal component is found out
According to above formula, the linear combination that 8 dimension datas are converted to 8 variables finds out principal component Y=(y1,y2,...,ym)T。
In order to illustrate the discreteness of data characteristics, by all data, each classification is abstracted into three-dimensional feature, such as specification
Shown in attached drawing 4, Fig. 4 is to be presented in three-dimensional figure to original 8 dimensional feature data abstraction at 3 dimensional features.XYZ indicates three-dimensional coordinate
Axis.It can be found that feature has apparent discrete type, can not can complete to classify by the single algorithm of tradition.
The extremely random tree algorithm of S3:
Extreme random tree
Extreme random tree (abbreviation ET, also known as extreme random forest) is similar to random forests algorithm, is by more decisions
Tree is integrated, thus has many same advantages.If classifying quality is outstanding and accuracy is high, high dimensional feature can be handled well
Data are simultaneously not necessarily to carry out feature selecting, and energy parallelization calculates the advantages that execution efficiency is high.In processing mixed gas detection classification neck
Domain, Ensemble Learning Algorithms classification accuracy with higher, but be complete used in every decision tree in extreme random tree algorithm
Portion's initial data, and random forests algorithm is then to sample to generate training sample using bootstrap.And extreme random tree exists
It is to randomly select division node when node split, and non-selected best division threshold value or feature.It is referring to Figure of description 5
Extreme random tree algorithm schematic diagram.
Difference between extreme random tree and random forests algorithm:
First, the training sample of random forests algorithm is to sample to generate by bootstrap, however extreme random tree
In every decision tree use all original training sample data, facilitate reduce model deviation.
Second, random forest sorting algorithm is in node split, the selected section feature first from all features, according to
This Partial Feature accurately chooses best divisional mode (such as GINI index etc.) Lai Shengcheng decision tree by division.And extremely with
Machine tree algorithm is then random selection divisional mode.Specific implementation form are as follows: for the division of classification form, randomly select certain
A little categorical datas are put into a branch, remaining categorical data is put into another branch;For the division of numeric form, with
Machine chooses a threshold value between maximum and minimum value, as the data principle of left and right branch, greater than the number of the threshold value
According to a branch is put into, the data less than the threshold value are put into another branch, and sample data is put into Liang Ge branch.Then
For classification problem herein, split values are calculated using GINI index meter.All features of the node are traversed, whole features are obtained
Split values, the feature for choosing maximum split values are divided and (for regression problem, calculate split values using mean square error).
In extreme random tree algorithm, since all training data samples are OOB (outside bag) data sample,
Calculating to the prediction error of extreme random tree is the error calculation to the OOB sample.It is found in the research of this project,
Trained time efficiency, classification accuracy, to training data in terms of, extreme random tree be superior to
Machine forest algorithm.
Extreme random tree algorithm realizes step
Wherein extreme random tree algorithm is indicated with { E (K, X, D) }, wherein E presentation class device model, D indicate original number
According to sample, K indicates the quantity of decision tree.Every decision tree inputs X={ x according to sample1,x2,...,xmPrediction result is generated,
Categorised decision is finally obtained according to voting rule.Specific step is as follows for extreme random tree algorithm:
(1): in the disaggregated model of extreme random tree, each base classifier using whole training samples (OOB sample) into
Row training, it is assumed that raw data set D, sample size N, feature quantity M.
(2): decision tree is generated according to CART algorithm.When carrying out node split, in each division node at random from M
M feature is selected in feature, is randomly selected certain classifications and is put into one of branch, remaining classification is put into another branch, together
When calculate the best split values of each node, select optimum attributes division, and without cut operator in division.Division
Subset iteration out generates a decision tree to preset value.
(3): by step (1), (2) repetitive operation K times, ultimately generating the extreme random tree mould being made of K decision tree
Type.
(4): testing via test data the extreme random tree-model come is trained, generated eventually by ballot
Final classification results.
For the effect for verifying proposed classifier, to the mixed of original ethylene and methane and ethylene and carbon monoxide
Gas sample is closed, we carry out model analysis and verifying by the way of 10 folding cross validations.Specific classification results and analysis
It is as follows.
Wherein by dynamic time warping (DTW) algorithm, it is 1,2,3 that the basic parameter num in DTW, which is arranged, in we
And tested without using the case where DTW, as a result as shown in Figure of description 6.
1 10 folding cross validation accuracy rate of table
From Fig. 6 and table 1 as can be seen that as num=3, five folding cross validation mean value accuracy rate ratio num=0 are to mention
It is high by 26.87%.From the point of view of time efficiency, num=3 ratio num=0 model running time efficiency improves 56.04%.Therefore
As can be seen that modelling effect is obviously improved, while improving the accuracy rate of classification after DTW.It is attached referring to specification
Fig. 7 is the DTW model running time, after experiment is repeated several times, when the parameter of DTW is set as 3, and time model operation
Time is most short, is 103.2568 seconds.
We use feature construction mode, and the dimension of Lai Zengjia data selects optimal feature to be trained, analysis knot
Fruit is as shown in Figure 7.From Fig. 7 analysis it is found that if keeping characteristic dimension constant, recognition accuracy is only 73.37%, and if passing through
A-B after mode increases dimension, becomes 28 dimension data features, it is 87.50% that recognition accuracy, which increases, there it can be seen that right
In special characteristic, purposive elevation dimension, there is good trend for feature discrete type.Therefore 56 are risen to from by feature
After dimension, the discrimination situation constant compared with dimension improves 18.97%, response by dimensionality reduction PCA algorithm after, final discrimination is
99.17%.
Extremely random tree algorithm is analyzed again, is done algorithms of different comparison and is found, same to random forest, XGboost algorithm comparison,
The accuracy rate and time efficiency of extreme random tree algorithm are all higher, have double dominant characteristic.
After the comparison of many algorithms, at present in integrated study sorting algorithm, wherein random forests algorithm is the most general
Time, effect is preferably also.Therefore comparative experiments is done by the extreme random tree algorithm of improved random forests algorithm.Simultaneously will
XGBoost algorithm and GBDT algorithm do the comparison of accuracy rate and time efficiency.Analytical table 2 is it is found that extreme random tree algorithm exists
More random forest algorithm improves 4.42% in accuracy rate, improves 5.00% compared with XGboost algorithm, is promoted compared with GBDT algorithm
7.99%.
2 algorithm classification accuracy rate correlation data of table
It is sorting algorithm model running time comparison diagram referring to Figure of description 8, is analyzed according to Fig. 8, doing algorithm model
In the experiment of run-time efficiency, wherein the runing time of extreme random tree algorithm is most short, only 103.2568 seconds, than random
The time efficiency of forest algorithm improves 66.85%, and wherein XGBoost algorithm is because model is the most complicated, when model running
Between also longest.Therefore the extreme random tree algorithm proposed is obviously improved in accuracy rate and time efficiency.
Above to a kind of mixed gas detection model construction method progress based on extreme random tree provided by the present invention
It is discussed in detail, used herein a specific example illustrates the principle and implementation of the invention, above embodiments
Explanation be merely used to help understand method and its core concept of the invention;Meanwhile for the general technology people of this field
Member, according to the thought of the present invention, there will be changes in the specific implementation manner and application range, in conclusion this explanation
Book content should not be construed as limiting the invention.
Herein, relational terms such as first and second and the like be used merely to by an entity or operation with
Another entity or operation distinguish, and without necessarily requiring or implying between these entities or operation, there are any this
Actual relationship or sequence.Moreover, the terms "include", "comprise" or its any other variant are intended to nonexcludability
Include so that include a series of elements process, method, article or equipment not only include those elements, but also
Including other elements that are not explicitly listed, or further include for this process, method, article or equipment it is intrinsic
Element.In the absence of more restrictions, the element limited by sentence "including a ...", it is not excluded that including institute
State in the process, method, article or equipment of element that there is also other identical elements.
Claims (4)
1. a kind of mixed gas detection model construction method based on extreme random tree, which comprises the steps of:
S1, data acquisition is carried out to mixed gas, obtains data set, the data set includes at least three gas signal time sequences
Column, and the optimal crooked route of gas signal time series is calculated, gas signal time series is carried out using optimal crooked route
Screening;
S2, gas characteristic is extracted to the gas signal time series after screening using Principal Component Analysis;
S3, model is established using extreme random number algorithm, and classify to target mixed gas.
2. a kind of mixed gas detection model construction method based on extreme random tree according to claim 1, feature
It is, the optimal crooked route calculating process of gas time sequence is as follows in the S1:
S11, the distance matrix for constructing two gas signal time serieses;Two time serieses are respectively X=(x1, x2... xm)、Y
=(y1, y2... yn), wherein two length of time series are m, n.Dm×nFor two time serieses construction m × n apart from square
Battle array
Wherein, Dm×nIn element dijIt is to pass through xiAnd yiCoordinate distance is calculated, calculating process are as follows:
dij=| | xi-yj||w
It is exactly Euclidean distance 2- norm, 1≤i≤m, 1≤j≤n as w=2;
S12, pass through Dm×nOne is found apart from the smallest crooked route pmin, i.e., optimal crooked route
pmin={ p1,p2,…pd,…pk}
k∈{max(m,n),m+n+1}
Wherein, pdFor search to point dijWhen, the current Cumulative Distance of crooked route, then pd+1Calculating formula are as follows:
pd+1=pd+min[d(i+1)j,d(i+1)(j+1),di(j+1)];
S13, give up PminMaximum two groups of gas signal time serieses, input of the residual gas signal time sequence as step 2
Data.
3. a kind of mixed gas detection model construction method based on extreme random tree according to claim 1, feature
It is, the S2 is specifically included:
The primitive character building of S21, gas signal;It constructs to obtain gas signal multidimensional primitive character using interaction feature method;
S22, dimension-reduction treatment is carried out using Principal Component Analysis to the gas signal multidimensional primitive character, obtains initial data sample
This.
4. a kind of mixed gas detection model construction method based on extreme random tree according to claim 1, feature
It is, the S3 is specifically included:
S31, in the disaggregated model of extreme random tree, each base classifier is trained using whole primary data samples,
In, raw data set D, sample size N, feature quantity M;
S32, decision tree is generated according to CART algorithm;When carrying out node split, in each division node at random from M feature
M feature is selected, several classifications is randomly selected and is put into one of branch, remaining classification is put into another branch, calculates simultaneously
The best split values of each node select optimum attributes division, and without cut operator in division;The subset divided out
Iteration generates a decision tree to preset value;
S33, by step S31, S32 repetitive operation K times, ultimately generate the extreme random tree-model being made of K decision tree;
S34, the extreme random tree-model after training is tested, final classification results is generated eventually by ballot.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910329097.3A CN110175195B (en) | 2019-04-23 | 2019-04-23 | Mixed gas detection model construction method based on extreme random tree |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910329097.3A CN110175195B (en) | 2019-04-23 | 2019-04-23 | Mixed gas detection model construction method based on extreme random tree |
Publications (2)
Publication Number | Publication Date |
---|---|
CN110175195A true CN110175195A (en) | 2019-08-27 |
CN110175195B CN110175195B (en) | 2022-11-29 |
Family
ID=67689897
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910329097.3A Active CN110175195B (en) | 2019-04-23 | 2019-04-23 | Mixed gas detection model construction method based on extreme random tree |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110175195B (en) |
Cited By (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110805534A (en) * | 2019-11-18 | 2020-02-18 | 长沙理工大学 | Fault detection method, device and equipment of wind driven generator |
CN111210871A (en) * | 2020-01-09 | 2020-05-29 | 青岛科技大学 | Protein-protein interaction prediction method based on deep forest |
CN111862264A (en) * | 2020-06-09 | 2020-10-30 | 昆明理工大学 | Multiphase mixed flow type cooperative regulation and control method |
CN112163376A (en) * | 2020-10-09 | 2021-01-01 | 江南大学 | Extreme random tree furnace temperature prediction control method based on longicorn stigma search |
CN112712046A (en) * | 2021-01-06 | 2021-04-27 | 浙江大学 | Wireless charging equipment authentication method based on equipment hardware fingerprint |
CN113177594A (en) * | 2021-04-29 | 2021-07-27 | 浙江大学 | Air conditioner fault diagnosis method based on Bayesian optimization PCA-extreme random tree |
CN114660231A (en) * | 2020-12-22 | 2022-06-24 | 中国石油化工股份有限公司 | Gas concentration prediction method, system, machine readable storage medium and processor |
CN115964853A (en) * | 2022-11-22 | 2023-04-14 | 首都师范大学 | Novel simulation method for representing ground settlement time sequence evolution |
CN117370899A (en) * | 2023-12-08 | 2024-01-09 | 中国地质大学(武汉) | Ore control factor weight determining method based on principal component-decision tree model |
Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
GB0604663D0 (en) * | 2006-01-13 | 2006-04-19 | Cytokinetics Inc | Random forest modeling of cellular phenotypes |
US20140279780A1 (en) * | 2013-03-12 | 2014-09-18 | Xerox Corporation | Method and system for recommending crowdsourcing platforms |
CN204666549U (en) * | 2015-05-14 | 2015-09-23 | 中国人民解放军军械工程学院 | Based on the mixed gas detection system of BP neural network |
CN105809191A (en) * | 2016-03-07 | 2016-07-27 | 四川大学 | Random tree chronic nephrosis by-stage predication algorithm integrated with Bagging algorithm |
CN107563425A (en) * | 2017-08-24 | 2018-01-09 | 长安大学 | A kind of method for building up of the tunnel operation state sensor model based on random forest |
CN108446656A (en) * | 2018-03-28 | 2018-08-24 | 熙家智能***(深圳)有限公司 | A kind of parser carrying out Selective recognition to kitchen hazardous gas |
CN109409672A (en) * | 2018-09-25 | 2019-03-01 | 深圳市元征科技股份有限公司 | A kind of auto repair technician classifies grading modeling method and device |
CN109473148A (en) * | 2018-10-26 | 2019-03-15 | 武汉工程大学 | A kind of ion concentration prediction technique, device and computer storage medium |
-
2019
- 2019-04-23 CN CN201910329097.3A patent/CN110175195B/en active Active
Patent Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
GB0604663D0 (en) * | 2006-01-13 | 2006-04-19 | Cytokinetics Inc | Random forest modeling of cellular phenotypes |
US20140279780A1 (en) * | 2013-03-12 | 2014-09-18 | Xerox Corporation | Method and system for recommending crowdsourcing platforms |
CN204666549U (en) * | 2015-05-14 | 2015-09-23 | 中国人民解放军军械工程学院 | Based on the mixed gas detection system of BP neural network |
CN105809191A (en) * | 2016-03-07 | 2016-07-27 | 四川大学 | Random tree chronic nephrosis by-stage predication algorithm integrated with Bagging algorithm |
CN107563425A (en) * | 2017-08-24 | 2018-01-09 | 长安大学 | A kind of method for building up of the tunnel operation state sensor model based on random forest |
CN108446656A (en) * | 2018-03-28 | 2018-08-24 | 熙家智能***(深圳)有限公司 | A kind of parser carrying out Selective recognition to kitchen hazardous gas |
CN109409672A (en) * | 2018-09-25 | 2019-03-01 | 深圳市元征科技股份有限公司 | A kind of auto repair technician classifies grading modeling method and device |
CN109473148A (en) * | 2018-10-26 | 2019-03-15 | 武汉工程大学 | A kind of ion concentration prediction technique, device and computer storage medium |
Non-Patent Citations (5)
Title |
---|
YONGHUI XU, XI ZHAO, YINSHENG CHEN, AND ZIXUAN YANG: "Research on a Mixed Gas Classification Algorithm Based on Extreme Random Tree", 《APPLIED SCIENCES》 * |
张丽平等: "采用核主成分分析和随机森林算法的变压器油纸绝缘评估方法", 《四川电力技术》 * |
许永辉等: "MOS传感器阵列的二元混合气体检测方法研究", 《仪器仪表学报》 * |
赵玺: "基于集成学习的混合气体分类和浓度预测算法研究", 《CNKI》 * |
韦海宇等: "基于改进极端随机树的异常网络流量分类", 《计算机工程》 * |
Cited By (15)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110805534A (en) * | 2019-11-18 | 2020-02-18 | 长沙理工大学 | Fault detection method, device and equipment of wind driven generator |
CN111210871A (en) * | 2020-01-09 | 2020-05-29 | 青岛科技大学 | Protein-protein interaction prediction method based on deep forest |
CN111210871B (en) * | 2020-01-09 | 2023-06-13 | 青岛科技大学 | Protein-protein interaction prediction method based on deep forests |
CN111862264B (en) * | 2020-06-09 | 2023-03-31 | 昆明理工大学 | Multiphase mixed flow type cooperative regulation and control method |
CN111862264A (en) * | 2020-06-09 | 2020-10-30 | 昆明理工大学 | Multiphase mixed flow type cooperative regulation and control method |
CN112163376A (en) * | 2020-10-09 | 2021-01-01 | 江南大学 | Extreme random tree furnace temperature prediction control method based on longicorn stigma search |
CN112163376B (en) * | 2020-10-09 | 2024-03-12 | 江南大学 | Extreme random tree furnace temperature prediction control method based on longhorn beetle whisker search |
CN114660231A (en) * | 2020-12-22 | 2022-06-24 | 中国石油化工股份有限公司 | Gas concentration prediction method, system, machine readable storage medium and processor |
CN114660231B (en) * | 2020-12-22 | 2023-11-24 | 中国石油化工股份有限公司 | Gas concentration prediction method, system, machine-readable storage medium and processor |
CN112712046A (en) * | 2021-01-06 | 2021-04-27 | 浙江大学 | Wireless charging equipment authentication method based on equipment hardware fingerprint |
CN113177594A (en) * | 2021-04-29 | 2021-07-27 | 浙江大学 | Air conditioner fault diagnosis method based on Bayesian optimization PCA-extreme random tree |
CN115964853A (en) * | 2022-11-22 | 2023-04-14 | 首都师范大学 | Novel simulation method for representing ground settlement time sequence evolution |
CN115964853B (en) * | 2022-11-22 | 2023-08-04 | 首都师范大学 | Novel simulation method for representing ground subsidence time sequence evolution |
CN117370899A (en) * | 2023-12-08 | 2024-01-09 | 中国地质大学(武汉) | Ore control factor weight determining method based on principal component-decision tree model |
CN117370899B (en) * | 2023-12-08 | 2024-02-20 | 中国地质大学(武汉) | Ore control factor weight determining method based on principal component-decision tree model |
Also Published As
Publication number | Publication date |
---|---|
CN110175195B (en) | 2022-11-29 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110175195A (en) | Mixed gas detection model construction method based on extreme random tree | |
Amra et al. | Students performance prediction using KNN and Naïve Bayesian | |
Priyam et al. | Comparative analysis of decision tree classification algorithms | |
CN109919184A (en) | A kind of more well complex lithology intelligent identification Methods and system based on log data | |
Liu et al. | Spectrum of variable-random trees | |
CN105938116A (en) | Gas sensor array concentration detection method based on fuzzy division and model integration | |
CN107016416B (en) | Data classification prediction method based on neighborhood rough set and PCA fusion | |
CN110346831A (en) | A kind of intelligent earthquake Fluid Identification Method based on random forests algorithm | |
CN110309867A (en) | A kind of Mixed gas identification method based on convolutional neural networks | |
CN110880369A (en) | Gas marker detection method based on radial basis function neural network and application | |
Barnett et al. | Endnote: Feature-based classification of networks | |
Ikawati et al. | Student behavior analysis to predict learning styles based felder silverman model using ensemble tree method | |
Kinalwa et al. | Determination of protein fold class from Raman or Raman optical activity spectra using random forests | |
Zhang et al. | Research and application of grade prediction model based on decision tree algorithm | |
Maletzke et al. | The Importance of the Test Set Size in Quantification Assessment. | |
CN111105041B (en) | Machine learning method and device for intelligent data collision | |
CN111026075A (en) | Error matching-based fault detection method for medium-low pressure gas pressure regulator | |
Patidar et al. | Decision tree C4. 5 algorithm and its enhanced approach for educational data mining | |
AU2021101882A4 (en) | Extremely randomized tree (et)–based construction method for gas mixture detection model | |
Yuan et al. | Classifications Based Decision Tree and Random Forests for Fanjing Mountains’ Tea | |
CN104636636B (en) | The long-range homology detection method of protein and device | |
CN114219157A (en) | Alkane gas infrared spectrum measurement method based on optimal decision and dynamic analysis | |
Chen et al. | A mixed gas composition identification method based on sample augmentation | |
CN103411913B (en) | A kind of fish oil infrared spectrum PLS recognition methods based on the adaptively selected waypoint of genetic algorithm | |
US20060287973A1 (en) | Method, apparatus and program recorded medium for information processing |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |