CN110175195A - Mixed gas detection model construction method based on extreme random tree - Google Patents

Mixed gas detection model construction method based on extreme random tree Download PDF

Info

Publication number
CN110175195A
CN110175195A CN201910329097.3A CN201910329097A CN110175195A CN 110175195 A CN110175195 A CN 110175195A CN 201910329097 A CN201910329097 A CN 201910329097A CN 110175195 A CN110175195 A CN 110175195A
Authority
CN
China
Prior art keywords
feature
mixed gas
gas
extreme random
algorithm
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201910329097.3A
Other languages
Chinese (zh)
Other versions
CN110175195B (en
Inventor
许永辉
孙超
赵玺
杨子萱
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Harbin Institute of Technology
Original Assignee
Harbin Institute of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Harbin Institute of Technology filed Critical Harbin Institute of Technology
Priority to CN201910329097.3A priority Critical patent/CN110175195B/en
Publication of CN110175195A publication Critical patent/CN110175195A/en
Application granted granted Critical
Publication of CN110175195B publication Critical patent/CN110175195B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G01MEASURING; TESTING
    • G01NINVESTIGATING OR ANALYSING MATERIALS BY DETERMINING THEIR CHEMICAL OR PHYSICAL PROPERTIES
    • G01N33/00Investigating or analysing materials by specific methods not covered by groups G01N1/00 - G01N31/00
    • G01N33/0004Gaseous mixtures, e.g. polluted air
    • G01N33/0009General constructional details of gas analysers, e.g. portable test equipment
    • G01N33/0027General constructional details of gas analysers, e.g. portable test equipment concerning the detector
    • G01N33/0031General constructional details of gas analysers, e.g. portable test equipment concerning the detector comprising two or more sensors, e.g. a sensor array
    • G01N33/0034General constructional details of gas analysers, e.g. portable test equipment concerning the detector comprising two or more sensors, e.g. a sensor array comprising neural networks or related mathematical techniques
    • GPHYSICS
    • G01MEASURING; TESTING
    • G01NINVESTIGATING OR ANALYSING MATERIALS BY DETERMINING THEIR CHEMICAL OR PHYSICAL PROPERTIES
    • G01N33/00Investigating or analysing materials by specific methods not covered by groups G01N1/00 - G01N31/00
    • G01N33/0004Gaseous mixtures, e.g. polluted air
    • G01N33/0009General constructional details of gas analysers, e.g. portable test equipment
    • G01N33/0027General constructional details of gas analysers, e.g. portable test equipment concerning the detector
    • G01N33/0036General constructional details of gas analysers, e.g. portable test equipment concerning the detector specially adapted to detect a particular component
    • G01N33/004CO or CO2
    • GPHYSICS
    • G01MEASURING; TESTING
    • G01NINVESTIGATING OR ANALYSING MATERIALS BY DETERMINING THEIR CHEMICAL OR PHYSICAL PROPERTIES
    • G01N33/00Investigating or analysing materials by specific methods not covered by groups G01N1/00 - G01N31/00
    • G01N33/0004Gaseous mixtures, e.g. polluted air
    • G01N33/0009General constructional details of gas analysers, e.g. portable test equipment
    • G01N33/0027General constructional details of gas analysers, e.g. portable test equipment concerning the detector
    • G01N33/0036General constructional details of gas analysers, e.g. portable test equipment concerning the detector specially adapted to detect a particular component
    • G01N33/0047Organic compounds
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2458Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
    • G06F16/2474Sequence data queries, e.g. querying versioned data
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/10Complex mathematical operations
    • G06F17/16Matrix or vector computation, e.g. matrix-matrix or matrix-vector multiplication, matrix factorization
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/213Feature extraction, e.g. by transforming the feature space; Summarisation; Mappings, e.g. subspace methods
    • G06F18/2135Feature extraction, e.g. by transforming the feature space; Summarisation; Mappings, e.g. subspace methods based on approximation criteria, e.g. principal component analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2411Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on the proximity to a decision surface, e.g. support vector machines
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/243Classification techniques relating to the number of classes
    • G06F18/24323Tree-organised classifiers

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Chemical & Material Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Analytical Chemistry (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Evolutionary Biology (AREA)
  • Food Science & Technology (AREA)
  • Medicinal Chemistry (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Biochemistry (AREA)
  • General Health & Medical Sciences (AREA)
  • Immunology (AREA)
  • Pathology (AREA)
  • Combustion & Propulsion (AREA)
  • Software Systems (AREA)
  • Databases & Information Systems (AREA)
  • Computational Mathematics (AREA)
  • Mathematical Analysis (AREA)
  • Mathematical Optimization (AREA)
  • Pure & Applied Mathematics (AREA)
  • Fuzzy Systems (AREA)
  • Probability & Statistics with Applications (AREA)
  • Computational Linguistics (AREA)
  • Computing Systems (AREA)
  • Algebra (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention discloses a kind of mixed gas detection model construction method based on extreme random tree, including carrying out data acquisition to mixed gas, obtain data set, the data set includes at least three gas signal time serieses, and the optimal crooked route of gas signal time series is calculated, it is screened using optimal crooked route;Gas characteristic is extracted to the gas signal time series after screening using Principal Component Analysis;Model is established using extreme random number algorithm, and is classified to target mixed gas.The present invention proposes the mixed gas detection model construction method based on extreme random tree, largely improves classification accuracy and time efficiency.

Description

Mixed gas detection model construction method based on extreme random tree
Technical field
The present invention relates to machine olfaction technical fields, in particular to based on the mixed gas detection model of extreme random tree Construction method.
Background technique
In current mixed gas detection field, Many researchers have been achieved for good classifying quality, such as using branch Hold vector machine (SVM), artificial neural network (ANN), k-nearest neighbor (KNN) scheduling algorithm.In order to improve the accuracy rate of classification, wherein There is researcher to propose a kind of Adaboost.M2 model of optimization, by multiple Classifiers Combination, carries out the classification experiments of drug, pass through The setting of different fusion rules, final highest recognition accuracy are 91.75%.There are also the posterior probability extracted from SVM Algorithm for estimating detects 10 kinds of bacterial components in people's blood using machine olfaction, recognition accuracy is higher but time cost compared with Greatly.Another part researcher's document solves the uncertainty relationship in gas source positioning using the processing of probability bayesian algorithm, leads to simultaneously The path planning algorithm of Markov decision process is crossed, the location efficiency of gas in practice is improved.PCA and artificial neural network (ANN) application of algorithm can be improved and differentiate soil moisture content, but ANN algorithm shortage is explanatory, and restrains speed Degree is slower, and efficiency is lower.There is no the levels that a kind of algorithm can make detection accuracy reach 99% or more in the prior art.And And never have researcher considered gas sensor itself data accuracy problem;And for traditional characteristic extracting mode PCA is the algorithm when dimension is higher, when algorithm dimension is not high, needs to construct its feature;And it is calculated in classification It is stronger for anti-capability of fitting in method, at the same training time speed is fast and the more algorithm of higher classification accuracy not It supports.But there has been no the models of the extreme random tree algorithm based on random forest innovatory algorithm for current patent, to solve to mix Field of gas detection problem.
Therefore, how a kind of mixed gas detection model structure based on extreme random tree, with high measurement accuracy is provided Construction method is those skilled in the art's technical problem urgently to be resolved.
Summary of the invention
The present invention situation low for two kinds of mixed gas classification accuracies, the models such as traditional support vector machines (SVM) Classification accuracy and time efficiency are not high enough, therefore the present invention proposes the mixed gas detection model based on extreme random tree Construction method largely improves classification accuracy and time efficiency.Concrete scheme is as follows:
S1, data acquisition is carried out to mixed gas, obtains data set, the data set includes at least three gas signals Time series, and the optimal crooked route of gas signal time series is calculated, when carrying out gas signal using optimal crooked route Between sequence screening;
S2, gas characteristic is extracted to the gas signal time series after screening using Principal Component Analysis;
S3, model is established using extreme random number algorithm, and classify to target mixed gas.
Preferably, the optimal crooked route calculating process of gas time sequence is as follows in the S1:
S11, the distance matrix for constructing two gas signal time serieses;Two time serieses are respectively X=(x1, x2... xm), Y=(y1, y2... yn), wherein two length of time series are m, n.Dm×nFor m × n of two time serieses construction Distance matrix
Wherein, Dm×nIn element dijIt is to pass through xiAnd yiCoordinate distance is calculated, calculating process are as follows:
dij=| | xi-yj||w
It is exactly Euclidean distance 2- norm, 1≤i≤m, 1≤j≤n as w=2;
S12, pass through Dm×nOne is found apart from the smallest crooked route pmin, i.e., optimal crooked route
pmin={ p1,p2,…pd,…pk}
k∈{max(m,n),m+n+1}
Wherein, pdFor search to point dijWhen, the current Cumulative Distance of crooked route, then pd+1Calculating formula are as follows:
pd+1=pd+min[d(i+1)j,d(i+1)(j+1),di(j+1)];
S13, give up PminMaximum two groups of gas signal time serieses, residual gas signal time sequence is as step 2 Input data.
Preferably, the S2 is specifically included:
The primitive character building of S21, gas signal;It constructs to obtain the original spy of gas signal multidimensional using interaction feature method Sign;
S22, dimension-reduction treatment is carried out using Principal Component Analysis to the gas signal multidimensional primitive character, obtained original Data sample.
Preferably, the S3 is specifically included:
S31, in the disaggregated model of extreme random tree, each base classifier is instructed using whole primary data samples Practice, wherein raw data set D, sample size N, feature quantity M;
S32, decision tree is generated according to CART algorithm;When carrying out node split, in each division node at random from M M feature is selected in feature, is randomly selected several classifications and is put into one of branch, remaining classification is put into another branch, together When calculate the best split values of each node, select optimum attributes division, and without cut operator in division;Division Subset iteration out generates a decision tree to preset value;
S33, by step S31, S32 repetitive operation K times, ultimately generate the extreme random tree mould being made of K decision tree Type;
S34, the extreme random tree-model after training is tested, final classification results is generated eventually by ballot.
Compared with the prior art the present invention has the advantages that
The invention proposes the dynamic time warping algorithms based on DTW, and classification accuracy is improved 26.87%;It is based on Primitive character building and Principal Component Analysis Algorithm, classification accuracy improve 25.8%;Change eventually by extreme random tree algorithm Into the time efficiency problem in random forests algorithm, final classification accuracy rate has reached 99.17%, and time efficiency is than random Forest algorithm improves 66.85%, only 103.2568 seconds.The method proposed through the invention, solves for mixed gas Classification problem, random forests algorithm is made that and is largely improved, the classification for improving machine olfaction system is accurate Rate offers theoretical foundation to simulate the algorithm of olfactory neural system.Using extremely random tree algorithm, generated by ballot decision Prediction result, generalization ability are stronger;Using whole primary data samples training base classifier, training result precision is higher;Due to It is random selection in node split, randomness substantially enhances.
Detailed description of the invention
In order to more clearly explain the embodiment of the invention or the technical proposal in the existing technology, below will to embodiment or Attached drawing needed to be used in the description of the prior art is briefly described, it should be apparent that, the accompanying drawings in the following description is only The embodiment of the present invention for those of ordinary skill in the art without creative efforts, can be with Other attached drawings are obtained according to the attached drawing of offer.
Fig. 1 is that the present invention is based on the flow charts of the mixed gas detection model construction method of extreme random tree;
Fig. 2 is that inventive sensor acquires gas data response diagram;
Fig. 3 is inventive sensor TGS2602 to the dynamic response curve figure in the case of Et_L_Me_H;
Fig. 4 is that feature of present invention engineering is abstracted three-dimensional feature figure;
Fig. 5 is the extreme random tree algorithm schematic diagram of the present invention;
Fig. 6 is 10 folding cross validation accuracy rate schematic diagrames after DTW of the present invention;
Fig. 7 is cross validation accuracy rate schematic diagram after feature of present invention building;
Fig. 8 is inventive algorithm model running time comparison diagram.
Specific embodiment
Following will be combined with the drawings in the embodiments of the present invention, and technical solution in the embodiment of the present invention carries out clear, complete Site preparation description, it is clear that described embodiments are only a part of the embodiments of the present invention, instead of all the embodiments.It is based on Embodiment in the present invention, it is obtained by those of ordinary skill in the art without making creative efforts it is all its His embodiment, shall fall within the protection scope of the present invention.
A kind of mixed gas detection model construction method based on extreme random tree is present embodiments provided,
S1 dynamic time warping algorithm (DTW)
The present embodiment is detected with the mixed gas that ethylene-CH4 and ethylene-CO are mixed to get.By 6 under every kind of label Secondary experiment forms different data sets, wherein each label refers to a kind of gas mixing classification.Continue in the data sampling stage Time is 300 seconds.Gas is not passed through in the initial 60 second time.The mixed gas for setting concentration ratio is passed through gas at 60 seconds Interior, mixed gas be passed through the time be 180 seconds.It is passed through without mixed gas within last 60 seconds.Sensor array is classified as 8 sensor groups At sensor frequency is set as 50HZ, and mixed gas data set is acquired by 8 sensors and obtained.According to time rule by data Collection is stored, and each data set includes 11 column datas: time (s), temperature, humidity (%) and TGS2600, TGS2612, TGS2611, TGS2610, TGS2602, TGS2602, TGS2620, TGS2620 sensor acquire data.Sensor acquires data It is indicated for its resistance value with A, unified value is then converted to by Rs (KOhm)=10* (3110-A)/A.For certain primary experiment Sensor response diagram referring to Figure of description 2, by taking Et_H_Me_n situation as an example, Et indicate ethylene H represent high concentration, Me table Showing that methane n represents concentration is zero, and abscissa is the time, and ordinate is the sensor reading after conversion.
In order to probe into the acquisition data cases of sensor, for TGS2602, under same label, (i.e. Et_M_Me_M is marked Label) the case where respond tracing analysis, be respectively TGS2602 to Et_L_Me_H situation referring to Figure of description 3 (1)-(6) Under dynamic response curve.As can be seen from the figure for the response in same situation, there are different journeys in same sensor The variation of degree.Wherein it can clearly be seen that in finally experiment twice, discovery sensor response curve and have before obvious It is different.It can therefore be concluded that in an experiment, because the problems such as the configuration of experiment condition, it may appear that different degrees of data are different Cause situation.
By analysis before, data need to carry out effective pretreatment work.Since mixed gas data are when being based on Between sequence gas signal response curve, for data sets carry out dynamic time warping work.Dynamic time warping is based on dynamic State plans a kind of algorithm of (DP) thought, and characteristic parameter dislocation is optimized, its basic principle is found in time sequence Optimal crooked route between column.It is found in other sequences by the coordinate value of data point in a sequence most identical The point of feature calculates the distance between same characteristic features point after finding, and is made with this to calculate the sum of the distance of two time serieses For optimal crooked route.
Assuming that two time serieses are respectively X=(x1, x2... xm), Y=(y1, y2... yn), wherein two time sequences Column length is m, n.Dm×nFor the distance matrix of m × n of two time serieses construction.
Wherein Dm×nIn element dijIt is to pass through xiAnd yiCoordinate distance is calculated, calculating process are as follows:
dij=| | xi-yj||w
It is exactly Euclidean distance 2- norm as w=2.And pass through Dm×nOne is found apart from the smallest crooked route pmin, It is exactly the DTW distance between two time serieses.
pmin={ p1,p2,…pd,…pk}
k∈{max(m,n),m+n+1}
Wherein, if pdFor search to point dijWhen, the current Cumulative Distance of crooked route.
For pminSearching to meet three conditions are as follows: 1) fixed starting-point, the starting point in path are d11, terminal dmn。 2) monotonicity is consistent, if the current point d of searchij, current Cumulative Distance is pd, pd+1=pd+di′j′, then i ' > > i, j ' > > j.3) continuity is consistent, if the current point of search is dij, current Cumulative Distance is pd, pd+1=pd+di′j′, then i ' < < i+ 1, j ' < < j+1.Meet three above condition, searching route initial position is determined by first point, and determine search road at the two or three point The position of next point of diameter is one of right, top or the upper right side of current point, if current point is pd, and it is false If Searching point is d at this timeij, then pd+1Calculating formula are as follows:
pd+1=pd+min[d(i+1)j, d(i+1)(j+1), di(j+1)]
Finally obtain pmin, different and generate accumulation to solve sequence length while by cumulative distance handling averagely The case where having differences property of distance.
D=pmin/k
D is the Cumulative Distance for equalizing two sequences.
Due to the limitation of 3 constraint condition, DTW algorithm has traversed all observation points, and every original series are ok Find corresponding points.Eventually by the setting of dynamic time warping algorithm (DTW) algorithm, we carry out sample from initial data Preliminary screening, to complete the further promotion to classifying quality.
Every kind of label of raw data set includes 6 repetition experimental datas, that is, each sensor is directed to a kind of mixed gas Classification carries out 6 groups of acquisitions, the time series of 6 gas signals is obtained, by P in DTW algorithmminIt calculates, gives up PminIt is maximum Experimental data twice, input data of the remaining data as S2.
The selection of S2 data and feature extraction:
Primitive character building
In doing comparative test, primitive character training is used to designed classifier and comparison-of-pair sorting's device and has been constructed Feature after is trained comparison.Original data set has 8 dimensional features, to improve classification accuracy, carries out structure to data characteristics The case where building, comparing different features finds the feature best to classifying quality.Why feature construction is carried out, is because of instruction Practice data to determine, the highest accuracy rate that can reach just determines therewith.By feature construction, recognizer can handle The problem of habit ability difference.So improving sorting algorithm accuracy rate by building new feature on the basis of primitive character.
Common feature construction method has interaction feature, such as feature A and B, and creates feature A*B, A-B, A/B, A+B This meeting is so that feature space explodes.The present embodiment is applied to due to carrying out the acquisition of gas signal data using 8 sensors Feature be 8 dimensional features, created feature be A-B, A/ B, then after creating interaction feature, obtain the original spy of gas signal multidimensional Sign, characteristic become 56.
The specific implementation step of principal component analysis (PCA)
The specific implementation steps are as follows for principal component analysis:
(1) initial data is standardized
PCA is the covariance matrix based on data, data it is not of uniform size, in order to be consistent the dimension of data, therefore Initial characteristic data should be standardized first.Data are subtracted to the mean value of dimension, then the standard deviation divided by dimension.
E(Xi) indicate data mean value, D (Xi) indicate data variance.
(2) covariance matrix of data is calculated
The covariance matrix of data is exactly the correlation matrix of primitive character after standardization.It is derived as shown in formula.
Correlation matrix R can be expressed as
(3) characteristic value and feature vector of coefficient R are calculated
By characteristic equationThe characteristic value for solving correlation matrix is λi(i=1,2,3...p), feature Vector is the sequence carried out characteristic value from big to small, λ1≥λ2≥...≥λp≥0.By λiIt substitutes into (R- λ iE) x=0, asks Solve feature vector ai, and by aiUnit turns to ei
(4) by calculating accumulation contribution rate, principal component is found out
The accumulation contribution rate of the good characteristic value of calculated permutations, it is general before t characteristic value accumulation contribution rate to 85%- When 95%, so that it may take this t as principal component, when t takes 3, t=3 in the present embodiment, the contribution rate of accumulative total of characteristic value reaches 90%.
(5) load of principal component is found out
According to above formula, the linear combination that 8 dimension datas are converted to 8 variables finds out principal component Y=(y1,y2,...,ym)T
In order to illustrate the discreteness of data characteristics, by all data, each classification is abstracted into three-dimensional feature, such as specification Shown in attached drawing 4, Fig. 4 is to be presented in three-dimensional figure to original 8 dimensional feature data abstraction at 3 dimensional features.XYZ indicates three-dimensional coordinate Axis.It can be found that feature has apparent discrete type, can not can complete to classify by the single algorithm of tradition.
The extremely random tree algorithm of S3:
Extreme random tree
Extreme random tree (abbreviation ET, also known as extreme random forest) is similar to random forests algorithm, is by more decisions Tree is integrated, thus has many same advantages.If classifying quality is outstanding and accuracy is high, high dimensional feature can be handled well Data are simultaneously not necessarily to carry out feature selecting, and energy parallelization calculates the advantages that execution efficiency is high.In processing mixed gas detection classification neck Domain, Ensemble Learning Algorithms classification accuracy with higher, but be complete used in every decision tree in extreme random tree algorithm Portion's initial data, and random forests algorithm is then to sample to generate training sample using bootstrap.And extreme random tree exists It is to randomly select division node when node split, and non-selected best division threshold value or feature.It is referring to Figure of description 5 Extreme random tree algorithm schematic diagram.
Difference between extreme random tree and random forests algorithm:
First, the training sample of random forests algorithm is to sample to generate by bootstrap, however extreme random tree In every decision tree use all original training sample data, facilitate reduce model deviation.
Second, random forest sorting algorithm is in node split, the selected section feature first from all features, according to This Partial Feature accurately chooses best divisional mode (such as GINI index etc.) Lai Shengcheng decision tree by division.And extremely with Machine tree algorithm is then random selection divisional mode.Specific implementation form are as follows: for the division of classification form, randomly select certain A little categorical datas are put into a branch, remaining categorical data is put into another branch;For the division of numeric form, with Machine chooses a threshold value between maximum and minimum value, as the data principle of left and right branch, greater than the number of the threshold value According to a branch is put into, the data less than the threshold value are put into another branch, and sample data is put into Liang Ge branch.Then For classification problem herein, split values are calculated using GINI index meter.All features of the node are traversed, whole features are obtained Split values, the feature for choosing maximum split values are divided and (for regression problem, calculate split values using mean square error).
In extreme random tree algorithm, since all training data samples are OOB (outside bag) data sample, Calculating to the prediction error of extreme random tree is the error calculation to the OOB sample.It is found in the research of this project, Trained time efficiency, classification accuracy, to training data in terms of, extreme random tree be superior to Machine forest algorithm.
Extreme random tree algorithm realizes step
Wherein extreme random tree algorithm is indicated with { E (K, X, D) }, wherein E presentation class device model, D indicate original number According to sample, K indicates the quantity of decision tree.Every decision tree inputs X={ x according to sample1,x2,...,xmPrediction result is generated, Categorised decision is finally obtained according to voting rule.Specific step is as follows for extreme random tree algorithm:
(1): in the disaggregated model of extreme random tree, each base classifier using whole training samples (OOB sample) into Row training, it is assumed that raw data set D, sample size N, feature quantity M.
(2): decision tree is generated according to CART algorithm.When carrying out node split, in each division node at random from M M feature is selected in feature, is randomly selected certain classifications and is put into one of branch, remaining classification is put into another branch, together When calculate the best split values of each node, select optimum attributes division, and without cut operator in division.Division Subset iteration out generates a decision tree to preset value.
(3): by step (1), (2) repetitive operation K times, ultimately generating the extreme random tree mould being made of K decision tree Type.
(4): testing via test data the extreme random tree-model come is trained, generated eventually by ballot Final classification results.
For the effect for verifying proposed classifier, to the mixed of original ethylene and methane and ethylene and carbon monoxide Gas sample is closed, we carry out model analysis and verifying by the way of 10 folding cross validations.Specific classification results and analysis It is as follows.
Wherein by dynamic time warping (DTW) algorithm, it is 1,2,3 that the basic parameter num in DTW, which is arranged, in we And tested without using the case where DTW, as a result as shown in Figure of description 6.
1 10 folding cross validation accuracy rate of table
From Fig. 6 and table 1 as can be seen that as num=3, five folding cross validation mean value accuracy rate ratio num=0 are to mention It is high by 26.87%.From the point of view of time efficiency, num=3 ratio num=0 model running time efficiency improves 56.04%.Therefore As can be seen that modelling effect is obviously improved, while improving the accuracy rate of classification after DTW.It is attached referring to specification Fig. 7 is the DTW model running time, after experiment is repeated several times, when the parameter of DTW is set as 3, and time model operation Time is most short, is 103.2568 seconds.
We use feature construction mode, and the dimension of Lai Zengjia data selects optimal feature to be trained, analysis knot Fruit is as shown in Figure 7.From Fig. 7 analysis it is found that if keeping characteristic dimension constant, recognition accuracy is only 73.37%, and if passing through A-B after mode increases dimension, becomes 28 dimension data features, it is 87.50% that recognition accuracy, which increases, there it can be seen that right In special characteristic, purposive elevation dimension, there is good trend for feature discrete type.Therefore 56 are risen to from by feature After dimension, the discrimination situation constant compared with dimension improves 18.97%, response by dimensionality reduction PCA algorithm after, final discrimination is 99.17%.
Extremely random tree algorithm is analyzed again, is done algorithms of different comparison and is found, same to random forest, XGboost algorithm comparison, The accuracy rate and time efficiency of extreme random tree algorithm are all higher, have double dominant characteristic.
After the comparison of many algorithms, at present in integrated study sorting algorithm, wherein random forests algorithm is the most general Time, effect is preferably also.Therefore comparative experiments is done by the extreme random tree algorithm of improved random forests algorithm.Simultaneously will XGBoost algorithm and GBDT algorithm do the comparison of accuracy rate and time efficiency.Analytical table 2 is it is found that extreme random tree algorithm exists More random forest algorithm improves 4.42% in accuracy rate, improves 5.00% compared with XGboost algorithm, is promoted compared with GBDT algorithm 7.99%.
2 algorithm classification accuracy rate correlation data of table
It is sorting algorithm model running time comparison diagram referring to Figure of description 8, is analyzed according to Fig. 8, doing algorithm model In the experiment of run-time efficiency, wherein the runing time of extreme random tree algorithm is most short, only 103.2568 seconds, than random The time efficiency of forest algorithm improves 66.85%, and wherein XGBoost algorithm is because model is the most complicated, when model running Between also longest.Therefore the extreme random tree algorithm proposed is obviously improved in accuracy rate and time efficiency.
Above to a kind of mixed gas detection model construction method progress based on extreme random tree provided by the present invention It is discussed in detail, used herein a specific example illustrates the principle and implementation of the invention, above embodiments Explanation be merely used to help understand method and its core concept of the invention;Meanwhile for the general technology people of this field Member, according to the thought of the present invention, there will be changes in the specific implementation manner and application range, in conclusion this explanation Book content should not be construed as limiting the invention.
Herein, relational terms such as first and second and the like be used merely to by an entity or operation with Another entity or operation distinguish, and without necessarily requiring or implying between these entities or operation, there are any this Actual relationship or sequence.Moreover, the terms "include", "comprise" or its any other variant are intended to nonexcludability Include so that include a series of elements process, method, article or equipment not only include those elements, but also Including other elements that are not explicitly listed, or further include for this process, method, article or equipment it is intrinsic Element.In the absence of more restrictions, the element limited by sentence "including a ...", it is not excluded that including institute State in the process, method, article or equipment of element that there is also other identical elements.

Claims (4)

1. a kind of mixed gas detection model construction method based on extreme random tree, which comprises the steps of:
S1, data acquisition is carried out to mixed gas, obtains data set, the data set includes at least three gas signal time sequences Column, and the optimal crooked route of gas signal time series is calculated, gas signal time series is carried out using optimal crooked route Screening;
S2, gas characteristic is extracted to the gas signal time series after screening using Principal Component Analysis;
S3, model is established using extreme random number algorithm, and classify to target mixed gas.
2. a kind of mixed gas detection model construction method based on extreme random tree according to claim 1, feature It is, the optimal crooked route calculating process of gas time sequence is as follows in the S1:
S11, the distance matrix for constructing two gas signal time serieses;Two time serieses are respectively X=(x1, x2... xm)、Y =(y1, y2... yn), wherein two length of time series are m, n.Dm×nFor two time serieses construction m × n apart from square Battle array
Wherein, Dm×nIn element dijIt is to pass through xiAnd yiCoordinate distance is calculated, calculating process are as follows:
dij=| | xi-yj||w
It is exactly Euclidean distance 2- norm, 1≤i≤m, 1≤j≤n as w=2;
S12, pass through Dm×nOne is found apart from the smallest crooked route pmin, i.e., optimal crooked route
pmin={ p1,p2,…pd,…pk}
k∈{max(m,n),m+n+1}
Wherein, pdFor search to point dijWhen, the current Cumulative Distance of crooked route, then pd+1Calculating formula are as follows:
pd+1=pd+min[d(i+1)j,d(i+1)(j+1),di(j+1)];
S13, give up PminMaximum two groups of gas signal time serieses, input of the residual gas signal time sequence as step 2 Data.
3. a kind of mixed gas detection model construction method based on extreme random tree according to claim 1, feature It is, the S2 is specifically included:
The primitive character building of S21, gas signal;It constructs to obtain gas signal multidimensional primitive character using interaction feature method;
S22, dimension-reduction treatment is carried out using Principal Component Analysis to the gas signal multidimensional primitive character, obtains initial data sample This.
4. a kind of mixed gas detection model construction method based on extreme random tree according to claim 1, feature It is, the S3 is specifically included:
S31, in the disaggregated model of extreme random tree, each base classifier is trained using whole primary data samples, In, raw data set D, sample size N, feature quantity M;
S32, decision tree is generated according to CART algorithm;When carrying out node split, in each division node at random from M feature M feature is selected, several classifications is randomly selected and is put into one of branch, remaining classification is put into another branch, calculates simultaneously The best split values of each node select optimum attributes division, and without cut operator in division;The subset divided out Iteration generates a decision tree to preset value;
S33, by step S31, S32 repetitive operation K times, ultimately generate the extreme random tree-model being made of K decision tree;
S34, the extreme random tree-model after training is tested, final classification results is generated eventually by ballot.
CN201910329097.3A 2019-04-23 2019-04-23 Mixed gas detection model construction method based on extreme random tree Active CN110175195B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910329097.3A CN110175195B (en) 2019-04-23 2019-04-23 Mixed gas detection model construction method based on extreme random tree

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910329097.3A CN110175195B (en) 2019-04-23 2019-04-23 Mixed gas detection model construction method based on extreme random tree

Publications (2)

Publication Number Publication Date
CN110175195A true CN110175195A (en) 2019-08-27
CN110175195B CN110175195B (en) 2022-11-29

Family

ID=67689897

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910329097.3A Active CN110175195B (en) 2019-04-23 2019-04-23 Mixed gas detection model construction method based on extreme random tree

Country Status (1)

Country Link
CN (1) CN110175195B (en)

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110805534A (en) * 2019-11-18 2020-02-18 长沙理工大学 Fault detection method, device and equipment of wind driven generator
CN111210871A (en) * 2020-01-09 2020-05-29 青岛科技大学 Protein-protein interaction prediction method based on deep forest
CN111862264A (en) * 2020-06-09 2020-10-30 昆明理工大学 Multiphase mixed flow type cooperative regulation and control method
CN112163376A (en) * 2020-10-09 2021-01-01 江南大学 Extreme random tree furnace temperature prediction control method based on longicorn stigma search
CN112712046A (en) * 2021-01-06 2021-04-27 浙江大学 Wireless charging equipment authentication method based on equipment hardware fingerprint
CN113177594A (en) * 2021-04-29 2021-07-27 浙江大学 Air conditioner fault diagnosis method based on Bayesian optimization PCA-extreme random tree
CN114660231A (en) * 2020-12-22 2022-06-24 中国石油化工股份有限公司 Gas concentration prediction method, system, machine readable storage medium and processor
CN115964853A (en) * 2022-11-22 2023-04-14 首都师范大学 Novel simulation method for representing ground settlement time sequence evolution
CN117370899A (en) * 2023-12-08 2024-01-09 中国地质大学(武汉) Ore control factor weight determining method based on principal component-decision tree model

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
GB0604663D0 (en) * 2006-01-13 2006-04-19 Cytokinetics Inc Random forest modeling of cellular phenotypes
US20140279780A1 (en) * 2013-03-12 2014-09-18 Xerox Corporation Method and system for recommending crowdsourcing platforms
CN204666549U (en) * 2015-05-14 2015-09-23 中国人民解放军军械工程学院 Based on the mixed gas detection system of BP neural network
CN105809191A (en) * 2016-03-07 2016-07-27 四川大学 Random tree chronic nephrosis by-stage predication algorithm integrated with Bagging algorithm
CN107563425A (en) * 2017-08-24 2018-01-09 长安大学 A kind of method for building up of the tunnel operation state sensor model based on random forest
CN108446656A (en) * 2018-03-28 2018-08-24 熙家智能***(深圳)有限公司 A kind of parser carrying out Selective recognition to kitchen hazardous gas
CN109409672A (en) * 2018-09-25 2019-03-01 深圳市元征科技股份有限公司 A kind of auto repair technician classifies grading modeling method and device
CN109473148A (en) * 2018-10-26 2019-03-15 武汉工程大学 A kind of ion concentration prediction technique, device and computer storage medium

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
GB0604663D0 (en) * 2006-01-13 2006-04-19 Cytokinetics Inc Random forest modeling of cellular phenotypes
US20140279780A1 (en) * 2013-03-12 2014-09-18 Xerox Corporation Method and system for recommending crowdsourcing platforms
CN204666549U (en) * 2015-05-14 2015-09-23 中国人民解放军军械工程学院 Based on the mixed gas detection system of BP neural network
CN105809191A (en) * 2016-03-07 2016-07-27 四川大学 Random tree chronic nephrosis by-stage predication algorithm integrated with Bagging algorithm
CN107563425A (en) * 2017-08-24 2018-01-09 长安大学 A kind of method for building up of the tunnel operation state sensor model based on random forest
CN108446656A (en) * 2018-03-28 2018-08-24 熙家智能***(深圳)有限公司 A kind of parser carrying out Selective recognition to kitchen hazardous gas
CN109409672A (en) * 2018-09-25 2019-03-01 深圳市元征科技股份有限公司 A kind of auto repair technician classifies grading modeling method and device
CN109473148A (en) * 2018-10-26 2019-03-15 武汉工程大学 A kind of ion concentration prediction technique, device and computer storage medium

Non-Patent Citations (5)

* Cited by examiner, † Cited by third party
Title
YONGHUI XU, XI ZHAO, YINSHENG CHEN, AND ZIXUAN YANG: "Research on a Mixed Gas Classification Algorithm Based on Extreme Random Tree", 《APPLIED SCIENCES》 *
张丽平等: "采用核主成分分析和随机森林算法的变压器油纸绝缘评估方法", 《四川电力技术》 *
许永辉等: "MOS传感器阵列的二元混合气体检测方法研究", 《仪器仪表学报》 *
赵玺: "基于集成学习的混合气体分类和浓度预测算法研究", 《CNKI》 *
韦海宇等: "基于改进极端随机树的异常网络流量分类", 《计算机工程》 *

Cited By (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110805534A (en) * 2019-11-18 2020-02-18 长沙理工大学 Fault detection method, device and equipment of wind driven generator
CN111210871A (en) * 2020-01-09 2020-05-29 青岛科技大学 Protein-protein interaction prediction method based on deep forest
CN111210871B (en) * 2020-01-09 2023-06-13 青岛科技大学 Protein-protein interaction prediction method based on deep forests
CN111862264B (en) * 2020-06-09 2023-03-31 昆明理工大学 Multiphase mixed flow type cooperative regulation and control method
CN111862264A (en) * 2020-06-09 2020-10-30 昆明理工大学 Multiphase mixed flow type cooperative regulation and control method
CN112163376A (en) * 2020-10-09 2021-01-01 江南大学 Extreme random tree furnace temperature prediction control method based on longicorn stigma search
CN112163376B (en) * 2020-10-09 2024-03-12 江南大学 Extreme random tree furnace temperature prediction control method based on longhorn beetle whisker search
CN114660231A (en) * 2020-12-22 2022-06-24 中国石油化工股份有限公司 Gas concentration prediction method, system, machine readable storage medium and processor
CN114660231B (en) * 2020-12-22 2023-11-24 中国石油化工股份有限公司 Gas concentration prediction method, system, machine-readable storage medium and processor
CN112712046A (en) * 2021-01-06 2021-04-27 浙江大学 Wireless charging equipment authentication method based on equipment hardware fingerprint
CN113177594A (en) * 2021-04-29 2021-07-27 浙江大学 Air conditioner fault diagnosis method based on Bayesian optimization PCA-extreme random tree
CN115964853A (en) * 2022-11-22 2023-04-14 首都师范大学 Novel simulation method for representing ground settlement time sequence evolution
CN115964853B (en) * 2022-11-22 2023-08-04 首都师范大学 Novel simulation method for representing ground subsidence time sequence evolution
CN117370899A (en) * 2023-12-08 2024-01-09 中国地质大学(武汉) Ore control factor weight determining method based on principal component-decision tree model
CN117370899B (en) * 2023-12-08 2024-02-20 中国地质大学(武汉) Ore control factor weight determining method based on principal component-decision tree model

Also Published As

Publication number Publication date
CN110175195B (en) 2022-11-29

Similar Documents

Publication Publication Date Title
CN110175195A (en) Mixed gas detection model construction method based on extreme random tree
Amra et al. Students performance prediction using KNN and Naïve Bayesian
Priyam et al. Comparative analysis of decision tree classification algorithms
CN109919184A (en) A kind of more well complex lithology intelligent identification Methods and system based on log data
Liu et al. Spectrum of variable-random trees
CN105938116A (en) Gas sensor array concentration detection method based on fuzzy division and model integration
CN107016416B (en) Data classification prediction method based on neighborhood rough set and PCA fusion
CN110346831A (en) A kind of intelligent earthquake Fluid Identification Method based on random forests algorithm
CN110309867A (en) A kind of Mixed gas identification method based on convolutional neural networks
CN110880369A (en) Gas marker detection method based on radial basis function neural network and application
Barnett et al. Endnote: Feature-based classification of networks
Ikawati et al. Student behavior analysis to predict learning styles based felder silverman model using ensemble tree method
Kinalwa et al. Determination of protein fold class from Raman or Raman optical activity spectra using random forests
Zhang et al. Research and application of grade prediction model based on decision tree algorithm
Maletzke et al. The Importance of the Test Set Size in Quantification Assessment.
CN111105041B (en) Machine learning method and device for intelligent data collision
CN111026075A (en) Error matching-based fault detection method for medium-low pressure gas pressure regulator
Patidar et al. Decision tree C4. 5 algorithm and its enhanced approach for educational data mining
AU2021101882A4 (en) Extremely randomized tree (et)–based construction method for gas mixture detection model
Yuan et al. Classifications Based Decision Tree and Random Forests for Fanjing Mountains’ Tea
CN104636636B (en) The long-range homology detection method of protein and device
CN114219157A (en) Alkane gas infrared spectrum measurement method based on optimal decision and dynamic analysis
Chen et al. A mixed gas composition identification method based on sample augmentation
CN103411913B (en) A kind of fish oil infrared spectrum PLS recognition methods based on the adaptively selected waypoint of genetic algorithm
US20060287973A1 (en) Method, apparatus and program recorded medium for information processing

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant