CN107563425A - A kind of method for building up of the tunnel operation state sensor model based on random forest - Google Patents

A kind of method for building up of the tunnel operation state sensor model based on random forest Download PDF

Info

Publication number
CN107563425A
CN107563425A CN201710737045.0A CN201710737045A CN107563425A CN 107563425 A CN107563425 A CN 107563425A CN 201710737045 A CN201710737045 A CN 201710737045A CN 107563425 A CN107563425 A CN 107563425A
Authority
CN
China
Prior art keywords
random forest
ntree
operation state
mtry
tunnel operation
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201710737045.0A
Other languages
Chinese (zh)
Inventor
陈建勋
钱超
罗彦斌
张馨予
李伟
吉祥
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Changan University
Original Assignee
Changan University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Changan University filed Critical Changan University
Priority to CN201710737045.0A priority Critical patent/CN107563425A/en
Publication of CN107563425A publication Critical patent/CN107563425A/en
Pending legal-status Critical Current

Links

Landscapes

  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

A kind of method for building up of the tunnel operation state sensor model based on random forest, randomly select ntree new self-service sample sets and build ntree decision tree, mtry feature is randomly selected in each node of decision tree, therefrom select a feature and carry out dendritic growth, obtain the unbiased esti-mator of the extensive error of random forest, calculation procedure run time;Iteration runs all ntree and mtry parameter combinations, exports unbiased esti-mator and run time corresponding to all parameter combinations, determines optimal ntree and mtry parameter combinations value in random forest, establish tunnel operation state sensor model.The present invention can lift it and analyze the ability of complicated dependency relation data and be not easy over-fitting occur, actual prediction result shows that its average perceived precision, recall rate, F measurements are superior to contrast model, tunnel operation state change can be better adapted to require, accurate real-time perception and prediction can be provided for tunnel operation state.

Description

Method for establishing tunnel operation state perception model based on random forest
Technical Field
The invention relates to the field of tunnel engineering, in particular to a method for establishing a tunnel operation state perception model based on a random forest.
Background
In recent years, a large number of extra-long road tunnels are built and put into operation in succession, and the road tunnels are gradually shifted from a construction peak period to an operation peak period. However, due to the influence of traffic composition and traffic volume, pollutants in the tunnel are continuously accumulated and are relatively difficult to discharge or dilute, so that the ventilation problem becomes the first problem in the operation period, and the problem is brought to the tunnel operation management. Therefore, traffic flow data and environmental monitoring data in the tunnel need to be analyzed, and after the operation state of the tunnel is determined, corresponding operation control measures are made. The rationality and scientificity of the tunnel operation state perception model directly determine the effectiveness of tunnel operation control measures. However, the operating state of the extra-long road tunnel is generated by mutual influence and superposition of traffic factors such as people, vehicles, roads, environment and the like in the tunnel, the influence factors are numerous, the evolution rule is complex, and a scientific dividing method and a uniform dividing standard do not exist at present. The published documents and patent documents do not disclose any research invention for analyzing the operation state by comprehensively using the real-time traffic flow information and the ventilation environment information in the tunnel.
Random forest is a supervised ensemble learning classification technique, and its model is composed of a set of decision tree classifiers. The decision tree algorithm is a classical data mining algorithm, which is essentially a process of recursively classifying data through a series of rules, and the more popular decision tree algorithm is used as follows: ID3, C4.5, and CART, and the like. Due to the defects of low precision, easy overfitting and the like of a single decision tree, the integration of a plurality of algorithms by adopting the ensemble learning becomes a research hotspot in the field of machine learning.
Breiman in 2001 combines the Bagging theory proposed by Breiman with the CART decision tree and the random subspace method proposed by Ho, and provides a nonparametric classification and regression algorithm, namely a random forest. The basic idea of random forest is shown in fig. 1, firstly, a plurality of sub-samples are randomly extracted from a training sample set in a back-put manner by utilizing a self-help resampling technology to generate a new training sample set; then constructing a plurality of decision trees according to the self-help sample set to form a random forest; and finally, according to the input samples to be classified/regressed. In recent years, a large number of theoretical researches and example verifications show that the random forest has a plurality of advantages of high capability of analyzing complex correlation data, high prediction precision, difficulty in overfitting and the like.
However, at present, no model can better adapt to the requirement of the change of the tunnel operation state, and can provide accurate real-time perception and prediction for the tunnel operation state, so that a method for establishing a tunnel operation state perception model based on random forests is needed.
Disclosure of Invention
The invention aims to provide a method for establishing a tunnel operation state perception model based on a random forest.
In order to realize the purpose, the technical scheme of the invention is as follows:
a method for establishing a tunnel operation state perception model based on a random forest comprises the following steps:
step 1): determining the number N of samples and the number M of variables in a tunnel operation monitoring training set;
step 2): determining the combination range and initial value of parameters ntree and mtry in the random forest; wherein ntree is the number of decision trees in the random forest, mtry is the number of variables randomly extracted by each split node, and mtry is less than M;
and step 3): resampling and replacing randomly extracting ntree new self-service sample sets and constructing ntree decision trees by using a Bootstrap method, wherein samples which are not extracted each time form ntree data outside bags;
step 4): each self-help sample set grows into a decision tree, mtry features are randomly selected from each node of the decision tree, one feature is selected from the mtry features for branch growth, and the decision tree grows;
step 5): predicting input training set samples according to the generated ntree decision trees, and meanwhile calculating out-of-bag data errors of each decision tree;
step 6): analyzing the prediction result of each decision tree, namely outputting the class with the maximum prediction probability sum in all the trees to obtain a final classification result, averaging the data errors outside the bags of all the trees to obtain unbiased estimation of random forest generalization errors, and simultaneously calculating the program operation time;
step 7): repeating the steps 2) -7), iteratively operating all ntree and mtry parameter combinations, and outputting unbiased estimation and operating time of random forest generalization errors corresponding to all the parameter combinations;
step 8): and determining the optimal ntree and mtry parameter combination value in the random forest, and establishing a tunnel operation state perception model based on the random forest.
The further improvement of the invention is that the specific process of determining the number N of samples in the tunnel operation monitoring training set in the step 1) is as follows:
firstly, self-defining the problem of perception of the tunnel operation state: given a training set T = { (x) 1 ,y 1 ),…,(x N ,y N )}∈(X 5 ×Y) N Wherein N is the number of samples in the training set; x is the number of i ∈X 5 The ith sample in the tunnel operation monitoring sample data set which represents model input comprises CO and NO 2 Monitoring results of wind speed, fine particles and heavy load vehicle, y i ∈Y={c 1 ,c 2 ,c 3 ,c 4 The state space X is searched according to the conditions that the samples correspond to one of four states of light pollution, moderate pollution, severe pollution and severe pollution, i represents the sample number in the training set, i =1,2,3, \8230, and N 5 A decision function f (X) of (a) X 5 → Y, to infer the tunnel operating status corresponding to any monitored sample.
A further development of the invention is that the value of N is greater than 500.
The invention further improves that in the step 4), one feature is selected from the features for branch growth according to the principle of minimum node purity.
The invention is further improved in that the specific process of calculating the data error outside each decision tree bag in the step 5) is as follows: the OOB number of the random forest is (1-1/N) of a training set T N When N is infinite, the number proportion of OOB is converged to 1/e ≈ 0.368, namely 37% of data samples cannot be extracted when each decision tree is constructed; since the OOB data is not used in the decision tree construction process, the prediction is made with the OOB data in the following manner:
suppose thatOOB partial data of any b-th block in the ntree decision tree, for each sample x of training set T i I =1,2,3, \8230: \8230N, average 1/e ≈ 0.368, paired by OOB dataCarrying out comprehensive prediction, wherein b is more than 0 and less than or equal to ntree; the estimate of its classification error rate ER is calculated according to the following equation:
in formula (1): i (-) represents an index function, ER is the classification error rate, ER OOB For the out-of-bag data error,for the out-of-bag data prediction, Y i Is a practical result.
The further improvement of the invention is that the combination value of the ntree and mtry parameters with the minimum unbiased estimation of the random forest generalization error and the shortest running time is selected as the optimal combination value of the ntree and mtry parameters in the step 8).
Compared with the prior art, the invention has the following beneficial effects: the method is based on the construction of a perception model of the random forest, and determines the optimal parameter combination of ntree and mtry in the model by adjusting and optimizing parameters and comprehensively considering perception accuracy and calculation timeliness, so that the capability of analyzing complex correlation data can be improved, and the overfitting phenomenon is not easy to occur. The actual prediction result shows that the average perception precision, recall rate and F measurement of the method are superior to those of a Naive Bayes model and an SVM model, the method can better adapt to the change requirement of the tunnel operation state, and can provide accurate real-time perception and prediction for the tunnel operation state. The method disclosed by the invention can better adapt to the dynamic change requirement of the tunnel operation state and provides accurate prediction to provide theoretical basis and scientific method for formulating the ventilation facility of the extra-long road tunnel and the intelligent management and control scheme of traffic operation.
Furthermore, traffic flow data such as traffic composition and traffic flow in the tunnel and ventilation environment data such as concentration of various pollutants and wind speed are deeply fused by applying a big data analysis technology, a data-driven operation state perception model is established, automatic identification and automatic early warning of the operation state of the tunnel are realized, and further the research and judgment level of the operation state of the tunnel is improved.
Drawings
FIG. 1 is a schematic diagram of a random forest algorithm;
FIG. 2 is a flow chart of a method for establishing a perception model based on a random forest tunnel operation state;
fig. 3 is a schematic view of a south line structure of a tunnel in qinling mountain 1 according to an embodiment of the present invention;
FIG. 4 is a graph illustrating the relationship between the effects of ntree and mtry parameter combinations on out-of-bag data errors according to an embodiment of the present invention;
FIG. 5 is a graph illustrating the effect of ntree and mtry parameter combinations on runtime provided by an embodiment of the present invention;
fig. 6 is an importance index diagram of feature variables according to an embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more clear and more obvious, the present invention is further described in detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.
Each decision tree in the random forest is a binary tree, and the generation of the binary tree follows a recursive splitting principle from top to bottom, namely, a training set is divided in sequence from a root node. In the binary tree, a root node contains all training set data, the training set data are split into a left node and a right node according to the principle that the node purity is minimum, the left node and the right node respectively contain a subset of the training data, and the nodes continue to be split according to the same rule until the branch stopping rule is met and the growth is stopped.
Referring to fig. 2, a method for establishing a tunnel operation state perception model based on a random forest includes determining a training set sample number N and a variable number M, determining a parameter range and an initial value of a decision tree number ntree and a randomly extracted variable number mtry, constructing the random forest, determining an optimal parameter combination value, and the like, and specifically includes the following steps:
step 1): determining the number N of samples and the number M of variables in a tunnel operation monitoring training set;
firstly, the perception problem of the operation state of a self-defined tunnel: given training set T = { (x) 1 ,y 1 ),…,(x N ,y N )}∈(X 5 ×Y) N Wherein N is the number of samples in the training set, and the numerical range is more than 500; x is the number of i ∈X 5 The ith sample in the tunnel operation monitoring sample data set which represents model input comprises CO and NO 2 Monitoring results of wind speed, fine particles, heavy duty Vehicles (HGV), y i ∈Y={c 1 ,c 2 ,c 3 ,c 4 The state space X is searched according to the conditions that the samples correspond to one of four states of light pollution, moderate pollution, severe pollution and severe pollution, i represents the sample number in the training set, i =1,2,3, \8230, and N 5 A decision function f (X) of (a) X 5 → Y, to infer the tunnel operation status corresponding to any monitoring sample.
Step 2): determining a combination range and an initial value of parameters ntree (the number of decision trees in the random forest) and mtry (the number of variables randomly extracted by each split node, wherein mtry is less than M) in the random forest;
and step 3): resampling and replacing randomly extracting ntree new self-help sample sets and constructing ntree decision trees by using a Bootstrap method, wherein samples which are not extracted each time form ntree Out-of-Bag (Out of Bag, OOB) data;
step 4): each self-help sample set grows into a decision tree, mtry features are randomly selected from each node of the decision tree, and one feature is selected from the mtry features according to the minimum principle of node purity to carry out branch growth so that the decision tree grows;
step 5): predicting input training set samples according to the generated ntree decision trees, and meanwhile calculating out-of-bag data errors (OOB Error Rate) of each decision tree;
the OOB number of the random forest is (1-1/N) of a training set T N When N is infinite, i.e. approaching infinity, the OOB number ratio will converge to 1/e ≈ 0.368, i.e. about 37% of the data samples will not be extracted when constructing each decision tree. Since the OOB data is not used in the decision tree construction process, the usable OOB data can be predicted in the following manner:
suppose thatIs OOB partial data of any b-th (b is more than 0 and less than or equal to ntree) decision tree in the ntree decision trees, and each sample x of the training set T i I =1,2,3, \8230 \8230n, N, average 1/e ≈ 0.368, can be paired by OOB dataAnd (6) carrying out comprehensive prediction. The estimated value of the class Error Rate (ER) is calculated according to the following formula:
in formula (1): i (-) represents an indicator function. ER is the classification error rate, ER OOB For the out-of-bag data error,for out-of-bag data prediction results, Y i Is the actual result.
Step 6): analyzing the prediction result of each decision tree, namely outputting the class with the maximum prediction probability sum in all the trees to obtain a final classification result, averaging the out-of-bag data errors (OOB Error Rate) of all the trees to obtain an unbiased estimation (OOB Estimate of Error Rate) of the random forest generalization Error, and simultaneously calculating the program running time;
step 7): repeating the steps 2) -7), iteratively operating all ntree and mtry parameter combinations, and outputting unbiased estimation and operation time of random forest generalization errors corresponding to all the parameter combinations;
step 8): and determining the optimal ntree and mtry parameter combination value in the random forest, namely selecting the ntree and mtry parameter combination value with the minimum unbiased estimation of the generalization error of the random forest and the shortest running time as the optimal ntree and mtry parameter combination value, and establishing a tunnel operation state random forest sensing model.
From the above analysis, two important parameters for constructing the random forest perception model are:
(1) number of decision trees in ntree-random forest;
(2) mtry — each split node randomly draws a variable number.
Wherein, ntree determines the overall size of the whole random forest, and mtry determines the condition of a single decision tree.
Four tunnel operation state classification sets Y = { c } for given light pollution, moderate pollution, heavy pollution and severe pollution 1 ,c 2 ,c 3 ,c 4 Its perceptual result confusion matrix can be represented by table 1.
Table 1 tunnel operation state confusion matrix
As shown in Table 1, n i,j Is expressed as a classificationState class c i Is identified as class c j The number of the cells.i1-4, j is 1-4, and the confusion matrix mainly reflects the distribution condition of the state class space Y and embodies the identification performance of the classifier. Where the ith row reflects the category c i Recall (Recall) of, the jth column reflects the category c j Precision (Precision). Thus operating the state (e.g., state c) for a particular tunnel j ) Calculating its independent accuracyAnd recall rateCan be calculated according to equations (2) and (3), respectively:
and (4) comprehensively utilizing the harmonic mean of the precision and the recall ratio to obtain a new statistic-F measure (F-measure):
this is illustrated below by means of a specific example.
The invention takes the south line of the Qinling mountain tunnel I as an engineering support, and further develops a real-time monitoring experiment of tunnel operation and further explains the specific implementation mode of the invention by combining the attached drawings.
As shown in fig. 3, the tunnel in qinling mountain is a separated type double-hole four-lane tunnel, the full length of a south line is 6102m, the altitude of a tunnel entrance is 1322m, the altitude of an exit is 1391m, and the average longitudinal slope is +2.58%, an emergency parking zone 11 (ESA-1 to ESA-11) is arranged, 30 jet fans are installed in total, and a feeding and discharging type ventilation inclined shaft 1 is reserved. Because the tunnel adopts a full-jet longitudinal ventilation mode, the concentration of pollutants in the tunnel conforms to the distribution characteristic of an upper triangle, namely the concentration of the entrance to the tunnel is lowest and the concentration of the exit from the tunnel is highest. And finally, selecting the parking belt at ESA-11 with the most serious pollution as a monitoring site for the research of the invention.
As shown in table 2, two types of seven data, i.e., tunnel environment data and traffic volume survey, are collected, wherein the traffic volume survey divides vehicle models into three types, i.e., passenger Vehicles (PCs), light-Duty Vehicles (LDVs), and Heavy-Duty Vehicles (HGVs). In the traffic composition in the tunnel operation monitoring period, the proportions of the PC, the LDV and the HGV are respectively 29.46%, 3.21% and 67.32%, and the proportion of the LDV is low, so that the influence of the LDV on the operation state can be ignored. By calculating Pearson correlation coefficient between variables, CO and NO 2 The HGV is strongly related to the HGV and weakly related to the PC, so that the influence of the PC on the operation state can be ignored, and only the HGV is reserved. Therefore, CO and NO are selected finally 2 And the five types of data such as wind speed, fine particulate matters, HGV and the like are taken as a sample data set for the research of the invention.
Table 2 tunnel operation monitoring data provided in the embodiment of the present invention
Further, abnormal value detection is carried out on the monitored sample data set, and whether the missing values and the noise data exist is judged. The main causes of missing values are gas monitoring equipment or vehicle detector failures, and the noise data are mainly pollutant concentrations or traffic volumes out of reasonable ranges. And eliminating all abnormal data because the proportion of the abnormal values in the total samples is small. Further, parameter tuning is performed before the perception model is established, firstly, a detected and processed tunnel operation monitoring sample data set is randomly divided into a training data set and a testing data set according to the proportion of 7.
As shown in FIG. 4 and FIG. 5, the combination of ntree and mtry is considered together, and the optimal parameter combination is determined by the minimum principle of out-of-bag data Error (OOB Error). Taking ntree =10,20, \ 8230;, 500; mtry =1,2,3,4,5, and 250 ntree and mtry parameter combinations are iteratively operated in the training data set to obtain the influence relationship of the ntree and mtry parameter combinations in the random forest classifier on the out-of-bag data error and the operation time.
As can be seen from the analysis in conjunction with fig. 4 and 5, the out-of-bag data error is mainly affected by ntree and decreases as the ntree value increases, i.e., the more accurate the classification result is, the corresponding running time increases linearly, but is still in the millisecond level, and the calculation time is negligible; mtry has little influence on OOB Error, and when ntree >200, OOB Error is approximately converged, and then ntree continues to increase, and basically has no influence on the precision of the classification result. Therefore, the random forest can be verified not to generate an overfitting phenomenon, and the classification error of the random forest is converged along with the increase of the number of the decision trees. Comprehensively considering classification precision and calculation timeliness, the method selects ntree =500; mtry =1 as the optimal parameter combination, which corresponds to an unbiased estimate of OOB Error of 6.8%.
As shown in fig. 6, the establishment of the random forest model can measure the importance of the characteristic variables, and by using the characteristic, on one hand, the characteristic variables can be ranked according to the importance degree; on the other hand, the feature subset can be optimized for dimension reduction (avoiding dimension disaster), so that the calculation complexity is reduced, and the performance of the learning algorithm is improved. The characteristic importance measurement indexes based on the random forest comprise Mean reduction Accuracy (MDA) and Mean reduction Gini index (MDG). The former is defined as the average reduction of the classification accuracy after slight disturbance of the out-of-bag data independent variable and the classification accuracy before disturbance, and the latter is defined as the average reduction of the Gini index. The larger the MDA and MDG, the more important the representative variables are.
Further, in the embodiment of the present invention, the training data set is combined with the optimal parameters, and the importance indexes of the variables are respectively calculated. The analysis shows that the variable importance of the change of the tunnel operation state is NO from large to small 2 CO, HGV, fineParticulate matter and wind speed, two kinds of gas pollutants NO can be seen 2 And CO is a core factor affecting the change of the tunnel operation state.
Finally, a naive Bayes (Navie Bayes), a Support Vector Machine (SVM) and a random forest perception model constructed according to the embodiment of the invention are respectively applied to predict the sample state in the test data set and establish a confusion matrix by combining the actual classification result, and various evaluation index results are shown in Table 3.
TABLE 3 comparison of evaluation indexes of different perception models
Note: loss of sample concentration contamination status
The method is obtained through calculation, and the average perception accuracy, the recall rate and the F measurement of the Naive Bayes model are respectively 96.72%, 89.83% and 92.77%; the average sensing precision, recall rate and F measurement of an SVM model are respectively 98.83%, 94.43% and 96.41%; and the average perception precision, recall rate and F measurement based on the random forest perception model are respectively 98.83%, 95.52% and 97.07%. Therefore, the effect based on the random forest perception model is the best among the three models, the tunnel operation state change requirement can be better met, and accurate prediction is provided for the tunnel operation state so as to improve the tunnel operation control accuracy.
The above description is further intended to illustrate the process of the present invention in detail with reference to specific examples, which should not be construed as limiting the practice of the process of the present invention. For a person skilled in the art to which the invention pertains, several equivalent alternatives or obvious modifications, all of which have the same properties or uses, without departing from the inventive concept, should be considered as falling within the scope of the patent protection of the invention, as determined by the claims submitted.

Claims (6)

1. A method for establishing a tunnel operation state perception model based on a random forest is characterized by comprising the following steps:
step 1): determining the number N of samples and the number M of variables in a tunnel operation monitoring training set;
step 2): determining the combination range and initial value of parameters ntree and mtry in the random forest; wherein ntree is the number of decision trees in the random forest, mtry is the number of variables randomly extracted by each split node, and mtry is less than M;
step 3): resampling non-replaced randomly extracting ntree new self-service sample sets and constructing ntree decision trees by using a Bootstrap method, wherein the samples which are not extracted each time form ntree out-of-bag data;
and step 4): each self-help sample set grows into a decision tree, mtry features are randomly selected from each node of the decision tree, one feature is selected from the mtry features for branch growth, and the decision tree grows;
and step 5): predicting input training set samples according to the generated ntree decision trees, and meanwhile calculating out-of-bag data errors of each decision tree;
step 6): analyzing the prediction result of each decision tree, namely outputting the class with the maximum prediction probability sum in all the trees to obtain a final classification result, averaging the data errors outside the bags of all the trees to obtain unbiased estimation of random forest generalization errors, and simultaneously calculating the program operation time;
step 7): repeating the steps 2) -7), iteratively operating all ntree and mtry parameter combinations, and outputting unbiased estimation and operation time of random forest generalization errors corresponding to all the parameter combinations;
step 8): and determining the optimal combination value of the ntree and mtry parameters in the random forest, and establishing a tunnel operation state perception model based on the random forest.
2. The method for establishing the tunnel operation state perception model based on the random forest as claimed in claim 1, wherein the specific process of determining the number N of samples in the tunnel operation monitoring training set in step 1) is as follows:
firstly, self-defining the problem of perception of the tunnel operation state: given a training set T = { (x) 1 ,y 1 ),…,(x N ,y N )}∈(X 5 ×Y) N Wherein N is the number of samples in the training set; x is the number of i ∈X 5 The ith sample in the tunnel operation monitoring sample data set which represents model input comprises CO and NO 2 Monitoring results of wind speed, fine particles and heavy load vehicle, y i ∈Y={c 1 ,c 2 ,c 3 ,c 4 The status space X is searched according to the status codes of the four states of light pollution, moderate pollution, severe pollution and severe pollution, i represents the number of the samples in the training set, i =1,2,3, \ 8230;, N 5 A decision function f (X) of X 5 → Y, to infer the tunnel operating status corresponding to any monitored sample.
3. A method for establishing a tunnel operation state perception model based on a random forest as claimed in claim 1, wherein a value of N is greater than 500.
4. The method for establishing the random forest-based tunnel operation state perception model according to claim 1, wherein in the step 4), one feature is selected from the features according to a principle of minimum node purities for branch growth.
5. The method for establishing the random forest-based tunnel operation state perception model according to claim 1, wherein the specific process of calculating the out-of-bag data error of each decision tree in the step 5) is as follows: the OOB number of the random forest is (1-1/N) of a training set T N When N is infinite, the number proportion of OOB is converged to 1/e ≈ 0.368, namely 37% of data samples cannot be extracted when each decision tree is constructed; since the OOB data is not used in the decision tree construction process, the prediction is made with the OOB data in the following manner:
suppose thatOOB partial data of any b-th block in the ntree decision tree, for each sample x of training set T i I =1,2,3, \8230: \8230N, average 1/e ≈ 0.368, paired by OOB dataCarrying out comprehensive prediction, wherein b is more than 0 and less than or equal to ntree; the estimate of its classification error rate ER is calculated according to the following equation:
in formula (1): i (-) denotes an indicator function, ER is the classification error rate, ER OOB The error is an out-of-bag data error,for out-of-bag data prediction results, Y i Is the actual result.
6. The method for establishing the tunnel operation state perception model based on the random forest as claimed in claim 1, wherein the combination value of the ntree and mtry parameters with the minimum unbiased estimation of the random forest generalization error and the shortest running time is selected as the optimal combination value of the ntree and mtry parameters in step 8).
CN201710737045.0A 2017-08-24 2017-08-24 A kind of method for building up of the tunnel operation state sensor model based on random forest Pending CN107563425A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710737045.0A CN107563425A (en) 2017-08-24 2017-08-24 A kind of method for building up of the tunnel operation state sensor model based on random forest

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710737045.0A CN107563425A (en) 2017-08-24 2017-08-24 A kind of method for building up of the tunnel operation state sensor model based on random forest

Publications (1)

Publication Number Publication Date
CN107563425A true CN107563425A (en) 2018-01-09

Family

ID=60976016

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710737045.0A Pending CN107563425A (en) 2017-08-24 2017-08-24 A kind of method for building up of the tunnel operation state sensor model based on random forest

Country Status (1)

Country Link
CN (1) CN107563425A (en)

Cited By (23)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108388919A (en) * 2018-02-28 2018-08-10 大唐高鸿信息通信研究院(义乌)有限公司 The identification of vehicle-mounted short haul connection net security feature and method for early warning
CN108446433A (en) * 2018-02-07 2018-08-24 广东省生态环境技术研究所 A kind of recognition methods, system and the device of soil acidification driving force
CN108596409A (en) * 2018-07-16 2018-09-28 江苏智通交通科技有限公司 The method for promoting traffic hazard personnel's accident risk prediction precision
CN108846338A (en) * 2018-05-29 2018-11-20 南京林业大学 Polarization characteristic selection and classification method based on object-oriented random forest
CN109063433A (en) * 2018-07-09 2018-12-21 中国联合网络通信集团有限公司 Recognition methods, device and the readable storage medium storing program for executing of fictitious users
CN109255159A (en) * 2018-08-17 2019-01-22 东南大学 A kind of circuit paths delay volatility forecast method based on machine learning
CN109283378A (en) * 2018-08-30 2019-01-29 番禺珠江钢管(珠海)有限公司 A kind of rotating arc welding is seamed into shape parameter detection method, system, device and medium
CN109300545A (en) * 2018-08-28 2019-02-01 昆明理工大学 A kind of method for prewarning risk of the thalassemia based on RF
CN109346182A (en) * 2018-08-28 2019-02-15 昆明理工大学 A kind of method for prewarning risk of the thalassemia based on CS-RF
CN109508817A (en) * 2018-10-23 2019-03-22 上海同岩土木工程科技股份有限公司 A kind of plant disease prevention method based on tunnel environment information
CN109598048A (en) * 2018-11-27 2019-04-09 杭州市地铁集团有限责任公司 A kind of lubrication degradation prediction technique of track vehicle door system
CN110096967A (en) * 2019-04-10 2019-08-06 同济大学 A kind of road anger driver's hazardous act characteristic variable screening technique based on random forests algorithm
CN110175195A (en) * 2019-04-23 2019-08-27 哈尔滨工业大学 Mixed gas detection model construction method based on extreme random tree
CN110318327A (en) * 2019-06-10 2019-10-11 长安大学 A kind of surface evenness prediction technique based on random forest
CN110457781A (en) * 2019-07-24 2019-11-15 中南大学 Train towards passenger comfort crosses tunnel duration calculation method
CN110751192A (en) * 2019-09-27 2020-02-04 南京大学 Random forest decision tree reasoning system and method based on CART algorithm
CN110795846A (en) * 2019-10-29 2020-02-14 东北财经大学 Construction method of boundary forest model, updating method of multi-working-condition soft computing model for complex industrial process and application of updating method
CN111352365A (en) * 2020-02-27 2020-06-30 益阳精锐科技有限公司 Dustproof ventilation type electric power and electrical equipment cabinet and control method
CN112381332A (en) * 2020-12-02 2021-02-19 中国科学院空天信息创新研究院 Population spatial distribution prediction method based on settlement object
CN113392885A (en) * 2021-05-31 2021-09-14 东南大学 Traffic accident space-time hot spot distinguishing method based on random forest theory
CN113392880A (en) * 2021-05-27 2021-09-14 扬州大学 Traffic flow short-time prediction method based on deviation correction random forest
CN113642241A (en) * 2021-08-17 2021-11-12 北京航空航天大学 Road network fine particle research method based on traffic running state
CN115017791A (en) * 2021-12-18 2022-09-06 中国铁道科学研究院集团有限公司电子计算技术研究所 Tunnel surrounding rock grade identification method and device

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106372748A (en) * 2016-08-29 2017-02-01 上海交通大学 Hard-rock tunnel boring machine boring efficiency prediction method
CN106548022A (en) * 2016-11-03 2017-03-29 上海隧道工程有限公司 The Forecasting Methodology and prognoses system of shield tunnel construction carbon emission amount

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106372748A (en) * 2016-08-29 2017-02-01 上海交通大学 Hard-rock tunnel boring machine boring efficiency prediction method
CN106548022A (en) * 2016-11-03 2017-03-29 上海隧道工程有限公司 The Forecasting Methodology and prognoses system of shield tunnel construction carbon emission amount

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
宋源 等: "基于统计特征随机森林算法的特征选择", 《计算机应用》 *
董师师 等: "随机森林理论浅析", 《集成技术》 *
钱超 等: "基于随机森林的公路隧道运营缺失数据插补方法", 《交通运输***工程与信息》 *

Cited By (32)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108446433A (en) * 2018-02-07 2018-08-24 广东省生态环境技术研究所 A kind of recognition methods, system and the device of soil acidification driving force
CN108388919A (en) * 2018-02-28 2018-08-10 大唐高鸿信息通信研究院(义乌)有限公司 The identification of vehicle-mounted short haul connection net security feature and method for early warning
CN108388919B (en) * 2018-02-28 2021-08-10 大唐高鸿信息通信(义乌)有限公司 Vehicle-mounted short-distance communication network safety feature identification and early warning method
CN108846338A (en) * 2018-05-29 2018-11-20 南京林业大学 Polarization characteristic selection and classification method based on object-oriented random forest
CN108846338B (en) * 2018-05-29 2022-04-15 南京林业大学 Polarization feature selection and classification method based on object-oriented random forest
CN109063433A (en) * 2018-07-09 2018-12-21 中国联合网络通信集团有限公司 Recognition methods, device and the readable storage medium storing program for executing of fictitious users
CN109063433B (en) * 2018-07-09 2021-04-30 中国联合网络通信集团有限公司 False user identification method and device and readable storage medium
CN108596409A (en) * 2018-07-16 2018-09-28 江苏智通交通科技有限公司 The method for promoting traffic hazard personnel's accident risk prediction precision
CN108596409B (en) * 2018-07-16 2021-07-20 江苏智通交通科技有限公司 Method for improving accident risk prediction precision of traffic hazard personnel
CN109255159A (en) * 2018-08-17 2019-01-22 东南大学 A kind of circuit paths delay volatility forecast method based on machine learning
CN109346182B (en) * 2018-08-28 2021-06-18 昆明理工大学 CS-RF-based risk early warning method for thalassemia
CN109300545A (en) * 2018-08-28 2019-02-01 昆明理工大学 A kind of method for prewarning risk of the thalassemia based on RF
CN109346182A (en) * 2018-08-28 2019-02-15 昆明理工大学 A kind of method for prewarning risk of the thalassemia based on CS-RF
CN109283378A (en) * 2018-08-30 2019-01-29 番禺珠江钢管(珠海)有限公司 A kind of rotating arc welding is seamed into shape parameter detection method, system, device and medium
CN109508817A (en) * 2018-10-23 2019-03-22 上海同岩土木工程科技股份有限公司 A kind of plant disease prevention method based on tunnel environment information
CN109598048A (en) * 2018-11-27 2019-04-09 杭州市地铁集团有限责任公司 A kind of lubrication degradation prediction technique of track vehicle door system
CN110096967A (en) * 2019-04-10 2019-08-06 同济大学 A kind of road anger driver's hazardous act characteristic variable screening technique based on random forests algorithm
CN110175195B (en) * 2019-04-23 2022-11-29 哈尔滨工业大学 Mixed gas detection model construction method based on extreme random tree
CN110175195A (en) * 2019-04-23 2019-08-27 哈尔滨工业大学 Mixed gas detection model construction method based on extreme random tree
CN110318327A (en) * 2019-06-10 2019-10-11 长安大学 A kind of surface evenness prediction technique based on random forest
CN110457781B (en) * 2019-07-24 2022-12-23 中南大学 Passenger comfort-oriented train tunnel-passing time length calculation method
CN110457781A (en) * 2019-07-24 2019-11-15 中南大学 Train towards passenger comfort crosses tunnel duration calculation method
CN110751192A (en) * 2019-09-27 2020-02-04 南京大学 Random forest decision tree reasoning system and method based on CART algorithm
CN110795846A (en) * 2019-10-29 2020-02-14 东北财经大学 Construction method of boundary forest model, updating method of multi-working-condition soft computing model for complex industrial process and application of updating method
CN111352365A (en) * 2020-02-27 2020-06-30 益阳精锐科技有限公司 Dustproof ventilation type electric power and electrical equipment cabinet and control method
CN112381332A (en) * 2020-12-02 2021-02-19 中国科学院空天信息创新研究院 Population spatial distribution prediction method based on settlement object
CN113392880A (en) * 2021-05-27 2021-09-14 扬州大学 Traffic flow short-time prediction method based on deviation correction random forest
CN113392880B (en) * 2021-05-27 2021-11-23 扬州大学 Traffic flow short-time prediction method based on deviation correction random forest
CN113392885A (en) * 2021-05-31 2021-09-14 东南大学 Traffic accident space-time hot spot distinguishing method based on random forest theory
CN113642241A (en) * 2021-08-17 2021-11-12 北京航空航天大学 Road network fine particle research method based on traffic running state
CN113642241B (en) * 2021-08-17 2023-10-31 北京航空航天大学 Road network fine particulate matter research method based on traffic running state
CN115017791A (en) * 2021-12-18 2022-09-06 中国铁道科学研究院集团有限公司电子计算技术研究所 Tunnel surrounding rock grade identification method and device

Similar Documents

Publication Publication Date Title
CN107563425A (en) A kind of method for building up of the tunnel operation state sensor model based on random forest
Shangguan et al. An integrated methodology for real-time driving risk status prediction using naturalistic driving data
CN105447504B (en) A kind of travel pattern Activity recognition method and corresponding identification model construction method
CN112085947A (en) Traffic jam prediction method based on deep learning and fuzzy clustering
Sielenou et al. Combining random forests and class-balancing to discriminate between three classes of avalanche activity in the French Alps
CN110737874A (en) watershed water quality monitoring abnormal value detection method based on spatial relationship
Shang et al. A hybrid method for traffic incident detection using random forest-recursive feature elimination and long short-term memory network with Bayesian optimization algorithm
CN110674858B (en) Traffic public opinion detection method based on space-time correlation and big data mining
CN110880369A (en) Gas marker detection method based on radial basis function neural network and application
CN113313145B (en) Expressway traffic event detection method based on mixed kernel correlation vector machine
CN110555565A (en) Decision tree model-based expressway exit ramp accident severity prediction method
CN106935038B (en) Parking detection system and detection method
CN108564110B (en) Air quality prediction method based on clustering algorithm
CN115563546A (en) Intelligent gas smell identification method, system, medium, equipment and terminal
CN110689140A (en) Method for intelligently managing rail transit alarm data through big data
CN116341901A (en) Integrated evaluation method for landslide surface domain-monomer hazard early warning
CN114863170A (en) Deep learning-based new energy vehicle battery spontaneous combustion early warning method and device
CN107992902A (en) A kind of routine bus system based on supervised learning steals individual automatic testing method
Bleu-Laine et al. Predicting adverse events and their precursors in aviation using multi-class multiple-instance learning
CN111708865B (en) Technology forecasting and patent early warning analysis method based on improved XGboost algorithm
CN111985782A (en) Automatic tramcar driving risk assessment method based on environment perception
Fang et al. A deep cycle limit learning machine method for urban expressway traffic incident detection
Tišljarić et al. Fuzzy inference system for congestion index estimation based on speed probability distributions
CN112733903B (en) SVM-RF-DT combination-based air quality monitoring and alarming method, system, device and medium
Jiang et al. Parametric calibration of speed–density relationships in mesoscopic traffic simulator with data mining

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20180109