CN108596409A - The method for promoting traffic hazard personnel's accident risk prediction precision - Google Patents
The method for promoting traffic hazard personnel's accident risk prediction precision Download PDFInfo
- Publication number
- CN108596409A CN108596409A CN201810783017.7A CN201810783017A CN108596409A CN 108596409 A CN108596409 A CN 108596409A CN 201810783017 A CN201810783017 A CN 201810783017A CN 108596409 A CN108596409 A CN 108596409A
- Authority
- CN
- China
- Prior art keywords
- data
- personnel
- model
- sampling
- accident
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 title claims abstract description 41
- 230000001737 promoting effect Effects 0.000 title claims abstract description 13
- 238000004422 calculation algorithm Methods 0.000 claims abstract description 62
- 238000005070 sampling Methods 0.000 claims abstract description 61
- 238000005457 optimization Methods 0.000 claims abstract description 34
- 230000002068 genetic effect Effects 0.000 claims abstract description 29
- 238000013480 data collection Methods 0.000 claims description 26
- 238000012360 testing method Methods 0.000 claims description 22
- 238000012549 training Methods 0.000 claims description 21
- 238000004458 analytical method Methods 0.000 claims description 14
- 230000006870 function Effects 0.000 claims description 12
- 206010039203 Road traffic accident Diseases 0.000 claims description 10
- 238000012545 processing Methods 0.000 claims description 7
- 230000000694 effects Effects 0.000 claims description 6
- 238000002790 cross-validation Methods 0.000 claims description 5
- 230000035772 mutation Effects 0.000 claims description 5
- 230000007547 defect Effects 0.000 claims description 4
- 238000001514 detection method Methods 0.000 claims description 4
- 230000000452 restraining effect Effects 0.000 claims description 4
- 238000013058 risk prediction model Methods 0.000 claims description 4
- 238000007637 random forest analysis Methods 0.000 claims description 3
- 230000002123 temporal effect Effects 0.000 claims description 3
- 230000009466 transformation Effects 0.000 claims description 3
- 238000000844 transformation Methods 0.000 claims description 3
- 230000009514 concussion Effects 0.000 claims description 2
- 230000001850 reproductive effect Effects 0.000 claims description 2
- 238000010276 construction Methods 0.000 abstract description 2
- 238000010586 diagram Methods 0.000 description 4
- 241000208340 Araliaceae Species 0.000 description 3
- 235000005035 Panax pseudoginseng ssp. pseudoginseng Nutrition 0.000 description 3
- 235000003140 Panax quinquefolius Nutrition 0.000 description 3
- 238000009395 breeding Methods 0.000 description 3
- 230000001488 breeding effect Effects 0.000 description 3
- 235000008434 ginseng Nutrition 0.000 description 3
- 241001269238 Data Species 0.000 description 2
- 230000006399 behavior Effects 0.000 description 2
- 239000000284 extract Substances 0.000 description 2
- 238000000605 extraction Methods 0.000 description 2
- 230000027272 reproductive process Effects 0.000 description 2
- 241000607479 Yersinia pestis Species 0.000 description 1
- 238000013528 artificial neural network Methods 0.000 description 1
- 238000009412 basement excavation Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000006378 damage Effects 0.000 description 1
- 238000003066 decision tree Methods 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 238000000556 factor analysis Methods 0.000 description 1
- 239000000203 mixture Substances 0.000 description 1
- 238000005192 partition Methods 0.000 description 1
- 238000004445 quantitative analysis Methods 0.000 description 1
- 238000011160 research Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q10/00—Administration; Management
- G06Q10/04—Forecasting or optimisation specially adapted for administrative or management purposes, e.g. linear programming or "cutting stock problem"
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q50/00—Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
- G06Q50/10—Services
- G06Q50/26—Government or public services
Landscapes
- Business, Economics & Management (AREA)
- Engineering & Computer Science (AREA)
- Strategic Management (AREA)
- Human Resources & Organizations (AREA)
- Tourism & Hospitality (AREA)
- Economics (AREA)
- Physics & Mathematics (AREA)
- General Business, Economics & Management (AREA)
- Marketing (AREA)
- Theoretical Computer Science (AREA)
- Development Economics (AREA)
- General Physics & Mathematics (AREA)
- Educational Administration (AREA)
- Health & Medical Sciences (AREA)
- General Health & Medical Sciences (AREA)
- Primary Health Care (AREA)
- Game Theory and Decision Science (AREA)
- Entrepreneurship & Innovation (AREA)
- Operations Research (AREA)
- Quality & Reliability (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
- Traffic Control Systems (AREA)
Abstract
The present invention provides a kind of method promoting traffic hazard personnel's accident risk prediction precision, traffic violation data and casualty data sample are obtained with the methods of sampling of optimization, traffic participant street accidents risks prediction model is trained using Ensemble Learning Algorithms, and model optimization is carried out by genetic algorithm.The present invention excavates the security feature of traffic trip person with Ensemble Learning Algorithms in traffic violation data, it is improved using the optimization methods of sampling in the sampling link of model construction and is based on initial model performance, and Model Parameter Optimization is carried out with genetic algorithm, effectively promote high-risk personnel accident risk prediction precision.
Description
Technical field
The present invention relates to a kind of methods promoting traffic hazard personnel's accident risk prediction precision.
Background technology
Some researches show that there are correlativity, driver, the pedestrians of traffic offence reservation between traffic offence and traffic accident
Equal traffic participants attribute can provide data supporting with behavior for the human factor analysis in traffic safety.The excavation of data can
With classificating thought, the security feature of traffic offence personnel is excavated according to personnel attribute variable.
Traditional sorting technique is that one is found in the space being made of various possible functions at one closest to reality
The grader of classification function, but be typically only capable to obtain the Weakly supervised model of preference under actual conditions, the reliability of model is bad.
Ensemble Learning Algorithms improve the performance of final mask by the combination of Weakly supervised model.But the parameter of integrated study model complexity
Composition carrys out certain difficulty for modelling effect elevator belt.And genetic algorithm be able to solve global optimum or near-optimization well
As a result, providing the feasible scheme for promoting precision.
Invention content
The object of the present invention is to provide a kind of methods promoting traffic hazard personnel's accident risk prediction precision, using optimization
The Ensemble Learning Algorithms of sampling, and parameter optimization is carried out by genetic algorithm, to there are the traffic of traffic law violation recording ginsengs
Qualitative assessment is carried out with person's danger level, is filled up currently in the missing of traffic safety participant's factor quantitative analysis method, and effectively
Promote high-risk personnel accident risk prediction precision.
Technical solution of the invention is:
A method of traffic hazard personnel's accident risk prediction precision being promoted, obtaining traffic with the methods of sampling of optimization disobeys
Method data and casualty data sample train traffic participant street accidents risks prediction model, into one using Ensemble Learning Algorithms
Step carries out model optimization to promote prediction result accuracy by genetic algorithm, includes the following steps,
S1, based on original traffic violation data and casualty data, it is structure unlawful data collection, major accident data set, light
Micro- casualty data collection.
S2, unlawful data collection two is classified, i.e. high-risk personnel, general staff, data markers value is determined according to classifying rules
Unlawful data collection is divided into high-risk personnel data subset D, general staff's data subset N and subset U to be identified by label accordingly.
S3, initial traffic participant danger level prediction model P0 is built using the optimization methods of sampling and Ensemble Learning Algorithms,
Determine model sampling number, SMOTE sampling proportions.
S4, performance optimization is carried out to model P0 using genetic algorithm, optimization object function is test set prediction accuracy
It maximizes, wherein test set Accuracy Analysis method is that k rolls over cross validation;Genetic algorithm parameter is set, object function is made to restrain
Speed is fast, avoids shaking the case where not restraining;Wherein genetic algorithm parameter includes cross selection probability, mutation probability, region of variability
Between, Population breeding algebraically, initial population quantity.
S5, the target optimal model parameters exported according to genetic algorithm, build the optimal of personnel at risk's accident risk prediction
Model of fit P determines model test coverage recall and Model checking threshold value;
S6, the subset data input model P to be identified by S2 export target object danger level.
Further, the Ensemble Learning Algorithms described in step S3 include random forests algorithm, AdaBoost algorithms,
XgBoost algorithms, GBDT algorithms.
Further, the optimization methods of sampling described in step S3 the specific steps are:
S31, sampling interval S is set according to data set N sample sizes and recycles step-length k, section coboundary s is usually no more than
Total sample size 25%;
S32, sample size nm=s0+ (m-1) k, s0 are sampling interval lower limiting value, and m is cycle-index, initial value 1;From number
According to integrating in N randomly drawing sample amount as the sample Nm of nm;
S33, data set D and Nm intersection Gm is split as training set and test set;
S34, SMOTE sampling is carried out to training set, setting high-risk personnel data subset D expands sample ratio ai;Wherein, work as i=1
When, ai=1 works as i>When 1, ai=ai-1+1, i initial values are the value upper limit that 1, i is equipped with setting;
S35, expand sample ratio ai, setting general staff's Nm data subset contracting sample ratios bj for high-risk personnel;Wherein, work as j
When=1, bj=1 works as j>When 1, bj=bj-1+1, j initial values are the value upper limit that 1, j is equipped with setting;For SMOTE sampling proportions
ai:Bj is trained expansion sample, contracting the sample processing of two class exemplars in collection, the training sample set as grader;
S36, the training that high-risk personnel grader is carried out with Ensemble Learning Algorithms, determine model parameter, realize traffic ginseng
With person's street accidents risks prediction modelFitting, model being capable of output token value and risk probability;
S37, model is carried out with test set dataAssessment, obtains the model accuracy of different coverage rates
S38, the interior data of sampling samples Nm supplementary sets Nm ' in general staff's data subset N are classified according to illegal number, and
Category input modelPeople Tab's False Rate of different coverage rate drags output is counted
Whether S39, j reach the value upper limit;If so, judging whether i reaches the value upper limit, if so, into S310, otherwise
I=i+1 is transferred to S34;Otherwise, j=j+1 goes to S35;
Whether S310, detection nm reach sampling interval upper limit value s, if then entering S311, otherwise m=m+1, returns to S32;
S311, the model by model accuracy, False Rate analysis with optimal performanceDetermine optimal random sampling
Number M, SMOTE sampling proportion I, J.
Further, the method that corresponding data mark value label is assigned based on classifying rules described in step S2 is specific
For:
High-risk personnel:One kind for there are it is illegal record and exist take the main responsibility or the severe traffic accidents of fullliability note
The personnel of record;Another kind of is there are illegal record, and there is only minor accident records, and accident record is not less than 2 personnel;
General staff:There are the personnel of illegal record but zero defects record;
The data for being unsatisfactory for above-mentioned criterion constitute subset to be identified.
Further, traffic violation data original in step S1 and casualty data include related personnel's certificate information;It is right
Illegal record is collected, obtains unlawful data collection after processing operation of classifying;Unlawful data collection is illegal record bulk sample notebook data,
Unlawful data collection information includes personnel's passport NO., illegal number, illegal type, deduction of points fine situation, the related illegal row of accident
For a situation arises, the illegal period of right time.
Further, a situation arises is obtained by correspondence analysis mode for the illegal activities of accident correlation in step S1, and extracts
The higher Criminal type of traffic accident influence degree, the data attribute as unlawful data collection.
Further, it is discrete variable, root that the illegal period of right time described in step S1, which is by Continuous-time variables transformations,
Classify according to illegal temporal characteristics.
The beneficial effects of the invention are as follows:
One, present invention employs genetic algorithms optimizes initial fitted model parameters, has been obviously improved traffic hazard
Personnel's accident risk prediction precision.
Two, the Ensemble Learning Algorithms that the present invention uses, compared to conventional sorting methods such as decision tree, neural networks, pre-
Surveying has significant advantage in performance, ensure that the accuracy of personnel at risk's street accidents risks prediction.
Three, the present invention excavates traffic violation data using the Ensemble Learning Algorithms of optimization, realizes and is joined based on traffic
With the traffic safety risk qualitative assessment of the illegal record of person, model can export the traffic hazard degree of personnel.
Description of the drawings
Fig. 1 is the method flow schematic diagram that the embodiment of the present invention promotes traffic hazard personnel's accident risk prediction precision.
Fig. 2 is the idiographic flow schematic diagram for the optimization methods of sampling that S3 is used in embodiment.
Fig. 3 is that data set illustrates schematic diagram in embodiment.
Fig. 4 is the genetic algorithm reproductive process schematic diagram that S5 is used in embodiment.
Specific implementation mode
The preferred embodiment that the invention will now be described in detail with reference to the accompanying drawings.
Embodiment
A method of traffic hazard personnel's accident risk prediction precision being promoted, obtaining traffic with the methods of sampling of optimization disobeys
Method data and casualty data sample train traffic participant street accidents risks prediction model, into one using Ensemble Learning Algorithms
Step carries out model optimization to promote prediction result accuracy, such as Fig. 1 by genetic algorithm.Embodiment method is with Ensemble Learning Algorithms
The security feature that traffic trip person is excavated in traffic violation data uses the optimization methods of sampling in the sampling link of model construction
It improves and is based on initial model performance, and Model Parameter Optimization is carried out with genetic algorithm, effectively promote high-risk personnel accident risk
Precision of prediction.Specifically method flow is:
S1, based on original traffic violation data and casualty data, it is structure unlawful data collection, major accident data set, light
Micro- casualty data collection.
Wherein, original traffic violation data and casualty data include related personnel's certificate information;Illegal record is carried out
Collect, obtain unlawful data collection after processing operation of classifying;Unlawful data collection is illegal record bulk sample notebook data, unlawful data collection letter
Breath includes personnel's passport NO., illegal number, illegal type, deduction of points fine situation, a situation arises for the illegal activities of accident correlation, disobeys
The method period of right time;A situation arises is obtained by correspondence analysis mode for the illegal activities of accident correlation, and extracts traffic accident and influence journey
Higher Criminal type is spent, the data attribute as unlawful data collection;The illegal period of right time is by Continuous-time variables transformations
For discrete variable, classified according to illegal temporal characteristics.
S2, unlawful data collection two is classified, i.e. high-risk personnel, general staff, data markers value is determined according to classifying rules
Unlawful data collection is divided into high-risk personnel data subset D, general staff's data subset N and subset U to be identified by label accordingly.
Wherein classifying rules is specially:High-risk personnel refers to (1) there are illegal record and presence is taken the main responsibility or whole duties
The traffic participant (including motor vehicle, non-motor vehicle driver and pedestrian) for the severe traffic accidents record appointed;(2) there are separated
Method records, and there is only minor accident records, and accident record is not less than 2 traffic participants;General staff refers to that there are illegal
The traffic participant of record but zero defects record;The data for being unsatisfactory for above-mentioned criterion constitute subset to be identified.
S3, initial traffic participant danger level prediction model P0 is built using the optimization methods of sampling and Ensemble Learning Algorithms,
Determine model sampling number, SMOTE sampling proportions;Wherein Ensemble Learning Algorithms include random forests algorithm, AdaBoost algorithms,
XgBoost algorithms, GBDT algorithms.As shown in Fig. 2, detailed process is:
S31, sampling interval S is set according to data set N sample sizes and recycles step-length k, section coboundary s is usually no more than
Total sample size 25%;
S32, sample size nm=s0+ (m-1) k, s0 are sampling interval lower limiting value, and m is cycle-index, initial value 1;From number
According to integrating in N randomly drawing sample amount as the sample Nm of nm;
S33, data set D and Nm intersection Gm is split as training set and test set;
S34, SMOTE sampling is carried out to training set, setting high-risk personnel data subset D expands sample ratio ai;Wherein, work as i=1
When, ai=1 works as i>When 1, ai=ai-1+1, the i value upper limits are usually 4;
S35, expand sample ratio ai, setting general staff's Nm data subset contracting sample ratios bj for high-risk personnel;Wherein, work as j
When=1, bj=1 works as j>When 1, bj=bj-1+1, the j value upper limits are usually 4;For SMOTE sampling proportions ai:Bj is instructed
Practice expansion sample, contracting the sample processing of two class exemplars in collection, the training sample set as grader;
S36, the training that high-risk personnel grader is carried out with Ensemble Learning Algorithms, determine model parameter, realize traffic ginseng
With person's street accidents risks prediction modelFitting, model being capable of output token value and risk probability;
S37, model is carried out with test set dataAssessment, obtains the model accuracy of different coverage rates
S38, the interior data of sampling samples Nm supplementary sets Nm ' in general staff's data subset N are classified according to illegal number, and
Category input modelPeople Tab's False Rate of different coverage rate drags output is counted
Whether S39, j reach the value upper limit;If so, judging whether i reaches the value upper limit, if so, into S310, otherwise
I=i+1 is transferred to S34;Otherwise, j=j+1 goes to S35;
Whether S310, detection nm reach sampling interval upper limit value s, if then entering S311, otherwise m=m+1, returns to S32;
S311, the model by model accuracy, False Rate analysis with optimal performanceDetermine optimal random sampling
Number M, SMOTE sampling proportion I, J.
S4, performance optimization is carried out to model P0 using genetic algorithm, optimization object function is test set prediction accuracy
It maximizes, wherein test set Accuracy Analysis method is that k rolls over cross validation;Genetic algorithm parameter is set, object function is made to restrain
Speed is fast, avoids shaking the case where not restraining;Wherein genetic algorithm parameter includes cross selection probability, mutation probability, region of variability
Between, Population breeding algebraically, initial population quantity.
S5, the target optimal model parameters exported according to genetic algorithm, build the optimal of personnel at risk's accident risk prediction
Model of fit P determines model test coverage recall and Model checking threshold value;
S6, the subset data input model P to be identified by S2 export target object danger level.
Specific example
The present embodiment artificially analyzes object with motor vehicle driving.
S1, traffic law violation recording and accident record by obtaining 2 years in region with connection.
Killed or wounded will occur seriously or the traffic accident of hit-and-run occurs as major accident, other accidents conduct
Minor accident accordingly classifies to original accident record, and using accident pattern and driver's certificate information as serious thing
Therefore the attributive character of data set and minor accident data set, obtain two data set sample datas.
Further, illegal initial data is pre-processed, the illegal information of driver is carried out to collect statistics, including
Add up illegal number, illegal type, accumulated deduction score value, score value (point/time) of averagely deducting points, single maximum deduction of points score value, add up
Impose a fine the amount of money, the average penalty amount of money (member/time).
Dimension-reduction treatment is carried out to traffic accident data and illegal initial data using correspondence analysis, according to illegal and accident
Correlation in type classifies to illegal type, and it is illegal as accident risk to extract wherein highest five class of correlation
The data attribute of behavior field, as shown in table 1.
1. accident correlation Criminal type dividing condition of table
According to the traffic flow operation of embodiment region road network and traffic offence event pests occurrence rule feature, by the time
It is polymerize, and the Partition Analysis period, converts continuous variable to nominal type variable;In another embodiment, by poly-
Other statisticals such as class carry out Time segments division.
Driver's characteristic is then encoded according to extraction driver's age, gender, affiliated provinces and cities in driver's passport NO.;
Unlawful data collection is generated according to the information of above-mentioned each link extraction, as shown in table 2.
2. unlawful data collection partial data of table
S2, high-risk driver and the classification of general driver two are carried out to this I of bulk sample in unlawful data collection.Such as Fig. 4, there will be
Illegal record and presence are taken the main responsibility or the driver of the severe traffic accidents of fullliability record is as high-risk driver's
A kind of situation, qualified data divide data set D1 into;There will be illegal record, there is only minor accident record, and accident
Another situation of driver of the record not less than 2 as high-risk driver, qualified data divide data set D2 into;It is high
Endanger driver's data set D=D1+D2.There are driver's corresponding datas of illegal record but zero defects record to synthesize general driver
Data set N.
The data for meeting rule are concentrated to determine high-risk or general data markers value label unlawful data accordingly, in addition
It can not be suitable for the data subset U=I-N-D of this classifying rules, then be data subset to be identified.
S3, initial vehicle driver danger level prediction model P0 is built using the optimization methods of sampling and XgBoost algorithms, really
Cover half type sampling number, SMOTE sampling proportions;
S31, sampling interval S is set according to data set N sample sizes and recycles step-length k, section coboundary s is usually no more than
Total sample size 25%;In the present embodiment, data set sample size is more than 84000, sampling interval S=[200,4000], cycle step-length k
It is 200.
S32, sample size nm=s0+ (m-1) k, s0 are sampling interval lower limiting value, and m is cycle-index, initial value 1;From number
According to integrating in N randomly drawing sample amount as the sample Nm of nm;In the present embodiment, initial sample number is 200.
S33, data set D and Nm intersection Gm is split as training set and test set;In the present embodiment, training set and test set
Primary contract be 9:1.
S34, SMOTE sampling is carried out to training set, high-risk driver's data subset D is set and expands sample ratio ai, wherein a1=
1, ai=ai-1+1, i initial value are the value upper limit that 1, i is equipped with setting, and i maximum values are 4;
S35, sample ratio ai is expanded for high-risk driver, general driver Nm data subsets contracting sample ratio bj is set, wherein
B1=1, bj=bj-1+1, j initial value are the value upper limit that 1, j is equipped with setting, and j maximum values are 4;For SMOTE sampling proportions ai:
Bj is trained expansion sample, contracting the sample processing of two class exemplars in collection, the training sample set as grader;
S36, the training that high-risk driver's grader is carried out with XgBoost algorithms determine model parameter, realize driver
Street accidents risks prediction modelFitting, model can export driver's mark value and risk probability;Model parameter packet
Include learning rate, Weak Classifier number, maximal tree depth, node minimum split values, leaf node smallest sample number, leaf node weights sum
Minimum value, minimize loss function value, line sampling rate, row sampling rate, regularization term 1, regularization term 2, positive and negative Weight balance item,
Training condition is terminated in advance;
S37, model is carried out with test set dataAssessment, obtains the model accuracy of different coverage rates
S38, the interior data of sampling samples Nm supplementary sets Nm ' in general driver's data subset N are classified according to illegal number,
And category input modelDriver's label False Rate of different coverage rate drags output is counted
Whether S39, j reach setting maximum value;If so, judge whether i reaches setting maximum value, if so, into S310,
Otherwise i=i+1 is transferred to S34;Otherwise, j=j+1 goes to S35;
Whether S310, detection nm reach section upper limit s, if then entering S311, otherwise m=m+1, returns to S32;
S311, the model by model accuracy, False Rate analysis with optimal performanceDetermine optimal random sampling
Number M, SMOTE sampling proportion I, J.
In the present embodiment, comprehensive False Rate, accuracy and index stability compare and analyze, determining optimal performance mould
Type isIt is 2 that i.e. random sampling sample number, which is 2400, SMOTE ratios,:2.
S4, performance optimization is carried out to model P0 using genetic algorithm, optimization object function be test set precision of prediction most
Bigization, wherein test set precision analytical method are that k rolls over cross validation;Genetic algorithm parameter is set, object function convergence rate is made
Soon, the case where avoiding concussion from not restraining;Wherein genetic algorithm parameter includes cross selection probability, mutation probability, variation section, kind
Group's reproductive order of generation, initial population quantity.
In the embodiment, use the test set precision under 10 folding cross validations for object function, genetic algorithm parameter is specific
It is set as:Cross selection probability CrossoverProbaiblity=0.8, mutation probability MutationProbability=
0.5, variation section Sigma=[[- 10,10], [- 2,2], [- 2,2], [- 2,2], [- 2,2]], Population breeding algebraically
Iteration=500, initial population quantity Population=100.Genetic algorithm reproductive process such as Fig. 4 institutes of parameter optimization
Show.
S5, the target optimal model parameters exported according to genetic algorithm, structure vehicle drive people's danger level are predicted optimal
Model of fit P determines model test coverage recall and Model checking threshold value.
In embodiment, the design parameter based on the initial model of XgBoost after genetic algorithm optimization is:Learning rate
Learning_rate_value=0.09, Weak Classifier number n_estimators_value=367, maximal tree depth max_
Depth_value=4, node minimum split values min_samples_split_value=10, leaf node smallest sample number min_
Samples_leaf_value=6, leaf node weights sum minimum value min_child_weight_value=3 minimize damage
Lose functional value gamma_value=0, line sampling rate subsample_value=0.45, row sampling rate colsample_
Bytree_value=0.1, regularization term 1reg_lambda_value=11, regularization term 2reg_alpha_value=11,
Positive and negative Weight balance item scale_pos_weight_value=1, training condition early_stopping_ is terminated in advance
Rounds_value=37.
Model accuracy after parameter optimization reaches 0.76.
S6, the subset data input model P to be identified by S2 export driver's danger level.Partial results are as shown in table 3.
Table 3. uses high-risk driver's hazard degree analysis result of the method for the present invention
Claims (7)
1. a kind of method promoting traffic hazard personnel's accident risk prediction precision, it is characterised in that:With the methods of sampling of optimization
Traffic violation data and casualty data sample are obtained, using Ensemble Learning Algorithms training traffic participant street accidents risks prediction
Model further carries out model optimization to promote prediction result accuracy by genetic algorithm, specifically includes following steps:
S1, based on original traffic violation data and casualty data, structure unlawful data collection, major accident data set, slight thing
Therefore data set;
S2, unlawful data collection two is classified, i.e. high-risk personnel, general staff, data markers value is determined according to classifying rules
Unlawful data collection is divided into high-risk personnel data subset D, general staff's data subset N and subset U to be identified by label accordingly;
S3, initial personnel at risk's accident risk prediction model P is built using the optimization methods of sampling and Ensemble Learning Algorithms0, determine mould
Type sampling number, SMOTE sampling proportions;
S4, using genetic algorithm to model P0Performance optimization is carried out, optimization object function is that test set prediction accuracy is maximum
Change, wherein test set Accuracy Analysis method is that k rolls over cross validation;Genetic algorithm parameter is set, object function convergence rate is made
Soon, the case where avoiding concussion from not restraining;Wherein genetic algorithm parameter includes cross selection probability, mutation probability, variation section, kind
Group's reproductive order of generation, initial population quantity;
S5, the target optimal model parameters exported according to genetic algorithm, build the optimal fitting of personnel at risk's accident risk prediction
Model P determines model test coverage recall and Model checking threshold value;
S6, the subset data input model P to be identified by step S2 export target object danger level.
2. the method for promoting traffic hazard personnel's accident risk prediction precision as described in claim 1, which is characterized in that step
Ensemble Learning Algorithms described in S3 include random forests algorithm, AdaBoost algorithms, XgBoost algorithms, GBDT algorithms.
3. the method for promoting traffic hazard personnel's accident risk prediction precision as described in claim 1, which is characterized in that step
The optimization methods of sampling described in S3 the specific steps are:
S31, sampling interval S and cycle step-length k is set according to data set N sample sizes;
S32, sample size nm=s0+ (m-1) k, s0For sampling interval lower limiting value, m is cycle-index, initial value 1;From data set N
Middle randomly drawing sample amount is nmSample Nm;
S33, by data set D and NmIntersection GmIt is split as training set and test set;
S34, SMOTE sampling is carried out to training set, setting high-risk personnel data subset D expands sample ratio ai;Wherein, as i=1, ai
=1, work as i>When 1, ai=ai-1+ 1, i initial value are the value upper limit that 1, i is equipped with setting;
S35, sample ratio a is expanded for high-risk personneli, setting general staff NmData subset contracting sample ratio bj;Wherein, as j=1,
bj=1, work as j>When 1, bj=bj-1+ 1, j initial value are the value upper limit that 1, j is equipped with setting;For SMOTE sampling proportions ai:bj, into
Expansion sample, contracting the sample processing of two class exemplars, the training sample set as grader in row training set;
S36, the training that high-risk personnel grader is carried out with Ensemble Learning Algorithms, determine model parameter, realize traffic participant
Street accidents risks prediction modelFitting, model being capable of output token value and risk probability;
S37, model is carried out with test set dataAssessment, obtains the model accuracy of different coverage rates
S38, by the sampling samples N in general staff's data subset NmSupplementary set Nm' interior data are classified according to illegal number, and press class
Other input modelPeople Tab's False Rate of different coverage rate drags output is counted
Whether S39, j reach the value upper limit;If so, judge whether i reaches the value upper limit, if so, into S310, otherwise i=i
+ 1, it is transferred to S34;Otherwise, j=j+1 goes to S35;
S310, detection nmWhether sampling interval upper limit value s is reached, if then entering S311, otherwise m=m+1, returns to S32;
S311, the model by model accuracy, False Rate analysis with optimal performanceDetermine optimal random sampling numbers M,
SMOTE sampling proportions I, J.
4. the method for promoting traffic hazard personnel's accident risk prediction precision as described in claim 1, which is characterized in that step
The method for assigning corresponding data mark value label based on classifying rules described in S2 is specially:
High-risk personnel:One kind for there are it is illegal record and exist take the main responsibility or the severe traffic accidents of fullliability record
Personnel;Another kind of is there are illegal record, and there is only minor accident records, and accident record is not less than 2 personnel;
General staff:There are the personnel of illegal record but zero defects record;
The data for being unsatisfactory for above-mentioned criterion constitute subset to be identified.
5. the method for promoting traffic hazard personnel's accident risk prediction precision as described in claim 1, it is characterised in that:Step
Original traffic violation data and casualty data include related personnel's certificate information in S1;Illegal record is collected, is classified
Unlawful data collection is obtained after processing operation;Unlawful data collection is illegal record bulk sample notebook data, and unlawful data collection information includes people
Member passport NO., illegal number, illegal type, deduction of points fine situation, a situation arises for the illegal activities of accident correlation, illegal generation when
Section.
6. the method for promoting traffic hazard personnel's accident risk prediction precision as described in claim 1, it is characterised in that:Step
A situation arises is obtained by correspondence analysis mode for the illegal activities of accident correlation in S1, and it is higher to extract traffic accident influence degree
Criminal type, the data attribute as unlawful data collection.
7. the method for promoting traffic hazard personnel's accident risk prediction precision as described in claim 1, it is characterised in that:Step
It is discrete variable that the illegal period of right time described in S1, which is by Continuous-time variables transformations, is divided according to illegal temporal characteristics
Class.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810783017.7A CN108596409B (en) | 2018-07-16 | 2018-07-16 | Method for improving accident risk prediction precision of traffic hazard personnel |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810783017.7A CN108596409B (en) | 2018-07-16 | 2018-07-16 | Method for improving accident risk prediction precision of traffic hazard personnel |
Publications (2)
Publication Number | Publication Date |
---|---|
CN108596409A true CN108596409A (en) | 2018-09-28 |
CN108596409B CN108596409B (en) | 2021-07-20 |
Family
ID=63617732
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201810783017.7A Active CN108596409B (en) | 2018-07-16 | 2018-07-16 | Method for improving accident risk prediction precision of traffic hazard personnel |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN108596409B (en) |
Cited By (15)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109408557A (en) * | 2018-09-29 | 2019-03-01 | 东南大学 | A kind of traffic accidents reason analysis method clustered based on multiple correspondence and K-means |
CN109558969A (en) * | 2018-11-07 | 2019-04-02 | 南京邮电大学 | A kind of VANETs car accident risk forecast model based on AdaBoost-SO |
CN109598931A (en) * | 2018-11-30 | 2019-04-09 | 江苏智通交通科技有限公司 | Group based on traffic safety risk divides and difference analysis method and system |
CN109635990A (en) * | 2018-10-12 | 2019-04-16 | 阿里巴巴集团控股有限公司 | A kind of training method, prediction technique, device and electronic equipment |
CN110379161A (en) * | 2019-07-18 | 2019-10-25 | 中南大学 | A kind of city road network traffic flow amount distribution method |
CN111081016A (en) * | 2019-12-18 | 2020-04-28 | 北京航空航天大学 | Urban traffic abnormity identification method based on complex network theory |
CN111080012A (en) * | 2019-12-17 | 2020-04-28 | 北京明略软件***有限公司 | Personnel risk degree prediction method and device, electronic equipment and readable storage medium |
WO2020083400A1 (en) * | 2018-10-26 | 2020-04-30 | 江苏智通交通科技有限公司 | Traffic accident data intelligent analysis and comprehensive application system |
CN111881988A (en) * | 2020-07-31 | 2020-11-03 | 北京航空航天大学 | Heterogeneous unbalanced data fault detection method based on minority class oversampling method |
CN112016735A (en) * | 2020-07-17 | 2020-12-01 | 厦门大学 | Patrol route planning method and system based on traffic violation hotspot prediction and readable storage medium |
CN112667919A (en) * | 2020-12-28 | 2021-04-16 | 山东大学 | Personalized community correction scheme recommendation system based on text data and working method thereof |
CN113076974A (en) * | 2021-03-09 | 2021-07-06 | 麦哲伦科技有限公司 | Multi-task learning method with parallel filling and classification of missing values of multi-layer sensing mechanism |
CN113793502A (en) * | 2021-09-15 | 2021-12-14 | 国网电动汽车服务(天津)有限公司 | Pedestrian crossing prediction method under no-signal-lamp control |
CN115035722A (en) * | 2022-06-20 | 2022-09-09 | 浙江嘉兴数字城市实验室有限公司 | Road safety risk prediction method based on combination of spatio-temporal features and social media |
CN117009767A (en) * | 2023-08-10 | 2023-11-07 | 中国环境科学研究院 | Soil benchmark formulation and risk assessment method based on bioavailability |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103246897A (en) * | 2013-05-27 | 2013-08-14 | 南京理工大学 | Internal structure adjusting method of weak classifier based on AdaBoost |
CN103462618A (en) * | 2013-09-04 | 2013-12-25 | 江苏大学 | Automobile driver fatigue detecting method based on steering wheel angle features |
JP5892663B2 (en) * | 2011-06-21 | 2016-03-23 | 国立大学法人 奈良先端科学技術大学院大学 | Self-position estimation device, self-position estimation method, self-position estimation program, and moving object |
CN107480839A (en) * | 2017-10-13 | 2017-12-15 | 深圳市博安达信息技术股份有限公司 | The classification Forecasting Methodology of high-risk pollution sources based on principal component analysis and random forest |
CN107563425A (en) * | 2017-08-24 | 2018-01-09 | 长安大学 | A kind of method for building up of the tunnel operation state sensor model based on random forest |
-
2018
- 2018-07-16 CN CN201810783017.7A patent/CN108596409B/en active Active
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP5892663B2 (en) * | 2011-06-21 | 2016-03-23 | 国立大学法人 奈良先端科学技術大学院大学 | Self-position estimation device, self-position estimation method, self-position estimation program, and moving object |
CN103246897A (en) * | 2013-05-27 | 2013-08-14 | 南京理工大学 | Internal structure adjusting method of weak classifier based on AdaBoost |
CN103462618A (en) * | 2013-09-04 | 2013-12-25 | 江苏大学 | Automobile driver fatigue detecting method based on steering wheel angle features |
CN107563425A (en) * | 2017-08-24 | 2018-01-09 | 长安大学 | A kind of method for building up of the tunnel operation state sensor model based on random forest |
CN107480839A (en) * | 2017-10-13 | 2017-12-15 | 深圳市博安达信息技术股份有限公司 | The classification Forecasting Methodology of high-risk pollution sources based on principal component analysis and random forest |
Cited By (25)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109408557A (en) * | 2018-09-29 | 2019-03-01 | 东南大学 | A kind of traffic accidents reason analysis method clustered based on multiple correspondence and K-means |
CN109408557B (en) * | 2018-09-29 | 2021-09-28 | 东南大学 | Traffic accident cause analysis method based on multiple correspondences and K-means clustering |
CN109635990A (en) * | 2018-10-12 | 2019-04-16 | 阿里巴巴集团控股有限公司 | A kind of training method, prediction technique, device and electronic equipment |
CN109635990B (en) * | 2018-10-12 | 2022-09-16 | 创新先进技术有限公司 | Training method, prediction method, device, electronic equipment and storage medium |
WO2020083400A1 (en) * | 2018-10-26 | 2020-04-30 | 江苏智通交通科技有限公司 | Traffic accident data intelligent analysis and comprehensive application system |
CN109558969A (en) * | 2018-11-07 | 2019-04-02 | 南京邮电大学 | A kind of VANETs car accident risk forecast model based on AdaBoost-SO |
WO2020093701A1 (en) * | 2018-11-07 | 2020-05-14 | 南京邮电大学 | Vehicle accident risk prediction model based on adaboost-so in vanets |
CN109598931A (en) * | 2018-11-30 | 2019-04-09 | 江苏智通交通科技有限公司 | Group based on traffic safety risk divides and difference analysis method and system |
WO2020108219A1 (en) * | 2018-11-30 | 2020-06-04 | 江苏智通交通科技有限公司 | Traffic safety risk based group division and difference analysis method and system |
CN110379161B (en) * | 2019-07-18 | 2021-02-02 | 中南大学 | Urban road network traffic flow distribution method |
CN110379161A (en) * | 2019-07-18 | 2019-10-25 | 中南大学 | A kind of city road network traffic flow amount distribution method |
CN111080012A (en) * | 2019-12-17 | 2020-04-28 | 北京明略软件***有限公司 | Personnel risk degree prediction method and device, electronic equipment and readable storage medium |
CN111081016B (en) * | 2019-12-18 | 2021-07-06 | 北京航空航天大学 | Urban traffic abnormity identification method based on complex network theory |
CN111081016A (en) * | 2019-12-18 | 2020-04-28 | 北京航空航天大学 | Urban traffic abnormity identification method based on complex network theory |
CN112016735A (en) * | 2020-07-17 | 2020-12-01 | 厦门大学 | Patrol route planning method and system based on traffic violation hotspot prediction and readable storage medium |
CN112016735B (en) * | 2020-07-17 | 2023-03-28 | 厦门大学 | Patrol route planning method and system based on traffic violation hotspot prediction and readable storage medium |
CN111881988B (en) * | 2020-07-31 | 2022-06-14 | 北京航空航天大学 | Heterogeneous unbalanced data fault detection method based on minority class oversampling method |
CN111881988A (en) * | 2020-07-31 | 2020-11-03 | 北京航空航天大学 | Heterogeneous unbalanced data fault detection method based on minority class oversampling method |
CN112667919A (en) * | 2020-12-28 | 2021-04-16 | 山东大学 | Personalized community correction scheme recommendation system based on text data and working method thereof |
CN113076974A (en) * | 2021-03-09 | 2021-07-06 | 麦哲伦科技有限公司 | Multi-task learning method with parallel filling and classification of missing values of multi-layer sensing mechanism |
CN113793502A (en) * | 2021-09-15 | 2021-12-14 | 国网电动汽车服务(天津)有限公司 | Pedestrian crossing prediction method under no-signal-lamp control |
CN115035722A (en) * | 2022-06-20 | 2022-09-09 | 浙江嘉兴数字城市实验室有限公司 | Road safety risk prediction method based on combination of spatio-temporal features and social media |
CN115035722B (en) * | 2022-06-20 | 2024-04-05 | 浙江嘉兴数字城市实验室有限公司 | Road safety risk prediction method based on combination of space-time characteristics and social media |
CN117009767A (en) * | 2023-08-10 | 2023-11-07 | 中国环境科学研究院 | Soil benchmark formulation and risk assessment method based on bioavailability |
CN117009767B (en) * | 2023-08-10 | 2024-04-26 | 中国环境科学研究院 | Soil benchmark formulation and risk assessment method based on bioavailability |
Also Published As
Publication number | Publication date |
---|---|
CN108596409B (en) | 2021-07-20 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN108596409A (en) | The method for promoting traffic hazard personnel's accident risk prediction precision | |
Tang et al. | Crash injury severity analysis using a two-layer Stacking framework | |
CN104268599B (en) | Intelligent unlicensed vehicle finding method based on vehicle track temporal-spatial characteristic analysis | |
CN105303197B (en) | A kind of vehicle follow the bus safety automation appraisal procedure based on machine learning | |
CN109410577B (en) | Self-adaptive traffic control subarea division method based on space data mining | |
CN106778583A (en) | Vehicle attribute recognition methods and device based on convolutional neural networks | |
CN106372571A (en) | Road traffic sign detection and identification method | |
CN109671274B (en) | Highway risk automatic evaluation method based on feature construction and fusion | |
CN106056162A (en) | A traffic safety credit scoring method based on GPS track and traffic law-violation records | |
CN109191828A (en) | Traffic participant accident risk prediction method based on integrated study | |
CN109522876B (en) | Subway station building escalator selection prediction method and system based on BP neural network | |
Mihaita et al. | Arterial incident duration prediction using a bi-level framework of extreme gradient-tree boosting | |
CN112232389A (en) | Dynamic adjustment method and system for traffic emergency plan of large-scale activity emergency | |
CN105809193A (en) | Illegal operation vehicle recognition method based on Kmeans algorithm | |
CN114924556A (en) | Method and system for automatically driving vehicle | |
Mafi et al. | Analysis of gap acceptance behavior for unprotected right and left turning maneuvers at signalized intersections using data mining methods: A driving simulation approach | |
CN111563555A (en) | Driver driving behavior analysis method and system | |
WO2023143000A1 (en) | Auditing system for elderly age-friendly subdistrict built environment on basis of multi-source big data | |
Akomolafe et al. | Using data mining technique to predict cause of accident and accident prone locations on highways | |
Shamsashtiany et al. | Road accidents prediction with multilayer perceptron MLP modelling case study: roads of Qazvin, Zanjan and Hamadan | |
CN109101568A (en) | Traffic high-risk personnel recognition methods based on XgBoost algorithm | |
CN109063751A (en) | The traffic high-risk personnel recognition methods of decision Tree algorithms is promoted based on gradient | |
Mohamad et al. | Using a decision tree to compare rural versus highway motorcycle fatalities in Thailand | |
Murat et al. | An integration of different computing approaches in traffic safety analysis | |
CN112308136A (en) | SVM-Adaboost-based driving distraction detection method |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
CB02 | Change of applicant information | ||
CB02 | Change of applicant information |
Address after: 211100 No. 19 Suyuan Avenue, Jiangning Economic and Technological Development Zone, Nanjing City, Jiangsu Province Applicant after: JIANGSU ZHITONG TRAFFIC TECHNOLOGY Co.,Ltd. Address before: 210006, Qinhuai District, Jiangsu, Nanjing should be 388 days street, Chenguang 1865 Technology Creative Industry Park E10 building on the third floor Applicant before: JIANGSU ZHITONG TRAFFIC TECHNOLOGY Co.,Ltd. |
|
GR01 | Patent grant | ||
GR01 | Patent grant |