CN108596409A - The method for promoting traffic hazard personnel's accident risk prediction precision - Google Patents

The method for promoting traffic hazard personnel's accident risk prediction precision Download PDF

Info

Publication number
CN108596409A
CN108596409A CN201810783017.7A CN201810783017A CN108596409A CN 108596409 A CN108596409 A CN 108596409A CN 201810783017 A CN201810783017 A CN 201810783017A CN 108596409 A CN108596409 A CN 108596409A
Authority
CN
China
Prior art keywords
data
personnel
model
sampling
accident
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201810783017.7A
Other languages
Chinese (zh)
Other versions
CN108596409B (en
Inventor
刘林
陈凝
吕伟韬
马党生
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
JIANGSU INTELLIGENT TRANSPORTATION SYSTEMS Co Ltd
Original Assignee
JIANGSU INTELLIGENT TRANSPORTATION SYSTEMS Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by JIANGSU INTELLIGENT TRANSPORTATION SYSTEMS Co Ltd filed Critical JIANGSU INTELLIGENT TRANSPORTATION SYSTEMS Co Ltd
Priority to CN201810783017.7A priority Critical patent/CN108596409B/en
Publication of CN108596409A publication Critical patent/CN108596409A/en
Application granted granted Critical
Publication of CN108596409B publication Critical patent/CN108596409B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/04Forecasting or optimisation specially adapted for administrative or management purposes, e.g. linear programming or "cutting stock problem"
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/10Services
    • G06Q50/26Government or public services

Landscapes

  • Business, Economics & Management (AREA)
  • Engineering & Computer Science (AREA)
  • Strategic Management (AREA)
  • Human Resources & Organizations (AREA)
  • Tourism & Hospitality (AREA)
  • Economics (AREA)
  • Physics & Mathematics (AREA)
  • General Business, Economics & Management (AREA)
  • Marketing (AREA)
  • Theoretical Computer Science (AREA)
  • Development Economics (AREA)
  • General Physics & Mathematics (AREA)
  • Educational Administration (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Primary Health Care (AREA)
  • Game Theory and Decision Science (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Operations Research (AREA)
  • Quality & Reliability (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)
  • Traffic Control Systems (AREA)

Abstract

The present invention provides a kind of method promoting traffic hazard personnel's accident risk prediction precision, traffic violation data and casualty data sample are obtained with the methods of sampling of optimization, traffic participant street accidents risks prediction model is trained using Ensemble Learning Algorithms, and model optimization is carried out by genetic algorithm.The present invention excavates the security feature of traffic trip person with Ensemble Learning Algorithms in traffic violation data, it is improved using the optimization methods of sampling in the sampling link of model construction and is based on initial model performance, and Model Parameter Optimization is carried out with genetic algorithm, effectively promote high-risk personnel accident risk prediction precision.

Description

The method for promoting traffic hazard personnel's accident risk prediction precision
Technical field
The present invention relates to a kind of methods promoting traffic hazard personnel's accident risk prediction precision.
Background technology
Some researches show that there are correlativity, driver, the pedestrians of traffic offence reservation between traffic offence and traffic accident Equal traffic participants attribute can provide data supporting with behavior for the human factor analysis in traffic safety.The excavation of data can With classificating thought, the security feature of traffic offence personnel is excavated according to personnel attribute variable.
Traditional sorting technique is that one is found in the space being made of various possible functions at one closest to reality The grader of classification function, but be typically only capable to obtain the Weakly supervised model of preference under actual conditions, the reliability of model is bad. Ensemble Learning Algorithms improve the performance of final mask by the combination of Weakly supervised model.But the parameter of integrated study model complexity Composition carrys out certain difficulty for modelling effect elevator belt.And genetic algorithm be able to solve global optimum or near-optimization well As a result, providing the feasible scheme for promoting precision.
Invention content
The object of the present invention is to provide a kind of methods promoting traffic hazard personnel's accident risk prediction precision, using optimization The Ensemble Learning Algorithms of sampling, and parameter optimization is carried out by genetic algorithm, to there are the traffic of traffic law violation recording ginsengs Qualitative assessment is carried out with person's danger level, is filled up currently in the missing of traffic safety participant's factor quantitative analysis method, and effectively Promote high-risk personnel accident risk prediction precision.
Technical solution of the invention is:
A method of traffic hazard personnel's accident risk prediction precision being promoted, obtaining traffic with the methods of sampling of optimization disobeys Method data and casualty data sample train traffic participant street accidents risks prediction model, into one using Ensemble Learning Algorithms Step carries out model optimization to promote prediction result accuracy by genetic algorithm, includes the following steps,
S1, based on original traffic violation data and casualty data, it is structure unlawful data collection, major accident data set, light Micro- casualty data collection.
S2, unlawful data collection two is classified, i.e. high-risk personnel, general staff, data markers value is determined according to classifying rules Unlawful data collection is divided into high-risk personnel data subset D, general staff's data subset N and subset U to be identified by label accordingly.
S3, initial traffic participant danger level prediction model P0 is built using the optimization methods of sampling and Ensemble Learning Algorithms, Determine model sampling number, SMOTE sampling proportions.
S4, performance optimization is carried out to model P0 using genetic algorithm, optimization object function is test set prediction accuracy It maximizes, wherein test set Accuracy Analysis method is that k rolls over cross validation;Genetic algorithm parameter is set, object function is made to restrain Speed is fast, avoids shaking the case where not restraining;Wherein genetic algorithm parameter includes cross selection probability, mutation probability, region of variability Between, Population breeding algebraically, initial population quantity.
S5, the target optimal model parameters exported according to genetic algorithm, build the optimal of personnel at risk's accident risk prediction Model of fit P determines model test coverage recall and Model checking threshold value;
S6, the subset data input model P to be identified by S2 export target object danger level.
Further, the Ensemble Learning Algorithms described in step S3 include random forests algorithm, AdaBoost algorithms, XgBoost algorithms, GBDT algorithms.
Further, the optimization methods of sampling described in step S3 the specific steps are:
S31, sampling interval S is set according to data set N sample sizes and recycles step-length k, section coboundary s is usually no more than Total sample size 25%;
S32, sample size nm=s0+ (m-1) k, s0 are sampling interval lower limiting value, and m is cycle-index, initial value 1;From number According to integrating in N randomly drawing sample amount as the sample Nm of nm;
S33, data set D and Nm intersection Gm is split as training set and test set;
S34, SMOTE sampling is carried out to training set, setting high-risk personnel data subset D expands sample ratio ai;Wherein, work as i=1 When, ai=1 works as i>When 1, ai=ai-1+1, i initial values are the value upper limit that 1, i is equipped with setting;
S35, expand sample ratio ai, setting general staff's Nm data subset contracting sample ratios bj for high-risk personnel;Wherein, work as j When=1, bj=1 works as j>When 1, bj=bj-1+1, j initial values are the value upper limit that 1, j is equipped with setting;For SMOTE sampling proportions ai:Bj is trained expansion sample, contracting the sample processing of two class exemplars in collection, the training sample set as grader;
S36, the training that high-risk personnel grader is carried out with Ensemble Learning Algorithms, determine model parameter, realize traffic ginseng With person's street accidents risks prediction modelFitting, model being capable of output token value and risk probability;
S37, model is carried out with test set dataAssessment, obtains the model accuracy of different coverage rates
S38, the interior data of sampling samples Nm supplementary sets Nm ' in general staff's data subset N are classified according to illegal number, and Category input modelPeople Tab's False Rate of different coverage rate drags output is counted
Whether S39, j reach the value upper limit;If so, judging whether i reaches the value upper limit, if so, into S310, otherwise I=i+1 is transferred to S34;Otherwise, j=j+1 goes to S35;
Whether S310, detection nm reach sampling interval upper limit value s, if then entering S311, otherwise m=m+1, returns to S32;
S311, the model by model accuracy, False Rate analysis with optimal performanceDetermine optimal random sampling Number M, SMOTE sampling proportion I, J.
Further, the method that corresponding data mark value label is assigned based on classifying rules described in step S2 is specific For:
High-risk personnel:One kind for there are it is illegal record and exist take the main responsibility or the severe traffic accidents of fullliability note The personnel of record;Another kind of is there are illegal record, and there is only minor accident records, and accident record is not less than 2 personnel;
General staff:There are the personnel of illegal record but zero defects record;
The data for being unsatisfactory for above-mentioned criterion constitute subset to be identified.
Further, traffic violation data original in step S1 and casualty data include related personnel's certificate information;It is right Illegal record is collected, obtains unlawful data collection after processing operation of classifying;Unlawful data collection is illegal record bulk sample notebook data, Unlawful data collection information includes personnel's passport NO., illegal number, illegal type, deduction of points fine situation, the related illegal row of accident For a situation arises, the illegal period of right time.
Further, a situation arises is obtained by correspondence analysis mode for the illegal activities of accident correlation in step S1, and extracts The higher Criminal type of traffic accident influence degree, the data attribute as unlawful data collection.
Further, it is discrete variable, root that the illegal period of right time described in step S1, which is by Continuous-time variables transformations, Classify according to illegal temporal characteristics.
The beneficial effects of the invention are as follows:
One, present invention employs genetic algorithms optimizes initial fitted model parameters, has been obviously improved traffic hazard Personnel's accident risk prediction precision.
Two, the Ensemble Learning Algorithms that the present invention uses, compared to conventional sorting methods such as decision tree, neural networks, pre- Surveying has significant advantage in performance, ensure that the accuracy of personnel at risk's street accidents risks prediction.
Three, the present invention excavates traffic violation data using the Ensemble Learning Algorithms of optimization, realizes and is joined based on traffic With the traffic safety risk qualitative assessment of the illegal record of person, model can export the traffic hazard degree of personnel.
Description of the drawings
Fig. 1 is the method flow schematic diagram that the embodiment of the present invention promotes traffic hazard personnel's accident risk prediction precision.
Fig. 2 is the idiographic flow schematic diagram for the optimization methods of sampling that S3 is used in embodiment.
Fig. 3 is that data set illustrates schematic diagram in embodiment.
Fig. 4 is the genetic algorithm reproductive process schematic diagram that S5 is used in embodiment.
Specific implementation mode
The preferred embodiment that the invention will now be described in detail with reference to the accompanying drawings.
Embodiment
A method of traffic hazard personnel's accident risk prediction precision being promoted, obtaining traffic with the methods of sampling of optimization disobeys Method data and casualty data sample train traffic participant street accidents risks prediction model, into one using Ensemble Learning Algorithms Step carries out model optimization to promote prediction result accuracy, such as Fig. 1 by genetic algorithm.Embodiment method is with Ensemble Learning Algorithms The security feature that traffic trip person is excavated in traffic violation data uses the optimization methods of sampling in the sampling link of model construction It improves and is based on initial model performance, and Model Parameter Optimization is carried out with genetic algorithm, effectively promote high-risk personnel accident risk Precision of prediction.Specifically method flow is:
S1, based on original traffic violation data and casualty data, it is structure unlawful data collection, major accident data set, light Micro- casualty data collection.
Wherein, original traffic violation data and casualty data include related personnel's certificate information;Illegal record is carried out Collect, obtain unlawful data collection after processing operation of classifying;Unlawful data collection is illegal record bulk sample notebook data, unlawful data collection letter Breath includes personnel's passport NO., illegal number, illegal type, deduction of points fine situation, a situation arises for the illegal activities of accident correlation, disobeys The method period of right time;A situation arises is obtained by correspondence analysis mode for the illegal activities of accident correlation, and extracts traffic accident and influence journey Higher Criminal type is spent, the data attribute as unlawful data collection;The illegal period of right time is by Continuous-time variables transformations For discrete variable, classified according to illegal temporal characteristics.
S2, unlawful data collection two is classified, i.e. high-risk personnel, general staff, data markers value is determined according to classifying rules Unlawful data collection is divided into high-risk personnel data subset D, general staff's data subset N and subset U to be identified by label accordingly.
Wherein classifying rules is specially:High-risk personnel refers to (1) there are illegal record and presence is taken the main responsibility or whole duties The traffic participant (including motor vehicle, non-motor vehicle driver and pedestrian) for the severe traffic accidents record appointed;(2) there are separated Method records, and there is only minor accident records, and accident record is not less than 2 traffic participants;General staff refers to that there are illegal The traffic participant of record but zero defects record;The data for being unsatisfactory for above-mentioned criterion constitute subset to be identified.
S3, initial traffic participant danger level prediction model P0 is built using the optimization methods of sampling and Ensemble Learning Algorithms, Determine model sampling number, SMOTE sampling proportions;Wherein Ensemble Learning Algorithms include random forests algorithm, AdaBoost algorithms, XgBoost algorithms, GBDT algorithms.As shown in Fig. 2, detailed process is:
S31, sampling interval S is set according to data set N sample sizes and recycles step-length k, section coboundary s is usually no more than Total sample size 25%;
S32, sample size nm=s0+ (m-1) k, s0 are sampling interval lower limiting value, and m is cycle-index, initial value 1;From number According to integrating in N randomly drawing sample amount as the sample Nm of nm;
S33, data set D and Nm intersection Gm is split as training set and test set;
S34, SMOTE sampling is carried out to training set, setting high-risk personnel data subset D expands sample ratio ai;Wherein, work as i=1 When, ai=1 works as i>When 1, ai=ai-1+1, the i value upper limits are usually 4;
S35, expand sample ratio ai, setting general staff's Nm data subset contracting sample ratios bj for high-risk personnel;Wherein, work as j When=1, bj=1 works as j>When 1, bj=bj-1+1, the j value upper limits are usually 4;For SMOTE sampling proportions ai:Bj is instructed Practice expansion sample, contracting the sample processing of two class exemplars in collection, the training sample set as grader;
S36, the training that high-risk personnel grader is carried out with Ensemble Learning Algorithms, determine model parameter, realize traffic ginseng With person's street accidents risks prediction modelFitting, model being capable of output token value and risk probability;
S37, model is carried out with test set dataAssessment, obtains the model accuracy of different coverage rates
S38, the interior data of sampling samples Nm supplementary sets Nm ' in general staff's data subset N are classified according to illegal number, and Category input modelPeople Tab's False Rate of different coverage rate drags output is counted
Whether S39, j reach the value upper limit;If so, judging whether i reaches the value upper limit, if so, into S310, otherwise I=i+1 is transferred to S34;Otherwise, j=j+1 goes to S35;
Whether S310, detection nm reach sampling interval upper limit value s, if then entering S311, otherwise m=m+1, returns to S32;
S311, the model by model accuracy, False Rate analysis with optimal performanceDetermine optimal random sampling Number M, SMOTE sampling proportion I, J.
S4, performance optimization is carried out to model P0 using genetic algorithm, optimization object function is test set prediction accuracy It maximizes, wherein test set Accuracy Analysis method is that k rolls over cross validation;Genetic algorithm parameter is set, object function is made to restrain Speed is fast, avoids shaking the case where not restraining;Wherein genetic algorithm parameter includes cross selection probability, mutation probability, region of variability Between, Population breeding algebraically, initial population quantity.
S5, the target optimal model parameters exported according to genetic algorithm, build the optimal of personnel at risk's accident risk prediction Model of fit P determines model test coverage recall and Model checking threshold value;
S6, the subset data input model P to be identified by S2 export target object danger level.
Specific example
The present embodiment artificially analyzes object with motor vehicle driving.
S1, traffic law violation recording and accident record by obtaining 2 years in region with connection.
Killed or wounded will occur seriously or the traffic accident of hit-and-run occurs as major accident, other accidents conduct Minor accident accordingly classifies to original accident record, and using accident pattern and driver's certificate information as serious thing Therefore the attributive character of data set and minor accident data set, obtain two data set sample datas.
Further, illegal initial data is pre-processed, the illegal information of driver is carried out to collect statistics, including Add up illegal number, illegal type, accumulated deduction score value, score value (point/time) of averagely deducting points, single maximum deduction of points score value, add up Impose a fine the amount of money, the average penalty amount of money (member/time).
Dimension-reduction treatment is carried out to traffic accident data and illegal initial data using correspondence analysis, according to illegal and accident Correlation in type classifies to illegal type, and it is illegal as accident risk to extract wherein highest five class of correlation The data attribute of behavior field, as shown in table 1.
1. accident correlation Criminal type dividing condition of table
According to the traffic flow operation of embodiment region road network and traffic offence event pests occurrence rule feature, by the time It is polymerize, and the Partition Analysis period, converts continuous variable to nominal type variable;In another embodiment, by poly- Other statisticals such as class carry out Time segments division.
Driver's characteristic is then encoded according to extraction driver's age, gender, affiliated provinces and cities in driver's passport NO.; Unlawful data collection is generated according to the information of above-mentioned each link extraction, as shown in table 2.
2. unlawful data collection partial data of table
S2, high-risk driver and the classification of general driver two are carried out to this I of bulk sample in unlawful data collection.Such as Fig. 4, there will be Illegal record and presence are taken the main responsibility or the driver of the severe traffic accidents of fullliability record is as high-risk driver's A kind of situation, qualified data divide data set D1 into;There will be illegal record, there is only minor accident record, and accident Another situation of driver of the record not less than 2 as high-risk driver, qualified data divide data set D2 into;It is high Endanger driver's data set D=D1+D2.There are driver's corresponding datas of illegal record but zero defects record to synthesize general driver Data set N.
The data for meeting rule are concentrated to determine high-risk or general data markers value label unlawful data accordingly, in addition It can not be suitable for the data subset U=I-N-D of this classifying rules, then be data subset to be identified.
S3, initial vehicle driver danger level prediction model P0 is built using the optimization methods of sampling and XgBoost algorithms, really Cover half type sampling number, SMOTE sampling proportions;
S31, sampling interval S is set according to data set N sample sizes and recycles step-length k, section coboundary s is usually no more than Total sample size 25%;In the present embodiment, data set sample size is more than 84000, sampling interval S=[200,4000], cycle step-length k It is 200.
S32, sample size nm=s0+ (m-1) k, s0 are sampling interval lower limiting value, and m is cycle-index, initial value 1;From number According to integrating in N randomly drawing sample amount as the sample Nm of nm;In the present embodiment, initial sample number is 200.
S33, data set D and Nm intersection Gm is split as training set and test set;In the present embodiment, training set and test set Primary contract be 9:1.
S34, SMOTE sampling is carried out to training set, high-risk driver's data subset D is set and expands sample ratio ai, wherein a1= 1, ai=ai-1+1, i initial value are the value upper limit that 1, i is equipped with setting, and i maximum values are 4;
S35, sample ratio ai is expanded for high-risk driver, general driver Nm data subsets contracting sample ratio bj is set, wherein B1=1, bj=bj-1+1, j initial value are the value upper limit that 1, j is equipped with setting, and j maximum values are 4;For SMOTE sampling proportions ai: Bj is trained expansion sample, contracting the sample processing of two class exemplars in collection, the training sample set as grader;
S36, the training that high-risk driver's grader is carried out with XgBoost algorithms determine model parameter, realize driver Street accidents risks prediction modelFitting, model can export driver's mark value and risk probability;Model parameter packet Include learning rate, Weak Classifier number, maximal tree depth, node minimum split values, leaf node smallest sample number, leaf node weights sum Minimum value, minimize loss function value, line sampling rate, row sampling rate, regularization term 1, regularization term 2, positive and negative Weight balance item, Training condition is terminated in advance;
S37, model is carried out with test set dataAssessment, obtains the model accuracy of different coverage rates
S38, the interior data of sampling samples Nm supplementary sets Nm ' in general driver's data subset N are classified according to illegal number, And category input modelDriver's label False Rate of different coverage rate drags output is counted
Whether S39, j reach setting maximum value;If so, judge whether i reaches setting maximum value, if so, into S310, Otherwise i=i+1 is transferred to S34;Otherwise, j=j+1 goes to S35;
Whether S310, detection nm reach section upper limit s, if then entering S311, otherwise m=m+1, returns to S32;
S311, the model by model accuracy, False Rate analysis with optimal performanceDetermine optimal random sampling Number M, SMOTE sampling proportion I, J.
In the present embodiment, comprehensive False Rate, accuracy and index stability compare and analyze, determining optimal performance mould Type isIt is 2 that i.e. random sampling sample number, which is 2400, SMOTE ratios,:2.
S4, performance optimization is carried out to model P0 using genetic algorithm, optimization object function be test set precision of prediction most Bigization, wherein test set precision analytical method are that k rolls over cross validation;Genetic algorithm parameter is set, object function convergence rate is made Soon, the case where avoiding concussion from not restraining;Wherein genetic algorithm parameter includes cross selection probability, mutation probability, variation section, kind Group's reproductive order of generation, initial population quantity.
In the embodiment, use the test set precision under 10 folding cross validations for object function, genetic algorithm parameter is specific It is set as:Cross selection probability CrossoverProbaiblity=0.8, mutation probability MutationProbability= 0.5, variation section Sigma=[[- 10,10], [- 2,2], [- 2,2], [- 2,2], [- 2,2]], Population breeding algebraically Iteration=500, initial population quantity Population=100.Genetic algorithm reproductive process such as Fig. 4 institutes of parameter optimization Show.
S5, the target optimal model parameters exported according to genetic algorithm, structure vehicle drive people's danger level are predicted optimal Model of fit P determines model test coverage recall and Model checking threshold value.
In embodiment, the design parameter based on the initial model of XgBoost after genetic algorithm optimization is:Learning rate Learning_rate_value=0.09, Weak Classifier number n_estimators_value=367, maximal tree depth max_ Depth_value=4, node minimum split values min_samples_split_value=10, leaf node smallest sample number min_ Samples_leaf_value=6, leaf node weights sum minimum value min_child_weight_value=3 minimize damage Lose functional value gamma_value=0, line sampling rate subsample_value=0.45, row sampling rate colsample_ Bytree_value=0.1, regularization term 1reg_lambda_value=11, regularization term 2reg_alpha_value=11, Positive and negative Weight balance item scale_pos_weight_value=1, training condition early_stopping_ is terminated in advance Rounds_value=37.
Model accuracy after parameter optimization reaches 0.76.
S6, the subset data input model P to be identified by S2 export driver's danger level.Partial results are as shown in table 3.
Table 3. uses high-risk driver's hazard degree analysis result of the method for the present invention

Claims (7)

1. a kind of method promoting traffic hazard personnel's accident risk prediction precision, it is characterised in that:With the methods of sampling of optimization Traffic violation data and casualty data sample are obtained, using Ensemble Learning Algorithms training traffic participant street accidents risks prediction Model further carries out model optimization to promote prediction result accuracy by genetic algorithm, specifically includes following steps:
S1, based on original traffic violation data and casualty data, structure unlawful data collection, major accident data set, slight thing Therefore data set;
S2, unlawful data collection two is classified, i.e. high-risk personnel, general staff, data markers value is determined according to classifying rules Unlawful data collection is divided into high-risk personnel data subset D, general staff's data subset N and subset U to be identified by label accordingly;
S3, initial personnel at risk's accident risk prediction model P is built using the optimization methods of sampling and Ensemble Learning Algorithms0, determine mould Type sampling number, SMOTE sampling proportions;
S4, using genetic algorithm to model P0Performance optimization is carried out, optimization object function is that test set prediction accuracy is maximum Change, wherein test set Accuracy Analysis method is that k rolls over cross validation;Genetic algorithm parameter is set, object function convergence rate is made Soon, the case where avoiding concussion from not restraining;Wherein genetic algorithm parameter includes cross selection probability, mutation probability, variation section, kind Group's reproductive order of generation, initial population quantity;
S5, the target optimal model parameters exported according to genetic algorithm, build the optimal fitting of personnel at risk's accident risk prediction Model P determines model test coverage recall and Model checking threshold value;
S6, the subset data input model P to be identified by step S2 export target object danger level.
2. the method for promoting traffic hazard personnel's accident risk prediction precision as described in claim 1, which is characterized in that step Ensemble Learning Algorithms described in S3 include random forests algorithm, AdaBoost algorithms, XgBoost algorithms, GBDT algorithms.
3. the method for promoting traffic hazard personnel's accident risk prediction precision as described in claim 1, which is characterized in that step The optimization methods of sampling described in S3 the specific steps are:
S31, sampling interval S and cycle step-length k is set according to data set N sample sizes;
S32, sample size nm=s0+ (m-1) k, s0For sampling interval lower limiting value, m is cycle-index, initial value 1;From data set N Middle randomly drawing sample amount is nmSample Nm
S33, by data set D and NmIntersection GmIt is split as training set and test set;
S34, SMOTE sampling is carried out to training set, setting high-risk personnel data subset D expands sample ratio ai;Wherein, as i=1, ai =1, work as i>When 1, ai=ai-1+ 1, i initial value are the value upper limit that 1, i is equipped with setting;
S35, sample ratio a is expanded for high-risk personneli, setting general staff NmData subset contracting sample ratio bj;Wherein, as j=1, bj=1, work as j>When 1, bj=bj-1+ 1, j initial value are the value upper limit that 1, j is equipped with setting;For SMOTE sampling proportions ai:bj, into Expansion sample, contracting the sample processing of two class exemplars, the training sample set as grader in row training set;
S36, the training that high-risk personnel grader is carried out with Ensemble Learning Algorithms, determine model parameter, realize traffic participant Street accidents risks prediction modelFitting, model being capable of output token value and risk probability;
S37, model is carried out with test set dataAssessment, obtains the model accuracy of different coverage rates
S38, by the sampling samples N in general staff's data subset NmSupplementary set Nm' interior data are classified according to illegal number, and press class Other input modelPeople Tab's False Rate of different coverage rate drags output is counted
Whether S39, j reach the value upper limit;If so, judge whether i reaches the value upper limit, if so, into S310, otherwise i=i + 1, it is transferred to S34;Otherwise, j=j+1 goes to S35;
S310, detection nmWhether sampling interval upper limit value s is reached, if then entering S311, otherwise m=m+1, returns to S32;
S311, the model by model accuracy, False Rate analysis with optimal performanceDetermine optimal random sampling numbers M, SMOTE sampling proportions I, J.
4. the method for promoting traffic hazard personnel's accident risk prediction precision as described in claim 1, which is characterized in that step The method for assigning corresponding data mark value label based on classifying rules described in S2 is specially:
High-risk personnel:One kind for there are it is illegal record and exist take the main responsibility or the severe traffic accidents of fullliability record Personnel;Another kind of is there are illegal record, and there is only minor accident records, and accident record is not less than 2 personnel;
General staff:There are the personnel of illegal record but zero defects record;
The data for being unsatisfactory for above-mentioned criterion constitute subset to be identified.
5. the method for promoting traffic hazard personnel's accident risk prediction precision as described in claim 1, it is characterised in that:Step Original traffic violation data and casualty data include related personnel's certificate information in S1;Illegal record is collected, is classified Unlawful data collection is obtained after processing operation;Unlawful data collection is illegal record bulk sample notebook data, and unlawful data collection information includes people Member passport NO., illegal number, illegal type, deduction of points fine situation, a situation arises for the illegal activities of accident correlation, illegal generation when Section.
6. the method for promoting traffic hazard personnel's accident risk prediction precision as described in claim 1, it is characterised in that:Step A situation arises is obtained by correspondence analysis mode for the illegal activities of accident correlation in S1, and it is higher to extract traffic accident influence degree Criminal type, the data attribute as unlawful data collection.
7. the method for promoting traffic hazard personnel's accident risk prediction precision as described in claim 1, it is characterised in that:Step It is discrete variable that the illegal period of right time described in S1, which is by Continuous-time variables transformations, is divided according to illegal temporal characteristics Class.
CN201810783017.7A 2018-07-16 2018-07-16 Method for improving accident risk prediction precision of traffic hazard personnel Active CN108596409B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810783017.7A CN108596409B (en) 2018-07-16 2018-07-16 Method for improving accident risk prediction precision of traffic hazard personnel

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810783017.7A CN108596409B (en) 2018-07-16 2018-07-16 Method for improving accident risk prediction precision of traffic hazard personnel

Publications (2)

Publication Number Publication Date
CN108596409A true CN108596409A (en) 2018-09-28
CN108596409B CN108596409B (en) 2021-07-20

Family

ID=63617732

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810783017.7A Active CN108596409B (en) 2018-07-16 2018-07-16 Method for improving accident risk prediction precision of traffic hazard personnel

Country Status (1)

Country Link
CN (1) CN108596409B (en)

Cited By (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109408557A (en) * 2018-09-29 2019-03-01 东南大学 A kind of traffic accidents reason analysis method clustered based on multiple correspondence and K-means
CN109558969A (en) * 2018-11-07 2019-04-02 南京邮电大学 A kind of VANETs car accident risk forecast model based on AdaBoost-SO
CN109598931A (en) * 2018-11-30 2019-04-09 江苏智通交通科技有限公司 Group based on traffic safety risk divides and difference analysis method and system
CN109635990A (en) * 2018-10-12 2019-04-16 阿里巴巴集团控股有限公司 A kind of training method, prediction technique, device and electronic equipment
CN110379161A (en) * 2019-07-18 2019-10-25 中南大学 A kind of city road network traffic flow amount distribution method
CN111081016A (en) * 2019-12-18 2020-04-28 北京航空航天大学 Urban traffic abnormity identification method based on complex network theory
CN111080012A (en) * 2019-12-17 2020-04-28 北京明略软件***有限公司 Personnel risk degree prediction method and device, electronic equipment and readable storage medium
WO2020083400A1 (en) * 2018-10-26 2020-04-30 江苏智通交通科技有限公司 Traffic accident data intelligent analysis and comprehensive application system
CN111881988A (en) * 2020-07-31 2020-11-03 北京航空航天大学 Heterogeneous unbalanced data fault detection method based on minority class oversampling method
CN112016735A (en) * 2020-07-17 2020-12-01 厦门大学 Patrol route planning method and system based on traffic violation hotspot prediction and readable storage medium
CN112667919A (en) * 2020-12-28 2021-04-16 山东大学 Personalized community correction scheme recommendation system based on text data and working method thereof
CN113076974A (en) * 2021-03-09 2021-07-06 麦哲伦科技有限公司 Multi-task learning method with parallel filling and classification of missing values of multi-layer sensing mechanism
CN113793502A (en) * 2021-09-15 2021-12-14 国网电动汽车服务(天津)有限公司 Pedestrian crossing prediction method under no-signal-lamp control
CN115035722A (en) * 2022-06-20 2022-09-09 浙江嘉兴数字城市实验室有限公司 Road safety risk prediction method based on combination of spatio-temporal features and social media
CN117009767A (en) * 2023-08-10 2023-11-07 中国环境科学研究院 Soil benchmark formulation and risk assessment method based on bioavailability

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103246897A (en) * 2013-05-27 2013-08-14 南京理工大学 Internal structure adjusting method of weak classifier based on AdaBoost
CN103462618A (en) * 2013-09-04 2013-12-25 江苏大学 Automobile driver fatigue detecting method based on steering wheel angle features
JP5892663B2 (en) * 2011-06-21 2016-03-23 国立大学法人 奈良先端科学技術大学院大学 Self-position estimation device, self-position estimation method, self-position estimation program, and moving object
CN107480839A (en) * 2017-10-13 2017-12-15 深圳市博安达信息技术股份有限公司 The classification Forecasting Methodology of high-risk pollution sources based on principal component analysis and random forest
CN107563425A (en) * 2017-08-24 2018-01-09 长安大学 A kind of method for building up of the tunnel operation state sensor model based on random forest

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP5892663B2 (en) * 2011-06-21 2016-03-23 国立大学法人 奈良先端科学技術大学院大学 Self-position estimation device, self-position estimation method, self-position estimation program, and moving object
CN103246897A (en) * 2013-05-27 2013-08-14 南京理工大学 Internal structure adjusting method of weak classifier based on AdaBoost
CN103462618A (en) * 2013-09-04 2013-12-25 江苏大学 Automobile driver fatigue detecting method based on steering wheel angle features
CN107563425A (en) * 2017-08-24 2018-01-09 长安大学 A kind of method for building up of the tunnel operation state sensor model based on random forest
CN107480839A (en) * 2017-10-13 2017-12-15 深圳市博安达信息技术股份有限公司 The classification Forecasting Methodology of high-risk pollution sources based on principal component analysis and random forest

Cited By (25)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109408557A (en) * 2018-09-29 2019-03-01 东南大学 A kind of traffic accidents reason analysis method clustered based on multiple correspondence and K-means
CN109408557B (en) * 2018-09-29 2021-09-28 东南大学 Traffic accident cause analysis method based on multiple correspondences and K-means clustering
CN109635990A (en) * 2018-10-12 2019-04-16 阿里巴巴集团控股有限公司 A kind of training method, prediction technique, device and electronic equipment
CN109635990B (en) * 2018-10-12 2022-09-16 创新先进技术有限公司 Training method, prediction method, device, electronic equipment and storage medium
WO2020083400A1 (en) * 2018-10-26 2020-04-30 江苏智通交通科技有限公司 Traffic accident data intelligent analysis and comprehensive application system
CN109558969A (en) * 2018-11-07 2019-04-02 南京邮电大学 A kind of VANETs car accident risk forecast model based on AdaBoost-SO
WO2020093701A1 (en) * 2018-11-07 2020-05-14 南京邮电大学 Vehicle accident risk prediction model based on adaboost-so in vanets
CN109598931A (en) * 2018-11-30 2019-04-09 江苏智通交通科技有限公司 Group based on traffic safety risk divides and difference analysis method and system
WO2020108219A1 (en) * 2018-11-30 2020-06-04 江苏智通交通科技有限公司 Traffic safety risk based group division and difference analysis method and system
CN110379161B (en) * 2019-07-18 2021-02-02 中南大学 Urban road network traffic flow distribution method
CN110379161A (en) * 2019-07-18 2019-10-25 中南大学 A kind of city road network traffic flow amount distribution method
CN111080012A (en) * 2019-12-17 2020-04-28 北京明略软件***有限公司 Personnel risk degree prediction method and device, electronic equipment and readable storage medium
CN111081016B (en) * 2019-12-18 2021-07-06 北京航空航天大学 Urban traffic abnormity identification method based on complex network theory
CN111081016A (en) * 2019-12-18 2020-04-28 北京航空航天大学 Urban traffic abnormity identification method based on complex network theory
CN112016735A (en) * 2020-07-17 2020-12-01 厦门大学 Patrol route planning method and system based on traffic violation hotspot prediction and readable storage medium
CN112016735B (en) * 2020-07-17 2023-03-28 厦门大学 Patrol route planning method and system based on traffic violation hotspot prediction and readable storage medium
CN111881988B (en) * 2020-07-31 2022-06-14 北京航空航天大学 Heterogeneous unbalanced data fault detection method based on minority class oversampling method
CN111881988A (en) * 2020-07-31 2020-11-03 北京航空航天大学 Heterogeneous unbalanced data fault detection method based on minority class oversampling method
CN112667919A (en) * 2020-12-28 2021-04-16 山东大学 Personalized community correction scheme recommendation system based on text data and working method thereof
CN113076974A (en) * 2021-03-09 2021-07-06 麦哲伦科技有限公司 Multi-task learning method with parallel filling and classification of missing values of multi-layer sensing mechanism
CN113793502A (en) * 2021-09-15 2021-12-14 国网电动汽车服务(天津)有限公司 Pedestrian crossing prediction method under no-signal-lamp control
CN115035722A (en) * 2022-06-20 2022-09-09 浙江嘉兴数字城市实验室有限公司 Road safety risk prediction method based on combination of spatio-temporal features and social media
CN115035722B (en) * 2022-06-20 2024-04-05 浙江嘉兴数字城市实验室有限公司 Road safety risk prediction method based on combination of space-time characteristics and social media
CN117009767A (en) * 2023-08-10 2023-11-07 中国环境科学研究院 Soil benchmark formulation and risk assessment method based on bioavailability
CN117009767B (en) * 2023-08-10 2024-04-26 中国环境科学研究院 Soil benchmark formulation and risk assessment method based on bioavailability

Also Published As

Publication number Publication date
CN108596409B (en) 2021-07-20

Similar Documents

Publication Publication Date Title
CN108596409A (en) The method for promoting traffic hazard personnel's accident risk prediction precision
Tang et al. Crash injury severity analysis using a two-layer Stacking framework
CN104268599B (en) Intelligent unlicensed vehicle finding method based on vehicle track temporal-spatial characteristic analysis
CN105303197B (en) A kind of vehicle follow the bus safety automation appraisal procedure based on machine learning
CN109410577B (en) Self-adaptive traffic control subarea division method based on space data mining
CN106778583A (en) Vehicle attribute recognition methods and device based on convolutional neural networks
CN106372571A (en) Road traffic sign detection and identification method
CN109671274B (en) Highway risk automatic evaluation method based on feature construction and fusion
CN106056162A (en) A traffic safety credit scoring method based on GPS track and traffic law-violation records
CN109191828A (en) Traffic participant accident risk prediction method based on integrated study
CN109522876B (en) Subway station building escalator selection prediction method and system based on BP neural network
Mihaita et al. Arterial incident duration prediction using a bi-level framework of extreme gradient-tree boosting
CN112232389A (en) Dynamic adjustment method and system for traffic emergency plan of large-scale activity emergency
CN105809193A (en) Illegal operation vehicle recognition method based on Kmeans algorithm
CN114924556A (en) Method and system for automatically driving vehicle
Mafi et al. Analysis of gap acceptance behavior for unprotected right and left turning maneuvers at signalized intersections using data mining methods: A driving simulation approach
CN111563555A (en) Driver driving behavior analysis method and system
WO2023143000A1 (en) Auditing system for elderly age-friendly subdistrict built environment on basis of multi-source big data
Akomolafe et al. Using data mining technique to predict cause of accident and accident prone locations on highways
Shamsashtiany et al. Road accidents prediction with multilayer perceptron MLP modelling case study: roads of Qazvin, Zanjan and Hamadan
CN109101568A (en) Traffic high-risk personnel recognition methods based on XgBoost algorithm
CN109063751A (en) The traffic high-risk personnel recognition methods of decision Tree algorithms is promoted based on gradient
Mohamad et al. Using a decision tree to compare rural versus highway motorcycle fatalities in Thailand
Murat et al. An integration of different computing approaches in traffic safety analysis
CN112308136A (en) SVM-Adaboost-based driving distraction detection method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
CB02 Change of applicant information
CB02 Change of applicant information

Address after: 211100 No. 19 Suyuan Avenue, Jiangning Economic and Technological Development Zone, Nanjing City, Jiangsu Province

Applicant after: JIANGSU ZHITONG TRAFFIC TECHNOLOGY Co.,Ltd.

Address before: 210006, Qinhuai District, Jiangsu, Nanjing should be 388 days street, Chenguang 1865 Technology Creative Industry Park E10 building on the third floor

Applicant before: JIANGSU ZHITONG TRAFFIC TECHNOLOGY Co.,Ltd.

GR01 Patent grant
GR01 Patent grant