CN109191828B - Traffic participant accident risk prediction method based on ensemble learning - Google Patents

Traffic participant accident risk prediction method based on ensemble learning Download PDF

Info

Publication number
CN109191828B
CN109191828B CN201810783019.6A CN201810783019A CN109191828B CN 109191828 B CN109191828 B CN 109191828B CN 201810783019 A CN201810783019 A CN 201810783019A CN 109191828 B CN109191828 B CN 109191828B
Authority
CN
China
Prior art keywords
data
accident
illegal
personnel
traffic
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201810783019.6A
Other languages
Chinese (zh)
Other versions
CN109191828A (en
Inventor
刘林
陈凝
吕伟韬
李璐
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Jiangsu Zhitong Traffic Technology Co ltd
Original Assignee
Jiangsu Zhitong Traffic Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Jiangsu Zhitong Traffic Technology Co ltd filed Critical Jiangsu Zhitong Traffic Technology Co ltd
Priority to CN201810783019.6A priority Critical patent/CN109191828B/en
Publication of CN109191828A publication Critical patent/CN109191828A/en
Application granted granted Critical
Publication of CN109191828B publication Critical patent/CN109191828B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G08SIGNALLING
    • G08GTRAFFIC CONTROL SYSTEMS
    • G08G1/00Traffic control systems for road vehicles
    • G08G1/01Detecting movement of traffic to be counted or controlled
    • G08G1/0104Measuring and analyzing of parameters relative to traffic conditions
    • G08G1/0125Traffic data processing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • General Engineering & Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Chemical & Material Sciences (AREA)
  • Analytical Chemistry (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention provides a traffic participant accident risk prediction method based on ensemble learning, which is characterized in that traffic violation data and accident data samples are obtained by an optimized sampling method, an ensemble learning algorithm is adopted to train a personnel traffic accident risk prediction model, automatic judgment of high-risk personnel is realized, and a traffic participant accident risk prediction index is obtained.

Description

Traffic participant accident risk prediction method based on ensemble learning
Technical Field
The invention relates to a traffic participant accident risk prediction method based on ensemble learning.
Background
Traffic participants are the key for influencing road traffic safety, but the traditional research and management application is limited by information acquisition and perception means, and the relevance between the attributes of people and the traffic safety is difficult to be mined, so that the targeted traffic safety control is difficult to be implemented. At present, the traffic safety and standard management work of China is mainly carried out by illegal investigation and treatment, and a large amount of traffic illegal data resources of vehicles and personnel are accumulated. Traffic violation and traffic safety have obvious relevance, so that necessary safety characteristic information of traffic participants can be extracted by performing data mining on traffic violation data.
In the data mining method, Ensemble Learning (Ensemble Learning) has excellent performance, and the method combines several machine Learning techniques into a meta-algorithm (meta-algorithm) of a prediction model to reduce variance (bagging), bias (boosting), or improve prediction (tracking), and helps improve the machine Learning result by combining several models. Compared with a single model, the method can well improve the prediction performance of the model.
The traffic accident risk prediction model of the traffic participants is constructed by an integrated learning algorithm, model fitting is mainly carried out by traffic violation data, the influence of an asymmetric data set on the model performance is reduced by an optimized sampling method, the model accuracy and the misjudgment rate are considered when the model performance is optimized, and the personnel risk prediction accuracy is improved.
Disclosure of Invention
The invention aims to provide a traffic participant accident risk prediction method based on ensemble learning, which adopts an ensemble learning algorithm of optimized sampling to predict and evaluate the traffic safety risk of a traffic participant with traffic violation records, fills the deficiency of the current quantitative analysis method of participant factors in traffic safety, and further improves the initiative and pertinence of traffic safety management work.
According to the invention, a high-risk personnel data set and a general personnel data set are divided through a judgment rule, an optimized sampling method is adopted, classifier training and correction are carried out based on an integrated learning algorithm, an integrated classifier with optimal performance is fitted into a traffic accident risk prediction model of a traffic participant, and personnel traffic safety attributes and risk probability can be output.
The technical solution of the invention is as follows:
a traffic participant accident risk prediction method based on ensemble learning comprises the following steps,
s1, constructing an illegal data set, a serious accident data set and a slight accident data set based on the original traffic illegal data and accident data;
s2, classifying the illegal data set into two categories, namely high-risk personnel and general personnel, determining a data label value label according to a classification rule, and accordingly dividing the illegal data set into a high-risk personnel data subset D, a general personnel data subset N and a subset U to be identified;
s3, setting a sampling interval S and a cycle step k according to the sample size of the data set N, wherein the boundary S on the interval generally does not exceed 25% of the total sample size;
s4 sample size nm=s0(m-1) k, s0 is the lower limit value of the sampling interval, m is the cycle number, and the initial value is 1; randomly extracting sample Nm with the sample size of N from the data set N;
s5, splitting the data set D and the Nm collection Gm into a training set and a test set;
s6, SMOTE sampling is carried out on the training set, and the sample expansion proportion ai of the high-risk personnel data subset D is set; when i is equal to 1, ai is equal to 1, and when i is greater than 1, ai is equal to ai-1+1, the initial value of i is 1, and i is provided with a set upper value limit;
s7, setting a sample shrinkage ratio bj of an Nm data subset of general personnel for the sample expansion ratio ai of the high-risk personnel; when j is 1, bj is 1, when j is greater than 1, bj-1+1, the initial value of j is 1, and j is provided with a set upper value limit; for the SMOTE sampling ratio ai: bj, carrying out sample expansion and sample contraction treatment on two types of label samples in a training set to be used as a training sample set of the classifier;
s8, training the high-risk personnel classifier by applying an ensemble learning algorithm, determining model parameters, and realizing a traffic accident risk prediction model for traffic participants
Figure BDA0001731870850000021
The model can output a marker value and a risk probability;
s9, modeling with the test set data
Figure BDA0001731870850000022
Evaluating to obtain model accuracy of different coverage
Figure BDA0001731870850000023
S10, classifying the data in the Nm complement Nm' of the sampling samples in the general personnel data subset N according to the illegal times, and inputting the data into the model according to the classification
Figure BDA0001731870850000024
Counting the misjudgment rate of personnel labels output by models under different coverage rates
Figure BDA0001731870850000025
Drawing a model misjudgment rate curve of the classification;
whether S11, j reaches the upper limit of the value; if yes, judging whether i reaches the upper value limit, if yes, entering S12, otherwise, turning to S6 if i is i + 1; otherwise, j ═ j +1, go to S7;
s12, detecting whether nm reaches the upper limit value of the sampling interval, if so, entering S13, otherwise, returning to S4 when m is m + 1;
s13, analyzing the model accuracy and the misjudgment rate of S9 and S10 to obtain the model with optimal performance
Figure BDA0001731870850000031
Determining an optimal random sampling number M, SMOTE sampling proportion I, J, a model coverage rate recall and a model discrimination threshold;
and S14, inputting the subset data to be identified in the step S2 into the model, and determining the corresponding data mark value and the risk probability.
Further, the ensemble learning algorithm in step S8 includes a random forest algorithm, an AdaBoost algorithm, an XgBoost algorithm, and a GBDT algorithm;
further, the method for assigning the corresponding data label value label based on the classification rule in step S2 specifically includes:
high-risk personnel: one category is traffic participants who have illegal records and have serious traffic accident records with major responsibility or all responsibility; the other type is that illegal records exist, only slight accident records exist, and the accident records are not less than 2 traffic participants;
the average person: traffic participants who have illegal records but no records of accidents;
the data which do not satisfy the above-mentioned discrimination condition constitute a subset to be recognized.
Further, the original traffic violation data and accident data in step S1 include the certificate information of the relevant person; collecting and classifying illegal records to obtain an illegal data set; the illegal data set is full sample data of illegal records of personnel, and the illegal data set information comprises personnel certificate numbers, illegal times, illegal types, punishment conditions, accident-related illegal behavior occurrence conditions and illegal occurrence time intervals.
Further, in step S1, the occurrence condition of the accident-related illegal activity is obtained by a corresponding analysis method, and the type of the violation with a high degree of influence of the traffic accident is extracted as the data attribute of the illegal data set.
Further, in step S1, the illegal occurrence time interval is obtained by converting a time continuous variable into a discrete variable and classifying the discrete variable according to the illegal time characteristics.
The invention has the beneficial effects that:
firstly, the traffic violation data are mined by adopting an integrated learning algorithm, the safety risk prediction based on the violation records of the traffic participants is realized, and the model can output the probability and the attribute of the traffic safety risk of the personnel.
Compared with traditional classification methods such as decision trees, neural networks and the like, the integrated learning algorithm adopted by the invention has obvious advantages in prediction performance, and ensures the accuracy of the prediction of the traffic accident risk of people.
The invention optimizes and improves the sampling link, improves both random sampling and SMOTE sampling, can relieve the problem that the accuracy of the model is influenced by unbalanced data sets to a certain extent, and has obvious effect on improving the performance of the model.
Drawings
Fig. 1 is a flow chart of a traffic participant accident risk prediction method based on ensemble learning according to an embodiment of the present invention.
FIG. 2 is an explanatory diagram of a data set in the embodiment.
FIG. 3 is a diagram illustrating attribute variables of the first 20 bits of importance in the embodiment.
FIG. 4 is a graph of model accuracy versus false positive rate in an example.
Detailed Description
Preferred embodiments of the present invention will be described in detail below with reference to the accompanying drawings.
Examples
A traffic participant accident risk prediction method based on ensemble learning is disclosed, as shown in fig. 1, and the specific method flow is as follows:
s1, constructing an illegal data set, a serious accident data set and a slight accident data set based on the original traffic illegal data and accident data;
in an embodiment, the original traffic violation data and accident data in step S1 include the certificate information of the relevant person; preprocessing operations such as collection and classification are carried out on the original illegal records to obtain an illegal data set; the law violation data set is full sample data of law violation records of personnel, and the data set information comprises personnel certificate numbers, violation times, violation types, punishment conditions, accident-related law violation behavior occurrence conditions and violation occurrence time intervals.
The occurrence condition of the accident-related illegal activity in the step S1 is obtained through a corresponding analysis mode, and the illegal type with a high degree of influence of the traffic accident is extracted as the data attribute of the illegal data set.
In the illegal occurrence time period in the step S1, the time continuous variable is converted into a discrete variable, and classification is performed according to the illegal time characteristics.
S2, classifying the illegal data set into two categories, namely high-risk personnel and general personnel, determining a data label value label according to a classification rule, and accordingly dividing the illegal data set into a high-risk personnel data subset D, a general personnel data subset N and a subset U to be identified;
the classification rules are specifically: the high-risk personnel refer to (1) personnel who have illegal records and have serious traffic accident records with main responsibility or all responsibility; (2) illegal records exist, only slight accident records exist, and the accident records are not less than 2 persons; the general personnel refers to personnel who have illegal records but no accident records; the data which do not satisfy the above-mentioned discrimination condition constitute a subset to be recognized.
And S3, setting a sampling interval S and a cycle step k according to the sample size of the data set N, wherein the boundary S on the interval does not exceed 25% of the total sample size generally.
S4 sample size nm=s0(m-1) k, s0 is the lower limit value of the sampling interval, m is the cycle number, and the initial value is 1; a sample Nm of the sample size Nm is randomly drawn from the data set N.
And S5, splitting the data set D and the Nm collection Gm into a training set and a test set.
S6, SMOTE sampling is carried out on the training set, and the sample expansion proportion ai of the high-risk personnel data subset D is set; wherein, when i is equal to 1, ai is equal to 1, when i is greater than 1, ai is equal to ai-1+1, the initial value of i is 1, i is provided with a set upper limit,
the upper limit of the value of i is usually 4;
s7, setting a sample shrinkage ratio bj of an Nm data subset of general personnel for the sample expansion ratio ai of the high-risk personnel; when j is 1, bj is 1, and when j >1, bj is bj-1+1, j has an initial value of 1, j has a set upper limit, and j has an upper limit of 4; for the SMOTE sampling ratio ai: bj, carrying out sample expansion and sample contraction treatment on two types of label samples in a training set to be used as a training sample set of the classifier;
s8, training the high-risk personnel classifier by applying an ensemble learning algorithm, determining model parameters, and realizing a traffic accident risk prediction model for traffic participants
Figure BDA0001731870850000051
The model can output a marker value and a risk probability;
s9, modeling with the test set data
Figure BDA0001731870850000052
Evaluating to obtain model accuracy of different coverage
Figure BDA0001731870850000053
S10, classifying the data in the Nm complement Nm' of the sampling samples in the general personnel data subset N according to the illegal times, and inputting the data into the model according to the classification
Figure BDA0001731870850000054
Counting the misjudgment rate of personnel labels output by models under different coverage rates
Figure BDA0001731870850000055
Drawing a model misjudgment rate curve of the classification;
whether S11, j reaches the upper limit of the value; if yes, judging whether i reaches the upper value limit, if yes, entering S12, otherwise, turning to S6 if i is i + 1; otherwise, j ═ j +1, go to S7;
s12, detecting whether nm reaches the upper limit value of the sampling interval, if so, entering S13, otherwise, returning to S4 when m is m + 1;
s13, S9 and S10Model with optimal performance in type accuracy and misjudgment rate analysis
Figure BDA0001731870850000056
Determining an optimal random sampling number M, SMOTE sampling proportion I, J, a model coverage rate recall and a model discrimination threshold;
and S14, inputting the subset data to be identified in the step S2 into the model, and determining the corresponding data mark value and the risk probability.
Specific examples
The present embodiment takes a driver of a motor vehicle as an analysis target.
And S1, acquiring the traffic violation records and accident records of 2 years in the area by butting with the database.
The traffic accident with death or serious injury or hit-and-run accident is taken as a serious accident, other accidents are taken as slight accidents, the original accident records are classified according to the serious accident or serious injury or hit-and-run accident, the accident type and driver certificate information are taken as attribute characteristics of a serious accident data set and a slight accident data set, and sample data of the two data sets are obtained.
Further, the illegal original data are preprocessed, and illegal information of the driver is collected and counted, wherein the illegal information comprises accumulated illegal times, illegal types, accumulated deduction scores, average deduction scores (minutes/times), single maximum deduction scores, accumulated fines amount and average fines amount (yuan/times).
The method comprises the steps of performing dimensionality reduction treatment on traffic accident data and illegal original data by adopting a corresponding analysis method, classifying illegal types according to the relevance of the illegal and the type of the accident, and extracting five types with highest relevance as data attributes of an accident risk illegal behavior field, wherein the data attributes are shown in a table 1.
TABLE 1 event-related violation type partitioning
Figure BDA0001731870850000061
According to the traffic flow operation of the road network of the area where the embodiment is located and the characteristics of the occurrence rule of the traffic violation event, aggregating the time, dividing the analysis time period, and converting the continuous variable into the nominal variable; in another embodiment, the time interval division is performed by other statistical means such as clustering.
Extracting the age, the gender and the province and city code of the driver according to the driver certificate number by the driver characteristic data; and generating an illegal data set according to the information extracted from each link, as shown in table 2.
TABLE 2. partial data of illegal data set
Figure BDA0001731870850000071
And S2, classifying the full sample I in the illegal data set into two categories, namely a high-risk driver and a common driver. Referring to fig. 4, in a case where a driver who has illegal records and has serious traffic accident records with major responsibility or all responsibility is taken as a high-risk driver, eligible data is classified as a data set D1; dividing the data meeting the conditions into a data set D2 according to another condition that the drivers with illegal records exist, only slight accident records exist and the accident records are not less than 2, and the drivers with the accident records are taken as high-risk drivers; the data set D of the high-risk drivers is D1+ D2. And synthesizing the corresponding data of the drivers with illegal records but no accident records into a general driver data set N.
Accordingly, a high-risk or general data label value label is determined for the data meeting the rule in the illegal data set, and the data subset U which cannot be applied to the classification rule is the data subset to be identified.
And S3, setting a sampling interval S and a cycle step k according to the sample size of the data set N, wherein the boundary S on the interval does not exceed 25% of the total sample size generally.
In this embodiment, the sample size of the data set exceeds 84000, the sampling interval S is [200,4000], and the loop step k is 200.
S4 sample size nm=s0(m-1) k, s0 is the lower limit value of the sampling interval, m is the cycle number, and the initial value is 1; a sample Nm of the sample size Nm is randomly drawn from the data set N.
In this embodiment, the initial number of samples is 200.
And S5, splitting the data set D and the Nm collection Gm into a training set and a test set.
In this embodiment, the split ratio of the training set to the test set is 9: 1.
S6, SMOTE sampling is conducted on the training set, and a high-risk driver data subset D sample expansion proportion ai is set, wherein a1 is equal to 1, ai is equal to ai-1+1, and the maximum value of i is 4.
S7, setting a sample shrinkage proportion bj of the Nm data subset of the general driver for the sample expansion proportion ai of the high-risk driver, wherein b1 is 1, bj is bj-1+1, and the maximum value of j is 4; and for the SMOTE sampling ratio ai: bj, carrying out sample expansion and sample contraction treatment on two types of label samples in the training set to be used as a training sample set of the classifier.
S8, training a high-risk driver classifier by using a random forest algorithm, determining model parameters, and realizing a driver traffic accident risk prediction model
Figure BDA0001731870850000081
The model can output the driver flag value and the risk probability.
S9, modeling with the test set data
Figure BDA0001731870850000082
Evaluating to obtain model accuracy of different coverage
Figure BDA0001731870850000083
S10, classifying the data in the Nm complement Nm' of the sampling samples in the general driver data subset N according to the number of times of violation, and inputting the classified data of 1 time, 2 times, 3 times, 4 times, 5 times, 6 times or more of violation into the model
Figure BDA0001731870850000084
Counting the misjudgment rate of the driver labels output by models under different coverage rates
Figure BDA0001731870850000085
Drawing classificationOther model false positive rate curves.
Whether S11, j reaches the set maximum value; if yes, judging whether i reaches a set maximum value, if yes, entering S12, otherwise, turning to S6 if i is i + 1; otherwise, j ═ j +1, go to S7.
And S12, detecting whether nm reaches an interval upper limit S, if so, entering S13, otherwise, returning to S4 when m is m + 1.
S13, analyzing the model accuracy and the misjudgment rate of S9 and S10 to obtain the model with optimal performance
Figure BDA0001731870850000086
And determining an optimal random sampling number M, SMOTE sampling proportion I, J, a model coverage rate recall and a model discrimination threshold.
In the embodiment, a random forest algorithm is adopted, and the proportion of the training set high-risk to the general driver sample expansion and sample contraction starts from 1:1 and ends up to 4: 4; comparing and analyzing the comprehensive misjudgment rate, the accuracy and the index stability to determine the optimal performance model as
Figure BDA0001731870850000087
Namely, the number of random sampling samples is 2400, the SMOTE ratio is 2:2, and attribute variables of the top 20 bits of importance in the model are shown in FIG. 3; the model coverage rate recall is 0.06, the corresponding model accuracy is 0.889, the misjudgment rate and accuracy curve of the model is shown in fig. 4, the model judgment threshold is 0.98, and the data misjudgment rate of illegal 1 time is slightly higher than that of other types as can be seen from the model performance.
And S14, inputting the subset data to be identified in the step S2 into the model, and determining the corresponding data mark value and the risk probability. The results of the partial judgment are shown in Table 3.
Table 3. high risk driver identification result using the method of the present invention
Figure BDA0001731870850000091

Claims (6)

1. A traffic participant accident risk prediction method based on ensemble learning is characterized in that: comprises the following steps of (a) carrying out,
s1, constructing an illegal data set, a serious accident data set and a slight accident data set based on the original traffic illegal data and accident data;
s2, classifying the illegal data set into two categories according to the serious traffic accident record of the serious accident data set and the light accident record of the light accident data set, namely high-risk personnel and general personnel, determining a data label value label according to a classification rule, and accordingly dividing the illegal data set into a high-risk personnel data subset D, a general personnel data subset N and a subset U to be identified;
s3, setting a sampling interval S and a cycle step k according to the sample size of the data set N;
s4 sample size nm=s0+(m-1)·k,s0Is the lower limit value of the sampling interval, m is the cycle number, and the initial value is 1; randomly sampling N samples from the data set NmSample N ofm
S5, data sets D and NmCollection GmSplitting the training set into a training set and a test set;
s6, SMOTE sampling is carried out on the training set, and the sample expansion proportion a of the high-risk personnel data subset D is seti(ii) a Wherein, when i is 1, aiWhen i is equal to 1>1 time, ai=ai-1The initial value of +1, i is 1, and i is provided with a set upper value limit;
s7 sample expansion ratio a for high-risk personneliSetting general person NmData subset reduction scale bj(ii) a Wherein, when j is 1, bjWhen j is equal to 1>1 time, bj=bj-1The initial value of +1, j is 1, and j is provided with a set upper value limit; sampling ratio a for SMOTEi:bjCarrying out sample expansion and sample contraction treatment on two types of label samples in a training set to be used as a training sample set of the classifier;
s8, training the high-risk personnel classifier by applying an ensemble learning algorithm, determining model parameters, and realizing a traffic accident risk prediction model for traffic participants
Figure FDA0002930017600000011
The model can output a marker value and a risk probability;
s9, modeling with the test set data
Figure FDA0002930017600000012
Evaluating to obtain model accuracy of different coverage
Figure FDA0002930017600000013
S10, sampling samples N in the general personnel data subset NmComplement Nm' Indata is classified according to the number of violations and input into the model by category
Figure FDA0002930017600000014
Counting the misjudgment rate of personnel labels output by models under different coverage rates
Figure FDA0002930017600000015
Drawing a model misjudgment rate curve of the classification;
whether S11, j reaches the upper limit of the value; if yes, judging whether i reaches the upper value limit, if yes, entering S12, otherwise, turning to S6 if i is i + 1; otherwise, j ═ j +1, go to S7;
s12, detecting nmIf the upper limit value of the sampling interval is reached, the step enters S13, otherwise, m is m +1, and the step returns to S4;
s13, analyzing the model accuracy and the misjudgment rate of S9 and S10 to obtain the model with optimal performance
Figure FDA0002930017600000021
Determining an optimal random sampling number M, SMOTE sampling proportion I, J, a model coverage rate recall and a model discrimination threshold;
and S14, inputting the subset data to be identified in the step S2 into the model, and determining the corresponding data mark value and the risk probability.
2. The ensemble learning-based traffic participant accident risk prediction method according to claim 1, wherein the ensemble learning algorithm in step S8 includes a random forest algorithm, an AdaBoost algorithm, an XgBoost algorithm, a GBDT algorithm;
3. the ensemble learning-based traffic participant accident risk prediction method according to claim 1, wherein the method for assigning the corresponding data label value label based on the classification rule in step S2 specifically comprises:
high-risk personnel: one category is traffic participants who have illegal records and have serious traffic accident records with major responsibility or all responsibility; the other type is that illegal records exist, only slight accident records exist, and the accident records are not less than 2 traffic participants;
the average person: traffic participants who have illegal records but no records of accidents;
and the data which do not meet the discrimination conditions of the high-risk personnel and the common personnel form a subset to be recognized.
4. The ensemble learning-based transportation participant accident risk prediction method of claim 1, wherein: the original traffic violation data and accident data in step S1 include personnel certificate information; collecting and classifying illegal records to obtain an illegal data set; the illegal data set records full sample data for the illegal, and the information of the illegal data set comprises personnel certificate numbers, illegal times, illegal types, punishment conditions, accident illegal behavior occurrence conditions and illegal occurrence time intervals.
5. The ensemble learning-based transportation participant accident risk prediction method of claim 1, wherein: in step S1, the occurrence of the accident illegal activity is obtained by a corresponding analysis method, and the type of the illegal activity with a high degree of influence of the traffic accident is extracted as the data attribute of the illegal data set.
6. The ensemble learning-based transportation participant accident risk prediction method of claim 4, wherein: in step S1, the time-continuous variable is converted into a discrete variable, and the discrete variable is classified according to the characteristics of the time of violation.
CN201810783019.6A 2018-07-16 2018-07-16 Traffic participant accident risk prediction method based on ensemble learning Active CN109191828B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810783019.6A CN109191828B (en) 2018-07-16 2018-07-16 Traffic participant accident risk prediction method based on ensemble learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810783019.6A CN109191828B (en) 2018-07-16 2018-07-16 Traffic participant accident risk prediction method based on ensemble learning

Publications (2)

Publication Number Publication Date
CN109191828A CN109191828A (en) 2019-01-11
CN109191828B true CN109191828B (en) 2021-05-28

Family

ID=64936778

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810783019.6A Active CN109191828B (en) 2018-07-16 2018-07-16 Traffic participant accident risk prediction method based on ensemble learning

Country Status (1)

Country Link
CN (1) CN109191828B (en)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110555989B (en) * 2019-08-16 2021-10-26 华南理工大学 Xgboost algorithm-based traffic prediction method
CN111126868B (en) * 2019-12-30 2023-07-04 中南大学 Road traffic accident occurrence risk determination method and system
CN111222784A (en) * 2020-01-03 2020-06-02 重庆特斯联智慧科技股份有限公司 Security monitoring method and system based on population big data
CN112016735B (en) * 2020-07-17 2023-03-28 厦门大学 Patrol route planning method and system based on traffic violation hotspot prediction and readable storage medium

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104182805A (en) * 2014-08-22 2014-12-03 杭州华亭科技有限公司 Dangerous tendency prediction method based on prisoner behavior characteristic ensemble learning model
CN104637246A (en) * 2015-02-02 2015-05-20 合肥工业大学 Driver multi-behavior early warning system and danger evaluation method
CN105303197A (en) * 2015-11-11 2016-02-03 江苏省邮电规划设计院有限责任公司 Vehicle following safety automatic assessment method based on machine learning
JP5896263B2 (en) * 2011-01-31 2016-03-30 矢崎エナジーシステム株式会社 Image recording control method and in-vehicle image recording apparatus
CN106780263A (en) * 2017-01-13 2017-05-31 中电科新型智慧城市研究院有限公司 High-risk personnel analysis and recognition methods based on big data platform
CN107169600A (en) * 2017-05-12 2017-09-15 广州中国科学院工业技术研究院 Recognize method, system, storage medium and the computer equipment of major hazard source

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP5896263B2 (en) * 2011-01-31 2016-03-30 矢崎エナジーシステム株式会社 Image recording control method and in-vehicle image recording apparatus
CN104182805A (en) * 2014-08-22 2014-12-03 杭州华亭科技有限公司 Dangerous tendency prediction method based on prisoner behavior characteristic ensemble learning model
CN104637246A (en) * 2015-02-02 2015-05-20 合肥工业大学 Driver multi-behavior early warning system and danger evaluation method
CN105303197A (en) * 2015-11-11 2016-02-03 江苏省邮电规划设计院有限责任公司 Vehicle following safety automatic assessment method based on machine learning
CN106780263A (en) * 2017-01-13 2017-05-31 中电科新型智慧城市研究院有限公司 High-risk personnel analysis and recognition methods based on big data platform
CN107169600A (en) * 2017-05-12 2017-09-15 广州中国科学院工业技术研究院 Recognize method, system, storage medium and the computer equipment of major hazard source

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
道路交通事故数据挖掘及应用研究;程坦;《中国优秀硕士学位论文全文数据库 信息科技辑》;20111215(第S2期);I138-955 *

Also Published As

Publication number Publication date
CN109191828A (en) 2019-01-11

Similar Documents

Publication Publication Date Title
CN108596409B (en) Method for improving accident risk prediction precision of traffic hazard personnel
CN109191828B (en) Traffic participant accident risk prediction method based on ensemble learning
CN110866677B (en) Driver relative risk evaluation method based on benchmark analysis
CN104268599B (en) Intelligent unlicensed vehicle finding method based on vehicle track temporal-spatial characteristic analysis
CN111461185A (en) Driving behavior analysis method based on improved K-means
CN108717786B (en) Traffic accident cause mining method based on universality meta-rule
CN109086808B (en) Traffic high-risk personnel identification method based on random forest algorithm
CN108682149B (en) Highway accident black point road section line shape cause analysis method based on binomial Logistic regression
CN109671274B (en) Highway risk automatic evaluation method based on feature construction and fusion
CN111462488A (en) Intersection safety risk assessment method based on deep convolutional neural network and intersection behavior characteristic model
CN110588658B (en) Method for detecting risk level of driver based on comprehensive model
CN111242484A (en) Vehicle risk comprehensive evaluation method based on transition probability
CN110562261B (en) Method for detecting risk level of driver based on Markov model
CN108847022B (en) Abnormal value detection method of microwave traffic data acquisition equipment
CN111563555A (en) Driver driving behavior analysis method and system
CN107766983B (en) Method for setting emergency rescue parking point of urban rail transit station
CN105809193A (en) Illegal operation vehicle recognition method based on Kmeans algorithm
CN115689040B (en) Traffic accident severity prediction method and system based on convolutional neural network
CN109598931A (en) Group based on traffic safety risk divides and difference analysis method and system
CN113673304B (en) Vehicle-mounted expected functional safety hazard analysis and evaluation method based on scene semantic driving
CN109101568A (en) Traffic high-risk personnel recognition methods based on XgBoost algorithm
CN110263074B (en) Method for mining illegal accident corresponding relation based on LLE and K mean value method
CN113392885A (en) Traffic accident space-time hot spot distinguishing method based on random forest theory
CN110705628B (en) Method for detecting risk level of driver based on hidden Markov model
CN112270114A (en) Vehicle personalized risk behavior identification method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
CB02 Change of applicant information
CB02 Change of applicant information

Address after: 211100 No. 19 Suyuan Avenue, Jiangning Economic and Technological Development Zone, Nanjing City, Jiangsu Province

Applicant after: JIANGSU ZHITONG TRAFFIC TECHNOLOGY Co.,Ltd.

Address before: 210006, Qinhuai District, Jiangsu, Nanjing should be 388 days street, Chenguang 1865 Technology Creative Industry Park E10 building on the third floor

Applicant before: JIANGSU ZHITONG TRAFFIC TECHNOLOGY Co.,Ltd.

GR01 Patent grant
GR01 Patent grant