CN105046957A - Balanced sampling method for accident analysis and safety assessment - Google Patents

Balanced sampling method for accident analysis and safety assessment Download PDF

Info

Publication number
CN105046957A
CN105046957A CN201510382446.XA CN201510382446A CN105046957A CN 105046957 A CN105046957 A CN 105046957A CN 201510382446 A CN201510382446 A CN 201510382446A CN 105046957 A CN105046957 A CN 105046957A
Authority
CN
China
Prior art keywords
accident
data
zero
model
bar
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201510382446.XA
Other languages
Chinese (zh)
Other versions
CN105046957B (en
Inventor
裴欣
李力
李兴山
张佐
姚丹亚
张毅
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tsinghua University
Original Assignee
Tsinghua University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tsinghua University filed Critical Tsinghua University
Priority to CN201510382446.XA priority Critical patent/CN105046957B/en
Publication of CN105046957A publication Critical patent/CN105046957A/en
Application granted granted Critical
Publication of CN105046957B publication Critical patent/CN105046957B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)
  • Traffic Control Systems (AREA)

Abstract

The invention discloses a balanced sampling method for accident analysis and safety assessment, which belongs to the technical field of safety assessment. The method comprises steps: a certain accident type is particularly selected, related data are collected, the data are imported to a processing system for preprocessing, and an accident analysis and safety assessment data set is built; an analysis model is built, parameter estimation and convergence judgment are carried out, and accident sample balanced sampling, counting model solving, re-sampling parameter estimation and convergence judgment are carried out; and a model result is finally outputted, and an improved suggestion is brought forward for significant factor analysis. The invention relates to predication and assessment on accidental events such as a traffic accident; the balanced sampling method is adopted for effectively solving the zero expansion problem by adopting a non-balanced data set; stability and effectiveness of the parameter estimation result are ensured, the model precision is improved, parameter estimation errors are reduced, more significant factors related to accent happening can be excavated, and practical applications can be effectively guided.

Description

A kind of balanced sample method for crash analysis and safety evaluation
Technical field
The invention belongs to safety evaluatio technical field.In particular to a kind of balanced sample method for crash analysis and safety evaluation, be specifically related to the predicting and appraising of the incidents such as traffic hazard, concrete parameter estimation and the analysis of Influential Factors adopting balanced sample method non-equilibrium data collection to be carried out to probability regression model.
Background technology
Traffic safety is with human health with develop closely bound up problem in global range, at present mainly with based on the data had an accident, counter model is utilized to set up accident risk and injures and deaths scale evaluation model, the remarkable factor of the generation of analyzing influence accident and the order of severity thereof, carry out safety evaluation, and then propose traffic safety Improving advice.Counter model is a kind of probability regression model, comprises Poisson model, negative binomial distribution model etc., in modeling process, needs a large amount of casualty datas as observation sample, and then the unknown parameter in solving model.According to existing traffic theory, the magnitude of traffic flow, traffic control and management scheme, weather etc. are all the key factors affecting accident generation, for studying the impact of above-mentioned factor on accident, need the data of thinner time granularity, mainly with hour being unit in reality, the crash analysis data set that Time Created is discrete.But due to the limitation and sporadic that casualty data obtains, time-discrete data centralization usually occurs that lot of accident number is the sample of zero, occur that the zero thermal expansion problem of casualty data (also claims zero to pile up problem thus, refer under existing space-time division methods, packet is containing excessive zero), cause data set serious unbalance, make accuracy and reliability deficiency (Shankaretal., 1997 of model parameter estimation; Washingtonetal., 2011), can not effectively instruct traffic safety engineering practice.For solving casualty data zero thermal expansion problem, Miaou (1994), LeeandMannering (2002), Shankar (2003), the people such as HuangandChin (2010), propose a class zero thermal expansion counter model (zero-inflatedcountdatamodels), this kind of model hypothesis road has two kinds of safe conditions (namely accident number is two kinds of situations of 0), one is perfectly safe, one is comparatively safe, and this model has better fitness compared to traditional counter model.But the people such as Lord point out respectively in the research of 2005 and 2007, the road be perfectly safe is non-existent, rational space-time should to be utilized to divide solve in data the problem that zero is excessive, but Lord does not provide the method in classifying rationally time, space.Although follow-up researcher proposes the method for all kinds of solution zero thermal expansion problem further, the limitation that non-equilibrium data collection brings all cannot be changed.
Summary of the invention
The object of this invention is to provide a kind of balanced sample method for crash analysis and safety evaluation, it is characterized in that, comprise the steps:
Step 1, the accident pattern selecting certain to be analyzed, and gather this kind of accident related accidents data over the years, import disposal system and carry out pre-service, set up crash analysis and safety evaluation data set;
Step 2, the crash analysis obtained according to step 1 and safety evaluation data set set up analytical model, carry out parameter estimation and convergence judges; Solve comprising balanced sample accident sample, counter model, resampling parameter estimation and convergence judge;
Step 3, output model result, for remarkable factor, analyze and propose recommendation on improvement.
In described step 1, the accident pattern selecting certain to be analyzed comprises the accident and disaster that occur in traffic hazard, industrial and agricultural production; Gather related data, and carry out pre-service, comprising:
Step 101, collection collect Various types of data, set up the traffic hazard data set comprising the magnitude of traffic flow, highway layout parameter, traffic control and management key element and weather condition;
Step 102, for each road entity, each unit interval, the accident number Y being regression model with accident quantity is dependent variable, so that the factor of accident generation may be affected for independent variable X, set up the traffic hazard data set comprising the influence factors such as the magnitude of traffic flow, highway layout parameter, traffic control and management key element and weather condition, obtain M bar data altogether;
Step 103, check the Problems of Multiple Synteny of multiple influence factor X, the collinearity variable of Delete superfluous.
Adopt balanced sample method repeatedly to extract equilibrium criterion collection in described step 2, and adopt probability regression model to solve many group models parameter, and then carry out parameter estimation and convergence judgement, specific practice comprises:
Whether step 201, be zero according to accident quantity in every bar road unit interval, total data is divided into 2 classes, a class is accident non-zero, and a class is accident number is zero data; In all M bar data, non-zero is designated as K bar;
Step 202, from (M-K) bar accident be the sample of zero, randomly draw K bar data, namely ratio be 1:1 be that zero data and K bar accident non-zero form new data acquisition B by K bar accident; B contains 2K bar data, and in B, accident is zero to balance with accident non-zero sample number;
Step 203, based on data acquisition B, adopt counter model set up appraisal of traffic safety equation Y=f (β X)), one group of solution of solving model parameter beta; Wherein Y is accident number;
Step 204, repeated sampling, repeat step 202 and more than 203 time, until result convergence;
Step 205, based on many group models parameter estimation result, the standard deviation (or being called evaluated error) of calculating parameter β, fiducial interval and the level of signifiance.
Described step 3 output model result, comprising:
The situation of step 301, standard deviation (or being called evaluated error), fiducial interval and the level of signifiance according to calculating parameter β, screens and to make a difference significant independent variable X to accident, claim these independents variable X to be remarkable factor;
Step 302, analyze the impact of remarkable factor X on accident number Y;
Step 303, analysis according to step 302, release Safety Measures approaches and proposals.
The invention has the beneficial effects as follows:
1. balanced sample sample, adopts balanced sample method, can efficient solution determine counter model adopt the zero thermal expansion problem of non-equilibrium data collection.And the hypotheses do not relied on research object, effectively prevent the unreasonable problem of basic assumption of existing zero thermal expansion counter model.
2. restrain model parameter, repeated sampling also solves repeatedly, until model parameter can be good at convergence, ensures stability and the validity of parameter estimation result.
3. improve model accuracy, the precision of improved model, parameter estimating error is less.
4. excavate remarkable factor, can excavate and manyly to accident, relevant remarkable factor occur, effective Guiding Practice is applied.
Accompanying drawing explanation
Fig. 1 is used for the workflow schematic diagram of the balanced sample method of crash analysis and safety evaluation.
Embodiment
Step 1, the accident pattern selecting certain to be analyzed, and gather this kind of accident related accidents data over the years, import disposal system and carry out pre-service, set up crash analysis and safety evaluation data set;
Step 2, the crash analysis obtained according to step 1 and safety evaluation data set set up analytical model, carry out parameter estimation and convergence judges; Solve comprising balanced sample accident sample, counter model, resampling parameter estimation, convergence judge;
Step 3, output model result, for remarkable factor, analyze and propose recommendation on improvement.
Embodiment
For above-mentioned three large steps, analyze for certain big city road traffic accident, the present invention is explained.
In described step 1, safety analysis relevant data acquisition and pre-service comprise:
Step 101, collection collect Various types of data, set up the traffic hazard data set comprising the magnitude of traffic flow, highway layout parameter, traffic control and management key element and weather condition;
Step 102, for each road entity, each unit interval, (dependent variable) accident number Y being regression model with accident quantity, so that the factor of accident generation may be affected for independent variable X, set up the traffic hazard data set comprising the magnitude of traffic flow, highway layout parameter, traffic control and management key element, weather condition, obtain 2 altogether, 230,314 data; Feature is as shown in table 1:
Step 103, check the Problems of Multiple Synteny of multiple influence factor X, the collinearity variable of Delete superfluous.
Table 1 model data is gathered information
Adopt balanced sample method repeatedly to extract equilibrium criterion collection in described step 2, and adopt probability regression model to solve many group models parameter, and then carry out parameter estimation and convergence judgement, specific practice comprises:
Whether step 201, be zero according to accident quantity in every bar road unit interval, total data is divided into 2 classes, a class is accident non-zero, and a class is accident number is zero data; In all 2,230,314 data, non-zero has 2,534;
Step 202, from 2,227,780 accidents are randomly draw 2 in the sample of zero, 534 data, namely ratio be 1:1 by 2,534 accidents are zero data and 2, and 534 accident non-zero form new data acquisition B; B contains 5,068 data, and in B, accident is zero to balance with accident non-zero sample number;
Step 203, based on data acquisition B, adopt counter model set up appraisal of traffic safety equation Y=f (β X)), one group of solution of solving model parameter beta, specifically:
The accident number Y that in certain hour, certain section occurs obeys Poisson distribution, and its probability equation is:
Wherein, y itfor the accident number of the actual generation of road i in time t, λ it=exp (β X it) be the predicted value of accident number, X itbe the vector representing each influence factor, model parameter β is the corresponding influence coefficient of influence factor.Model assesses the impact of each influence factor for accident number by solving model parameter.Poisson model is the basis of counter model, and other counter model solves various special data demand by all kinds of error term of introducing or stochastic variable.
Step 204, repeated sampling, repeat step 202 and more than 203 time, until result convergence;
Step 205, based on many group models parameter estimation result, the standard deviation (or being called evaluated error) of calculating parameter β, fiducial interval and the level of signifiance.
Described step 3 output model result, comprising:
The situation of step 301, standard deviation (or being called evaluated error), fiducial interval and the level of signifiance according to calculating parameter β, screens the significant independent variable X of accident impact, claims these independents variable X to be remarkable factor;
Step 302, analyze the independent variable X of remarkable factor to the impact of accident number Y;
Step 303, analysis according to step 302, propose approaches and proposals to the innovative approach of traffic safety.
Step 304, output model result, employing traditional parameters method of estimation, balanced sample method carry out counter model parameter estimation respectively below, and result is as shown in table 2.Can see from data shown in table 2, adopt the parameter estimation result that balanced sample method obtains, its evaluated error is less compared with conventional method, contributes to excavating more significantly factor, and can provide and more fully advise for improving traffic safety, Guiding Practice is applied.Such as central isolation factor, according to the result of classic method, central authorities' isolation relation not remarkable in the generation of accident; But according to the result of balanced sample, arranging of central authorities' isolation can the generation of minimizing accident significantly, can pay attention to the setting to central authorities' isolation, to reduce the generation of accident during suggestion highway layout.
Table 2 parameter estimation result contrasts
* represent that 95% level is remarkable.

Claims (4)

1., for a balanced sample method for crash analysis and safety evaluation, it is characterized in that, comprise the steps:
Step 1, the accident pattern selecting certain to be analyzed, and gather this kind of accident related accidents data over the years, import disposal system and carry out pre-service, set up crash analysis and safety evaluation data set;
Step 2, the crash analysis obtained according to step 1 and safety evaluation data set set up analytical model, carry out parameter estimation and convergence judges; Solve comprising balanced sample accident sample, counter model, resampling parameter estimation, convergence judge;
Step 3, output model result, for remarkable factor, analyze and propose recommendation on improvement.
2., according to claim 1 for the balanced sample method of crash analysis and safety evaluation, it is characterized in that, in described step 1, the accident pattern selecting certain to be analyzed comprises the accident and disaster that occur in traffic hazard, industrial and agricultural production; And gather related data, carry out pre-service, comprising:
Step 101, collection collect Various types of data, set up the traffic hazard data set comprising the magnitude of traffic flow, highway layout parameter, traffic control and management key element and weather condition;
Step 102, for each road entity, each unit interval, with the dependent variable Y that accident quantity is regression model, so that the factor of accident generation may be affected for independent variable X, set up the traffic hazard data set comprising the influence factors such as the magnitude of traffic flow, highway layout parameter, traffic control and management key element, weather condition, obtain M bar data altogether;
Step 103, check the Problems of Multiple Synteny of multiple influence factor X, the collinearity variable of Delete superfluous.
3. according to claim 1 for the balanced sample method of crash analysis and safety evaluation, it is characterized in that, balanced sample method is adopted repeatedly to extract equilibrium criterion collection in described step 2, and adopt probability regression model to solve many group models parameter, and then carry out parameter estimation and convergence judgement, specific practice comprises:
Whether step 201, be zero according to accident quantity in every bar road unit interval, total data is divided into 2 classes, a class is accident non-zero, and a class is accident number is zero data; In all M bar data, non-zero is designated as K bar;
Step 202, from (M-K) bar accident be the sample of zero, randomly draw K bar data, namely ratio be 1:1 be that zero data and K bar accident non-zero form new data acquisition B by K bar accident; B contains 2K bar data, and in B, accident is zero to balance with accident non-zero sample number;
Step 203, based on data acquisition B, adopt counter model set up appraisal of traffic safety equation Y=f (β X)), one group of solution of solving model parameter beta, Y is accident number;
Step 204, repeated sampling, repeat step 202 and more than 203 time, until result convergence;
Step 205, based on many group models parameter estimation result, the standard deviation (or being called evaluated error) of calculating parameter β, fiducial interval and the level of signifiance.
4., according to claim 1 for the balanced sample method of crash analysis and safety evaluation, it is characterized in that, described step 3 output model result, comprising:
Step 301, according to the standard deviation of calculating parameter β or the situation being called evaluated error, fiducial interval and the level of signifiance, screen and accident is made a difference significant independent variable X, claim these independents variable X to be remarkable factor;
Step 302, analyze the impact of remarkable factor X on accident Y;
Step 303, analysis according to step 302, release Safety Measures approaches and proposals.
CN201510382446.XA 2015-07-02 2015-07-02 A kind of balanced sample method for crash analysis and safety evaluation Active CN105046957B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201510382446.XA CN105046957B (en) 2015-07-02 2015-07-02 A kind of balanced sample method for crash analysis and safety evaluation

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201510382446.XA CN105046957B (en) 2015-07-02 2015-07-02 A kind of balanced sample method for crash analysis and safety evaluation

Publications (2)

Publication Number Publication Date
CN105046957A true CN105046957A (en) 2015-11-11
CN105046957B CN105046957B (en) 2017-06-30

Family

ID=54453463

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201510382446.XA Active CN105046957B (en) 2015-07-02 2015-07-02 A kind of balanced sample method for crash analysis and safety evaluation

Country Status (1)

Country Link
CN (1) CN105046957B (en)

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106485919A (en) * 2016-10-11 2017-03-08 东南大学 A kind of method for judging that through street fixed point tachymeter is affected on traffic accident quantity
CN107014635A (en) * 2017-04-10 2017-08-04 武汉轻工大学 Grain uniform sampling method and device
CN107025382A (en) * 2017-05-02 2017-08-08 清华大学 A kind of engineering system health analysis system and method based on critical phase transformation theory
CN107731007A (en) * 2017-11-16 2018-02-23 东南大学 The crossing accident Forecasting Methodology to be developed based on traffic conflict random process
CN111145535A (en) * 2019-11-28 2020-05-12 银江股份有限公司 Travel time reliability distribution prediction method under complex scene
CN111680022A (en) * 2020-05-15 2020-09-18 河海大学 Beach tourist safety accident database establishing and predicting method
CN113762364A (en) * 2021-08-23 2021-12-07 东南大学 Unbalanced traffic accident data synthesis sampling method
CN113808392A (en) * 2021-08-24 2021-12-17 东南大学 Method for optimizing traffic accident data under multi-source data structure
CN116842018A (en) * 2023-07-06 2023-10-03 江西桔贝科技有限公司 Big data screening method and system

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH11242755A (en) * 1998-02-26 1999-09-07 Fujitsu Ltd Analysis model generating device

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH11242755A (en) * 1998-02-26 1999-09-07 Fujitsu Ltd Analysis model generating device

Non-Patent Citations (5)

* Cited by examiner, † Cited by third party
Title
刘凯: "数据挖掘中类不平衡数据集分类模型研究", 《中国优秀硕士学位论文全文数据库信息科技辑》 *
施梦圜等: "基于平衡采样的轻量级广告点击率预估方法", 《计算机应用研究》 *
李勇等: "不平衡数据的集成分类算法综述", 《计算机应用研究》 *
翟云等: "不平衡类数据挖掘研究综述", 《计算机科学》 *
郭强等: "Role of street patterns in zone-based traffic safety analysis", 《JOURNAL OF CENTRAL SOUTH UNIVERSITY》 *

Cited By (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106485919A (en) * 2016-10-11 2017-03-08 东南大学 A kind of method for judging that through street fixed point tachymeter is affected on traffic accident quantity
CN107014635A (en) * 2017-04-10 2017-08-04 武汉轻工大学 Grain uniform sampling method and device
CN107014635B (en) * 2017-04-10 2019-09-27 武汉轻工大学 Grain uniform sampling method and device
CN107025382A (en) * 2017-05-02 2017-08-08 清华大学 A kind of engineering system health analysis system and method based on critical phase transformation theory
CN107025382B (en) * 2017-05-02 2019-11-26 清华大学 A kind of engineering system health analysis system and method based on critical phase transformation theory
CN107731007A (en) * 2017-11-16 2018-02-23 东南大学 The crossing accident Forecasting Methodology to be developed based on traffic conflict random process
CN111145535A (en) * 2019-11-28 2020-05-12 银江股份有限公司 Travel time reliability distribution prediction method under complex scene
CN111145535B (en) * 2019-11-28 2020-12-15 银江股份有限公司 Travel time reliability distribution prediction method under complex scene
CN111680022A (en) * 2020-05-15 2020-09-18 河海大学 Beach tourist safety accident database establishing and predicting method
CN113762364A (en) * 2021-08-23 2021-12-07 东南大学 Unbalanced traffic accident data synthesis sampling method
CN113762364B (en) * 2021-08-23 2022-11-04 东南大学 Unbalanced traffic accident data synthesis sampling method
CN113808392A (en) * 2021-08-24 2021-12-17 东南大学 Method for optimizing traffic accident data under multi-source data structure
CN113808392B (en) * 2021-08-24 2022-04-01 东南大学 Method for optimizing traffic accident data under multi-source data structure
CN116842018A (en) * 2023-07-06 2023-10-03 江西桔贝科技有限公司 Big data screening method and system
CN116842018B (en) * 2023-07-06 2024-02-23 上海比滋特信息技术有限公司 Big data screening method and system

Also Published As

Publication number Publication date
CN105046957B (en) 2017-06-30

Similar Documents

Publication Publication Date Title
CN105046957A (en) Balanced sampling method for accident analysis and safety assessment
CN104484993A (en) Processing method of cell phone signaling information for dividing traffic zones
US20120220274A1 (en) Position information analysis device and position information analysis method
CN103345566B (en) Based on the geochemical anomaly discrimination and evaluation method of Geological Connotation
CN107144891A (en) The monitoring of water burst precursor information dash forward with merging early warning system and method in tunnel
Vasconcelos et al. Estimating the parameters of Cowan’s M3 headway distribution for roundabout capacity analyses
CN110795467A (en) Traffic rule data processing method and device, storage medium and computer equipment
Hildreth et al. Reduction of short-interval GPS data for construction operations analysis
CN104750830B (en) The cycle method for digging of time series data
CN107798877B (en) Method and system for predicting traffic volume based on highway charging data
CN109684373A (en) Emphasis party based on trip and call bill data analysis has found method
CN103218668A (en) County-level road accident forecasting method based on geographic weighting Poisson regression
CN103514743A (en) Method for recognizing abnormal traffic state characteristics of real-time index data matching memory range
CN104574141A (en) Service influence degree analysis method
Raju et al. Examining smoothening techniques for developing vehicular trajectory data under heterogeneous conditions
CN107798418A (en) A kind of traffic accident frequency Forecasting Methodology based on traffic analysis cell
JP5486939B2 (en) Road traffic situation analysis system
Anusha et al. Dynamical systems approach for queue and delay estimation at signalized intersections under mixed traffic conditions
CN101853481A (en) Method for evaluating functions of land patrol vehicle
CN114282082A (en) Index visualization method and system supporting urban physical examination
JP2013171491A (en) Traffic estimation system using single image
AlHadidi et al. Modeling bus passenger boarding/alighting times: A stochastic approach
Bansal et al. Impacts of bus-stops on the speed of motorized vehicles under heterogeneous traffic conditions: a case-study of Delhi, India
Zhai et al. Using parametric modeling to estimate highway construction contract time
CN112800691B (en) Method and device for constructing precipitation level prediction model

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant