WO2020083400A1

WO2020083400A1 - Traffic accident data intelligent analysis and comprehensive application system

Info

Publication number: WO2020083400A1
Application number: PCT/CN2019/113477
Authority: WO
Inventors: 刘林; 饶欢; 陈凝; 吕伟韬
Original assignee: 江苏智通交通科技有限公司
Priority date: 2018-10-26
Filing date: 2019-10-26
Publication date: 2020-04-30
Also published as: CN109409430A; CN109409430B

Abstract

A traffic accident data intelligent analysis and comprehensive application system, comprising a data connection module, a mining processing module, an interaction module, a map module, and a data analysis module. The mining processing module drives data processing by means of a traffic accident data factor importance analysis model according to traffic accident data extracted by the data connection module, so as to obtain the importance degree of attribute factor set elements; the data analysis module receives an attribute factor selection result of the interaction module, and from a data analysis aspect of attribute factors, provides a targeted data analysis result by means of an accident data analysis mode; according to the system, importance analysis of the attribute factors is carried out on the basis of original traffic accident data, a missing data estimation policy is configured, and the situation that the attribute factors provided by different data sources of a traffic accident are inconsistent can be effectively coped with; therefore, attribute factors having important traffic accident information are extracted from sample data selected by a user, and quantitative importance indexes are output.

Description

交通事故数据智能分析与综合应用***Intelligent analysis and comprehensive application system of traffic accident data

技术领域Technical field

本发明涉及一种交通事故数据智能分析与综合应用***。The invention relates to an intelligent analysis and comprehensive application system of traffic accident data.

背景技术Background technique

传统的交通事故数据应用方式比较单一，大多为定期的数据报表汇报，报表统计维度也较为固定，主要是根据管理经验中确定统计维度。The traditional traffic accident data application method is relatively simple, and most of them are regular data reports. The statistical dimensions of the reports are also relatively fixed, mainly based on the management experience to determine the statistical dimensions.

但事实上，交通事故记录数据包含了事故相关的人、车、路、环境等众多属性数据，除了常用的统计维度以外，还能够从大量数据中提取出更多有价值的信息，其中一点就是对属性特征与事故的关联性的分析，挖掘属性特征中携带的事故相关信息量。从目前的研究成果与应用现状来看，对于事故属性的分析在构建安全评价体系时多有体现，例如中国专利CN201610529822.8《一种基于人-车-路-货多风险源的货运安全评价模型》中采用了事故树方法对重要属性进行了筛选，但这种方法缺乏对属性定量化地标定；中国专利CN201410129672.2《一种道路交通安全评估方法及***》也未明确事故属性的因子载荷确定方式。But in fact, the traffic accident record data contains many attribute data such as people, cars, roads, and environment related to the accident. In addition to the commonly used statistical dimensions, more valuable information can be extracted from a large amount of data, one of which is Analyze the correlation between attribute characteristics and accidents, and mine the amount of accident-related information carried in attribute characteristics. From the current research results and application status, the analysis of accident attributes is often reflected in the construction of a safety evaluation system. For example, Chinese patent CN201610529822.8 "A freight safety evaluation based on multiple risk sources of man-vehicle-road-cargo" The model uses the accident tree method to screen important attributes, but this method lacks quantitative calibration of attributes; the Chinese patent CN201410129672.2 "A road traffic safety assessment method and system" also does not specify the factors of accident attributes Load determination method.

属性因子的重要度是进行有针对性的事故数据分析应用的基础，而当前尚缺乏能够实现定量化属性因子重要度分析的方法；另一方面，在进行交通事故数据管理应用时手段也较为单一，大多忽略了对数据深层特征以及关联结论的挖掘。The importance of attribute factors is the basis for targeted accident data analysis and application, and there is currently no method for quantitative attribute factor importance analysis; on the other hand, the means for traffic accident data management applications are also relatively simple , Most of them ignore the mining of the deep features of data and the conclusions of association.

发明内容Summary of the invention

鉴于上述现状问题，本发明的目的是提供一种交通事故数据智能分析与综合应用***解决现有技术中存在的当前尚缺乏能够实现定量化属性因子重要度分析的方法，在进行交通事故数据管理应用时手段也较为单一，大多忽略了对数据深层特征以及关联结论的挖掘的问题。In view of the above-mentioned current problems, the object of the present invention is to provide an intelligent analysis and comprehensive application system for traffic accident data to solve the current lack of a method that can achieve quantitative attribute factor importance analysis in the prior art, and to carry out traffic accident data management The method is relatively simple in application, and most of them ignore the problem of mining deep data features and association conclusions.

该种交通事故数据智能分析与综合应用***，实现以数据为驱动的交通事故属性因子重要度提取，引导用户在进行事故数据分析应用时主动关注与事故发生结果密切相关的属性因子并由此展开深度分析，改变传统的固定式数据报表统计的应用模式，为交通安全的治理工作提供更具有针对性的信息与结论。This kind of intelligent analysis and comprehensive application system of traffic accident data realizes the data-driven extraction of the importance of traffic accident attribute factors, and guides users to actively pay attention to the attribute factors closely related to the results of accidents when conducting accident data analysis and application. In-depth analysis changes the traditional application model of fixed data report statistics and provides more targeted information and conclusions for traffic safety management.

本发明的技术解决方案是：The technical solution of the present invention is:

一种交通事故数据智能分析与综合应用***，包括数据对接模块、挖掘处理模块、交互模块、地图模块和数据分析模块，An intelligent analysis and comprehensive application system for traffic accident data, including data docking module, mining processing module, interaction module, map module and data analysis module,

数据对接模块：从交通事故数据库中提取指定条件的交通事故数据，并将提取的交通事故数据发送给数据对接模块；Data docking module: extract traffic accident data of specified conditions from the traffic accident database, and send the extracted traffic accident data to the data docking module;

挖掘处理模块：依据数据对接模块提取的交通事故数据，由交通事故数据因子重要度分析模型驱动数据处理，得到属性因子集合元素的重要度；Mining processing module: According to the traffic accident data extracted by the data docking module, the traffic accident data factor importance analysis model drives the data processing to obtain the importance of the attribute factor set elements;

交互模块：接收挖掘处理模块得到的属性因子集合以及重要度，并结合属性因子所属维度以及重要度数值大小进行可视化显示；还包括日期时间选择控件、属性因子选择控件；将已设置的时间传输至数据对接模块，选中的属性因子传输至数据分析模块，并接收数据分析模块的分析结果，以专用控件进行内容展示；Interactive module: Receive the attribute factor set and importance obtained from the mining processing module, and combine it with the dimension of the attribute factor and the value of the importance value for visual display; also include the date and time selection control and attribute factor selection control; transfer the set time to Data docking module, the selected attribute factor is transmitted to the data analysis module, and the analysis result of the data analysis module is received, and the content is displayed with a dedicated control;

数据分析模块：接收交互模块的属性因子选择结果，以属性因子为数据分析角度，通过事故数据分析方式提供针对性地数据分析结果给交互模块与地图模块；Data analysis module: receive the attribute factor selection result of the interaction module, use the attribute factor as the data analysis angle, and provide targeted data analysis results to the interaction module and the map module through accident data analysis;

地图模块：包含地理信息数据，支持地图操作；与交互模块协同实现前端交互操作，并将数据分析模块输出的结果进行可视化呈现；还包括区域自定义工具，通过绘图设置目标区域，并将划定的区域空间坐标范围传输至数据对接模块。Map module: contains geographic information data, supports map operations; cooperates with the interaction module to achieve front-end interactive operations, and visually presents the results of the data analysis module; also includes area customization tools, set the target area through drawing, and delineate The area spatial coordinate range is transferred to the data docking module.

进一步地，挖掘处理模块中，交通事故数据因子重要度分析模型驱动数据处理，得到属性因子集合元素的重要度，具体为，构建交通事故数据属性因子集合，配置缺失信息补全策略，根据属性因子级别以及数据缺失情况对属性因子进行缺失信息估计，在此基础上量化分析并输出属性因子集合元素的重要度。Further, in the mining processing module, the traffic accident data factor importance analysis model drives data processing to obtain the importance of the attribute factor set elements. Specifically, it constructs the traffic accident data attribute factor set, configures the missing information completion strategy, and configures the missing information completion strategy. The missing information of the attribute factors is estimated based on the level and data missing situation, on the basis of which, the importance of the elements of the attribute factor set is quantitatively analyzed and output.

进一步地，挖掘处理模块中交通事故数据因子重要度分析模型进行数据处理分析具体为：Further, the data processing analysis of the traffic accident data factor importance analysis model in the mining processing module specifically includes:

S1、确定交通事故数据一级属性维度；S1. Determine the first-level attribute dimension of traffic accident data;

S2、根据交通事故数据的具体字段确定一级属性的二级属性集合，该属性集合为交通事故样本数据中二级属性因子全集，集合元素个数为NL(2)；对二级属性因子进行三级拆解，获得三级属性因子全集，集合元素个数为NL(3)，其中三级拆解根据二级属性因子的具体取值确定，离散属性变量即根据该变量的取值范围确定三级属性因子组成，连续属性则进行离散转换，再确定其三级属性因子组成；S2. Determine the secondary attribute set of primary attributes according to the specific fields of the traffic accident data. The attribute set is the complete set of secondary attribute factors in the traffic accident sample data. The number of elements in the set is NL (2); Three-level disassembly, to obtain a complete set of three-level attribute factors, the number of elements in the set is NL (3), of which the three-level disassembly is determined according to the specific value of the second-level attribute factor, and the discrete attribute variable is determined according to the value range of the variable The three-level attribute factor is composed, and the continuous attribute is discretely transformed, and then the third-level attribute factor composition is determined;

S3、分别对二级属性因子全集、三级属性因子全集进行样本数据的分组合并，获得二级分组数量GN(2)及每组的样本量samplesize(2)、三级分组数量GN(3)及每组的样本量samplesize(3)；对于任一分组G(level) _i，其属性因子包括步骤S2中的该级别全部属性以及样本量samplesize(level) _i，其中level表征属性因子级别； S3. Group and merge the sample data of the second-level attribute factor set and the third-level attribute factor set respectively to obtain the number of second-level grouping GN (2) and the sample size of each group (size 2), and the number of third-level grouping GN (3) And the sample size (3) of each group; for any group G (level) _i , its attribute factor includes all attributes of the level in step S2 and the sample size samplesize (level) _i , where level represents the attribute factor level;

S4、检测分组G(level) _i属性因子数据是否存在缺失的情况，若不存在则转入步骤S5；否则，配置缺失信息补全策略，根据属性因子级别以及数据缺失情况对属性因子进行缺失信息估计； S4. Detect whether the G (level) _i attribute factor data in the group is missing. If not, go to step S5; otherwise, configure the missing information completion strategy and perform missing information on the attribute factor according to the attribute factor level and the data missing condition estimate;

S5、基于分组合并数据构建随机森林回归模型，计算属性因子重要度。S5. Construct a random forest regression model based on grouped and merged data to calculate the importance of attribute factors.

进一步地，步骤S4具体为，Further, step S4 is specifically,

S41、序号j＝1；S41, serial number j = 1;

S42、检测属性因子a _j是否存在数据缺失的情况，若存在，则计算其数据缺失率r _j＝m _j/GN(level)，m _i为缺失该属性因子的分组数，level的取值根据a _j的属性级别确定，进入步骤S43；否则进入步骤S44； S42. Detect whether there is data missing in the attribute factor a _j . If there is, calculate the data missing rate r _j = m _j / GN (level), m _i is the number of groups missing the attribute factor, and the value of level is based on If the attribute level of a _j is determined, go to step S43; otherwise, go to step S44;

S43、若r _i∈[th _l,th _u]，则通过随机森林方法补充缺失信息，其中th _l、th _u分别为上下限阈值； S43. If r _i ∈ [th _l , th _u ], the missing information is supplemented by the random forest method, where th _l and th _u are the upper and lower thresholds respectively;

若r _i∈[0,th _l)，则在后续分析中放弃该属性因子； If r _i ∈ [0, th _l ), the attribute factor is discarded in the subsequent analysis;

若r _i∈(th _u,1]，则由统计值M进行缺失估计，统计值M在众数、平均数中选择； If r _i ∈ (th _u , 1), the missing value is estimated by the statistical value M, and the statistical value M is selected from the mode and the average;

S44、j<NL(level)是否成立？若成立，则j＝j+1，回到步骤S42循环处理；若不成立，则检测是否仍存在带有缺失值的属性因子，若是则返回步骤S41循环处理，否则结束缺失估计流程。S44. Is j <NL (level) established? If it is true, then j = j + 1, and return to step S42 for loop processing; if it is not true, it is detected whether there is still an attribute factor with missing values, if it is, then return to step S41 for loop processing, otherwise the missing estimation process is ended.

进一步地，步骤S43中，通过随机森林方法补充缺失信息，具体为，在不存在缺失的样本分组中划分训练集与测试集，由训练集拟合缺失值属性因子与其他属性因子的关系，根据经由测试集测试通过的随机森林，对该分组后数据进行缺失补全，即将其他属性因子输入随机森林，输出的分类结果作为该属性因子的估计值。Further, in step S43, the missing information is supplemented by the random forest method, specifically, the training set and the test set are divided in the sample group where there is no deletion, and the relationship between the missing value attribute factor and other attribute factors is fitted by the training set according to The random forest passed through the test set is tested to complete the missing data of the grouped data, that is, other attribute factors are input into the random forest, and the output classification result is used as the estimated value of the attribute factor.

进一步地，步骤S5具体为，随机森林回归模型中的决策树数量为NT，对于每一颗树使用袋外数据计算袋外误差error1；随机对袋外数据所有样本的属性因子a _t进行噪声干扰，计算袋外误差error2；属性因子a _t的重要度D(a _k)＝∑|error1-error2|/NT。 Further, step S5 specifically, the number of random forest tree regression model is NT, the outer outer bags each tree using the error data calculating bags ERROR1; attribute data of the outer bag factor randomly samples a _t all interference noise outer error calculation bags Error2; a _t attribute importance level factor _{D (a k) = Σ |} error1-error2 | / NT.

进一步地，数据对接模块中，指定条件指用户通过交互模块设置的时间范围以及通过地图模块设置的空间范围条件。Further, in the data docking module, the specified condition refers to the time range set by the user through the interaction module and the space range condition set by the map module.

进一步地，数据分析模块采用的数据分析方式包括因素分析、对应分析、关联分析、定制报表。Further, the data analysis methods adopted by the data analysis module include factor analysis, correspondence analysis, correlation analysis, and customized reports.

进一步地，数据分析模块中，Further, in the data analysis module,

因素分析：提取含有选中属性因子的所有数据样本，进行时间、空间维度的指标统计；Factor analysis: extract all data samples containing the selected attribute factors and perform index statistics in time and space dimensions;

对应分析：以全部样本进行对应分析处理生成对应分析结论集合，包含对应分析二维散点图、属性因子对应分析结论，并从中提取含有选中属性因子的所有结论；Correspondence analysis: Correspondence analysis processing is performed on all samples to generate a corresponding analysis conclusion set, including a corresponding analysis two-dimensional scatter plot, attribute factor correspondence analysis conclusion, and all conclusions containing the selected attribute factor are extracted therefrom;

关联分析：以全部样本进行关联分析处理生成关联分析结论集合，包含关联关系以及提升度；根据接收到的条件属性因子、结果属性因子，提取条件、结果中包含对应属性因子的关联分析结论；Association analysis: Perform association analysis processing on all samples to generate association analysis conclusion set, including association relationship and promotion degree; according to the received condition attribute factor and result attribute factor, extract the condition and the result of the association analysis conclusion containing the corresponding attribute factor;

定制报表：根据接收到的若干属性因子，进行数据统计。Customized report: According to the received several attribute factors, data statistics.

本发明的有益效果是：The beneficial effects of the present invention are:

一、该种交通事故数据智能分析与综合应用***，基于原始交通事故数据进行属性因子的重要度分析，配置了缺失数据估计策略，能够有效应对交通事故不同数据源提供的属性因子不一致的情况；从而从用户选择的样本数据中提取出带有交通事故重要信息的属性因子，并输出了定量化的重要度指标。1. This kind of intelligent analysis and comprehensive application system of traffic accident data, based on the original traffic accident data, analyzes the importance of attribute factors, and configures the missing data estimation strategy, which can effectively deal with the inconsistency of attribute factors provided by different data sources of traffic accidents; Thus, the attribute factors with important information of traffic accidents are extracted from the sample data selected by the user, and the quantitative importance index is output.

二、该种交通事故数据智能分析与综合应用***，对原始交通事故记录进行处理，提取出与交通事故发生结果具有较强相关性的属性因子，并提供可量化的属性因子重要度指标，进而向交通安全管理者提供重点关注属性因子信息，在此基础上实现针对性强且多样化的交通事故数据分析，对于实施主动式的交通安全管理具有现实指导意义。2. This kind of intelligent analysis and comprehensive application system of traffic accident data processes the original traffic accident records, extracts attribute factors that have a strong correlation with the results of traffic accidents, and provides quantifiable attribute factor importance indexes, which in turn To provide traffic safety managers with key information about attribute factors, on this basis, to implement targeted and diversified traffic accident data analysis is of practical guiding significance for the implementation of proactive traffic safety management.

三、该种交通事故数据智能分析与综合应用***，重要度分析的结果直接应用到***中，实现了灵活的事故属性因子筛选功能，用户可以属性因子重要度为依据，进行分析维度以及统计属性的筛选，事故数据分析更具针对性。3. This kind of intelligent analysis and comprehensive application system of traffic accident data, the results of importance analysis are directly applied to the system, and a flexible accident attribute factor screening function is realized. Users can analyze the dimensions and statistical attributes based on the importance of attribute factors Screening, accident data analysis is more targeted.

四、本发明的交通事故数据智能分析与综合应用***，事故数据分析模块提供了多种数据分析方式，能够提供多种数据分析结论。4. The intelligent analysis and comprehensive application system of traffic accident data of the present invention. The accident data analysis module provides multiple data analysis methods and can provide various data analysis conclusions.

附图说明BRIEF DESCRIPTION

图1是本发明实施例交通事故数据智能分析与综合应用***的说明示意图。FIG. 1 is an explanatory diagram of an intelligent analysis and comprehensive application system for traffic accident data according to an embodiment of the present invention.

图2是实施例交通事故数据因子重要度分析模型的数据处理分析的流程示意图。FIG. 2 is a schematic flowchart of data processing and analysis of an analysis model of factor importance of traffic accident data according to an embodiment.

具体实施方式detailed description

下面结合附图详细说明本发明的优选实施例。The preferred embodiments of the present invention will be described in detail below with reference to the drawings.

实施例Examples

一种交通事故数据智能分析与综合应用***，如图1，包含数据对接模块、挖掘处理模块、交互模块、地图模块、数据分析模块。A traffic accident data intelligent analysis and comprehensive application system, as shown in Figure 1, contains a data docking module, mining processing module, interaction module, map module, and data analysis module.

数据对接模块：从数据库中提取指定条件的交通事故数据；指定条件是指用户通过交互模块设置的时间范围以及通过地图模块设置的空间范围条件。Data docking module: extract the traffic accident data of the specified conditions from the database; the specified conditions refer to the time range set by the user through the interaction module and the spatial range conditions set by the map module.

挖掘处理模块：由交通事故数据因子重要度分析模型驱动数据处理，其处理数据来源于数据对接模块；其中交通事故数据因子重要度分析模型，构建交通事故数据属性因子集合，配置缺失信息补全策略，根据属性因子级别以及数据缺失情况对属性因子进行缺失信息估计，在此基础上量化分析并输出属性因子集合元素的重要度。Mining processing module: the data processing is driven by the traffic accident data factor importance analysis model, and the processing data comes from the data docking module; the traffic accident data factor importance analysis model is used to construct traffic accident data attribute factor sets and configure missing information completion strategies Based on the attribute factor level and data missing situation, the missing information of the attribute factor is estimated, on this basis, the quantitative analysis and output of the importance of the attribute factor set elements.

交互模块：接收辅助决策处理引擎的属性因子集合以及重要度，并结合属性因子所属维度以及重要度数值大小进行可视化显示；还包括日期时间选择控件、属性因子选择控件；该模块将已设置的时间传输至数据对接模块，选中的属性因子传输至数据分析模块，并接收数据分析模块的分析结果，以专用控件进行内容展示；该模块还接收数据分析模块的分析结果，以专用控件进行内容展示。Interaction module: Receive the attribute factor set and importance of the auxiliary decision processing engine, and combine it with the dimension of the attribute factor and the value of the importance value for visual display; also include the date and time selection control, attribute factor selection control; the module will set the time Transferred to the data docking module, the selected attribute factor is transferred to the data analysis module, and receives the analysis result of the data analysis module, and displays the content with the special control; the module also receives the analysis result of the data analysis module, and displays the content with the special control.

数据分析模块：接收属性因子选择结果，以属性因子为数据分析角度，通过多种事故数据分析方式提供针对性地数据分析结果；数据分析方式包括因素分析、对应分析、关联分析、定制报表等。Data analysis module: receive attribute factor selection results, use attribute factors as the data analysis perspective, and provide targeted data analysis results through a variety of accident data analysis methods; data analysis methods include factor analysis, correspondence analysis, association analysis, customized reports, etc.

地图模块：包含地理信息数据，支持地图操作；与交互模块协同实现***的前端交互操作，并将数据分析模块输出的结果进行可视化呈现；还包括区域自定义工具，通过绘图设置目标区域；该模块并将划定的区域空间坐标范围传输至数据对接模块。Map module: contains geographic information data, supports map operations; cooperates with the interaction module to realize the front-end interactive operation of the system, and visually presents the results of the data analysis module; also includes area customization tools, setting the target area through drawing; this module The spatial coordinate range of the defined area is transmitted to the data docking module.

实施例的挖掘处理模块中，交通事故数据因子重要度分析模型的数据处理分析，具体为：In the mining processing module of the embodiment, the data processing analysis of the traffic accident data factor importance analysis model specifically includes:

S1、确定交通事故数据一级属性维度：人员维度、车辆维度、道路维度、环境维度。S1. Determine the first-level attribute dimensions of traffic accident data: personnel dimension, vehicle dimension, road dimension, and environment dimension.

S2、根据交通事故数据的具体字段确定一级属性的二级属性集合，该属性集合为交通事故样本数据中二级属性因子全集，集合元素个数为NL(2)；对二级属性因子进行三级拆解，获得三级属性因子全集，集合元素个数为NL(3)，其中三级拆解根据二级属性因子的具体取值确定，离散属性变量即根据该变量的取值范围确定三级属性因子组成，连续属性则进行离散转换，再确定其三级属性因子组成。S2. Determine the secondary attribute set of primary attributes according to the specific fields of the traffic accident data. The attribute set is the complete set of secondary attribute factors in the traffic accident sample data. The number of elements in the set is NL (2); Three-level disassembly, to obtain a complete set of three-level attribute factors, the number of elements in the set is NL (3), of which the three-level disassembly is determined according to the specific value of the second-level attribute factor, and the discrete attribute variable is determined according to the value range of the variable The three-level attribute factor is composed, and the continuous attribute is discretely transformed, and then the third-level attribute factor composition is determined.

在实施例中，一般事故的原始记录中可提取的人员维度对应的二级属性包括性别、年龄、国籍、户口性质、职业、驾龄；简易事故的原始记录中可提取的人员维度对应的二级属性包括性别、年龄。由此确定人员属性的二级属性集合元素为性别、年龄、国籍、户口性质、职业、驾龄。对于性别二级属性，其三级属性因子为男性、女性；年龄二级属性为连续变量，通过分段聚合拆解出三级变量。In the embodiment, the secondary attributes corresponding to the personnel dimension that can be extracted from the original record of the general accident include gender, age, nationality, hukou nature, occupation, and driving age; the secondary attributes corresponding to the personnel dimension that can be extracted from the original record of the simple accident Attributes include gender and age. From this, it is determined that the second-level attribute set elements of personnel attributes are gender, age, nationality, hukou nature, occupation, and driving age. For the second-level attribute of gender, the third-level attribute factor is male and female; the second-level attribute of age is a continuous variable, and the third-level variable is disassembled through segmented aggregation.

S3、分别对二级属性因子全集、三级属性因子全集进行样本数据的分组合并，获得二级分组数量GN(2)及每组的样本量samplesize(2)、三级分组数量GN(3)及每组的样本量samplesize(3)。S3. Group and merge the sample data of the second-level attribute factor set and the third-level attribute factor set respectively to obtain the number of second-level grouping GN (2) and the sample size of each group (size 2), and the number of third-level grouping GN (3) And the sample size (3) of each group.

对于任一分组G(level) _i，其属性因子包括S2中的该级别全部属性以及样本量samplesize(level) _i；其中level表征属性因子级别，取值为2、3。 For any group G (level) _i , its attribute factor includes all the attributes of that level in S2 and the sample size (level) _i ; where level represents the attribute factor level, and the values are 2 and 3.

在实施例中，人员维度下的二级分组合并结果如下表所示：In the embodiment, the results of the two-level grouping under the personnel dimension are shown in the following table:

在分组G(4)、G(6)、G(8)、G(9)、G(11)、G(12)、G(13)、G(14)、G(15)、G(16)、G(17)、G(18)、G(20)中均存在缺失数据的属性因子(△标示)。In group G (4), G (6), G (8), G (9), G (11), G (12), G (13), G (14), G (15), G (16 ), G (17), G (18), and G (20) all have attribute factors (△ marks) for missing data.

S4、检测G(level) _i属性因子数据是否存在缺失的情况，若不存在则转入步骤S5，否则，配置缺失信息补全策略，根据属性因子级别以及数据缺失情况对属性因子进行缺失信息估计；具体为： S4. Detect whether the G (level) _i attribute factor data is missing, if not, go to step S5, otherwise, configure the missing information completion strategy, and estimate the missing information of the attribute factor according to the attribute factor level and data missing situation ;Specifically:

S41、序号j＝1；S41, serial number j = 1;

S42、检测属性因子a _j是否存在数据缺失的情况，若存在，则计算其数据缺失率r _j＝m _j/GN(level)，m _i为缺失该属性因子的分组数，level的取值根据a _j的属性级别确定；否则进入步骤S44； S42. Detect whether there is data missing in the attribute factor a _j . If there is, calculate the data missing rate r _j = m _j / GN (level), m _i is the number of groups missing the attribute factor, and the value of level is based on The attribute level of a _j is determined; otherwise, step S44 is entered;

S43、若r _i∈[th _l,th _u]，则通过随机森林方法补充缺失信息，其中th _l、th _u分别为上下限阈值；具体地，在不存在缺失的样本分组中划分训练集与测试集，由训练集拟合缺失值属性因子与其他属性因子的关系，根据经由测试集测试通过的随机森林，对该分组后数据进行缺失补全，即将其他属性因子输入随机森林，输出的分类结果作为该属性因子的估计值； S43. If r _i ∈ [th _l , th _u ], the missing information is supplemented by the random forest method, where th _l and th _u are the upper and lower thresholds respectively; specifically, the training set and In the test set, the relationship between the missing value attribute factors and other attribute factors is fitted from the training set. According to the random forest passed through the test set, the grouped data is missing-completed, that is, other attribute factors are input into the random forest and the output is classified The result is used as the estimated value of the attribute factor;

S44、j<NL(level)是否成立？若成立，则j＝j+1，回到S42循环处理；若不成立，则检测是否仍存在带有缺失值的属性因子，若是则返回S41循环处理，柔则结束缺失估计流程。S44. Is j <NL (level) established? If it is true, then j = j + 1, and return to the S42 loop process; if it is not true, it is detected whether there is still an attribute factor with missing values, if it is, then return to the S41 loop process, and then the missing estimation process is ended.

S5、基于分组合并数据构建随机森林回归模型，计算属性因子重要度；具体地，随机森林回归模型中的决策树数量为NT，对于每一颗树使用袋外数据计算袋外误差error1；随机对袋外数据所有样本的属性因子a _t进行噪声干扰，计算袋外误差error2；属性因子a _t的重要度D(a _k)＝∑|error1-error2|/NT。 S5. Build a random forest regression model based on grouped and merged data to calculate the importance of attribute factors; specifically, the number of decision trees in the random forest regression model is NT, and use out-of-bag data for each tree to calculate out-of-bag error error1; bag the property factor data for all samples a _t noise interference, the outer bags calculated error Error2; a _t attribute importance level factor _{D (a k) = Σ |} error1-error2 | / NT.

实施例中，人员维度的二级属性因子重要度分析结果如下表：In the embodiment, the analysis results of the importance of the secondary attribute factors of the personnel dimension are as follows:

数据分析模块中的因素分析，提取含有选中属性因子的所有数据样本，进行时间、空间维度的指标统计，其中指标包括事故总数、涉案人数、涉案车辆数、受伤人数、死亡人数、财产损失。The factor analysis in the data analysis module extracts all the data samples containing the selected attribute factors, and performs index statistics in the time and space dimensions. The indicators include the total number of accidents, the number of persons involved, the number of vehicles involved, the number of injuries, the number of deaths, and property losses.

时间维度指标统计即按不同时间间隔长度(日、周、月、季、年)进行指标统计，分析结果由交互模块以统计图、统计报表呈现；空间维度指标统计，以事故发生位置坐标进行指标数据的空间汇集，分析结果由地图模块驱动，通过叠加展示图层呈现，叠加图层包括散点、热力图、聚合图、统计图等。The statistics of the time dimension index is to count the indicators according to the length of different time intervals (days, weeks, months, quarters, and years). The analysis results are presented by the interactive module in statistical charts and statistical reports; the statistics of the spatial dimensions are measured by the coordinates of the accident location The spatial collection of data, the analysis results are driven by the map module, and are presented through overlay display layers. The overlay layers include scatter points, heat maps, aggregation maps, and statistical maps.

数据分析模块中的对应分析，以全部样本进行对应分析处理生成对应分析结论集合，包含对应分析二维散点图、属性因子对应分析结论，并从中提取含有选中属性因子的所有结论；输出结果由交互模块调用专用控件进行展示。Correspondence analysis in the data analysis module, corresponding analysis processing is performed on all samples to generate a corresponding analysis conclusion set, including a corresponding analysis two-dimensional scatter plot, attribute factor correspondence analysis conclusion, and extracting all conclusions containing the selected attribute factor from it; the output result is The interactive module calls special controls for display.

数据分析模块中的关联分析，以全部样本进行关联分析处理生成关联分析结论集合，包含关联关系以及提升度；根据接收到的条件属性因子、结果属性因子，提取条件、结果中包含对应属性因子的关联分析结论；输出结果由交互模块调用专用控件进行展示。The association analysis in the data analysis module performs association analysis processing on all samples to generate the association analysis conclusion set, including the association relationship and promotion degree; according to the received condition attribute factor and result attribute factor, the extraction condition and the result contain the corresponding attribute factor Association analysis conclusion; the output result is displayed by the interactive module calling special control.

若数据分析模块在因素分析时，接收的属性因子为单个，进行单因素分析；若为多个，则进行多因素分析。If the data analysis module is in factor analysis, the received attribute factor is single, and single factor analysis is performed; if it is multiple, multi-factor analysis is performed.

数据分析模块中的定制报表，根据接收到的若干属性因子，进行数据统计。The customized report in the data analysis module performs data statistics based on the received several attribute factors.

实施例的交互模块中，在展示对应分析结果时，以专用图形、文本控件展示对应分析二维散点图以及对应分析结论；展示关联分析结果时，以专用图形、文本控件展示属性因子的关联关系、提升度数据等；展示定制报表结果时，以表格空间展示事故数据统计结果，表头内容为选定的属性因子。In the interaction module of the embodiment, when displaying the corresponding analysis result, the corresponding analysis two-dimensional scatter plot and the corresponding analysis conclusion are displayed with a dedicated graphic and text control; when displaying the correlation analysis result, the association of the attribute factor is displayed with a dedicated graphic and text control Relationship, promotion data, etc .; when displaying customized report results, the accidental data statistical results are displayed in table space, and the header content is the selected attribute factor.

该种交通事故数据智能分析与综合应用***的数据处理流程为：The data processing flow of this kind of traffic accident data intelligent analysis and comprehensive application system is:

用户通过交互模块的前端界面，在时间选择控件中设置起始日期、结束日期，在地图模块驱动的电子地图前端界面中，可通过绘图工具选定目标区域，也可通过查询筛选插件选择具体的行政区划、道路名、路段名。通过数据对接模块从数据库中调取时间区间以及空间范围内的交通事故原始数据。该模块还设置了默认时空范围，若用户未设置则按默认时空范围进行数据调取。The user sets the start date and end date in the time selection control through the front-end interface of the interactive module. In the front-end interface of the electronic map driven by the map module, the target area can be selected by the drawing tool, or the specific Administrative divisions, road names, section names. Through the data docking module, the original data of traffic accidents in the time interval and the spatial range are retrieved from the database. The module also sets a default time and space range. If the user does not set it, the data is retrieved according to the default time and space range.

数据对接模块调取的原始数据通过挖掘处理模块进行处理，挖掘处理模块输出该时空范围内交通事故数据的分析属性因子以及重要度。The original data retrieved by the data docking module is processed by the mining processing module, and the mining processing module outputs the analysis attribute factor and the importance degree of the traffic accident data within the time and space range.

交互模块提供的前端界面中，其包含的筛选查询插件中的内容即为挖掘处理模块处理输出的属性因子、重要度指标，属性因子根据人员维度、车辆维度、道路维度、环境维度进行分类，在各类别下属性因子根据重要度指标数值大小进行排序。In the front-end interface provided by the interaction module, the content contained in the filtering query plug-in is the attribute factor and importance index processed by the mining processing module. The attribute factors are classified according to the personnel dimension, vehicle dimension, road dimension, and environment dimension. The attribute factors in each category are sorted according to the value of the importance index.

用户在前端界面中，可查看各维度下属性因子及重要度指标，并选取一个或多个属性因子进行因素分析、对应分析、关联分析、定制报表统计；并在前端界面中查看分析结果与结论信息；并可将分析结果以.pdf、.doc等文件格式导出***保存至本地。In the front-end interface, users can view the attribute factors and importance indicators in each dimension, and select one or more attribute factors for factor analysis, correspondence analysis, correlation analysis, and custom report statistics; and view the analysis results and conclusions in the front-end interface Information; the analysis results can be exported to the system and saved locally in .pdf, .doc and other file formats.

实施例的交通事故数据智能分析与综合应用***，基于原始交通事故数据进行属性因子的重要度分析，配置了缺失数据估计策略，能够有效应对交通事故不同数据源提供的属性因子不一致的情况；从而从用户选择的样本数据中提取出带有交通事故重要信息的属性因子，并输出了定量化的重要度指标。The traffic accident data intelligent analysis and comprehensive application system of the embodiment analyzes the importance of attribute factors based on the original traffic accident data, and configures the missing data estimation strategy, which can effectively deal with the inconsistency of attribute factors provided by different data sources of traffic accidents; The attribute factors with important information of traffic accidents are extracted from the sample data selected by the user, and the quantitative importance index is output.

该种交通事故数据智能分析与综合应用***，重要度分析的结果直接应用到***中，实现了灵活的事故属性因子筛选功能，用户可以属性因子重要度为依据，进行分析维度以及统计属性的筛选，事故数据分析更具针对性。This kind of intelligent analysis and comprehensive application system of traffic accident data, the results of importance analysis are directly applied to the system, and a flexible accident attribute factor screening function is realized. The user can filter the analysis dimensions and statistical attributes based on the importance of attribute factors , The accident data analysis is more targeted.

实施例的交通事故数据智能分析与综合应用***，事故数据分析模块提供了多种数据分析方式，能够提供多种数据分析结论。实施例***能够提供可量化的交通事故属性因子重要度指标；交通事故数据分析方式与结果呈现方式多样，数据统计报表表头亦可自主灵活定制。In the traffic accident data intelligent analysis and comprehensive application system of the embodiment, the accident data analysis module provides multiple data analysis methods and can provide multiple data analysis conclusions. The embodiment system can provide quantifiable traffic accident attribute factor importance indicators; traffic accident data analysis methods and results presentation methods are diverse, and data statistical report headers can also be customized independently and flexibly.

Claims

一种交通事故数据智能分析与综合应用***，其特征在于：包括数据对接模块、挖掘处理模块、交互模块、地图模块和数据分析模块，A traffic accident data intelligent analysis and comprehensive application system, which is characterized by including a data docking module, a mining processing module, an interaction module, a map module and a data analysis module,

数据对接模块：从交通事故数据库中提取指定条件的交通事故数据，并将提取的交通事故数据发送给数据对接模块；Data docking module: extract traffic accident data of specified conditions from the traffic accident database, and send the extracted traffic accident data to the data docking module;

挖掘处理模块：依据数据对接模块提取的交通事故数据，由交通事故数据因子重要度分析模型驱动数据处理，得到属性因子集合元素的重要度；Mining processing module: According to the traffic accident data extracted by the data docking module, the traffic accident data factor importance analysis model drives the data processing to obtain the importance of the attribute factor set elements;

交互模块：接收挖掘处理模块得到的属性因子集合以及重要度，并结合属性因子所属维度以及重要度数值大小进行可视化显示；还包括日期时间选择控件、属性因子选择控件；将已设置的时间传输至数据对接模块，选中的属性因子传输至数据分析模块，并接收数据分析模块的分析结果，以专用控件进行内容展示；Interactive module: Receive the attribute factor set and importance obtained from the mining processing module, and combine it with the dimension of the attribute factor and the value of the importance value for visual display; also include the date and time selection control and attribute factor selection control; transfer the set time to Data docking module, the selected attribute factor is transmitted to the data analysis module, and the analysis result of the data analysis module is received, and the content is displayed with a dedicated control;

数据分析模块：接收交互模块的属性因子选择结果，以属性因子为数据分析角度，通过事故数据分析方式提供针对性地数据分析结果给交互模块与地图模块；Data analysis module: receive the attribute factor selection result of the interaction module, use the attribute factor as the data analysis angle, and provide targeted data analysis results to the interaction module and the map module through accident data analysis;

地图模块：包含地理信息数据，支持地图操作；与交互模块协同实现前端交互操作，并将数据分析模块输出的结果进行可视化呈现；还包括区域自定义工具，通过绘图设置目标区域，并将划定的区域空间坐标范围传输至数据对接模块。Map module: contains geographic information data, supports map operations; cooperates with the interaction module to achieve front-end interactive operations, and visually presents the results of the data analysis module; also includes area customization tools, set the target area through drawing, and delineate The area spatial coordinate range is transferred to the data docking module.
如权利要求1所述的交通事故数据智能分析与综合应用***，其特征在于：挖掘处理模块中，交通事故数据因子重要度分析模型驱动数据处理，得到属性因子集合元素的重要度，具体为，构建交通事故数据属性因子集合，配置缺失信息补全策略，根据属性因子级别以及数据缺失情况对属性因子进行缺失信息估计，在此基础上量化分析并输出属性因子集合元素的重要度。The traffic accident data intelligent analysis and comprehensive application system according to claim 1, characterized in that: in the mining processing module, the traffic accident data factor importance analysis model drives data processing to obtain the importance of the attribute factor set elements, specifically, Construct the attribute factor set of traffic accident data, configure missing information completion strategy, estimate the missing information of the attribute factor according to the attribute factor level and the data missing situation, on this basis, quantitatively analyze and output the importance of the attribute factor set elements.
如权利要求2所述的交通事故数据智能分析与综合应用***，其特征在于：挖掘处理模块中交通事故数据因子重要度分析模型进行数据处理分析具体为：The traffic accident data intelligent analysis and comprehensive application system according to claim 2, characterized in that the data processing analysis of the traffic accident data factor importance analysis model in the mining processing module is specifically:

S1、确定交通事故数据一级属性维度；S1. Determine the first-level attribute dimension of traffic accident data;

S2、根据交通事故数据的具体字段确定一级属性的二级属性集合，该属性集合为交通事故样本数据中二级属性因子全集，集合元素个数为NL(2)；对二级属性因子进行三级拆解，获得三级属性因子全集，集合元素个数为NL(3)，其中三级拆解根据二级属性因子的具体取值确定，离散属性变量即根据该变量的取值范围确定三级属性因子组成，连续属性则进行离散转换，再确定其三级属性因子组成；S2. Determine the secondary attribute set of primary attributes according to the specific fields of the traffic accident data. The attribute set is the complete set of secondary attribute factors in the traffic accident sample data. The number of elements in the set is NL (2); Three-level disassembly, to obtain a complete set of three-level attribute factors, the number of elements in the set is NL (3), of which the three-level disassembly is determined according to the specific value of the second-level attribute factor, and the discrete attribute variable is determined according to the value range of the variable The three-level attribute factor is composed, and the continuous attribute is discretely transformed, and then the third-level attribute factor composition is determined;

S3、分别对二级属性因子全集、三级属性因子全集进行样本数据的分组合并，获得二级分组数量GN(2)及每组的样本量samplesize(2)、三级分组数量GN(3)及每组的样本量samplesize(3)；对于任一分组G(level) _i，其属性因子包括步骤S2中的该级别全部属性以及样本量samplesize(level) _i，其中level表征属性因子级别； S3. Group and merge the sample data of the second-level attribute factor set and the third-level attribute factor set respectively to obtain the number of second-level grouping GN (2) and the sample size of each group (size 2), and the number of third-level grouping GN (3) And the sample size (3) of each group; for any group G (level) _i , its attribute factor includes all attributes of the level in step S2 and the sample size samplesize (level) _i , where level represents the attribute factor level;

S4、检测分组G(level) _i属性因子数据是否存在缺失的情况，若不存在则转入步骤S5；否则，配置缺失信息补全策略，根据属性因子级别以及数据缺失情况对属性因子进行缺失信息估计； S4. Detect whether the G (level) _i attribute factor data in the group is missing. If not, go to step S5; otherwise, configure the missing information completion strategy and perform missing information on the attribute factor according to the attribute factor level and the data missing condition estimate;

S5、基于分组合并数据构建随机森林回归模型，计算属性因子重要度。S5. Construct a random forest regression model based on grouped and merged data to calculate the importance of attribute factors.
如权利要求3所述的交通事故数据智能分析与综合应用***，其特征在于：步骤S4具体为，The intelligent analysis and comprehensive application system for traffic accident data according to claim 3, wherein step S4 is specifically:

S41、序号j＝1；S41, serial number j = 1;

S42、检测属性因子a _j是否存在数据缺失的情况，若存在，则计算其数据缺失率r _j＝m _j/GN(level)，m _i为缺失该属性因子的分组数，level的取值根据a _j的属性级别确定，进入步骤S43；否则进入步骤S44； S42. Detect whether there is data missing in the attribute factor a _j . If there is, calculate the data missing rate r _j = m _j / GN (level), m _i is the number of groups missing the attribute factor, and the value of level is based on If the attribute level of a _j is determined, go to step S43; otherwise, go to step S44;

S43、若r _i∈[th _l,th _u]，则通过随机森林方法补充缺失信息，其中th _l、th _u分别为上下限阈值； S43. If r _i ∈ [th _l , th _u ], the missing information is supplemented by the random forest method, where th _l and th _u are the upper and lower thresholds respectively;

若r _i∈[0,th _l)，则在后续分析中放弃该属性因子； If r _i ∈ [0, th _l ), the attribute factor is discarded in the subsequent analysis;

若r _i∈(th _u,1]，则由统计值M进行缺失估计，统计值M在众数、平均数中选择； If r _i ∈ (th _u , 1), the missing value is estimated by the statistical value M, and the statistical value M is selected from the mode and the average;

S44、j<NL(level)是否成立？若成立，则j＝j+1，回到步骤S42循环处理；若不成立，则检测是否仍存在带有缺失值的属性因子，若是则返回步骤S41循环处理，否则结束缺失估计流程。S44. Is j <NL (level) established? If it is true, then j = j + 1, and return to step S42 for loop processing; if it is not true, it is detected whether there is still an attribute factor with missing values, if it is, then return to step S41 for loop processing, otherwise the missing estimation process is ended.
如权利要求4所述的交通事故数据智能分析与综合应用***，其特征在于：步骤S43中，通过随机森林方法补充缺失信息，具体为，在不存在缺失的样本分组中划分训练集与测试集，由训练集拟合缺失值属性因子与其他属性因子的关系，根据经由测试集测试通过的随机森林，对该分组后数据进行缺失补全，即将其他属性因子输入随机森林，输出的分类结果作为该属性因子的估计值。The intelligent analysis and comprehensive application system for traffic accident data according to claim 4, characterized in that in step S43, the missing information is supplemented by the random forest method, specifically, the training set and the test set are divided in the sample group where there is no missing , The relationship between the missing value attribute factor and other attribute factors is fitted from the training set. According to the random forest passed through the test set, the grouped data is missing-completed, that is, other attribute factors are input into the random forest, and the output classification result is The estimated value of the attribute factor.
如权利要求3所述的交通事故数据智能分析与综合应用***，其特征在于：步骤S5具体为，随机森林回归模型中的决策树数量为NT，对于每一颗树使用袋外数据计算袋外误差error1；随机对袋外数据所有样本的属性因子a _t进行噪声干扰，计算袋外误差error2；属性因子a _t的重要度D(a _k)＝∑|error1-error2|/NT。 The intelligent analysis and comprehensive application system for traffic accident data according to claim 3, wherein step S5 is specifically that the number of decision trees in the random forest regression model is NT, and the out-of-bag data is calculated for each tree using out-of-bag data error ERROR1; random data on all samples of the outer bag property for a _t the noise factor, the bag outer calculated error Error2; a _t attribute importance level factor _{D (a k) = Σ |} error1-error2 | / NT.
如权利要求1-5任一项所述的交通事故数据智能分析与综合应用***，其特征在于：数据对接模块中，指定条件指用户通过交互模块设置的时间范围以及通过地图模块设置的空间范围条件。The traffic accident data intelligent analysis and comprehensive application system according to any one of claims 1 to 5, characterized in that: in the data docking module, the specified condition refers to the time range set by the user through the interaction module and the space range set by the map module condition.
如权利要求1-5任一项所述的交通事故数据智能分析与综合应用***，其特征在于：数据分析模块采用的数据分析方式包括因素分析、对应分析、关联分析、定制报表。The traffic accident data intelligent analysis and comprehensive application system according to any one of claims 1 to 5, characterized in that the data analysis methods adopted by the data analysis module include factor analysis, correspondence analysis, association analysis, and customized reports.
如权利要求8所述的交通事故数据智能分析与综合应用***，其特征在于：数据分析模块中，The intelligent analysis and comprehensive application system for traffic accident data according to claim 8, wherein in the data analysis module,

因素分析：提取含有选中属性因子的所有数据样本，进行时间、空间维度的指标统计；Factor analysis: extract all data samples containing the selected attribute factors and perform index statistics in time and space dimensions;

对应分析：以全部样本进行对应分析处理生成对应分析结论集合，包含对应分析二维散点图、属性因子对应分析结论，并从中提取含有选中属性因子的所有结论；Correspondence analysis: Correspondence analysis processing is performed on all samples to generate a corresponding analysis conclusion set, including a corresponding analysis two-dimensional scatter plot, attribute factor correspondence analysis conclusion, and all conclusions containing the selected attribute factor are extracted therefrom;

关联分析：以全部样本进行关联分析处理生成关联分析结论集合，包含关联关系以及提升度；根据接收到的条件属性因子、结果属性因子，提取条件、结果中包含对应属性因子的关联分析结论；Association analysis: Perform association analysis processing on all samples to generate association analysis conclusion set, including association relationship and promotion degree; according to the received condition attribute factor and result attribute factor, extract the condition and the result of the association analysis conclusion containing the corresponding attribute factor;

定制报表：根据接收到的若干属性因子，进行数据统计。Customized report: According to the received several attribute factors, data statistics.