CN114596702B - Traffic state prediction model construction method and traffic state prediction method - Google Patents

Traffic state prediction model construction method and traffic state prediction method Download PDF

Info

Publication number
CN114596702B
CN114596702B CN202210170462.2A CN202210170462A CN114596702B CN 114596702 B CN114596702 B CN 114596702B CN 202210170462 A CN202210170462 A CN 202210170462A CN 114596702 B CN114596702 B CN 114596702B
Authority
CN
China
Prior art keywords
data
traffic state
prediction model
state prediction
value
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202210170462.2A
Other languages
Chinese (zh)
Other versions
CN114596702A (en
Inventor
杨丽丽
孟繁宇
曾益萍
袁狄平
王倩倩
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Southern University of Science and Technology
Original Assignee
Southern University of Science and Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Southern University of Science and Technology filed Critical Southern University of Science and Technology
Publication of CN114596702A publication Critical patent/CN114596702A/en
Application granted granted Critical
Publication of CN114596702B publication Critical patent/CN114596702B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G08SIGNALLING
    • G08GTRAFFIC CONTROL SYSTEMS
    • G08G1/00Traffic control systems for road vehicles
    • G08G1/01Detecting movement of traffic to be counted or controlled
    • G08G1/0104Measuring and analyzing of parameters relative to traffic conditions
    • GPHYSICS
    • G08SIGNALLING
    • G08GTRAFFIC CONTROL SYSTEMS
    • G08G1/00Traffic control systems for road vehicles
    • G08G1/01Detecting movement of traffic to be counted or controlled
    • G08G1/0104Measuring and analyzing of parameters relative to traffic conditions
    • G08G1/0108Measuring and analyzing of parameters relative to traffic conditions based on the source of data
    • GPHYSICS
    • G08SIGNALLING
    • G08GTRAFFIC CONTROL SYSTEMS
    • G08G1/00Traffic control systems for road vehicles
    • G08G1/01Detecting movement of traffic to be counted or controlled
    • G08G1/0104Measuring and analyzing of parameters relative to traffic conditions
    • G08G1/0125Traffic data processing
    • G08G1/0129Traffic data processing for creating historical data or processing based on historical data
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Landscapes

  • Chemical & Material Sciences (AREA)
  • Analytical Chemistry (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Traffic Control Systems (AREA)

Abstract

The application is applicable to the technical field of intelligent traffic and provides a traffic state prediction model construction method and a traffic state prediction method. The model construction method comprises the following steps: acquiring first data, second data and third data, wherein the first data comprises historical traffic state data of a road section to be predicted, the second data comprises historical traffic state data of an upstream intersection and a downstream intersection of the road section to be predicted, the third data comprises characteristics of the first data and characteristics of the second data, and the characteristics comprise spatial characteristics and/or temporal characteristics; fusing the first data, the second data and the third data to obtain training sample data; constructing a traffic state prediction model based on training sample data, and screening out associated features, wherein the associated features are features with importance degrees larger than preset importance degrees; and a final traffic state prediction model is established based on the associated features, so that the prediction precision can be improved, and the prediction result is ensured to be consistent with the actual traffic situation, thereby accurately predicting the situation of the bidirectional traffic flow.

Description

Traffic state prediction model construction method and traffic state prediction method
Technical Field
The application belongs to the technical field of intelligent traffic, and particularly relates to a traffic state prediction model construction method and a traffic state prediction method.
Background
At present, in modern traffic management and emergency resource scheduling, the latest road traffic running state information needs to be mastered, so that the traffic state of the whole network can be known from the whole world, and a decision maker is helped to specify schemes such as traffic jam dredging, accident handling, rescue path planning and the like. In general, the acquisition and visualization of the running state of the road network are based on the accurate prediction of the traffic state of road segments and intersections, including the speeds, the flow rates, the transit time, etc. of the road segments and intersections
However, the capturing of the spatial association of traffic information by the traditional traffic state prediction method generally stays in an association matrix estimated or learned by a road network topology structure and historical traffic data, so that a great deal of simplification and stronger assumption are needed to be made on an actual road network structure, and the prediction accuracy of the running state of the road network is reduced due to the lack of consideration of actual traffic conditions and secondary/unknown factors, so that the decision maker's designated scheme is influenced.
Disclosure of Invention
The embodiment of the application provides a traffic state prediction model construction method and a traffic state prediction method, which can solve the problem of low prediction precision.
In a first aspect, an embodiment of the present application provides a traffic state prediction model construction method, including:
acquiring first data, second data and third data, wherein the first data comprises historical traffic state data of a road section to be predicted, the second data comprises historical traffic state data of an upstream and downstream intersection of the road section to be predicted, the third data comprises characteristics of the first data and characteristics of the second data, and the characteristics comprise spatial characteristics and/or temporal characteristics;
fusing the first data, the second data and the third data to obtain training sample data;
constructing a traffic state prediction model based on the training sample data, and screening out related features, wherein the related features are features with importance degrees larger than a preset importance degree;
and establishing a final traffic state prediction model based on the associated features.
In a possible implementation manner of the first aspect, training a prediction model and screening out associated features specifically includes:
dividing the training sample data into a first test set and a first verification set according to a preset dividing proportion;
setting the characteristics corresponding to the first verification set as decision attributes, setting the characteristics corresponding to the first test set as conditional attributes, and establishing and training the traffic state prediction model based on a preset loss function;
In the training process, calculating the importance of the features;
and if the importance degree is larger than the preset importance degree, selecting the related characteristics.
Further, calculating the importance of the feature specifically includes:
obtaining the score of the feature lifting the traffic state prediction model during each segmentation;
a square weighting of the score is calculated.
In a possible implementation manner of the first aspect, the establishing a final traffic state prediction model based on the associated features specifically includes:
establishing an initial traffic state prediction model using the associated features;
dividing the training sample data into a second test set and a second verification set according to a preset dividing proportion;
and training to obtain a final traffic state prediction model based on the second test set, the second verification set and the initial traffic state prediction model.
Further, the preset loss function is a square loss function;
the objective function of the traffic state prediction model is as follows:
Figure BDA0003517427680000021
wherein y is t The actual traffic state value corresponding to the road section to be predicted in the t-th step,
Figure BDA0003517427680000022
the predicted value f obtained by the traffic state prediction model in the t-1 step t (x t ) As a transformation function, x t Is of the nature, Ω (f i ) Regularization operation for the ith tree, +.>
Figure BDA0003517427680000031
Gamma is the threshold value for controlling node splitting, lambda is the L2 regularization weight, omega is the leaf score, and M is the number of leaves.
Further, fusing the first data, the second data and the third data to obtain training sample data, including:
the first data and the second data are complemented, and the complemented first data and second data are obtained;
and fusing the first data and the second data after the completion with the third data to obtain training sample data.
In a possible implementation manner of the first aspect, the complementing the first data and the second data specifically includes:
the attribute correspondence of the feature is divided into continuous variables or discrete variables;
sequencing the corresponding features according to the total missing data amount and the attributes of each feature;
if the value is a continuous variable, initializing the value of the missing data by using the median of the adjacent time periods or all the time periods;
and/or if discrete, initializing the value of the missing data with the mode of the adjacent time period or all time periods.
Further, if the variable is a continuous variable, initializing the value of the missing data with the median of the adjacent time period or all time periods specifically includes:
Initializing the value of the missing data of each continuous variable by using a median each time, and then respectively obtaining a first new data set;
calculating the difference value between each new data set and the corresponding first old data set, and summing to obtain a first sum value;
if the first sum is smaller than the preset difference value, stopping completing;
and/or if the value is a discrete variable, initializing the value of the missing data with the mode of the adjacent time period or all time periods, specifically including:
initializing the value of the missing data of each discrete variable with a mode each time, and then respectively obtaining a second new data set;
calculating the difference value between each new data set and the corresponding second old data set, and summing to obtain a second sum value;
and if the second sum value is smaller than the preset difference value, stopping the completion.
For example, calculating the difference between each of the first new data sets and the corresponding first old data set, and summing the difference to obtain a first sum value specifically includes:
the calculation is performed according to the following formula:
Figure BDA0003517427680000041
wherein DeltaN is the first sum, j is the serial number of the sequenced continuous variable, D n For missing continuous variable values of the first new data set, D o Missing continuous variable values for the first old data set;
And/or, calculating the difference value between each new data set and the corresponding second old data set, and summing to obtain a second sum value, which specifically includes:
the calculation is performed according to the following formula:
Figure BDA0003517427680000042
wherein ΔF is the second sum, j is the number of the ordered continuous variable, i is the number of the ordered discrete variable, x n For missing discrete variable values, x, of the second new dataset o For missing discrete variable values of the second old data set, I is a decision function, if x n ≠x o I takes 1, otherwise I takes 0, N mis Is the total number of missing items in the discrete variable.
In a possible implementation manner of the first aspect, acquiring the first data specifically includes:
acquiring traffic state data of historical time periods of the road section to be predicted according to a preset selection value, wherein the traffic state data of each time period comprises traffic data and corresponding spatial characteristics and time characteristics, and the first data comprises traffic state data of all time periods;
the traffic data are data acquired by a sensor, the spatial characteristics are traffic state indexes of the road section to be predicted, and the temporal characteristics are time states of the road section to be predicted.
In a possible implementation manner of the first aspect, the acquiring the second data specifically includes:
Acquiring traffic state data of all turning historical time periods in each direction of the upstream and downstream intersections according to a preset selection value, wherein the traffic state data of each time period comprises traffic data and corresponding spatial features, and the second data comprises traffic state data of all time periods;
the traffic data are data acquired by sensors, and the spatial characteristics are traffic state indexes of the upstream and downstream intersections.
In a second aspect, an embodiment of the present application provides a traffic state prediction method, including:
acquiring first data, second data and third data, wherein the first data comprises historical traffic state data of a road section to be predicted, the second data comprises historical traffic state data of an upstream and downstream intersection of the road section to be predicted, the third data comprises characteristics of the first data and spatial characteristics of the second data, and the characteristics comprise spatial characteristics and/or temporal characteristics;
fusing the first data, the second data and the third data to obtain fused data;
obtaining a traffic state prediction result of the road section to be predicted according to the fusion data by using a traffic state prediction model;
Wherein the traffic state prediction model is a final traffic state prediction model trained by the method of any one of the first aspects.
In a third aspect, an embodiment of the present application provides an electronic device, including: comprising a memory, a processor and a computer program stored in the memory and executable on the processor, the processor implementing the method according to any one of the first aspect or the second aspect as described above when the computer program is executed.
In a fourth aspect, embodiments of the present application provide a computer-readable storage medium, comprising: the computer readable storage medium stores a computer program, characterized in that the computer program, when executed by a processor, implements a method as described in any of the above first aspects or in the above second aspect.
In a fifth aspect, embodiments of the present application provide a computer program product which, when run on an electronic device, causes the electronic device to perform the method of any one of the first aspect or the second aspect described above.
It will be appreciated that the advantages of the second to fifth aspects may be found in the relevant description of the first aspect, and are not described here again.
Compared with the prior art, the embodiment of the application has the beneficial effects that:
according to the method, the first data, the second data and the third data are obtained, the first data comprise historical traffic state data of a road section to be predicted, the second data comprise historical traffic state data of an upstream intersection and a downstream intersection of the road section to be predicted, the third data comprise characteristics of the first data and characteristics of the second data, and the characteristics comprise spatial characteristics and/or time characteristics; fusing the first data, the second data and the third data to obtain training sample data; constructing a traffic state prediction model based on training sample data, screening out related features, wherein the related features are features with importance degrees larger than a preset importance degree, and adding consideration of features with high importance degrees; the final traffic state prediction model is established based on the associated features, the actual road network structure is not required to be simplified and hypothesized, the prediction accuracy, the operation efficiency and the robustness can be improved, the prediction result is ensured to be consistent with the actual traffic condition, and the condition of the bidirectional traffic flow can be accurately predicted.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present application, the following description will briefly introduce the drawings that are needed in the embodiments or the description of the prior art, it is obvious that the drawings in the following description are only some embodiments of the present application, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
FIG. 1 is a schematic flow chart of a traffic state prediction model construction method according to an embodiment of the present application;
FIG. 2 is a flow chart of a traffic state prediction model construction method according to another embodiment of the present application;
FIG. 3 is a flow chart of a traffic state prediction model construction method according to another embodiment of the present application;
FIG. 4 is a flow chart of a traffic state prediction method according to another embodiment of the present application;
fig. 5 is a schematic structural diagram of an electronic device according to an embodiment of the present application.
Detailed Description
In the following description, for purposes of explanation and not limitation, specific details are set forth, such as particular system configurations, techniques, etc. in order to provide a thorough understanding of the embodiments of the present application. It will be apparent, however, to one skilled in the art that the present application may be practiced in other embodiments that depart from these specific details. In other instances, detailed descriptions of well-known systems, devices, circuits, and methods are omitted so as not to obscure the description of the present application with unnecessary detail.
It should be understood that the terms "comprises" and/or "comprising," when used in this specification and the appended claims, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.
It should also be understood that the term "and/or" as used in this specification and the appended claims refers to any and all possible combinations of one or more of the associated listed items, and includes such combinations.
As used in this specification and the appended claims, the term "if" may be interpreted as "when..once" or "in response to a determination" or "in response to detection" depending on the context. Similarly, the phrase "if a determination" or "if a [ described condition or event ] is detected" may be interpreted in the context of meaning "upon determination" or "in response to determination" or "upon detection of a [ described condition or event ]" or "in response to detection of a [ described condition or event ]".
In addition, in the description of the present application and the appended claims, the terms "first," "second," "third," and the like are used merely to distinguish between descriptions and are not to be construed as indicating or implying relative importance.
Reference in the specification to "one embodiment" or "some embodiments" or the like means that a particular feature, structure, or characteristic described in connection with the embodiment is included in one or more embodiments of the application. Thus, appearances of the phrases "in one embodiment," "in some embodiments," "in other embodiments," and the like in the specification are not necessarily all referring to the same embodiment, but mean "one or more but not all embodiments" unless expressly specified otherwise. The terms "comprising," "including," "having," and variations thereof mean "including but not limited to," unless expressly specified otherwise.
Fig. 1 is a flow chart of a traffic state prediction model construction method according to an embodiment of the present application. By way of example and not limitation, as shown in fig. 1, the method includes:
s101: acquiring first data, second data and third data;
the first data comprises historical traffic state data of a road section to be predicted, the second data comprises historical traffic state data of an upstream intersection and a downstream intersection of the road section to be predicted, the third data comprises characteristics of the first data and characteristics of the second data, and the characteristics comprise spatial characteristics and/or temporal characteristics.
In one possible implementation manner, traffic state data of historical time periods of a road section to be predicted is obtained according to a preset selection value, the traffic state data of each time period comprises traffic data and corresponding spatial characteristics and time characteristics, and the first data comprises traffic state data of all time periods.
Specifically, the preset selection value is used for selecting traffic state data corresponding to a time period as historical traffic state data. For example, if the preset selection value is 13, the traffic state data of 13 time periods is selected as the historical traffic state data.
The traffic data are data acquired by the sensor. By way of example, the data collected by the sensor may include data collected by an imaging device, a detector.
The spatial characteristics are traffic state indexes of the road section to be predicted. Specifically, the spatial feature is a short-term traffic state index for each time period. The short-term traffic state index is the traffic state of the corresponding time period selected according to the time step. For example, if the time step is 3 and the traffic state in the t time period is to be predicted, the traffic states in the t-1, t-2 and t-3 time periods are selected forward to form 3 features, and the 3 features are used as short-term traffic state indexes of the road section to be predicted. The traffic conditions may be: road segment transit time, traffic flow density, or road segment average transit speed.
The time characteristic is the time state of the road section to be predicted. In particular, the time characteristic is a long-term time state of each time period. The time state may include one or more of the following: month, week, hour, working day/non-working day, peak/flat peak hours. By way of example, the time status may be described as february, weekday, peak hours. Because the same or similar time features bring similar traffic states, the time feature consideration is added, so that the prediction model can learn the nonlinear quantity, and the prediction accuracy is improved.
In one possible implementation manner, the traffic state data of historical time periods of all turns in each direction of the upstream and downstream intersections are obtained according to preset selection values, the traffic state data of each time period comprises traffic data and corresponding spatial features, and the second data comprises the traffic state data of all time periods.
Specifically, the preset selection value is used for selecting traffic state data corresponding to a time period as historical traffic state data. The predetermined selection value may be the same as or different from the predetermined selection value. In this embodiment, the preset selection value is also 13.
And setting each intersection at the upstream and downstream as a node, and acquiring traffic state data in the east, west, south and north directions of the node, namely selecting the traffic state data of 13 time periods forward in the three directions of straight, left-turning and right-turning in each direction as historical traffic state data. If a certain direction is not available or a certain steering direction is not available, the corresponding preset data are removed.
The traffic data are data acquired by the sensor. By way of example, the data collected by the sensors may include camera equipment, ring detection deployed on the ground at intersections, and floating car GPS collected data. The traffic data may be fused data.
The space features are traffic state indexes of the upstream and downstream intersections. By way of example, traffic status indicators include average queuing length, number of stops, delay in stops, average speed of passage. The average queuing length, the parking times and the parking delay can be extracted from the data acquired by the camera equipment through the convolutional neural network, and the average passing speed can be calculated from the data acquired by the GPS of the floating car and the corresponding speed measurement result.
In this embodiment, the original data sources corresponding to the first data, the second data and the third data are stored in a classified manner, so as to provide a basis for preprocessing and extraction analysis of the corresponding data sources.
S102: and fusing the first data, the second data and the third data to obtain training sample data.
Optionally, the first data, the second data and the third data are corresponding and fused through time.
S103: constructing a traffic state prediction model based on training sample data, and screening out related features;
the associated feature is a feature having an importance greater than a preset importance.
Based on the loss function, features in the training sample data are used as segmentation points, and a traffic state prediction model is constructed. In the present embodiment, the traffic state prediction model is an XGBoost prediction model (eXtreme Gradient Boosting). The preset importance is set according to the model precision and/or variable number requirements.
The feature with the greatest contribution to the traffic state of the road section to be predicted is extracted from the plurality of features, and feature learning is performed, so that the efficiency and the accuracy of the model on feature learning can be improved, redundant features are removed, and the robustness of the model is improved.
S104: a final traffic state prediction model is established based on the associated features.
And establishing a final traffic state prediction model, namely a final XGBoost prediction model, based on the loss function by using the associated features as the segmentation points.
According to the method, first data, second data and third data are obtained, the first data comprise historical traffic state data of a road section to be predicted, the second data comprise historical traffic state data of an upstream intersection and a downstream intersection of the road section to be predicted, the third data comprise characteristics of the first data and characteristics of the second data, and the characteristics comprise spatial characteristics and/or time characteristics; fusing the first data, the second data and the third data to obtain training sample data; constructing a traffic state prediction model based on training sample data, screening out related features, wherein the related features are features with importance degrees larger than a preset importance degree, and adding consideration of features with high importance degrees; the final traffic state prediction model is established based on the associated features, the actual road network structure is not required to be simplified and hypothesized, the prediction accuracy, the operation efficiency and the robustness can be improved, the prediction result is ensured to be consistent with the actual traffic condition, and the condition of the bidirectional traffic flow can be accurately predicted.
Fig. 2 is a flow chart of a traffic state prediction model construction method according to another embodiment of the present application. By way of example and not limitation, as shown in fig. 2, the method includes:
s201: and dividing the training sample data into a first test set and a first verification set according to a preset dividing ratio.
For example, the preset division ratio is selected as 4:1 into 4 first test sets and 1 first validation set.
S202: and setting the characteristics corresponding to the first verification set as decision attributes, setting the characteristics corresponding to the first test set as conditional attributes, and establishing and training a traffic state prediction model based on a preset loss function.
In one possible implementation, the preset loss function is a square loss function;
the objective function of the traffic state prediction model is as follows:
Figure BDA0003517427680000101
wherein y is t The actual traffic state value corresponding to the section to be predicted in the t step,
Figure BDA0003517427680000102
the predicted value f is the predicted value obtained in the t-1 step through a traffic state prediction model t (x t ) As a transformation function, x t Is an attribute. The transformation function may include: XGBoost, random forest.
Ω(f i ) For the regularization operation of the ith tree,
Figure BDA0003517427680000111
gamma is the threshold value for controlling node splitting, lambda is the L2 regularization weight, omega is the leaf score, and M is the number of leaves.
The objective function of the traffic state prediction model is established based on the idea of minimizing the loss function (maximizing the objective function obj) by using a gradient descent algorithm and features. The model performance of the model was characterized using mean absolute percentage error (mean absolute percentage error, MAPE).
And in the training process, optimizing model parameters by adopting a GridSearchCV algorithm (grid search method) to obtain optimized parameter adjustment results. The parameters to be optimized include: maximum depth (max_depth), learning rate (learning_rate), regularization parameters (alpha, gamma, lambda), total number of trees (n_evators), etc.
S203: during the training process, the importance of the feature is calculated.
Specifically, the score of the feature lifting the prediction model during each segmentation is obtained; a square weight of the score is calculated. The squared weight of the score is the importance of the corresponding feature.
The calculation is performed according to the following formula:
Figure BDA0003517427680000112
wherein S is i For the importance of the ith feature, K is the number of divisions each time the corresponding feature is calculated during training,
Figure BDA0003517427680000113
predictive model score at t-th segmentation for the ith feature,/>
Figure BDA0003517427680000114
The model score is predicted for the ith feature at the t-1 th segmentation.
S204: and if the importance degree is larger than the preset importance degree, selecting the related characteristics.
For example, the preset importance is set to 10%, and if the importance of the feature is greater than 10%, the feature is selected as the associated feature.
In another embodiment, parameters such as a total sample ratio (subsamples), a sample ratio (samples_byte) in each tree, and a boost method (tree_boost) used for modeling can be set or optimized to obtain a more optimized parameter tuning result.
In another embodiment, the parameter optimization process and feature screening process can be accelerated by setting alpha and lambda to 0 in the regularization parameters, modeling with a total sample ratio of 0.5, and a sample ratio of 0.8 in each tree.
In another embodiment, under the condition that the prediction model is required to have learning capability on time characteristics, spatial characteristics of all directions of an upstream intersection and a downstream intersection are screened out without screening the time characteristics, and the spatial characteristics with importance greater than a preset importance are screened out as associated spatial characteristics.
In another embodiment, features of the training sample data may be screened using feature importance indicators of the XGBoost prediction model. Specifically, the contribution degree of the features of the training sample data is quantitatively evaluated by using the gain (gain), the coverage (cover) or the total gain (total_gain) as an evaluation index, and the features with the contribution degree larger than a preset index threshold are taken as associated features.
S205: an initial traffic state prediction model is established using the associated features.
And establishing an initial traffic state prediction model by using the optimized parameter adjusting result and the associated characteristics.
S206: and dividing the training sample data into a second test set and a second verification set according to a preset dividing ratio.
For example, the preset division ratio is selected as 4:1, randomly divided into 4 second test sets and 1 second validation set. The training sample data of the first test set is the same or different from the training sample data of the second test set, and the training sample data of the first verification set is the same or different from the training sample data of the second verification set.
S207: and training to obtain a final traffic state prediction model based on the second test set, the second verification set and the initial traffic state prediction model.
In the training process, based on the second test set and the second verification set, continuously adopting the GridSearchCV algorithm to optimize parameters of the initial traffic state prediction model, wherein the parameters to be optimized comprise: maximum depth (max_depth), learning rate (learning_rate), regularization parameter (alpha, gamma, lambda), total number of subtrees (n_evators), etc.
And then training the optimized model by continuously utilizing the training sample data, and performing model performance verification by adopting 10-fold cross test to obtain the associated characteristics and the final traffic state prediction model.
In another embodiment, parameters such as total sample ratio (subsamples), sample ratio (samples_byte) in each tree, and boost method (tree_boost) used for modeling can be optimized through the gridsearch cv algorithm to obtain a more optimized parameter tuning result.
In another embodiment, the parameter optimization process is accelerated by setting the total sample ratio used for modeling to 0.5, and the sample ratio within each tree to 0.8.
In another embodiment, spatial features of the upstream and downstream intersections in each direction may be screened again to screen out spatial features with importance greater than a preset importance as associated spatial features.
Fig. 3 is a flowchart of a traffic state prediction model construction method according to another embodiment of the present application. By way of example and not limitation, as shown in fig. 3, fusing the first data, the second data, and the third data to obtain training sample data includes:
s301: the first data and the second data are complemented to obtain the complemented first data and second data, and the method comprises the following steps:
specifically, the attribute correspondence of the feature is classified as either a continuous variable or a discrete variable.
For example, the average passing speed is a continuous variable, the number of stops is a discrete variable, and then classified according to the attributes of the features.
And sorting the corresponding features according to the total missing data amount and the attribute of each feature.
Illustratively, all continuous variables are ordered and numbered according to the total amount of missing data, and all discrete variables are ordered and numbered according to the total amount of missing data.
If the variable is a continuous variable, the value of the missing data is initialized with the median of the adjacent time periods or all time periods.
Wherein the value of the missing data is initialized with the median of the adjacent time period, specifically with the median of the time period adjacent to the time period of the missing data.
And/or the number of the groups of groups,
if the value is a discrete variable, the mode of the adjacent time period or all time periods is used for initializing the value of the missing data.
Wherein the value of the missing data is initialized with the mode of the adjacent time period, specifically with the mode of the time period adjacent to the time period of the missing data.
In another embodiment, if the variable is continuous, initializing the value of the missing data with the median of the adjacent time periods or all time periods specifically includes:
after initializing the value of missing data of each continuous variable with a median each time, a first new data set is obtained respectively. At the same time, the first old data set of each continuous variable before the current initialization is obtained, namely the first new data set obtained by the last initialization.
Illustratively, each time initialized, the value of the partially missing data in each continuous variable is initialized with a median.
After each initialization, calculating the difference between each first new data set and the corresponding first old data set, and summing to obtain a first sum value;
the calculation is performed according to the following formula:
Figure BDA0003517427680000141
wherein DeltaN is the first sum, j is the serial number of the sequenced continuous variable, D n For missing continuous variable values of the first new data set, D o A missing continuous variable value for the first old data set.
And if the first sum is smaller than the preset difference value, stopping the completion. Then the first new data set after the completion is used as the final modeling data set.
And/or if the value is a discrete variable, initializing the value of the missing data with the mode of the adjacent time period or all time periods, specifically including:
after initializing the missing data value of each discrete variable with mode, a second new data set is obtained, respectively. At the same time, a second old data set of each discrete variable before the current initialization is obtained, namely a second new data set obtained by the last initialization.
Illustratively, each time initialized, the values of the partially missing data in each discrete variable are initialized with a mode.
After each initialization, the difference between each new data set and the corresponding second old data set is calculated and summed to obtain a second sum.
The calculation is performed according to the following formula:
Figure BDA0003517427680000142
wherein ΔF is the second sum, j is the number of the ordered continuous variable, i is the number of the ordered discrete variable, x n For missing discrete variable values, x, of the second new dataset o For missing discrete variable values of the second old data set, I is a decision function, if x n ≠x o I takes 1, otherwise I takes 0, N mis Is the total number of missing items in the discrete variable.
And if the second sum is smaller than the preset difference value, stopping the completion. Then the second new data set after the completion is taken as the final modeling data set.
The preset difference value can be selected according to actual conditions. For example, the preset difference is 1%.
S302: and fusing the first data and the second data after the completion with the third data to obtain training sample data.
Fig. 4 is a flow chart of a traffic state prediction method according to another embodiment of the present application. By way of example and not limitation, the method can be applied to traffic guidance systems, emergency real-time guidance and dispatch systems, emergency auxiliary decision-making systems and mobile phone map traffic state visualization systems. As shown in fig. 4, the method includes:
S401: acquiring first data, second data and third data;
the first data comprises historical traffic state data of a road section to be predicted, the second data comprises historical traffic state data of an upstream intersection and a downstream intersection of the road section to be predicted, the third data comprises characteristics of the first data and characteristics of the second data, and the characteristics comprise spatial characteristics and/or temporal characteristics.
S402: and fusing the first data, the second data and the third data to obtain fused data.
S403: and obtaining a traffic state prediction result of the road section to be predicted according to the fusion data by using the traffic state prediction model.
The traffic state prediction model is a final traffic state prediction model obtained through training by the method of any one of the above.
It should be understood that the sequence number of each step in the foregoing embodiment does not mean that the execution sequence of each process should be determined by the function and the internal logic of each process, and should not limit the implementation process of the embodiment of the present application in any way.
Fig. 5 is a schematic structural diagram of an electronic device according to an embodiment of the present application. As shown in fig. 5, the electronic apparatus 5 of this embodiment includes: at least one processor 50 (only one is shown in fig. 5), a memory 51 and a computer program 52 stored in the memory 51 and executable on the at least one processor 50, the processor 50 implementing the steps in any of the various method embodiments described above when executing the computer program 52.
The electronic device 5 may be a computing device such as a desktop computer, a notebook computer, a palm computer, a cloud server, etc. The electronic device may include, but is not limited to, a processor 50, a memory 51. It will be appreciated by those skilled in the art that fig. 5 is merely an example of the electronic device 5 and is not meant to be limiting of the electronic device 5, and may include more or fewer components than shown, or may combine certain components, or different components, such as may also include input-output devices, network access devices, etc.
The processor 50 may be a central processing unit (Central Processing Unit, CPU), the processor 50 may also be other general purpose processors, digital signal processors (Digital Signal Processor, DSP), application specific integrated circuits (Application Specific Integrated Circuit, ASIC), off-the-shelf programmable gate arrays (Field-Programmable Gate Array, FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, or the like. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.
The memory 51 may in some embodiments be an internal storage unit of the electronic device 5, such as a hard disk or a memory of the electronic device 5. The memory 51 may in other embodiments also be an external storage device of the electronic device 5, such as a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card) or the like, which are provided on the electronic device 5. Further, the memory 51 may also include both an internal storage unit and an external storage device of the electronic device 5. The memory 51 is used for storing an operating system, application programs, boot loader (BootLoader), data, other programs, etc., such as program codes of the computer program. The memory 51 may also be used to temporarily store data that has been output or is to be output.
It should be noted that, because the content of information interaction and execution process between the above devices/units is based on the same concept as the method embodiment of the present application, specific functions and technical effects thereof may be referred to in the method embodiment section, and will not be described herein again.
It will be apparent to those skilled in the art that, for convenience and brevity of description, only the above-described division of the functional units and modules is illustrated, and in practical application, the above-described functional distribution may be performed by different functional units and modules according to needs, i.e. the internal structure of the apparatus is divided into different functional units or modules to perform all or part of the above-described functions. The functional units and modules in the embodiment may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit, where the integrated units may be implemented in a form of hardware or a form of a software functional unit. In addition, specific names of the functional units and modules are only for convenience of distinguishing from each other, and are not used for limiting the protection scope of the present application. The specific working process of the units and modules in the above system may refer to the corresponding process in the foregoing method embodiment, which is not described herein again.
Embodiments of the present application also provide a computer readable storage medium storing a computer program which, when executed by a processor, implements steps that may implement the various method embodiments described above.
Embodiments of the present application provide a computer program product which, when run on a mobile terminal, causes the mobile terminal to perform steps that may be performed in the various method embodiments described above.
The integrated units, if implemented in the form of software functional units and sold or used as stand-alone products, may be stored in a computer readable storage medium. Based on such understanding, the present application implements all or part of the flow of the method of the above embodiments, and may be implemented by a computer program to instruct related hardware, where the computer program may be stored in a computer readable storage medium, where the computer program, when executed by a processor, may implement the steps of each of the method embodiments described above. Wherein the computer program comprises computer program code which may be in source code form, object code form, executable file or some intermediate form etc. The computer readable medium may include at least: any entity or device capable of carrying computer program code to a camera device/electronic apparatus, a recording medium, a computer Memory, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), an electrical carrier signal, a telecommunications signal, and a software distribution medium. Such as a U-disk, removable hard disk, magnetic or optical disk, etc. In some jurisdictions, computer readable media may not be electrical carrier signals and telecommunications signals in accordance with legislation and patent practice.
In the foregoing embodiments, the descriptions of the embodiments are emphasized, and in part, not described or illustrated in any particular embodiment, reference is made to the related descriptions of other embodiments.
Those of ordinary skill in the art will appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.
In the embodiments provided in the present application, it should be understood that the disclosed apparatus/network device and method may be implemented in other manners. For example, the apparatus/network device embodiments described above are merely illustrative, e.g., the division of the modules or units is merely a logical functional division, and there may be additional divisions in actual implementation, e.g., multiple units or components may be combined or integrated into another system, or some features may be omitted, or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed may be an indirect coupling or communication connection via interfaces, devices or units, which may be in electrical, mechanical or other forms.
The units described as separate units may or may not be physically separate, and units shown as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.
The above embodiments are only for illustrating the technical solution of the present application, and are not limiting; although the present application has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the present application, and are intended to be included in the scope of the present application.

Claims (13)

1. The traffic state prediction model construction method is characterized by comprising the following steps of:
acquiring first data, second data and third data, wherein the first data comprises historical traffic state data of a road section to be predicted, the second data comprises historical traffic state data of an upstream and downstream intersection of the road section to be predicted, the third data comprises characteristics of the first data and characteristics of the second data, and the characteristics comprise spatial characteristics and/or temporal characteristics;
Fusing the first data, the second data and the third data to obtain training sample data;
constructing a traffic state prediction model based on the training sample data, and screening out related features, wherein the related features are features with importance degrees larger than a preset importance degree;
and establishing a final traffic state prediction model based on the associated features.
2. The method of claim 1, wherein constructing a traffic state prediction model based on the training sample data and screening out associated features comprises:
dividing the training sample data into a first test set and a first verification set according to a preset dividing proportion;
setting the characteristics corresponding to the first verification set as decision attributes, setting the characteristics corresponding to the first test set as conditional attributes, and establishing and training the traffic state prediction model based on a preset loss function;
in the training process, calculating the importance of the features;
and if the importance degree is larger than the preset importance degree, selecting the related characteristics.
3. The method according to claim 2, wherein calculating the importance of the feature comprises:
obtaining the score of the feature lifting the traffic state prediction model during each segmentation;
A square weighting of the score is calculated.
4. The method of claim 1, wherein establishing a final traffic state prediction model based on the associated features comprises:
establishing an initial traffic state prediction model using the associated features;
dividing the training sample data into a second test set and a second verification set according to a preset dividing proportion;
and training to obtain a final traffic state prediction model based on the second test set, the second verification set and the initial traffic state prediction model.
5. The method of claim 2, wherein the predetermined loss function is a square loss function;
the objective function of the traffic state prediction model is as follows:
Figure FDA0003517427670000021
wherein y is t The actual traffic state value corresponding to the road section to be predicted in the t-th step,
Figure FDA0003517427670000022
the predicted value f obtained by the traffic state prediction model in the t-1 step t (x t ) As a transformation function, x t Is of the nature, Ω (f i ) Regularization operation for the ith tree, +.>
Figure FDA0003517427670000023
Gamma is the threshold value for controlling node splitting, lambda is the L2 regularization weight, omega is the leaf score, and M is the number of leaves.
6. The method of claim 1, wherein fusing the first data, the second data, and the third data to obtain training sample data comprises:
The first data and the second data are complemented, and the complemented first data and second data are obtained;
and fusing the first data and the second data after the completion with the third data to obtain training sample data.
7. The method of claim 6, wherein the complementing the first data and the second data specifically comprises:
the attribute correspondence of the feature is divided into continuous variables or discrete variables;
sequencing the corresponding features according to the total missing data amount and the attributes of each feature;
if the value is a continuous variable, initializing the value of the missing data by using the median of the adjacent time periods or all the time periods;
and/or if discrete, initializing the value of the missing data with the mode of the adjacent time period or all time periods.
8. The method of claim 7, wherein initializing the value of the missing data with the median of the adjacent time period or all time periods if a continuous variable, specifically comprises:
initializing the value of the missing data of each continuous variable by using a median each time, and then respectively obtaining a first new data set;
calculating the difference value between each new data set and the corresponding first old data set, and summing to obtain a first sum value;
If the first sum is smaller than the preset difference value, stopping completing;
and/or if the value is a discrete variable, initializing the value of the missing data with the mode of the adjacent time period or all time periods, specifically including:
initializing the value of the missing data of each discrete variable with a mode each time, and then respectively obtaining a second new data set;
calculating the difference value between each new data set and the corresponding second old data set, and summing to obtain a second sum value;
and if the second sum value is smaller than the preset difference value, stopping the completion.
9. The method of claim 8, wherein calculating the difference between each of the first new data sets and the corresponding first old data set and summing to obtain a first sum value, comprises:
the calculation is performed according to the following formula:
Figure FDA0003517427670000031
wherein DeltaN is the first sum, j is the serial number of the sequenced continuous variable, D n For missing continuous variable values of the first new data set, D o Missing continuous variable values for the first old data set;
and/or, calculating the difference value between each new data set and the corresponding second old data set, and summing to obtain a second sum value, which specifically includes:
the calculation is performed according to the following formula:
Figure FDA0003517427670000032
Where ΔF is the second sum, j is the number of the ordered continuous variable, i is the number of the ordered discrete variable,x n for missing discrete variable values, x, of the second new dataset o For missing discrete variable values of the second old data set, I is a decision function, if x n ≠x o I takes 1, otherwise I takes 0, N mis Is the total number of missing items in the discrete variable.
10. The method of claim 1, wherein obtaining the first data comprises:
acquiring traffic state data of historical time periods of the road section to be predicted according to a preset selection value, wherein the traffic state data of each time period comprises traffic data and corresponding spatial characteristics and time characteristics, and the first data comprises traffic state data of all time periods;
the traffic data are data acquired by a sensor, the spatial characteristics are traffic state indexes of the road section to be predicted, and the temporal characteristics are time states of the road section to be predicted.
11. The method of claim 1, wherein obtaining the second data comprises:
acquiring traffic state data of all turning historical time periods in each direction of the upstream and downstream intersections according to a preset selection value, wherein the traffic state data of each time period comprises traffic data and corresponding spatial features, and the second data comprises traffic state data of all time periods;
The traffic data are data acquired by sensors, and the spatial characteristics are traffic state indexes of the upstream and downstream intersections.
12. A traffic state prediction method, comprising:
acquiring first data, second data and third data, wherein the first data comprises historical traffic state data of a road section to be predicted, the second data comprises historical traffic state data of an upstream and downstream intersection of the road section to be predicted, the third data comprises characteristics of the first data and characteristics of the second data, and the characteristics comprise spatial characteristics and/or temporal characteristics;
fusing the first data, the second data and the third data to obtain fused data;
obtaining a traffic state prediction result of the road section to be predicted according to the fusion data by using a traffic state prediction model;
wherein the traffic state prediction model is a final traffic state prediction model trained by the method of any one of claims 1-11.
13. A computer readable storage medium storing a computer program, characterized in that the computer program when executed by a processor implements the method of any one of claims 1 to 11 or the method of claim 12.
CN202210170462.2A 2021-11-12 2022-02-23 Traffic state prediction model construction method and traffic state prediction method Active CN114596702B (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN2021113423540 2021-11-12
CN202111342354 2021-11-12

Publications (2)

Publication Number Publication Date
CN114596702A CN114596702A (en) 2022-06-07
CN114596702B true CN114596702B (en) 2023-07-04

Family

ID=81804490

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210170462.2A Active CN114596702B (en) 2021-11-12 2022-02-23 Traffic state prediction model construction method and traffic state prediction method

Country Status (1)

Country Link
CN (1) CN114596702B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115600022A (en) * 2022-10-17 2023-01-13 京东城市(北京)数字科技有限公司(Cn) Training and processing method, device and medium of spatio-temporal data processing model

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113096404A (en) * 2021-04-23 2021-07-09 中南大学 Road blockade oriented quantitative calculation method for change of traffic flow of road network
CN113379156A (en) * 2021-06-30 2021-09-10 南方科技大学 Speed prediction method, device, equipment and storage medium

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8755991B2 (en) * 2007-01-24 2014-06-17 Tomtom Global Assets B.V. Method and structure for vehicular traffic prediction with link interactions and missing real-time data
CN109300310B (en) * 2018-11-26 2021-09-17 平安科技(深圳)有限公司 Traffic flow prediction method and device
CN111738474A (en) * 2019-03-25 2020-10-02 京东数字科技控股有限公司 Traffic state prediction method and device
CN110853347A (en) * 2019-10-14 2020-02-28 深圳市综合交通运行指挥中心 Short-time traffic road condition prediction method and device and terminal equipment
CN110826774B (en) * 2019-10-18 2022-03-22 广东电网有限责任公司广州供电局 Bus load prediction method and device, computer equipment and storage medium

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113096404A (en) * 2021-04-23 2021-07-09 中南大学 Road blockade oriented quantitative calculation method for change of traffic flow of road network
CN113379156A (en) * 2021-06-30 2021-09-10 南方科技大学 Speed prediction method, device, equipment and storage medium

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
基于XGBoost的短时交通流预测模型;钟颖;邵毅明;吴文文;胡广雪;;科学技术与工程(第30期);全文 *

Also Published As

Publication number Publication date
CN114596702A (en) 2022-06-07

Similar Documents

Publication Publication Date Title
CN109087510B (en) Traffic monitoring method and device
CN110210604B (en) Method and device for predicting movement track of terminal equipment
CN110751828B (en) Road congestion measuring method and device, computer equipment and storage medium
CN111144648B (en) People flow prediction device and method
CN111680102A (en) Positioning data processing method based on artificial intelligence and related equipment
CN111967696B (en) Neural network-based electric vehicle charging demand prediction method, system and device
CN110807924A (en) Multi-parameter fusion method and system based on full-scale full-sample real-time traffic data
CN114781272A (en) Carbon emission prediction method, device, equipment and storage medium
CN114596702B (en) Traffic state prediction model construction method and traffic state prediction method
CN114170797B (en) Method, device, equipment, medium and product for identifying traffic restriction intersection
CN111310828A (en) Target detection model fine-tuning method and device for ADAS scene
CN115220133A (en) Multi-meteorological-element rainfall prediction method, device, equipment and storage medium
CN113895460A (en) Pedestrian trajectory prediction method, device and storage medium
CN114545459A (en) Low-orbit satellite routine measurement and control task preprocessing method based on unified logic representation
CN117083621A (en) Detector training method, device and storage medium
CN116894383A (en) Random simulation method and device for annual weather scene
Shin et al. Statistical evaluation of different sample sizes for local calibration process in the highway safety manual
CN117196186A (en) Multi-missile task allocation method based on binary gorilla army optimizer
CN115080388B (en) Automatic driving system-oriented simulation test scene generation method
CN112529315B (en) Landslide prediction method, landslide prediction device, landslide prediction equipment and storage medium
CN111950753A (en) Scenic spot passenger flow prediction method and device
CN114742644A (en) Method and device for training multi-scene wind control system and predicting business object risk
CN114218504A (en) Blocked road segment identification method and device, electronic equipment and storage medium
CN103337220A (en) Picture data provision system
JP6997664B2 (en) Status judgment device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant