CN114842645B

CN114842645B - Road network speed situation rule extraction method based on k-means

Info

Publication number: CN114842645B
Application number: CN202210455678.3A
Authority: CN
Inventors: 于海涛; 肖冉东; 朱佳佳; 杜勇
Original assignee: Beijing Intelligent Transportation Development Center Beijing Motor Vehicle Regulation And Management Center
Current assignee: Beijing Intelligent Transportation Development Center Beijing Motor Vehicle Regulation And Management Center
Priority date: 2022-04-28
Filing date: 2022-04-28
Publication date: 2023-04-07
Anticipated expiration: 2042-04-28
Also published as: CN114842645A

Abstract

The invention discloses a road network speed situation rule extraction method based on k-means. Secondly, an index system based on geometrical characteristics of a curve of the high-peak traffic time series is provided according to the peak-shaped deviation characteristics and the geometrical characteristics of the time series, and then the traffic speed time series is divided into a plurality of traffic state clusters by day by using a k-means-based clustering method to perform self-discovery of a traffic speed mode. The selection of the value of the cluster number k is determined by Gapstartistic. And finally, calculating a clustering center of each cluster according to the obtained clustering result, namely k traffic speed state rules of each road section, and matching by using a pattern matching method based on similarity and a real-time traffic speed time sequence to obtain a pattern similar to the real-time sequence. The method provided by the invention can reveal the internal relation of the traffic speed flow, thereby more clearly and intuitively revealing the time-varying characteristics and the law of the traffic state of the urban road network.

Description

Road network speed situation rule extraction method based on k-means

Technical Field

Effective extraction of the speed situation law of the road network is an important means for traffic trip scheduling, urban functional area identification and deep disclosure of the space-time characteristics of urban traffic from a refined view angle, and provides decision reference for urban traffic jam management. The invention analyzes the traffic speed characteristics, the traffic law and the traffic modes of different links based on massive real-time traffic speed data and link information continuously issued by a traffic monitoring system, extracts the traffic speed curve characteristics, establishes a traffic mode clustering analysis model, obtains the traffic speed modes of different forms, plays an important role in discovering the abnormal speed in the abnormal road, and belongs to the field of urban public traffic systems.

Background

In order to relieve the congestion problem of urban roads, an intelligent traffic system is developed at the right moment, and the safe and effective operation of traffic is ensured by reasonably regulating and controlling traffic load. In an intelligent traffic system, a traffic speed situation rule is a prerequisite for realizing key applications such as traffic signal control and traffic guidance.

Road traffic data has the remarkable characteristics of large volume, strong time variation, poor detection precision and the like, and how to efficiently extract traffic speed operation rules in a multi-dimensional and fine manner through complex data representation so as to detect modes and behaviors for traffic prediction and guide decision is the key point of current intelligent traffic field research.

Model development based on deep learning is mainly based on experience intuition and repeated experiments, and the model properties and parameters are lack of interpretability, so that the algorithm is difficult to be migrated and applied. The clustering algorithm is an important means of pattern recognition, can be effectively applied to space-time pattern recognition of macroscopic and microscopic traffic flows, clusters daily traffic state speed sequences of expressways, extracts road network traffic state patterns, and expresses the explanation of a special traffic behavior in each state. The traffic speed time sequence is analyzed through a cluster analysis method, and the internal relation of traffic speed flow can be disclosed, so that the time-varying characteristics and the rules of the traffic state of the urban road network can be more clearly and intuitively disclosed.

Disclosure of Invention

Aiming at the problems of time-varying characteristics and road network spatial distribution heterogeneity in road network traffic speed, the invention provides a road network speed situation rule extraction method based on k-means, a homogeneous space-time traffic speed state mode is found, the method analyzes with a daily traffic speed time sequence, a geometric characteristic index system based on a peak traffic speed time sequence is established, and effective compression of data is realized; meanwhile, based on the extracted time series characteristics, self-discovery of a traffic operation mode is carried out on the historical time series data of the traffic speed by using a k-means clustering method, and the traffic speed time series is divided into a plurality of traffic clusters day by day; and finally, matching the traffic speed historical mode with the real-time traffic speed time sequence by using the specified similarity distance index.

The technical scheme of the invention is as follows: a road network speed situation rule extraction method based on k-means is realized by the following steps:

(1) Merging the road section lengths smaller than 500 meters based on the road section information, obtaining the speed of the merged road section according to the ratio of the merged road section length to the average vehicle running time, further obtaining speed time sequence data with the time granularity of 1 minute of each merged road section, and then processing the missing speed time sequence data by using an interpolation method, carrying out 5-minute aggregation on the speed time sequence, and carrying out smooth noise reduction on the speed time sequence data by using a moving average method aiming at the speed time sequence data of each road section;

(2) According to the peak-shaped deviation characteristics and the geometric characteristics of the time sequence, constructing a clustering model of the traffic mode by adopting a weight equalization principle based on an index system of the geometric characteristics of a curve of the peak traffic time sequence;

(3) Dividing the speed time sequence into a plurality of traffic state clusters by day by using a k-means-based clustering method, and carrying out self-discovery of a traffic speed mode, wherein the selection of a cluster number k value is determined by adopting a Gapstatistic method;

(4) And calculating the clustering center of each cluster according to the obtained clustering result, namely the k traffic speed state rules of each road section, and matching by using a pattern matching method based on similarity and a real-time traffic speed time sequence to obtain a pattern similar to the real-time traffic speed time sequence.

By adopting the technical scheme, the invention has the following beneficial effects:

1. the clustering is carried out based on the time series geometric characteristics, the historical mode of the traffic flow can be effectively identified, the data redundancy is reduced, the calculation efficiency is high, and the traffic operation mode can be automatically found without supervision.

2. The method provides a scientific and effective way for traffic managers to master the traffic state and performance of the road network so as to mine the potential supply capacity of the express way under the condition of limited traffic resources.

3. The method provides timely and reliable road network traffic state and trend information for traffic managers, reasonably estimates and predicts the duration, the spreading range, the influence degree and the like of traffic jam under the accidental events, is convenient for providing effective traffic information service for the public, and timely takes event response and emergency measures.

Drawings

FIG. 1 is a schematic diagram of a road network speed situation rule extraction method based on k-means;

FIG. 2 is a diagram of an original time series before a certain road segment is clustered;

FIG. 3 is a graph of profile coefficient as a function of k value;

FIG. 4 is a schematic diagram of a clustering result of a certain road section;

fig. 5 is a diagram showing the matching result between the real-time data stream and the historical pattern.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is further described in detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and do not limit the invention. In addition, the technical features involved in the embodiments of the present invention described below may be combined with each other as long as they do not conflict with each other.

The invention provides a road network speed situation rule extraction method based on k-means, which is realized by the following steps:

(1) Screening the length of a target road section based on road section information, merging road section speed data smaller than 500 meters because the extraction speed of a shorter road section is not meaningful, and calculating the speed data of the merged road section;

(2) Aiming at the problem of data deletion, the data with the deletion of more than 40 percent of the time sequence has low use value, and the time sequence is directly removed. For a small amount of time sequence missing data, filling missing values by adopting a missing value filling method based on interpolation, and generating a speed time sequence with the granularity of 1 minute;

(3) Aggregating the speed time sequence with the granularity of 1 minute into a speed time sequence with the granularity of 5 minutes, smoothing the time sequence data by adopting a traffic flow filtering method based on moving average, and removing random components in the traffic speed data;

(4) Analyzing the road network speed time sequence obtained in the step (3) by taking date as a unit, providing an index system based on the geometric characteristics of a high-peak traffic time sequence curve according to the peak-shaped deviation characteristics and the geometric characteristics of the time sequence, and constructing a clustering model of a traffic mode by adopting a weight equalization principle;

(5) And dividing the traffic speed time sequence into a plurality of traffic state clusters by day by using a k-means-based clustering method, and carrying out self-discovery of a traffic speed mode. Selecting a value of the number k of the clusters by adopting Gap static;

(6) And (5) calculating a clustering center of each cluster according to the clustering result obtained in the step (5), namely k traffic speed state rules of each road section, and matching by using a pattern matching method based on similarity and a real-time traffic speed time sequence to obtain a pattern similar to the real-time traffic speed time sequence.

Further, in the road network speed situation rule extraction method based on k-means, the road section speed data smaller than 500 meters are merged in the step (1), and the merged road section speed data are calculated. The method specifically comprises the following steps:

wherein the content of the first and second substances,

for road sectionsiThe length of (a) of (b),

for vehicles passing through road sectionsiThe average time that has been used is,vfor road sectionsiTo road sectionjThe average running speed of.

Further, in the road network speed situation rule extraction method based on k-means, the time series aggregation and data smoothing in the step (3) aim to remove random components in traffic flow data and eliminate the influence of random interference of measured data on a road network speed data mining modelAnd (6) sounding. Is provided with a time sequence

Based on the idea of sliding window, the data points are sequentially shifted point by point to obtain

(size of sliding window) number of average, one-time moving average can be obtained:

the moving average has low calculation complexity, high efficiency and good smoothing effect, and can eliminate the influence of irregular variation to display a long-term trend, so that the moving average is used for data smoothing.

Further, in the method for extracting the road network speed situation law based on k-means, the daily traffic speed time series characteristic index system in the step (4) discovers that morphological characteristics such as a peak, a peak-off, a head-shoulder model, an M model and the like exist in the morning and evening under the daily scale and characteristics of peak drift before and after holidays when the geometric shape of a daily traffic speed curve is researched, so that an index system based on the geometric characteristics of the traffic speed time series in the peak period is provided:

the indexes not only consider the overall characteristics of the traffic speed time sequence, but also consider the relation characteristics of the peak state and the non-peak state, reflect the curve peak-shaped deviation characteristics, and simultaneously correspondingly describe the curve geometric characteristics of the key time period in the peak period, so that the indexes are more simplified and efficient. In the process of clustering analysis, in order to prevent the indexes with large number levels from interfering the result, data needs to be subjected to dimensionless processing, and zero-mean normalization processing is used.

Further, in the road network speed situation law extraction method based on k-means, in the step (5), based on a traffic speed mode self-discovery process of k-means, the clustering number k is worth selecting to be determined by adopting a GapStatistic method. The Gap static compares the distribution aggregation of data in the class with the random distribution, and makes the distribution in the class as different as possible from the random distribution. It is defined as follows:

wherein E is

The expectations, typically generated using monte carlo simulations,

is the loss function, i.e., the sum of the squares of the distances between the clustered velocity time series data to the cluster center. Gap (k) can be considered as the difference between the loss of random samples and the loss of actual samples.

The step (5) is based on the features extracted in the step (4) to perform clustering instead of inputting the whole time sequence into a model, so that the data dimension reduction is realized, the calculation efficiency is high, the time sequence with a homogeneous state can be extracted, the redundancy of mass traffic flow data is reduced, the data is effectively compressed, and the traffic operation mode self-discovery is performed on the historical time sequence data of traffic operation through clustering analysis.

Further, in the road network speed situation rule extraction method based on k-means, the similar mode time sequence matching method in the step (6) matches the real-time sequence to obtain a similar traffic speed mode. Setting a road segment to have N historical patterns, wherein the historical pattern data is

Wherein

The real-time data stream is

And n and m are the number of the time stamps, the similarity is defined as follows:

firstly, selecting a historical data set of nearly 3 months, then merging target road sections, processing missing data, and denoising and smoothing time sequence data. The embodiment of the data is explained when a certain road section is selected, time is taken as an abscissa, different extracted time sequences before clustering are drawn as shown in fig. 2, the road section time sequence mode can be divided into two modes by observation, the data is taken as input data, the data is subjected to feature extraction to obtain the geometric feature indexes of 26 time sequence data, and then clustering is carried out according to the indexes.

Given the k value, samples were randomly selected and subjected to k-means experiments, and then the GapStatistic statistic was calculated

Repeating the above steps for a plurality of times

Is the final average value of

. While

The k corresponding to the maximum value is the optimal k. The variation relationship of the Gap statistical value of the road section along with the number k of clusters is shown in fig. 3. It can be found that when k =2, the maximum value is reached, so the road segment selects k equal to 2 as the optimal clustering number.

The data gathered into two categories, see fig. 4, clearly shows that the two modes are separated, with a pattern-early peak traffic speed at 6 to 10 having a tendency to fall from a decline to a rise, which may be a period of time with a frequent decline in speed due to more vehicles on the work day. Mode two has no early peak down trend and may be associated with less holiday traffic.

After clustering into different modes, the cluster center of each mode is the traffic speed mode with different forms, then matching is carried out on the real-time sequence, the mode similar to the real-time speed time sequence can be obtained, and the final result can be shown in figure 5.

Finally, it should be noted that: the above examples are only intended to illustrate the technical solution of the present invention, and not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.

Claims

1. A road network speed situation law extraction method based on k-means is characterized by comprising the following steps:

step (1), merging road section lengths smaller than 500 meters based on road section information, obtaining merged road section speeds according to the ratio of the merged road section lengths to the average vehicle running time, further obtaining speed time sequence data with the time granularity of 1 minute of each merged road section, then processing missing speed time sequence data by using an interpolation method according to the speed time sequence data of each road section, carrying out 5-minute aggregation on the speed time sequence data, and carrying out smooth noise reduction on the speed time sequence data by using a moving average method;

extracting peak-like deviation characteristics and geometric characteristics of the speed time sequence data, constructing a clustering model of the traffic mode by adopting a weight equalization principle based on an index system of the geometric characteristics of a curve of the speed time sequence data of the peak traffic time;

step (3) dividing the speed time sequence data into a plurality of traffic state clusters by day by using a k-means-based clustering method, and automatically discovering a traffic speed mode, wherein the selection of a cluster number k value is determined by adopting a Gap static method;

step (4), calculating a clustering center of each cluster according to the obtained clustering result, namely k traffic speed state rules of each road section, and matching by using a pattern matching method based on similarity and real-time traffic speed time sequence data to obtain a pattern similar to the real-time traffic speed time sequence;

the index system in the step (2) comprises the overall characteristics of traffic congestion and the relationship characteristics of congestion state and non-congestion state, reflects the peak-shaped deflection characteristics of a congestion curve, and correspondingly describes the curve geometric characteristics of important periods in a peak period;

the self-discovery of the speed mode in the step (3) adopts a Gap statistical method to select the value of the clustering number k, specifically to calculate the Gap statistical statistics Gap (k) of different clustering numbers k:

Gap(k)＝E(logD _k )-logD _k

wherein E is logD _k Expectation of (D) _k Is a loss function, i.e., the sum of the squares of the distances from the clustered velocity time series data to its cluster center;

in the step (4), for the real-time data stream, matching the real-time data with the historical patterns by using a matching algorithm based on similarity, specifically, similarity calculation is respectively performed on the extracted N historical patterns and the real-time data, the pattern is matched by the one with the minimum similarity, a certain road section is set to have N historical patterns, and the historical pattern data is X _i Wherein X is _i ＝{x ₁ ，x ₂ ，x ₃ …x _n Y = { Y) real-time data stream ₁ ，y ₂ ，y ₃ …y _m And n and m are the number of the timestamps, the similarity is defined as follows:

selecting

The most similar historical pattern of the current real-time sequence. />