CN113420925B - Traffic health prediction method and system based on naive Bayes - Google Patents

Traffic health prediction method and system based on naive Bayes Download PDF

Info

Publication number
CN113420925B
CN113420925B CN202110721673.6A CN202110721673A CN113420925B CN 113420925 B CN113420925 B CN 113420925B CN 202110721673 A CN202110721673 A CN 202110721673A CN 113420925 B CN113420925 B CN 113420925B
Authority
CN
China
Prior art keywords
congestion
sub
cluster
subgroup
sampling
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110721673.6A
Other languages
Chinese (zh)
Other versions
CN113420925A (en
Inventor
李大庆
睢少博
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beihang University
Original Assignee
Beihang University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beihang University filed Critical Beihang University
Priority to CN202110721673.6A priority Critical patent/CN113420925B/en
Publication of CN113420925A publication Critical patent/CN113420925A/en
Application granted granted Critical
Publication of CN113420925B publication Critical patent/CN113420925B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/04Forecasting or optimisation specially adapted for administrative or management purposes, e.g. linear programming or "cutting stock problem"
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F30/00Computer-aided design [CAD]
    • G06F30/20Design optimisation, verification or simulation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N7/00Computing arrangements based on specific mathematical models
    • G06N7/01Probabilistic graphical models, e.g. probabilistic networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/10Services
    • G06Q50/26Government or public services
    • GPHYSICS
    • G08SIGNALLING
    • G08GTRAFFIC CONTROL SYSTEMS
    • G08G1/00Traffic control systems for road vehicles
    • G08G1/01Detecting movement of traffic to be counted or controlled
    • G08G1/0104Measuring and analyzing of parameters relative to traffic conditions
    • G08G1/0125Traffic data processing

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Strategic Management (AREA)
  • Economics (AREA)
  • Evolutionary Computation (AREA)
  • Human Resources & Organizations (AREA)
  • General Engineering & Computer Science (AREA)
  • Tourism & Hospitality (AREA)
  • Marketing (AREA)
  • Artificial Intelligence (AREA)
  • Development Economics (AREA)
  • General Business, Economics & Management (AREA)
  • Chemical & Material Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Pure & Applied Mathematics (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Mathematical Analysis (AREA)
  • Analytical Chemistry (AREA)
  • Computational Mathematics (AREA)
  • Algebra (AREA)
  • Educational Administration (AREA)
  • Probability & Statistics with Applications (AREA)
  • Health & Medical Sciences (AREA)
  • Evolutionary Biology (AREA)
  • General Health & Medical Sciences (AREA)
  • Mathematical Optimization (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Primary Health Care (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Game Theory and Decision Science (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Operations Research (AREA)
  • Quality & Reliability (AREA)

Abstract

The invention relates to a traffic health prediction method based on naive Bayes, which starts with a congestion space-time characteristic congestion sub-group, provides a new space-time dimension congestion clustering method, describes the space-time evolution of congestion, adopts the naive Bayes method to identify the congestion scale increase distribution under different conditions, realizes the increase prediction of the congestion sub-group through Markov chain and Monte Carlo simulation, can predict the scale change of congestion aggregation in an area, and finally realizes the traffic health prediction of the naive Bayes method by combining a cybernetics method.

Description

Traffic health prediction method and system based on naive Bayes
Technical Field
The invention relates to the technical field of traffic supervision, in particular to a traffic health prediction method and system based on naive Bayes.
Background
The rapid urbanization and the rapid increase of the holding capacity of private automobiles lead to serious urban traffic jam, the service capacity of a traffic system cannot be fully exerted, the urban operation efficiency and the life quality of residents are influenced, the urban traffic system is increasingly complicated due to the enlargement of the urban scale and the diversity of travel demands in cities, the rapidly increased travel demands lead to serious urban traffic jam, and the jam can lead to high transportation cost, pollution and economic loss. In order to reduce the cost loss caused by traffic jam, a reliable traffic health prediction method is urgently needed, and urban space-time jam full-cycle growth prediction becomes an important component of an intelligent traffic system.
The congestion is generated and transmitted by depending on an organization structure in a traffic system. The traditional congestion prediction research focuses on the congestion propagation relationship among different connecting edges (road sections) in a traffic network, and urban congestion is predicted from a microscopic view based on road-level traffic dynamics. Constructing a generalized link travel time function to capture the influence of adverse weather conditions on the road network; evaluating the change of the urban traffic peak time starting point-destination (OD) traffic demand from two angles of a mean value and a covariance matrix; and constructing a multivariate probability principal component analysis model to predict the passing time of the future road section. The scholars also quantify the influence of severe weather events on the performance of the future road network and find out the weakest link; an Adaptive Rolling Smoothing (ARS) method is provided, which dynamically predicts the traffic state of an urban highway by using space-time conditions; developing a microscopic visual model, and evaluating the congestion of each road link by using a dynamic congestion estimation strategy; applying a deep learning method to large-scale traffic network congestion evolution prediction; or a deep learning method is adopted to capture the nonlinear space-time effect of the traffic flow caused by congestion and recovery; or a quick learning method of the symmetrical extreme learning method clustering (S-ELM-Cluster) is provided, and sub-prediction models of different road sections are trained for urban congestion estimation and prediction. However, urban congestion has the characteristics of dynamic propagation, cascade failure and multiple faults, and the existing congestion estimation and prediction methods do not consider the characteristics of urban congestion, so that congestion estimation and prediction are inaccurate.
Disclosure of Invention
The invention aims to provide a traffic health prediction method and system based on naive Bayes so as to improve accuracy of traffic system regional congestion estimation and prediction.
In order to achieve the purpose, the invention provides the following scheme:
the invention provides a traffic health prediction method based on naive Bayes, which comprises the following steps:
acquiring speed data of each road at each sampling moment in an area to be detected;
extracting congestion subgroups of the area to be detected at each sampling moment according to the speed data of each road at each sampling moment;
determining a congestion sub-cluster time relay relation according to the congestion sub-cluster of the area to be detected at each sampling moment;
determining probability distribution of the increase speed of the congestion subgroups based on a naive Bayes theory according to the time relay relation of the congestion subgroups in the region to be detected;
and predicting the scale increase of the congestion sub-group by adopting a Markov chain and Monte Carlo simulation mode according to the probability distribution of the congestion sub-group increase speed to predict the traffic health.
Optionally, the extracting the congestion sub-cluster of the area to be detected at each sampling time according to the speed data of each road at each sampling time specifically includes:
according to the speed data of each road at each sampling moment, taking the intersection of the area to be detected as a node, the road of the area to be detected as a connecting edge and the speed data of each road as a weight, and constructing a traffic operation network of the area to be detected at each sampling moment;
removing the roads of which the speed data are greater than the congestion threshold value in the traffic operation network at each sampling moment respectively to obtain a congestion road network at each sampling moment;
and respectively extracting the mutually communicated roads in the congested road network at each sampling moment to form a communicated sub-cluster, obtaining a plurality of communicated sub-clusters at each sampling moment, and forming the congested sub-cluster at each sampling moment.
Optionally, the determining a congestion sub-cluster time relay relationship according to the congestion sub-cluster of the area to be detected at each sampling time specifically includes:
according to the congestion subgroups of the area to be detected at each sampling moment, a formula O is utilized ij
Figure BDA0003137047400000031
Determining road overlapping proportion between different congestion subgroups at adjacent sampling moments;
wherein, CC j (t k ) And CC i (t k-1 ) Respectively representing the sampling instants t k Congestion subgroup j and sampling time t k-1 Represents the number of roads within the congestion sub-cluster;
when the road overlapping proportion between two congestion subgroups at adjacent sampling moments is larger than an overlapping proportion threshold value, determining a congestion subgroup CC j (t k ) And CC i (t k-1 ) And the time inheritance relationship exists between the space-time congestion subgroups and belongs to the same space-time congestion subgroup.
Optionally, the determining, according to the congestion subgroup time relay relationship in the region to be detected, probability distribution of the congestion subgroup growth speed based on a naive bayes theory specifically includes:
according to the congestion subgroup of the area to be detected at each sampling moment, based on a naive Bayes theory, determining the probability distribution of the increase speed of the congestion subgroup as follows:
Figure BDA0003137047400000032
wherein P (d ═ k | T ═ T i ,S=s j ) Probability distribution representing the congestion clique growth rate, i.e., posterior distribution of the congestion clique scale with time variation d, P (T ═ T) i ,S=s j I d k) denotes a first conditional probability distribution, P (d k) denotes a second conditional probability distribution, and P (T) i ,S=s j ) A third condition probability distribution is shown, T represents a sub-period to which the congestion sub-group belongs, S represents a congestion sub-group type to which the size of the congestion sub-group belongs, d represents a time variation of the congestion sub-group size, and d is equal to N (CC (T) m ))-N(CC(t m-1 )),N(CC(t m ) And N (CC (t)) m-1 ) Respectively represent sampling instants t m And a sampling time t m-1 K represents a congestion sub-cluster size setting amount, t i Denotes the ith sub-period, s j Indicating the jth congestion clique category.
Optionally, the predicting the scale increase of the congestion sub-cluster by using a markov chain and monte carlo simulation mode according to the probability distribution of the congestion sub-cluster increase speed to perform health prediction specifically includes:
according to the probability distribution of the congestion subgroup growth speed, constructing a congestion subgroup state transition matrix based on a Markov chain;
and simulating the congestion sub-cluster scale growth process by adopting a Monte Carlo simulation mode based on the congestion sub-cluster state transition matrix and the current certain congestion sub-cluster scale to obtain a congestion sub-cluster scale growth expectation curve and probability distribution of congestion sub-cluster scale growth along with time.
Optionally, the simulating a growing process is performed by using a monte carlo simulation mode based on the congestion sub-cluster state transition matrix and the congestion sub-cluster scale at the current time to obtain a congestion sub-cluster scale growing expectation curve and a probability distribution of congestion sub-cluster scale growing with time, and then the method further includes:
calculating the numerical value of the actually observed congestion sub-cluster scale deviating from the congestion sub-cluster scale growth expectation curve as a deviation value according to the congestion sub-cluster scale growth expectation curve and the probability distribution of the congestion sub-cluster scale growing along with the time;
judging whether the deviation value is larger than a deviation threshold value or not, and obtaining a judgment result;
if the judgment result shows that the congestion sub-cluster at the current moment is in the future, an unhealthy congestion area can be developed, namely the congestion sub-cluster can be diffused into large-range congestion;
if the judgment result shows that the congestion sub-cluster is not in the current time, the congestion sub-cluster at the current time can not develop into an unhealthy congestion area in the future, and the congestion sub-cluster is limited within a small range.
A naive bayes based traffic health prediction system, the prediction system comprising:
the speed data acquisition module is used for acquiring the speed data of each road at each sampling moment in the area to be detected;
the congestion sub-cluster extraction module is used for extracting the congestion sub-cluster of the area to be detected at each sampling moment according to the speed data of each road at each sampling moment;
the congestion sub-cluster relay relation analysis module is used for determining a congestion sub-cluster time relay relation according to the congestion sub-clusters of the area to be detected at each sampling moment;
the congestion subgroup growth analysis module is used for determining probability distribution of the congestion subgroup growth speed based on a naive Bayes theory according to the congestion subgroup time relay relation in the region to be detected;
and the traffic health prediction module is used for predicting the scale growth of the congestion sub-group by adopting a Markov chain and Monte Carlo simulation mode according to the probability distribution of the growth speed of the congestion sub-group so as to predict the traffic health.
Optionally, the congestion sub-cluster extraction module specifically includes:
the traffic operation network construction module is used for constructing a traffic operation network of the area to be detected at each sampling moment by taking the intersection of the area to be detected as a node, the road of the area to be detected as a connecting edge and the speed data of each road as a weight according to the speed data of each road at each sampling moment;
the congested road network acquisition module is used for respectively removing roads of which the speed data are greater than a congestion threshold value in the traffic operation network at each sampling moment to obtain a congested road network at each sampling moment;
and the congestion sub-cluster acquisition module is used for respectively extracting the mutually communicated roads in the congestion road network at each sampling moment to form a communicated sub-cluster, acquiring a plurality of communicated sub-clusters at each sampling moment and forming the congestion sub-cluster at each sampling moment.
Optionally, the congestion clique inheritance relationship analysis module specifically includes:
a road overlapping proportion calculation submodule for utilizing a formula according to the congestion subgroup of the area to be detected at each sampling moment
Figure BDA0003137047400000051
Determining road overlapping proportion between different congestion subgroups at adjacent sampling moments;
wherein, CC j (t k ) And CC i (t k-1 ) Respectively representing the sampling time t k Congestion subgroup j and sampling time t k-1 Represents the number of roads within the congestion sub-cluster;
the time inheritance relationship determining submodule is used for determining the congestion sub-clusters CC when the road overlapping proportion between two congestion sub-clusters at adjacent sampling moments is larger than an overlapping proportion threshold value j (t k ) And CC i (t k-1 ) There is a temporal inheritance relationship between them, and they belong to the same space-time congestion subgroup.
Optionally, the congestion sub-cluster growth analysis module specifically includes:
and the congestion subgroup growth analysis submodule is used for determining the probability distribution of the congestion subgroup growth speed as follows according to the congestion subgroups of the area to be detected at each sampling moment based on the naive Bayes theory:
Figure BDA0003137047400000052
wherein P (d ═ k | T ═ T i ,S=s j ) Probability distribution representing congestion clique growth speed, i.e. posterior distribution of congestion clique scale with time variation d, P (T ═ T) i ,S=s j I d k) denotes a first conditional probability distribution, P (d k) denotes a second conditional probability distribution, and P (T) i ,S=s j ) A third condition probability distribution is shown, T represents a sub-period to which the congestion sub-group belongs, S represents a congestion sub-group type to which the size of the congestion sub-group belongs, d represents a time variation of the congestion sub-group size, and d is equal to N (CC (T) m ))-N(CC(t m-1 )),N(CC(t m ) And N (CC (t)) m-1 ) Respectively represent sampling instants t m And a sampling time t m-1 K represents a congestion sub-cluster size setting amount, t i Denotes the ith sub-period, s j Indicating the jth congestion clique category.
Optionally, the congestion prediction module specifically includes:
the congestion subgroup state transition matrix construction submodule is used for constructing a congestion subgroup state transition matrix based on a Markov chain according to probability distribution of the congestion subgroup increasing speed;
and the Monte Carlo simulation submodule is used for simulating the congestion sub-cluster scale increasing process by adopting a Monte Carlo simulation mode based on the congestion sub-cluster state transition matrix and the scale of a certain congestion sub-cluster at the current moment to obtain a congestion sub-cluster scale increasing expected curve and probability distribution of the congestion sub-cluster increasing regularly along with time.
Optionally, the congestion prediction module further includes:
the deviation value operator module is used for calculating the numerical value of the congestion sub-cluster scale deviation congestion sub-cluster scale increase expected curve as a deviation value through actual observation according to the congestion sub-cluster scale expected increase curve and the probability distribution of the congestion sub-cluster scale increase along with time;
the judgment submodule is used for judging whether the deviation value is greater than a deviation threshold value or not and obtaining a judgment result;
the first judgment result output submodule is used for developing the congestion sub-cluster at the current moment into an unhealthy congestion area in the future if the judgment result shows that the congestion sub-cluster is in a large-range congestion state;
and the second judgment result output submodule is used for judging whether the current congestion sub-cluster is in an unhealthy congestion area in the future or not if the judgment result shows that the current congestion sub-cluster is not in the unhealthy congestion area, and limiting the congestion sub-cluster to a smaller range.
According to the specific embodiment provided by the invention, the invention discloses the following technical effects:
the invention discloses a traffic health prediction method based on naive Bayes, which comprises the following steps: acquiring speed data of each road at each sampling moment in an area to be detected; extracting congestion subgroups of the area to be detected at each sampling moment according to the speed data of each road at each sampling moment; determining a congestion sub-cluster time relay relation according to the congestion sub-cluster of the area to be detected at each sampling moment; determining probability distribution of the increase speed of the congestion subgroups based on a naive Bayes theory according to the time relay relation of the congestion subgroups in the region to be detected; and predicting the scale increase of the congestion sub-group by adopting a Markov chain and Monte Carlo simulation mode according to the probability distribution of the congestion sub-group increase speed, and predicting the traffic health. The invention starts from a congested space-time characteristic congestion sub-group, provides a novel space-time dimension congestion clustering method, describes the spatio-temporal evolution of congestion, adopts a naive Bayes method to identify congestion scale increase distribution under different conditions, and realizes the increase prediction of the congestion sub-group through a Markov chain and Monte Carlo simulation.
The traffic health prediction method based on the naive Bayes method can be finally realized by combining the complex network modeling, the naive Bayes technology and the cybernetics method, and theoretical and technical guidance is provided for urban traffic health prediction, so that support is provided for urban traffic jam treatment, intelligent traffic construction and other contents.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings needed to be used in the embodiments will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings without inventive exercise.
Fig. 1 is a flowchart of a traffic health prediction method based on naive bayes provided by the invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
The invention aims to provide a traffic health prediction method and system based on naive Bayes so as to improve the accuracy of road congestion estimation and prediction.
In order to make the aforementioned objects, features and advantages of the present invention comprehensible, embodiments accompanied with figures are described in further detail below.
As shown in fig. 1, the present invention provides a traffic health prediction method based on naive bayes, which comprises the following steps:
step 1, obtaining speed data of each road at each sampling moment in an area to be tested.
For a selected time period, collecting road speed data: and collecting traffic flow speed data information of each road section in a time period. And performing data compensation on the data with the missing data.
And 2, extracting the congestion sub-cluster of the area to be detected at each sampling moment according to the speed data of each road at each sampling moment.
The step 2 specifically comprises the following steps:
(1) and according to the speed data of each road at each sampling moment, constructing a traffic operation network of the area to be detected at each sampling moment by taking the intersection of the area to be detected as a node, the road of the area to be detected as a connecting edge and the speed data of each road as a weight. And (3) constructing a traffic operation network at each moment in a time period by using the speed data acquired in the step (1): for a researched traffic system, a directed network is constructed by taking intersections as nodes, roads as connecting edges and speed as weight; wherein the speed data in the traffic operation network is relative speed, obtained by normalizing the speed of 95% quantile point of the road all-day speed, and specifically expressed as r ij =v ij /v i 95% Wherein r is ij Indicating the relative speed of the road i at time j, v ij Representing the true speed, v, observed on the road i at time j i 95% Representing 95% quantile of the road i all day speed profile corresponding to speed.
(2) And respectively removing the roads of which the speed data are greater than the congestion threshold value in the traffic operation network at each sampling moment, and obtaining the congestion road network at each sampling moment. That is, for the directional network constructed in (1), the congestion network is extracted according to the congestion threshold: setting a congestion threshold q and removing a relative speed r for the constructed directed network at different moments ij And (4) analyzing the connectivity of the rest networks of the roads larger than the congestion threshold, and counting the inclusion relation between the congestion communication sub-groups and the roads.
Here, "removing" a road refers to setting the road to a non-connectable state; "connected" here means that when any two adjacent roads are not removed, the two adjacent areas are in a connected state; "adjacent" here means a road that coincides with a certain road passing start point or end point; the "connected sub-cluster" refers to a set composed of connected roads, each road in the set is directly connected with another road area in the set or indirectly connected through a transitive connection relationship, and any road in the set does not have any direct or indirect connection relationship with the road outside the set. For example, if road a is in communication with road B whose end point coincides, and road B is in communication with road C whose start point coincides, road a is in indirect communication with road C; if the road A, B, C no longer has a direct or indirect connection relationship with other roads, the three roads form a connected sub-cluster;
(3) and respectively extracting the mutually communicated roads in the congested road network at each sampling moment to form a communicated sub-cluster, obtaining a plurality of communicated sub-clusters at each sampling moment, and forming the congested sub-cluster at each sampling moment.
And 3, determining the congestion sub-cluster time relay relationship according to the congestion sub-cluster of the area to be detected at each sampling moment.
The step 3 specifically comprises the following steps:
according to the congestion network extracted in the step 2, defining the relay relation of the congestion sub-group in time: for the congestion space at a given time to express the congestion sub-clusters, the succession relation between the congestion sub-clusters in continuous time is formed by the road overlapping proportion O between two congestion sub-clusters at adjacent time ij To be determined, defined as follows:
Figure BDA0003137047400000091
wherein, CC j (t k ) And CC i (t k-1 ) Respectively representing the sampling instants t k Congestion subgroup j and sampling time t k-1 Represents the number of roads within the congestion sub-cluster; when the road overlap ratio between two congestion sub-clusters at adjacent times is greater than a threshold value delta, i.e. O ij > δ, we consider the congestion clique CC j (t k ) And CCi (t) k-1 ) Belonging to the same space-time congestion subgroup.
And 4, determining probability distribution of the increase speed of the congestion subgroups based on a naive Bayes theory according to the congestion subgroups of the area to be detected at each sampling moment.
And (3) deducing the probability distribution of the congestion subgroup growth speed for defining the relay relation on the congestion subgroup time in the step 3: the variation d of the congestion subgroup scale at the near moment is N (CC) j (t k ))-N(CC j (t k-1 ) Wherein N (CC) j (t k ) ) represents t k Constantly congested clique CC j Number of inner roads, N (CC) j (t k-1 ) Represents t) k-1 Constantly congested clique CC j The number of roads in. And then, equally dividing the target time interval into L sub-time intervals, dividing the congestion sub-groups into three types, namely a large congestion sub-group, a medium congestion sub-group and a small congestion sub-group according to the congestion sub-group scale, counting the probability distribution of the scale change d of the congestion sub-groups of different scales in different sub-time periods, and deducing the posterior distribution of the time change d of the congestion sub-group scale through a naive Bayes theory.
Figure BDA0003137047400000092
Wherein T represents the sub-time period to which the sub-cluster currently belongs, and S represents the congestion sub-cluster type to which the congestion sub-cluster currently belongs.
And 5, predicting the scale growth of the congestion sub-group by adopting a Markov chain and Monte Carlo simulation mode according to the probability distribution of the growth speed of the congestion sub-group, and predicting the traffic health.
Step 5, specifically comprising:
(1) and constructing a congestion subgroup state transition matrix based on a Markov chain according to the probability distribution of the congestion subgroup growth speed. Namely, a Markov chain is utilized to describe the spatio-temporal congestion subgroup size growth process: constructing a congestion subgroup state transition matrix, and determining a transition direction and a probability of a congestion subgroup state, wherein rows and columns in the transition matrix represent the state of the congestion subgroup, the state (T, S) is determined by the sub-period T and the congestion subgroup type S in the step 3, 3 xL states are provided in total, matrix elements represent transition probabilities among different states, and the probability is determined by the distribution of the current time scale of the congestion subgroup and the next time scale change d of the congestion subgroup; after the state of the congestion subgroup is determined, the probability distribution of the congestion subgroup scale change d is deduced through the step 3, and the change degree of the congestion subgroup scale is determined.
(2) And simulating the growth process by adopting a Monte Carlo simulation mode based on the congestion sub-cluster state transition matrix and the scale of a certain congestion sub-cluster at the current moment to obtain a congestion sub-cluster scale growth expectation curve and probability distribution of the congestion sub-cluster scale growth along with time. Namely, Monte Carlo simulation realizes the prediction of the growth process of the congestion sub-cluster: giving an initial value of the scale of the congestion sub-cluster in the target time period, and simulating a possible congestion growth process on a space-time dimension by adopting a Monte Carlo method, wherein the single Monte Carlo simulation process is as follows: deducing the distribution of the scale change d of the congestion sub-group at the next moment according to the current state (T, S) of the sub-group through naive Bayes, randomly sampling from the distribution to obtain the value of the change d, and determining the state (T, S) of the congestion sub-group at the next moment; continuously repeating the process until the congestion sub-cluster disappears or the simulation time is cut off, and obtaining a single possible congestion sub-cluster scale growth curve; and repeating the Monte Carlo simulation process for multiple times to obtain the probability distribution of the congestion subgroup scale increasing along with the time, wherein the expected curve of the distribution is the congestion subgroup increasing process predicted by the Monte Carlo simulation. The specific process of the monte carlo simulation is supported by the acknowledged technology and literature in the field of computational science, and is not described herein any more.
(3) Calculating a numerical value of the congestion sub-cluster scale deviating from the congestion sub-cluster scale increasing curve obtained by actual observation according to the congestion sub-cluster scale increasing expected curve and the probability distribution of the congestion sub-cluster scale increasing along with time, and taking the numerical value as a deviation value; judging whether the deviation value is larger than a deviation threshold value or not, and obtaining a judgment result; if the judgment result shows that the congestion sub-cluster at the current moment is in the future, an unhealthy congestion area can be developed, namely the congestion sub-cluster can be diffused into large-range congestion; if the judgment result shows that the congestion sub-cluster is not in the healthy congestion area, the congestion sub-cluster at the current moment cannot be developed in the future, and the congestion sub-cluster is limited in a small range. And (2) analyzing the situation that the actual observed value deviates from the expected increase curve according to the probability distribution of the congestion subgroup scale increasing along with the time and the expected increase curve obtained in the step (2), selecting a deviation threshold value tau, and judging whether the congestion subgroup is a large congestion subgroup in the future or not if (s (t) -E (s (t))/E (s (t)) > tau, wherein s (t) is the scale of the congestion subgroup at the time t, and E (s (t))) is the expected scale of the congestion subgroup at the time t obtained by Monte Carlo simulation.
The invention also provides a traffic health prediction system based on naive Bayes, which comprises:
the speed data acquisition module is used for acquiring the speed data of each road at each sampling moment in the area to be detected;
and the congestion sub-cluster extraction module is used for extracting the congestion sub-cluster of the area to be detected at each sampling moment according to the speed data of each road at each sampling moment.
The congestion subgroup extracting module specifically comprises: the traffic operation network construction module is used for constructing a traffic operation network of the area to be detected at each sampling moment by taking the intersection of the area to be detected as a node, the road of the area to be detected as a connecting edge and the speed data of each road as a weight according to the speed data of each road at each sampling moment; the congested road network acquisition module is used for respectively removing roads of which the speed data are greater than a congestion threshold value in the traffic operation network at each sampling moment to obtain a congested road network at each sampling moment; and the congestion sub-cluster acquisition module is used for respectively extracting the mutually communicated roads in the congestion road network at each sampling moment to form a communicated sub-cluster, acquiring a plurality of communicated sub-clusters at each sampling moment and forming the congestion sub-cluster at each sampling moment.
And the congestion subgroup relay relation analysis module is used for determining the congestion subgroup time relay relation according to the congestion subgroups of the area to be detected at each sampling moment.
The congestion sub-cluster inheritance relationship analysis module specifically comprises:
a road overlapping proportion calculation submodule for utilizing a formula according to the congestion subgroups of the area to be detected at each sampling moment
Figure BDA0003137047400000111
Determining between different congestion subgroups at adjacent sampling instantsThe road overlap ratio of (1);
wherein, CC j (t k ) And CC i (t k-1 ) Respectively representing the sampling instants t k Congestion subgroup j and sampling time t k-1 The congestion subgroup i, N (-) of (a) represents the number of roads within the congestion subgroup;
a time inheritance relation determination submodule used for determining the congestion sub-cliques CC when the road overlap ratio between two congestion sub-cliques at adjacent sampling moments is larger than an overlap ratio threshold value j (t k ) And CC i (t k-1 ) And the time inheritance relationship exists between the space-time congestion subgroups and belongs to the same space-time congestion subgroup.
And the congestion subgroup growth analysis module is used for determining the probability distribution of the congestion subgroup growth speed based on the naive Bayes theory according to the congestion subgroups of the region to be detected at each sampling moment.
The congestion subgroup growth analysis module specifically comprises: and the congestion subgroup growth analysis submodule is used for determining the probability distribution of the congestion subgroup growth speed as follows based on the naive Bayes theory according to the congestion subgroup of the area to be detected at each sampling moment:
Figure BDA0003137047400000121
wherein P (d ═ k | T ═ T i ,S=s j ) Probability distribution representing the congestion clique growth rate, i.e., posterior distribution of the congestion clique scale with time variation d, P (T ═ T) i ,S=s j I d k) denotes a first conditional probability distribution, P (d k) denotes a second conditional probability distribution, and P (T) i ,S=s j ) Represents a third condition probability distribution, T represents a sub-period to which the congestion sub-group belongs, S represents a congestion sub-group type to which the congestion sub-group scale belongs, d represents a time variation of the congestion sub-group scale, and d is N (CC (T) m ))-N(CC(t m-1 )),N(CC(t m ) And N (CC (t)) m-1 ) Respectively represent sampling instants t m And a sampling time t m-1 K represents a congestion sub-cluster size setting amount, t i Denotes the ith sub-period, s j Indicating the jth congestion sub-cluster category.
And the congestion prediction module is used for predicting the scale increase of the congestion sub-group by adopting a Markov chain and Monte Carlo simulation mode according to the probability distribution of the congestion sub-group increase speed, predicting the traffic health according to the probability distribution of the congestion sub-group increase speed, predicting the scale increase of the congestion sub-group by adopting the Markov chain and Monte Carlo simulation mode, and predicting the traffic health.
The congestion prediction module specifically includes:
and the congestion subgroup state transition matrix constructing submodule is used for constructing a congestion subgroup state transition matrix based on the Markov chain according to the probability distribution of the congestion subgroup increasing speed.
And the Monte Carlo simulation submodule is used for simulating the growth process by adopting a Monte Carlo simulation mode based on the congestion sub-cluster state transition matrix and the congestion sub-cluster scale at the current moment to obtain a congestion sub-cluster scale growth expectation curve and probability distribution of congestion sub-cluster scale growth along with time.
And the deviation value calculation operator module is used for calculating the numerical value of the congestion sub-cluster scale deviation congestion sub-cluster scale increase expected curve obtained by actual observation according to the congestion sub-cluster scale increase expected curve and the probability distribution of the congestion sub-cluster scale increase along with time, and the numerical value is used as a deviation value.
And the judgment submodule is used for judging whether the deviation value is greater than the deviation threshold value or not and obtaining a judgment result.
And the first judgment result output submodule is used for developing the congestion sub-cluster at the current time into an unhealthy congestion area in the future if the judgment result shows that the congestion sub-cluster is in a positive state, namely the congestion sub-cluster spreads into large-range congestion.
And the second judgment result output submodule is used for judging whether the current congestion subgroup is in an unhealthy congestion area in the future or not if the judgment result shows that the current congestion subgroup is not in a small range.
The invention also discloses the following technical effects:
the method is based on the complex network modeling and the naive Bayes method, the congestion network is constructed by the complex network modeling method to identify the congestion subgroup, the succession relation of the congestion subgroup in the space-time dimension is defined, the naive Bayes method is used to infer the increase speed distribution of the congestion subgroup, the increase of the congestion subgroup is predicted through Markov chain and Monte Carlo simulation, the health prediction of traffic is realized, and the health prediction pain point problem of the urban level traffic system is solved.
The invention has the advantages that: firstly, the classical system health prediction is mainly based on a congestion propagation mode among road sections, a scene of influence among the road sections is solved from a microscopic view angle, a relatively accurate result can be obtained only by relying on high-quality and long-term data, the influence of the characteristics of structural coupling, regional propagation and congestion block association of traffic congestion on health is difficult to consider, and the complex network theory and naive Bayes method can calculate an accurate result only by road network topology information and speed data within a certain time. Secondly, a complex network theory analysis method is adopted, and modeling description can be carried out on the characteristics of structural coupling, regional propagation and congestion block association of traffic congestion; finally, the invention adopts a naive Bayes method which has outstanding learning ability and reasoning ability, can adapt to complex and dynamic external environment, and can excavate different congestion growth modes based on prior experience and observation results instead of simple congestion road propagation, so that the traffic health diagnosis at the city level becomes possible.
The embodiments in the present description are described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same and similar parts among the embodiments are referred to each other.
The principles and embodiments of the present invention have been described herein using specific examples, which are provided only to help understand the method and the core concept of the present invention; meanwhile, for a person skilled in the art, according to the idea of the present invention, the specific embodiments and the application range may be changed. In view of the above, the present disclosure should not be construed as limiting the invention.

Claims (8)

1. A traffic health prediction method based on naive Bayes is characterized by comprising the following steps:
acquiring speed data of each road at each sampling moment in a region to be detected;
extracting congestion subgroups of the area to be detected at each sampling moment according to the speed data of each road at each sampling moment;
determining a congestion sub-cluster time relay relation according to the congestion sub-cluster of the area to be detected at each sampling moment;
the method for determining the congestion subgroup time relay relationship according to the congestion subgroups of the area to be detected at each sampling moment specifically comprises the following steps:
according to the congestion subgroups of the area to be detected at each sampling moment, a formula is utilized
Figure FDA0003714857060000011
Figure FDA0003714857060000012
Determining road overlapping proportion between different congestion subgroups at adjacent sampling moments;
wherein, CC j 9t k ) And CC i (t k-1 ) Respectively representing the sampling instants t k Congestion subgroup j and sampling time t k-1 Represents the number of roads within the congestion sub-cluster;
when the road overlap ratio between two congestion sub-clusters at adjacent sampling moments is larger than an overlap ratio threshold value, determining a congestion sub-cluster CC j (t k ) And CC i (t k-1 ) A time inheritance relationship exists between the time inheritance clusters and the time inheritance clusters belong to the same space-time congestion subgroup;
determining probability distribution of the increase speed of the congestion subgroups based on a naive Bayes theory according to the time relay relation of the congestion subgroups in the region to be detected;
and predicting the scale increase of the congestion sub-group by adopting a Markov chain and Monte Carlo simulation mode according to the probability distribution of the congestion sub-group increase speed to predict the traffic health.
2. The naive bayes-based traffic health prediction method according to claim 1, wherein the extracting the congestion subgroups of the area to be detected at each sampling time according to the speed data of each road at each sampling time specifically comprises:
according to the speed data of each road at each sampling moment, taking the intersection of the area to be detected as a node, the road of the area to be detected as a connecting edge and the speed data of each road as a weight, and constructing a traffic operation network of the area to be detected at each sampling moment;
removing roads of which the speed data are larger than a congestion threshold value in the traffic operation network at each sampling moment respectively to obtain a congestion road network at each sampling moment;
and respectively extracting the mutually communicated roads in the congested road network at each sampling moment to form a communicated sub-cluster, obtaining a plurality of communicated sub-clusters at each sampling moment, and forming the congested sub-cluster at each sampling moment.
3. The naive Bayes-based traffic health prediction method according to claim 1, wherein the determining, based on a naive Bayes theory, a probability distribution of a congestion subgroup growth speed according to a congestion subgroup time relay relationship in an area to be measured specifically comprises:
according to the congestion subgroup of the area to be detected at each sampling moment, based on a naive Bayes theory, determining the probability distribution of the increase speed of the congestion subgroup as follows:
Figure FDA0003714857060000021
wherein P (d ═ k | T ═ T) i ,S=s j ) Probability distribution representing the congestion clique growth rate, i.e., posterior distribution of the congestion clique scale with time variation d, P (T ═ T) i ,S=s j D ═ k) denotesThe first conditional probability distribution, P (d ═ k), represents the second conditional probability distribution, and P (T ═ T) i ,S=s j ) Represents a third condition probability distribution, T represents a sub-period to which the congestion sub-group belongs, S represents a congestion sub-group type to which the congestion sub-group scale belongs, d represents a time variation of the congestion sub-group scale, and d is N (CC (T) m ))-N(CC(t m-1 )),N(CC(t m ) And N (CC (t)) m-1 ) Respectively represent sampling instants t m And a sampling time t m-1 K represents a congestion sub-cluster size setting amount, t i Denotes the ith sub-period, s j Indicating the jth congestion clique category.
4. The naive bayes-based traffic health prediction method according to claim 1, wherein the predicting the health by predicting the scale growth of the congestion subgroup according to the probability distribution of the congestion subgroup growth speed by using a markov chain and monte carlo simulation method specifically comprises:
according to the probability distribution of the congestion subgroup growth speed, constructing a congestion subgroup state transition matrix based on a Markov chain;
and simulating the growth process by adopting a Monte Carlo simulation mode based on the congestion sub-cluster state transition matrix and the congestion sub-cluster scale at the current moment to obtain a congestion sub-cluster scale growth expectation curve and probability distribution of congestion sub-cluster scale growth along with time.
5. The naive Bayes based traffic health prediction method as recited in claim 4, wherein a simulation of a growth process is performed by adopting a Monte Carlo simulation mode based on the congestion sub-cluster state transition matrix and the congestion sub-cluster scale at the current moment, so as to obtain a congestion sub-cluster scale growth expectation curve and a probability distribution of the congestion sub-cluster scale growth with time, and then further comprising:
calculating the numerical value of the congestion sub-cluster scale deviating from the congestion sub-cluster scale growth expectation curve obtained by actual observation as a deviation value according to the congestion sub-cluster scale growth expectation curve and the probability distribution of the congestion sub-cluster scale growing along with time;
judging whether the deviation value is larger than a deviation threshold value or not, and obtaining a judgment result;
if the judgment result shows that the congestion sub-cluster at the current moment is in the future, an unhealthy congestion area can be developed, namely the congestion sub-cluster can be diffused into large-range congestion;
if the judgment result shows that the congestion sub-cluster is not in the healthy congestion area, the congestion sub-cluster at the current moment does not develop into the unhealthy congestion area in the future, and the congestion sub-cluster is limited in a small range.
6. A naive bayes based traffic health prediction system, the prediction system comprising:
the speed data acquisition module is used for acquiring the speed data of each road at each sampling moment in the area to be detected;
the congestion sub-cluster extraction module is used for extracting the congestion sub-cluster of the area to be detected at each sampling moment according to the speed data of each road at each sampling moment;
the congestion sub-cluster relay relation analysis module is used for determining the congestion sub-cluster time relay relation according to the congestion sub-cluster of the area to be detected at each sampling moment;
the congestion sub-cluster inheritance relationship analysis module specifically comprises:
a road overlapping proportion calculation submodule for utilizing a formula according to the congestion subgroup of the area to be detected at each sampling moment
Figure FDA0003714857060000031
Determining road overlapping proportion between different congestion subgroups at adjacent sampling moments;
wherein, CC j (t k ) And CC i (t k-1 ) Respectively representing the sampling instants t k Congestion subgroup j and sampling time t k-1 Represents the number of roads within the congestion sub-cluster;
a temporal inheritance relationship determination submodule for determining a path between two congested subgroups at adjacent sampling instantsWhen the road overlapping proportion is larger than the overlapping proportion threshold value, determining the congestion subgroup CC j (t k ) And CC i (t k-1 ) A time inheritance relationship exists between the time inheritance clusters and the time inheritance clusters belong to the same space-time congestion subgroup;
the congestion subgroup growth analysis module is used for determining probability distribution of the congestion subgroup growth speed according to the congestion subgroups of the area to be detected at each sampling moment based on a naive Bayes theory;
and the congestion prediction module is used for predicting the scale increase of the congestion sub-group by adopting a Markov chain and Monte Carlo simulation mode according to the probability distribution of the increase speed of the congestion sub-group so as to predict the traffic health.
7. The naive bayes-based traffic health prediction system of claim 6, wherein the congestion subgroup extraction module specifically comprises:
the traffic operation network construction module is used for constructing a traffic operation network of the area to be detected at each sampling moment by taking the intersection of the area to be detected as a node, the road of the area to be detected as a connecting edge and the speed data of each road as a weight according to the speed data of each road at each sampling moment;
the congested road network acquisition module is used for respectively removing roads of which the speed data are greater than a congestion threshold value in the traffic operation network at each sampling moment to obtain a congested road network at each sampling moment;
and the congestion sub-cluster acquisition module is used for respectively extracting the mutually communicated roads in the congestion road network at each sampling moment to form a communicated sub-cluster, acquiring a plurality of communicated sub-clusters at each sampling moment and forming the congestion sub-cluster at each sampling moment.
8. The naive bayes-based traffic health prediction system of claim 6, wherein the congestion subgroup growth analysis module specifically comprises:
and the congestion subgroup growth analysis submodule is used for determining the probability distribution of the congestion subgroup growth speed as follows based on the naive Bayes theory according to the congestion subgroup of the area to be detected at each sampling moment:
Figure FDA0003714857060000041
wherein P (d ═ k | T ═ T i ,S=s j ) Probability distribution representing the congestion clique growth rate, i.e., posterior distribution of the congestion clique scale with time variation d, P (T ═ T) i ,S=s j I d k) denotes a first conditional probability distribution, P (d k) denotes a second conditional probability distribution, and P (T) i ,S=s j ) Represents a third condition probability distribution, T represents a sub-period to which the congestion sub-group belongs, S represents a congestion sub-group type to which the congestion sub-group scale belongs, d represents a time variation of the congestion sub-group scale, and d is N (CC (T) m ))-N(CC(t m-1 )),N(CC(t m ) And N (CC (t)) m-1 ) Respectively represent sampling instants t m And a sampling time t m-1 K represents a congestion sub-cluster size setting amount, t i Denotes the ith sub-period, s j Indicating the jth congestion sub-cluster category.
CN202110721673.6A 2021-06-28 2021-06-28 Traffic health prediction method and system based on naive Bayes Active CN113420925B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110721673.6A CN113420925B (en) 2021-06-28 2021-06-28 Traffic health prediction method and system based on naive Bayes

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110721673.6A CN113420925B (en) 2021-06-28 2021-06-28 Traffic health prediction method and system based on naive Bayes

Publications (2)

Publication Number Publication Date
CN113420925A CN113420925A (en) 2021-09-21
CN113420925B true CN113420925B (en) 2022-08-19

Family

ID=77717813

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110721673.6A Active CN113420925B (en) 2021-06-28 2021-06-28 Traffic health prediction method and system based on naive Bayes

Country Status (1)

Country Link
CN (1) CN113420925B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113870591B (en) * 2021-10-22 2023-08-01 上海应用技术大学 Traffic prediction-based signal control period dividing method, device and equipment

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103903436A (en) * 2012-12-28 2014-07-02 上海优途信息科技有限公司 Expressway traffic jam detecting method and system based on floating car
CN103391185B (en) * 2013-08-12 2017-06-16 北京泰乐德信息技术有限公司 A kind of cloud security storage of track traffic Monitoring Data and processing method and system
CN110222950B (en) * 2019-05-16 2021-05-11 北京航空航天大学 Health index system and evaluation method for urban traffic
CN110111576B (en) * 2019-05-16 2020-10-09 北京航空航天大学 Method for realizing urban traffic elasticity index based on space-time congestion sub-cluster
CN112382082B (en) * 2020-09-30 2022-06-14 银江技术股份有限公司 Method and system for predicting traffic running state in congested area
CN112215365A (en) * 2020-10-28 2021-01-12 天津大学 Method for providing feature prediction capability based on naive Bayes model

Also Published As

Publication number Publication date
CN113420925A (en) 2021-09-21

Similar Documents

Publication Publication Date Title
WO2022241802A1 (en) Short-term traffic flow prediction method under complex road network, storage medium, and system
CN103413443B (en) Short-term traffic flow forecasting method based on hidden Markov model
CN104408913B (en) A kind of traffic flow three parameter real-time predicting method considering temporal correlation
Chang et al. Flood forecasting using radial basis function neural networks
CN106652441A (en) Urban road traffic condition prediction method based on spatial-temporal data
CN104778837A (en) Multi-time scale forecasting method for road traffic running situation
CN112949999A (en) High-speed traffic accident risk early warning method based on Bayesian deep learning
CN106530715B (en) Road network traffic state prediction method based on fuzzy Markov process
CN114093168B (en) Method for evaluating urban road traffic running state based on toughness visual angle
CN109859480B (en) Congestion road section modeling and evaluating method based on complex network
CN111710161B (en) Road network congestion propagation situation prediction method and system based on infectious disease model
CN115148019A (en) Early warning method and system based on holiday congestion prediction algorithm
CN113420925B (en) Traffic health prediction method and system based on naive Bayes
CN110991776A (en) Method and system for realizing water level prediction based on GRU network
CN111199298A (en) Flood forecasting method and system based on neural network
CN110738853B (en) Key node identification method based on complex network correlation
CN116227362A (en) Municipal drainage pipe network state prediction method based on graph and deep learning
CN113409578B (en) Traffic network health portrait method and system based on fuzzy clustering
CN113284354B (en) Traffic elasticity regulation and control method and system based on reinforcement learning
CN117494586A (en) Mountain torrent space-time prediction method based on deep learning
CN116862076A (en) Drainage pipe network flow prediction method, device and storage medium
Turki et al. A Markova-Chain Approach to Model Vehicles Traffic Behavior
CN113409574B (en) Bayesian network-based traffic health diagnosis method
CN109814462A (en) A kind of municipal drainage network monitoring control system based on big data
CN115294770A (en) Method and device for predicting traffic congestion index in rainy days

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant