CN114463684A

CN114463684A - Urban highway network-oriented blockage detection method

Info

Publication number: CN114463684A
Application number: CN202210132700.0A
Authority: CN
Inventors: 孙鹏飞; 史煜; 李雷孝
Original assignee: Inner Mongolia University of Technology
Current assignee: Inner Mongolia University of Technology
Priority date: 2022-02-14
Filing date: 2022-02-14
Publication date: 2022-05-10

Abstract

The invention provides a method for detecting a blockage point of an urban highway network, which comprises the following steps: s1, video data acquisition is carried out on the real-time road conditions through the intelligent terminal, and a highway network real-time video image information table is established; s2, carrying out filtering processing, graying processing and image enhancement processing on the collected video image data; s3, extracting key indexes of the processed video image data information by adopting a target detection algorithm based on a deep neural network, and extracting key index information of the vehicle speed, the vehicle flow, the lane speed difference and the maximum vehicle carrying number in the video image; and S4, establishing a city expressway network congestion detection classification model by using the extracted key index information, and outputting an identification result.

Description

Urban highway network-oriented blockage detection method

Technical Field

The invention belongs to the technical field of a method for detecting a blockage of an urban highway network, and particularly relates to a method for detecting a blockage facing the urban highway network.

Background

With the rapid development of social economy and the continuous increase of the quantity of motor vehicles, the problem of urban road congestion becomes the 'common fault' of urban road traffic. Particularly, in the urban highway network, namely the urban expressway, traffic accidents often occur due to large traffic flow and high vehicle speed in the peak time of going to and from work, so that road congestion can be caused. The harmfulness of urban road congestion has two main aspects: firstly, time delay and energy waste caused by road congestion bring huge economic loss to the society. Secondly, when the vehicle speed is too low, the pollution degree of the automobile exhaust is greatly increased, and simultaneously, a large amount of noise is generated, so that the air quality and the urban environment quality are sharply reduced, and the living standard of citizens is reduced.

An Intelligent Transportation System (ITS) applies scientific technology to the fields of traffic transportation, service control and the like, strengthens the connection among vehicles, roads and users, thereby forming a comprehensive transportation system which ensures safety, improves efficiency and saves resources, accurately predicts and detects the traffic jam of urban roads, provides decision reference for relevant departments of traffic management, reasonably schedules traffic and timely shunts and removes obstacles for vehicles in a road network, and is a method for mainly solving the traffic jam at present. Existing road congestion prediction studies can be roughly divided into three categories: based on a traditional machine learning algorithm; based on a depth learning algorithm; and the deep model is based on the combination of a machine learning algorithm and a deep learning algorithm. The traditional machine learning algorithms such as SVM, KNN, XGboost and the like are theoretically feasible for identifying and judging road congestion and finely identifying the congestion points. However, the traditional machine learning algorithm has no deep network structure, the extracted features depend on expert experience, the feature extraction capability for time series data is lacked, and the prediction accuracy is not ideal. Deep learning algorithms such as the recurrent neural network and the variants thereof, the deep confidence network, the convolutional neural network CNN and the like can automatically extract data features and can completely characterize data, but the prediction accuracy of the deep learning algorithms is low due to strong randomness, time sequence dependency and spatial correlation of traffic flow data. The deep model takes a deep learning algorithm as a bottom layer and a traditional machine learning algorithm as a top layer, performs characteristic extraction on traffic flow data to form a new characteristic data set, and then takes the new characteristic data set as the input of machine learning to finally predict road congestion and identify the type of road congestion points.

Traffic data is typically spatiotemporal data that exhibits both temporal and spatial correlations and heterogeneity. Most of the existing technologies can only capture partial attributes of traffic data, cannot fully utilize the space-time characteristics of traffic flow or fully consider the relationship among the characteristics, and further cannot well represent the internal rules of traffic flow sequences, so that the prediction effect is not ideal.

Disclosure of Invention

The present invention provides a method for detecting a congestion point of an urban highway network, aiming at overcoming the drawbacks of the prior art, and solving the problems in the background art.

In order to solve the technical problems, the invention adopts the technical scheme that: a method for detecting a blockage point facing an urban highway network comprises the following steps:

s1, acquiring video data of real-time road conditions through an intelligent terminal, separating the video data into video image data according to specified frame number, performing formatting association, and establishing a highway network real-time video image information table;

s2, filtering, graying and image enhancing the collected video image data, reducing the data size while maximally retaining the texture information, and accelerating the processing speed;

s3, extracting key indexes of the processed video image data information by adopting a target detection algorithm based on a deep neural network, and extracting key index information of the vehicle speed, the vehicle flow, the lane speed difference and the maximum vehicle bearing number in the video image;

and S4, establishing a city expressway network congestion detection classification model by using the extracted key index information, and outputting an identification result.

Further, in S1, the highway network real-time video image information table includes intelligent terminal number information, frame start time information, frame number information, shooting position information, and frame image information.

Further, in S2, performing denoising processing on the road video image data by using bilateral filtering in a nonlinear filtering manner, then performing graying processing on the image, and converting the grayscale value of the pixel in the image into a range of [0,255], specifically performing grayscale normalization processing by using the following formula:

p(i,j)＝R(i,j)×0.587+G(i,j)×0.299+B(i,j)×0.114

in the formula, p (i, j) represents the gray value of the vein image pixel after gray normalization, and the numerical range is [0,255 ];

r (i, j) represents the original image red channel pixel value;

g (i, j) represents the green channel pixel value of the original image;

b (i, j) represents the original image blue channel pixel value;

then, enhancing the video image by adopting a local histogram equalization algorithm with limited contrast;

then, an effective local threshold segmentation method of Niblack is adopted to segment the image, and image pixel points are summarized into a plurality of classes mainly by setting different characteristic thresholds;

in the R multiplied by R neighborhood, the binarization is carried out by calculating the mean value and the variance of pixel points and then calculating a threshold value, and the specific calculation formula is as follows:

T(x,y)＝m(x,y)+k×s(x,y)

wherein T (x, y) is the threshold value of the point;

k is a correction coefficient;

m (x, y) is the mean of the pixels in the R × R neighborhood;

s (x, y) is the standard deviation of pixels in the R × R neighborhood, as represented by:

further, the denoising processing of the road video image by adopting the bilateral filtering in the nonlinear filtering mode is specifically as follows:

the bilateral filtering has two filtering kernels, namely a space domain kernel and a value domain kernel, the original form of the target edge information is kept unchanged while noise is removed, and a weight template is defined by the Euclidean distance of coordinates:

in the formula, the left side is a space domain kernel, and the right side is a value domain kernel;

where (i, j) represents the coordinates of the other windows of the template;

(k, l) coordinates representing a template center window;

f (i, j) represents the coordinate value of the template pixel;

f (k, l) represents the coordinate value of the template center pixel;

σ_drepresents the standard deviation of the spatial domain;

σ_rrepresents the standard deviation of the spatial domain;

and performing convolution operation on the weights and the pixels, wherein the final output pixel value is as follows:

further, the specific processing procedure in S3 is as follows:

integrating candidate frame extraction, classification and positioning into a neural network by adopting a YOLO network, directly extracting the candidate frame from each preprocessed frame image, and predicting a traffic flow index by utilizing image characteristics;

then, according to the traffic flow count obtained in the previous stage, a fixed period length is set, the video frequency is divided into a plurality of statistical periods, and the number of vehicles passing through the road in the time length of each period is countedFinally, the maximum value in all the statistical results is taken as the maximum traffic flow V of the current road_max；

Virtually setting two coils in a video stream, recording the initial time and the end time of each vehicle, calculating the time difference between the two, and obtaining the vehicle speed by taking the quotient of the displacement and the time difference; the lane speed difference indicator is then calculated using the average difference in vehicle speed for adjacent lanes.

Further, a YOLO network is adopted to integrate the extraction of candidate frames, classification and positioning into a neural network, the candidate frames are directly extracted from each preprocessed frame image, and the specific process of predicting the traffic flow index by utilizing the image characteristics is as follows:

firstly, candidate frames are extracted, each preprocessed frame image is divided into L multiplied by N unit cells, each unit cell predicts S candidate frames with different specifications, then, targets in the candidate frames are detected, and the confidence coefficient of the target object in each candidate frame is predicted, wherein the specific formula is as follows:

wherein Pr (Object) represents the likelihood of the presence of an object in the current candidate box;

pr (object) ═ 1 indicates that there is an object in the candidate box, and conversely 0;

the ratio of the predicted result to the real value is expressed, and the specific formula is defined as follows:

and then, detecting and positioning the vehicle according to a confidence coefficient formula of the vehicle contained in the candidate frame, wherein the specific formula is defined as follows:

labeling candidate boxes of ConL >0, wherein the boxes comprise predicted target car;

and repeatedly counting the number of vehicles in each frame of image to obtain the traffic flow index.

Further, two coils are virtually arranged in the video stream, the initial time and the ending time of each vehicle are recorded, the time difference between the two is calculated, and the vehicle speed can be obtained by taking the quotient of the displacement and the time difference; then, the lane speed difference index is calculated by using the vehicle speed average difference of the adjacent lanes in the following specific process:

firstly, arranging two virtual coils on each lane in a video image stream after gray processing, wherein the distance between the two virtual coils is required to be L;

calculating the average gray value of two virtual coils on each lane, and recording the average gray value of the coils in the driving direction of the vehicle as V_inThe average value of the coil gray levels in the vehicle-out direction is denoted as V_outInitializing the grey value V of the coil without the presence of a vehicle_in,V_out；

Detecting the sudden change of pixels in the virtual coil area, and recording the average gray value in the first coil as V_firstWhen V is_firstAnd V_inWhen the absolute value difference exceeds a set threshold, the number of frames Frame at that time is recorded_i(ii) a The mean gray value in the second coil is denoted as V_secWhen V is_secAnd V_outWhen the absolute value difference exceeds the threshold, the Frame number Frame at that time is recorded_jAnd the frame number of the video is 30, the time t taken for the vehicle to pass through the coil is shown as the following formula:

in the formula, Gap is the video frame number;

and calculating the speed of the vehicle according to the distance L between the two virtual coils and the time t taken by the vehicle to pass through the coils

The formula is as follows:

in the formula, C_lNumbering lanes;

repeatedly performing statistics on the average speed of each vehicle, and calculating the average speed of each lane vehicle

Calculating lane speed difference

Being adjacent lanes.

Further, according to a key indicator, i.e. average vehicle speed

Vehicle flow V_maxSpeed difference of lane

Maximum number of vehicles N for adjacent lanes_maxAnd constructing a city highway network congestion detection classification model:

in the formula, K₀,K₁,K₂,K₃,K₄In order to be the weight coefficient,

and CI represents a congestion index, CI represents road smoothness when being 0-0.5, CI represents light congestion when being 0.5-0.7, CI represents medium congestion when being 0.7-0.9, CI represents heavy congestion when being 0.9-1, and finally the road traffic condition is output according to the index evaluation result.

Compared with the prior art, the invention has the following advantages:

the method is based on the spatial correlation characteristics of urban highway network data, and takes the problems of large traffic video stream data volume, high real-time requirement and the like into consideration through the stages of real-time road condition video data acquisition, road condition video data preprocessing, target detection based on a deep neural network, key index extraction and congestion detection classification, and adopts a distributed architecture design to sink the stages of real-time road condition video data acquisition and data preprocessing to an intelligent terminal so as to relieve the data processing pressure of server nodes;

the scheme of the invention has strong practicability, safety and reliability, is mainly applied to the aspects of traffic planning, road management, driving safety, road maintenance and the like, is beneficial to relevant departments of traffic management to reasonably carry out traffic scheduling and timely shunt and remove obstacles for vehicles in a road network.

Drawings

FIG. 1 is a schematic flow diagram of the present invention;

FIG. 2 is a video frame image in an embodiment of the invention;

FIG. 3 is an image after duplex filtering processing according to an embodiment of the present invention;

FIG. 4 is a gray scale processed image in an embodiment of the invention;

FIG. 5 is an image after target recognition is completed in an embodiment of the present invention;

fig. 6 is a diagram of a virtual coil arrangement in an embodiment of the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be described clearly and completely with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only some embodiments of the present invention, and not all embodiments. All other embodiments, which can be obtained by a person skilled in the art without making any creative effort based on the embodiments in the present invention, belong to the protection scope of the present invention.

As shown in fig. 1, the present invention provides a technical solution: a method for detecting a blockage point facing an urban highway network comprises the following steps:

s1, acquiring video data of real-time road conditions through an intelligent terminal, wherein the intelligent terminal is a camera provided with an efficient array infrared lamp and an AI image processing chip, and separates the video data into video image data according to a specified frame number, and carries out formatting association, and establishes a highway network real-time video image information table which comprises intelligent terminal number information, frame start time information, frame number information, shooting position information and frame image information;

TABLE 1 real-time video image information table for highway network

Intelligent terminal numbering

Frame start time

Frame numbering

Shooting position

Frame image

in S2, denoising the road video image by bilateral filtering in a nonlinear filtering manner is specifically as follows:

where (i, j) represents the coordinates of the other windows of the template;

(k, l) coordinates representing a template center window;

f (i, j) represents the coordinate value of the template pixel;

f (k, l) represents the coordinate value of the template center pixel;

σ_drepresents the standard deviation of the spatial domain;

σ_rrepresents a standard deviation of a spatial domain;

then carrying out gray level processing on the image, converting the gray level value of the pixel in the image into a range of [0,255], and specifically carrying out gray level normalization processing by using the following formula:

p(i,j)＝R(i,j)×0.587+G(i,j)×0.299+B(i,j)×0.114

r (i, j) represents the original image red channel pixel value;

g (i, j) represents the green channel pixel value of the original image;

b (i, j) represents the original image blue channel pixel value;

T(x,y)＝m(x,y)+k×s(x,y)

wherein T (x, y) is the threshold value of the point;

k is a correction coefficient;

m (x, y) is the mean of the pixels in the R × R neighborhood;

the specific treatment process is as follows: integrating the extraction of candidate frames, classification and positioning into a neural network by adopting a YOLO network, directly extracting the candidate frames from each preprocessed frame image, and predicting a traffic flow index by utilizing image characteristics;

the method is characterized in that a YOLO network is adopted to integrate the extraction of candidate frames, classification and positioning into a neural network, the candidate frames are directly extracted from each preprocessed frame image, and the specific process of predicting the traffic flow index by utilizing the image characteristics is as follows:

repeatedly counting the number of vehicles in each frame of image to obtain a traffic flow index;

then according to the traffic flow count obtained in the last stage, a fixed period length is set, the video is divided into a plurality of statistical periods, the number of vehicles passing through the road in the time length of each period is counted, and finally the maximum value in all statistical results is taken as the maximum traffic flow V of the current road_max；

Virtually setting two coils in a video stream, recording the initial time and the end time of each vehicle, calculating the time difference between the two, and obtaining the vehicle speed by quoting the displacement and the time difference; then, calculating a lane speed difference index by using the vehicle speed average difference of adjacent lanes;

virtually setting two coils in a video stream, recording the initial time and the end time of each vehicle, calculating the time difference between the two, and obtaining the vehicle speed by taking the quotient of the displacement and the time difference; then, the lane speed difference index is calculated by using the vehicle speed average difference of the adjacent lanes in the following specific process:

Detecting pixel sudden change in the virtual coil area, and recording the average gray value in the first coil as V_firstWhen V is_firstAnd V_inWhen the absolute value difference exceeds a set threshold, the number of frames Frame at that time is recorded_i(ii) a The mean gray value in the second coil is denoted as V_secWhen V is_secAnd V_outWhen the absolute value difference exceeds the threshold, the Frame number Frame at that time is recorded_jAnd the frame number of the video is 30, the time t taken for the vehicle to pass through the coil is shown as the following formula:

in the formula, Gap is the video frame number;

The formula is as follows:

in the formula, C_lNumbering lanes;

Calculating lane speed difference

Being adjacent lanes.

S4, establishing a city expressway network congestion detection classification model by using the extracted key index information, outputting an identification result,

in particular according to a key indicator, i.e. average vehicle speed

Vehicle flow V_maxSpeed difference of lane

in the formula, K₀,K₁,K₂,K₃,K₄In order to be the weight coefficient,

In experimental example 1, video data of real-time road conditions is acquired through an intelligent terminal, a video is partitioned according to seconds, the video data is separated into video image data according to a specified frame number Gap, the Gap is set to be 30, namely, the video is divided into 30 frames of images per second, and a highway network real-time video image information table is established by performing formatting association, and is shown as the following table:

real-time video image information table of highway network

The image information is shown in fig. 2;

because the acquired video image contains various noises, the texture information is maximally reserved and the data volume is reduced at the same time through image preprocessing technologies such as filtering processing, graying processing, image enhancement processing and the like, the processing speed is increased, and a foundation is laid for the extraction of subsequent feature points; extracting key indexes of the video image information preprocessed in the last step by adopting a target detection algorithm based on a deep neural network, and extracting key index information such as vehicle speed, vehicle flow, lane speed difference, maximum vehicle carrying number and the like in the video image;

aiming at the extracted key index information, establishing a city expressway network congestion detection classification model and outputting an identification result;

because the collected video image contains various noises, filtering processing is carried out on the collected video data, noise removal processing is carried out on the road video image by adopting a nonlinear filtering mode and bilateral filtering, and the image after gray processing is shown in fig. 3.

Because the experimental example is influenced by different illumination conditions and different acquisition times, only indexes such as the number of vehicles and the speed are acquired, the acquisition of related indexes does not relate to the color composition of the image and can be realized by utilizing a gray image, so that the image is subjected to gray processing, the gray value of a pixel in the image is converted into the range of [0,255], and the gray normalization processing can be performed by using the following formula:

p(i,j)＝R(i,j)×0.587+G(i,j)×0.299+B(i,j)×0.114

the basis given by the weight in the formula is that human eyes are most sensitive to green and least sensitive to blue; p (i, j) represents the gray value of the vein image pixel after gray normalization, and the numerical range is [0,255 ]; r (i, j) represents the original image red channel pixel value; g (i, j) represents the green channel pixel value of the original image; b (i, j) represents the original image blue channel pixel value; the image after the gradation processing is shown in fig. 4.

In order to extract more accurate road traffic index features, enhancement processing needs to be performed on the image after the gray level processing. The video image is enhanced by adopting a local histogram equalization algorithm with limited contrast.

In order to find the characteristic region more conveniently and extract the interested region, an image segmentation technology is adopted to distinguish the enhanced image from the texture characteristic region and the background region. In the experimental example, an effective local threshold segmentation method, such as Niblack, is adopted to segment the image, and the method mainly classifies image pixels into a plurality of classes by setting different characteristic thresholds.

And in the R multiplied by R neighborhood, carrying out binarization by calculating the mean value and the variance of pixel points and then calculating a threshold value. The specific calculation formula is as follows:

T(x,y)＝m(x,y)+k×s(x,y)

wherein T (x, y) is the threshold of the point, k is the correction coefficient, when the value of k is continuously increased, the noise almost completely disappears, m (x, y) is the pixel mean value in the R × R neighborhood, and s (x, y) is the standard deviation of the pixels in the R × R neighborhood, as represented by the following formula:

the bilateral filtering has two filtering kernels, namely a spatial domain kernel and a value domain kernel, and can maintain the original form of the target edge information unchanged while removing noise; we define the weight template by the euclidean distance of the coordinates:

wherein the left side of the formula is a space domain kernel, the right side is a value domain kernel,

where (i, j) represents the coordinates of the other windows of the template, (k, l) represents the coordinates of the window in the center of the template, and f (i, j) represents the templateCoordinate values of pixels, f (k, l) represents coordinate values of a center pixel of the template, σ_dDenotes the standard deviation, σ, of the spatial domain_rRepresenting the standard deviation of the spatial domain.

And then carrying out convolution operation on the weights and the pixels, wherein the finally output pixel values are as follows:

the method comprises the steps of adopting a local histogram equalization algorithm with limited contrast to enhance a video image, wherein a relevant area is determined by integrating the size W of a window according to any point in the image;

calculating the histogram in the rectangular window, wherein the specific calculation formula is as follows:

wherein h is_w(r) represents a normalized histogram within a window; h is_B(r) represents the normalized histogram outside the window, all

Wherein A is_w,A_BRepresent the area of regions w and B, respectively, if

h_LAnd (r) is h (r), the local histogram is equal to the global histogram.

If it is not

Indicating that the local histogram emphasizes local information.

To formula

The middle histogram is equalized and the processing equal to the central pixel of the window is realized;

moving the rectangular window, and repeating the steps until the whole image is processed;

and integrating candidate frame extraction, classification and positioning into a neural network by adopting a deep YOLO network, directly extracting the candidate frame from each preprocessed frame image, and predicting the traffic flow index by utilizing the image characteristics.

Setting a fixed period length according to the traffic flow count obtained in the previous stage, dividing the video into a plurality of statistical periods, counting the number of vehicles passing through the road in each period duration, and finally taking the maximum value in all statistical results as the maximum traffic flow V of the current road_maxAccording to the result data, we obtain the maximum traffic flow V_max117, the length of each vehicle is generally considered to be 5 meters according to the relationship between the size and the length of the road surface, so the maximum load bearing number is V_max＝135。

Virtually setting two coils in a video stream, recording the initial time and the end time of each vehicle, calculating the time difference between the two, and obtaining the vehicle speed by taking the quotient of the displacement and the time difference; then, calculating a lane speed difference index by using the vehicle speed average difference of adjacent lanes;

firstly, extracting candidate frames, dividing each preprocessed frame image into L multiplied by N unit cells, and predicting S candidate frames with different specifications by each unit cell;

then, starting to detect the targets in the candidate boxes, and predicting the confidence level of the target object in each candidate box, wherein the specific formula is as follows:

wherein pr (object) indicates the possibility of the object in the current candidate box, and pr (object) 1 indicates the object in the candidate box, otherwise 0;

and detecting and positioning the vehicle according to a confidence coefficient formula of the vehicle contained in the candidate frame, wherein the specific formula is defined as follows:

labeling candidate boxes of ConL >0, wherein the boxes contain the predicted target car;

the steps are executed to count the number of vehicles in each frame of image to obtain a traffic flow index, and the final counting result is marked on the image, as shown in fig. 5;

firstly, two virtual coils are arranged on each lane in a video image stream after gray processing, the converted actual distance between the two virtual coils is required to be 0.3km, three groups of coils are respectively arranged on three lanes on the left side, and the arrangement is shown in fig. 6;

calculating the average gray value of two virtual coils on each lane, and recording the average gray value of the coils in the driving direction of the vehicle as V_inThe average value of the coil gray levels in the vehicle-out direction is denoted as V_outInitializing the grey value V of the coil without the presence of a vehicle_in,V_outLet us set up V_in＝55,V_out＝118；

Detecting pixel sudden change in the virtual coil area, and recording the average gray value in the first coil as V_firstWhen V is_firstAnd V_inWhen the absolute value difference exceeds the set threshold value of 20, the Frame number Frame at that time is recorded_i(ii) a The mean gray value in the second coil is denoted as V_secWhen V is_secAnd V_outWhen the absolute value difference exceeds the threshold, the Frame number Frame at that time is recorded_jAnd the frame number of the video is 30, the time t taken by the vehicle to pass through the coil is shown as the formula:

wherein Gap is the video frame number;

calculating the speed of the vehicle according to the distance L between the two virtual coils and the time t taken by the vehicle to pass through the coils

The formula is as follows:

wherein C is_lNumbering lanes;

the steps are executed, the average speed of each vehicle is counted, and the average speed of each lane vehicle is calculated

Calculating lane speed difference

Is an adjacent lane;

average speed of the characteristic index output according to the steps

Vehicle flow V_max117 lane speed difference

Maximum number of vehicles N for adjacent lanes_maxConstructing an urban highway network congestion detection classification model as follows:

training to obtain CI which is 0.958;

congestion evaluation index

CI	Road traffic situation
		0-0.5	Road is unobstructed
0.5-0.7	Light congestion
		0.7-0.9	Moderate congestion
0.9-1	Severe congestion

And according to the congestion indexes, the road traffic condition is known to be heavily congested.

It is noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus.

Although embodiments of the present invention have been shown and described, it will be appreciated by those skilled in the art that various changes, modifications, substitutions and alterations can be made in these embodiments without departing from the principles and spirit of the invention, the scope of which is defined in the appended claims and their equivalents.

Claims

1. A method for detecting a blockage point facing an urban highway network is characterized by comprising the following steps: the method comprises the following steps:

s3, extracting key indexes of the processed video image data information by adopting a target detection algorithm based on a deep neural network, and extracting key index information of the vehicle speed, the vehicle flow, the lane speed difference and the maximum vehicle carrying number in the video image;

2. The urban highway network-oriented traffic jam detection method according to claim 1, wherein in S1, the highway network real-time video image information table comprises intelligent terminal number information, frame start time information, frame number information, shooting position information and frame image information.

3. The method for detecting the urban highway network-oriented blockage points as claimed in claim 1, wherein in S2, a nonlinear filtering method is adopted to perform bilateral filtering to perform denoising processing on road video image data, then the image is subjected to graying processing, the gray value of pixels in the image is converted into the range of [0,255], and specifically the following formula is used to perform graying normalization processing:

p(i,j)＝R(i,j)×0.587+G(i,j)×0.299+B(i,j)×0.114

r (i, j) represents the original image red channel pixel value;

g (i, j) represents the green channel pixel value of the original image;

b (i, j) represents the original image blue channel pixel value;

T(x,y)＝m(x,y)+k×s(x,y)

wherein T (x, y) is the threshold value of the point;

k is a correction coefficient;

m (x, y) is the mean of the pixels in the R × R neighborhood;

4. the urban highway network-oriented blockage point detection method according to claim 3, wherein the denoising processing of the road video image by adopting a nonlinear filtering mode bilateral filtering is specifically as follows:

where (i, j) represents the coordinates of the other windows of the template;

(k, l) coordinates representing a template center window;

f (i, j) represents the coordinate value of the template pixel;

f (k, l) represents the coordinate value of the template center pixel;

σ_drepresents the standard deviation of the spatial domain;

σ_rrepresents the standard deviation of the spatial domain;

5. the method for detecting the urban highway network-oriented blockage points according to claim 4, wherein the specific processing procedure in S3 is as follows:

then according to the traffic flow count obtained in the previous stage, setting a fixed period length, dividing the video into a plurality of statistical periods, counting the number of vehicles passing through the road in each period duration, and finally taking the maximum value in all statistical results as the maximum traffic flow V of the current road_max；

6. The urban highway network-oriented traffic jam detection method according to claim 5, wherein a YOLO network is adopted to integrate the extraction of candidate frames, classification and positioning into a neural network, the candidate frames are directly extracted from each preprocessed frame image, and the specific process of predicting traffic flow indexes by using image features is as follows:

firstly, candidate frames are extracted, each preprocessed frame image is divided into L multiplied by N unit cells, each unit cell predicts S candidate frames with different specifications, then, the detection of targets in the candidate frames is started, and the confidence coefficient of target objects in each candidate frame is predicted, and the specific formula is as follows:

7. The urban highway network-oriented traffic jam detection method according to claim 5, wherein two coils are virtually arranged in a video stream, the initial time and the end time of each vehicle are recorded, the time difference between the initial time and the end time is calculated, and the vehicle speed can be obtained by quoting the displacement and the time difference; then, the lane speed difference index is calculated by using the vehicle speed average difference of the adjacent lanes in the following specific process:

Detecting pixel sudden change in the virtual coil area, and recording the average gray value in the first coil as V_firstWhen V is_firstAnd V_inWhen the absolute value difference exceeds the set threshold, the Frame number Frame at that time is recorded_i(ii) a The mean gray value in the second coil is denoted as V_secWhen V is_secAnd V_outWhen the absolute value difference exceeds the threshold, the Frame number Frame at that time is recorded_jAnd the frame number of the video is 30, the time t taken for the vehicle to pass through the coil is shown as the following formula:

in the formula, Gap is the video frame number;

and then the vehicle spends time passing through the coil according to the distance L between the two virtual coilsCalculating the speed of the vehicle

The formula is as follows:

in the formula, C_lNumbering lanes;

Calculating lane speed difference

Being adjacent lanes.

8. The method as claimed in claim 7, wherein the average speed is determined according to a key index

Vehicle flow V_maxSpeed difference of lane

in the formula, K₀,K₁,K₂,K₃,K₄In order to be the weight coefficient,