CN114758319A

CN114758319A - Near-field vehicle jamming behavior prediction method based on image input

Info

Publication number: CN114758319A
Application number: CN202210289381.4A
Authority: CN
Inventors: 陈广; 边疆; 瞿三清; 钟志华
Original assignee: Tongji University
Current assignee: Tongji University
Priority date: 2022-03-22
Filing date: 2022-03-22
Publication date: 2022-07-15

Abstract

The invention relates to a near-field vehicle jamming behavior prediction method based on image input, which comprises the following steps: (1) acquiring image sequence information based on a forward panoramic image in a real structured road scene, and marking the position and behavior information of a vehicle target in the image sequence by a manual method; (2) constructing a near-field vehicle detection and tracking model suitable for detecting and tracking near-field vehicles in a structured road; (3) constructing a lane line detection network suitable for detecting lane lines in the structured road and a corresponding loss function; (4) and (3) obtaining the relative position deviation of the target and the lane line based on the vehicle ID and the boundary frame position data of the corresponding target obtained by the near-field vehicle detection and tracking model established in the step (2) and the lane line obtained by the lane line detection network established in the step (3), and obtaining the congestion behavior prediction result of the near-field vehicle according to the formulation of the prior rule. Compared with the prior art, the method has the advantages of high prediction accuracy and high efficiency.

Description

Near-field vehicle jamming behavior prediction method based on image input

Technical Field

The invention relates to the technical field of intelligent driving, in particular to a near-field vehicle jamming behavior prediction method based on image input.

Background

Behavior prediction belongs to further development based on behavior recognition, and is one of basic tasks in the field of computer vision, in recent years, with the fire development of deep learning technology, algorithms for making prior rules and end-to-end prediction technology based on a deep neural network also exist in behavior prediction algorithms. The behavior recognition and prediction method is developed from an initial method based on physical motion characteristics to a Slowfast network based on visual video input, a dual-modality input action recognition network TSN and a 3D convolutional neural network based on expansion three-dimensional convolution (I3D), and a plurality of good algorithm technologies emerge, and the algorithms are excellent in detection effect and performance on an open human behavior recognition data set, but for a near-field vehicle jamming behavior prediction task, the following defects exist in practical application:

firstly, the existing public data set lacks an own Vehicle (Ego Vehicle) perspective data set for predicting near-field Vehicle jamming behavior, and compared with a target detection data set, the data acquisition and labeling have many difficulties, so that the further development of the technology is limited;

secondly, in the dual-mode input technology, the optical flow belongs to the hand-made characteristic, and is respectively trained with RGB input, so that end-to-end training cannot be realized, the accuracy of the system needs to be improved, and the real-time performance of the system operation is reduced by a complex algorithm;

thirdly, the hardware cost and the use and maintenance cost of the LiDAR-based method are high, and meanwhile, an own Vehicle (Ego Vehicle) view angle behavior prediction method taking forward panoramic image video as input is lacked at present.

Fourthly, the end-to-end method based on deep learning has the problems of poor generalization, unclear specific principle of a training model, high requirement on hardware of a mobile end and the like, and is difficult to quickly land in an actual scene.

Disclosure of Invention

The invention aims to overcome the defects in the prior art and provide a near-field Vehicle jamming behavior prediction method based on image input, which can take forward panoramic image video data provided by a Vehicle-mounted high-definition camera within a period of time as input, obtain perception and tracking of a forward target Vehicle region of interest under the angle of view of a self Vehicle (Ego Vehicle) by using a target detection tracking algorithm based on image input, perform behavior prediction on the region of interest based on an algorithm for making a priori rule, greatly reduce the cost of software and hardware for actual deployment on the premise of ensuring the reasoning speed, finally obtain more accurate prediction on jamming behavior of an adjacent Vehicle, provide sufficient time for avoiding risks of an intelligent driving system, and improve the overall safety of the intelligent driving system.

The purpose of the invention can be realized by the following technical scheme:

a near-field vehicle jamming behavior prediction method based on image input comprises the following steps:

(1) acquiring image sequence information based on a forward panoramic image in a real structured road scene, and marking the position and behavior information of a vehicle target in the image sequence by a manual method;

(2) constructing a near-field vehicle detection and tracking model suitable for detecting and tracking near-field vehicles in a structured road;

(3) constructing a lane line detection network suitable for detecting lane lines in the structured road and a corresponding loss function;

(4) and (3) obtaining the relative position deviation of the target and the lane line based on the vehicle ID and the boundary frame position data of the corresponding target obtained by the near-field vehicle detection and tracking model established in the step (2) and the lane line obtained by the lane line detection network established in the step (3), and obtaining the congestion behavior prediction result of the near-field vehicle according to the formulation of the prior rule.

Preferably, step (1) specifically comprises:

(11) calibrating internal and external parameters of the camera, wherein the external parameters comprise a rotation matrix R and a translational vector T, and the internal parameters comprise an internal parameter matrix K and a camera distortion coefficient;

(12) acquiring video data in a real road scene by using a data acquisition vehicle provided with a camera, and recording the category of a vehicle target in an image during acquisition;

(13) and marking the collected video data by using a marking tool, wherein the marking mode comprises vehicle target tracking ID marking, vehicle target category marking, target object boundary frame marking, vehicle jamming starting, vehicle crossing lane line midpoint marking, key frame marking of vehicle jamming completing behaviors and vehicle jamming behavior category marking, and the marking content at least needs to comprise the position of a near-field vehicle, a key frame and jamming behavior category information.

Preferably, the step (2) specifically comprises:

(21) constructing an improved Yolov 5-based near-field vehicle target detection network, inputting an input video slice serving as an image time sequence into the near-field vehicle target detection network, and performing feature extraction and feature coding on input image information through multilayer convolution and downsampling operation to obtain a multi-dimensional feature tensor obtained by dividing a picture;

(22) constructing a classification network, and finally obtaining the position information and classification confidence information of each target by adopting non-maximum inhibition operation, wherein the position information and the classification confidence information comprise the classification probability and the positioning probability of the object;

(23) the method comprises the steps of constructing an improved Deep-SORT-based near-field vehicle target tracking network, taking target object boundary box information and classification information obtained by target detection as input, simultaneously positioning and tracking a plurality of objects in a video, recording ID and track information, particularly reducing the transformation of object IDs under the condition of shielding, and outputting the tracking ID, the target category and the target object boundary box information of a target vehicle.

Preferably, step (3) specifically comprises:

(31) constructing a lane line characteristic extraction backbone network based on a convolutional neural network, connecting network output characteristics based on shallow residual errors, and improving the reasoning speed of the model while ensuring the detection effect by using a larger receptive field;

(32) constructing a lane line semantic segmentation network, sampling multi-scale features to the same scale during network training, calculating semantic segmentation loss through transposition convolution, enhancing visual feature extraction capability of a backbone network, and finally obtaining an enhanced lane line detection backbone network based on residual connection;

(33) calculating candidate points through a classifier in a global scope according to the characteristics extracted by the backbone network and a longitudinal candidate anchor frame of a picture specified by a priori, and finally obtaining lane line position nodes of a lane where the self vehicle is located;

(34) and constructing a loss function of the lane line detection network, wherein the loss function comprises multi-classification loss, segmentation loss and lane structuring loss.

Preferably, the loss function of the lane line detection network is represented as L_total：

L_total＝L_cls+L_seg+ηL_lane

L_clsIs a multi-classification loss, L_segFor a segmentation loss, L_laneFor lane structuring loss, η is the hyperparameter.

Preferably, said multi-classification loss L_clsExpressed as:

wherein L is_CE(. represents a cross-entropy loss function, P_i，j，：Represents all (w +1) lane line unit prediction results for the ith lane line and the jth transverse anchor frame, T_i，j，：Represents the real distribution of all (w +1) lane line units for the ith lane line and the jth transverse anchor frame, c_i，j，：Represents P_i，j，：And T_i，j，：The similarity of the two-dimensional data is represented by C and h, which respectively represent the number of lane lines and the number of longitudinal anchor points of the lane, and gamma and alpha are hyper-parameters.

Preferably, the lane structuring loss L is_laneExpressed as:

L_lane＝L_sim+λ_Lshp

wherein L is_simFor loss of similarity, L_shpFor shape loss, λ is a hyper-parameter representing the weight of the loss, P_i,j,kAnd (5) representing the prediction probability of the ith type lane line at the position of (j, k), wherein w is the number of the division units of each row.

Preferably, the step (4) of training the network and making the prior rule comprises the following steps:

(41) carrying out data preprocessing on the acquired image sequence, wherein the data preprocessing comprises the following steps: randomly horizontally turning, cutting and uniformly scaling the image to a fixed size, correspondingly turning, cutting and scaling the labeled data, and normalizing the obtained image according to a channel on the basis;

(42) obtaining a lane line position node of a lane where the self vehicle is located by detecting the lane line, and obtaining a lane line fitting model of the lane where the self vehicle is located by adopting a high robustness regression model;

(43) establishing a congestion behavior interest region according to a lane line model of a lane where the self vehicle is located, calculating the position deviation of the target vehicle boundary frame information and the congestion behavior interest region, and establishing an expected number of congestion behavior of each target and a vehicle state symbol dictionary according to the target vehicle tracking ID;

(44) and judging the specific behavior of the target vehicle by setting the expected times threshold of the jamming behavior and combining the vehicle state symbol, updating the parameters of the established interesting region of the jamming behavior, and iterating to obtain ideal network parameters.

Preferably, the lane line fitting model of the lane where the self-vehicle is located is a linear model, and the left lane and the right lane are respectively obtained by fitting according to the predicted position nodes of the lane lines.

Preferably, the established expected times of the jamming behavior of each target and the key value in the vehicle state symbol dictionary are target tracking IDs, and the values are the expected times and the vehicle state symbols.

Compared with the prior art, the invention has the following advantages:

(1) the invention provides a brand-new network structure aiming at the problem that the prediction of the vehicle jamming behavior by the existing neural network is poor, the performance of predicting the near-field vehicle real-time behavior is greatly improved on the premise of limited calculation capacity of a mobile terminal, and the network structure can be conveniently deployed in the existing intelligent driving system, so that the intelligent driving vehicle can timely react to the near-field vehicle jamming behavior, and the safety in the driving process is improved;

(2) firstly, extracting information of video clips of lane changing behaviors and boundary frames (Bounding boxes) of targets, screening out a data set which accords with the adjacent vehicle congestion behavior defined in the foregoing, and finally establishing a data set of an adjacent object typical congestion behavior database containing labels and video data;

(3) according to the method, a forward visual angle image is obtained through a large visual field and high resolution, the appearance characteristics of targets and the dependency relationship among the targets are included, and a near vehicle detection and tracking model and a jamming behavior prediction algorithm are developed. The target detection module is deeply improved based on the latest One-Stage target detection algorithm Yolov5 at present, and has higher detection speed on the basis of keeping certain detection accuracy;

(4) in the invention, after the bounding box and the category information of the target are output, a Deep-SORT multi-target tracking algorithm is adopted to obtain an interesting frame sequence corresponding to each target ID, and in consideration of the higher requirement of the traditional bimodal input network on the computing resources of the system, in order to ensure the real-time performance of the algorithm, the optical flow is not adopted as the extraction of time domain characteristics, but the target sequence is adopted as the input of space-time characteristics;

(5) according to the invention, a blocking behavior prediction module provides an interpretable near object blocking behavior prediction method with high robustness in real time under a vehicle driving scene based on an algorithm for formulating a prior rule and the target sequence output obtained in the last step from the characteristic information of the blocking behavior.

Drawings

FIG. 1 is a schematic flow chart of a near-field vehicle jam behavior prediction method based on image input according to the present invention;

FIG. 2 is a schematic diagram of a near field vehicle object detection network according to the present invention;

FIG. 3 is a first part of an algorithm flow for predicting a congestion behavior based on a priori rules;

FIG. 4 is a second part of the algorithm flow for predicting a congestion behavior based on a priori rules;

fig. 5 is a third part of the algorithm flow for predicting the congestion behavior according to the prior rule.

Detailed Description

The invention is described in detail below with reference to the figures and specific embodiments. Note that the following description of the embodiments is merely a substantial example, and the present invention is not intended to be limited to the application or the use thereof, and is not limited to the following embodiments.

Examples

The behavior prediction method mainly comprises the steps of collecting image sequence information based on a forward panoramic image in a real structured road scene, and marking the position and behavior information of a vehicle target in the image sequence by a manual method; constructing a deep convolution neural network suitable for detecting and tracking near-field vehicles in a structured road; constructing a deep neural network suitable for detecting lane lines in a structured road and a corresponding loss function; and obtaining the relative position deviation of the target and the lane line from the vehicle ID obtained by the established near-field vehicle detection and tracking model and the position data of the boundary frame corresponding to the target and the lane line model obtained by the established lane line detection network, and obtaining the result of predicting the jamming behavior of the near-field vehicle according to the formulation of a prior rule. The invention provides a brand-new congestion behavior prediction algorithm aiming at the problem that the prediction of the vehicle congestion behavior by the existing neural network is poor, greatly improves the performance of predicting the near-field vehicle real-time behavior on the premise of limited calculation capacity of a mobile terminal, can be conveniently deployed in the existing intelligent driving system, enables the intelligent driving vehicle to timely react to the near-field vehicle congestion behavior, and improves the safety in the driving process.

As shown in fig. 1, the present embodiment provides a near-field vehicle jamming behavior prediction method based on image input, including:

Specifically, the step (1) specifically comprises:

(13) the method comprises the steps of utilizing a marking tool to mark collected video data, wherein the marking mode comprises vehicle target tracking ID marking, vehicle target category marking, target object boundary frame marking, vehicle jamming starting, vehicle crossing lane line midpoint and vehicle jamming behavior completion key frame marking, and vehicle jamming behavior category marking, marking content at least needs to comprise the position of a near-field vehicle, a key frame and jamming behavior category information, and the vehicle target ID required to be recorded in the process is unique.

The step (2) specifically comprises the following steps:

(21) constructing a near-field vehicle target detection network based on improved Yolov5, inputting an input video slice into the near-field vehicle target detection network as an image time sequence, and performing multi-layer convolution and down-sampling operation on input image information to perform feature extraction and feature coding to obtain a multi-dimensional feature tensor obtained by dividing a picture, wherein the network structure of the part of the near-field vehicle target detection network is shown in figure 2 and comprises a backhaul structure, an FPN (flat panel network), a PAN (personal area network) and other structures;

(23) the method comprises the steps of constructing an improved Deep-SORT-based near-field vehicle target tracking network, taking target object boundary box information and classification information obtained by target detection as input, simultaneously positioning and tracking a plurality of objects in a video, recording ID and track information, particularly reducing the transformation of object ID under the condition of shielding, and outputting the tracking ID, the target category and the target object boundary box information of a target vehicle, wherein a ReiD module in the Deep-SORT network is trained by a new vehicle re-recognition data set Comcars after reclassification processing.

The step (3) specifically comprises the following steps:

(31) constructing a lane line feature extraction backbone network based on a convolutional neural network, improving the reasoning speed of a model while ensuring the detection effect by using a larger receptive field based on the output feature of a shallow residual connection network, wherein the shallow residual connection convolutional neural network outputs four scales which are respectively 112 multiplied by 3, 56 multiplied by 64, 28 multiplied by 128 and 14 multiplied by 256:

(33) calculating candidate points through a classifier in a global scope according to the characteristics extracted by the backbone network and a picture longitudinal candidate anchor frame appointed by a priori, and finally obtaining a lane line position node of a lane where a self vehicle is located, wherein the value of the number C of lane lines is 4, and the value of the number h of the longitudinal candidate anchor frames is 18;

(34) constructing a loss function of the lane line detection network, wherein the loss function comprises multi-classification loss, segmentation loss and lane structuring loss, and the loss function of the lane line detection network is expressed as L_total：

L_total＝L_cls+L_seg+ηL_lane

Wherein, the multi-classification loss L_clsAnd a segmentation loss L_segAll adopt cross entropy loss function, partition loss L_segA multi-classification penalty of class number 2 is employed.

In particular, the multi-classification loss L_clsExpressed as:

wherein L is_CE(. represents a cross-entropy loss function, P_i，j，：Represents all (w +1) lane line unit prediction results for the ith lane line and the jth transverse anchor frame, T_i，j，：Represents the true distribution of all (w +1) lane line units for the ith lane line, the jth transverse anchor frame, c_i，j，：Represents P_i，j，：And T_i，j，：Similarity of (C) to (h)The category represents the number of lane lines and the number of longitudinal anchor points of the lane, and gamma and alpha are hyper-parameters.

Structural loss of lane L_laneExpressed as:

L_lane＝L_sim+λL_shp

wherein L is_simFor loss of similarity, L_shpFor shape loss, λ is a hyper-parameter representing the weight of the loss, P_i，j，kAnd (5) representing the prediction probability of the ith type lane line at the position (j, k), wherein w is the number of lane line units in each row.

In an example, η has a value of 1, γ has a value of 2, α has a value of 0.25, and λ has a value of 1.25.

Step (4), training the network and making prior rules as follows:

(41) carrying out data preprocessing on the acquired image sequence, wherein the data preprocessing comprises the following steps: randomly and horizontally turning, cutting and uniformly zooming the image to a fixed size, correspondingly turning, cutting and zooming the labeled data, and normalizing the obtained image according to a channel on the basis;

(42) obtaining lane line position nodes of a lane where an own vehicle is located by detecting lane lines, obtaining a lane line fitting model of the lane where the own vehicle is located by adopting a high robustness regression model, wherein the lane line fitting model of the lane where the own vehicle is located is a linear model, and the left lane and the right lane are respectively obtained by fitting according to the lane line predicted position nodes;

(43) establishing a congestion behavior interest region according to a lane line model of a lane where a vehicle is located, calculating the position deviation of the target vehicle boundary frame information and the congestion behavior interest region, establishing an expected congestion behavior frequency and a vehicle state symbol dictionary of each target according to a target vehicle tracking ID, and taking the key values in the expected congestion behavior frequency and the vehicle state symbol dictionary of each target as target tracking IDs and taking the values as expected frequency and vehicle state symbols;

(44) and (4) judging the specific behavior of the target vehicle by setting the expected times threshold of the jamming behavior and combining the vehicle state symbol, updating the parameters of the established jamming behavior interesting region, and iterating to obtain ideal network parameters.

The specific flow of the algorithm for obtaining the prediction result of the congestion behavior of the near-field vehicle according to the formulation of the prior rule is shown in fig. 3, 4 and 5. First, after each approaching vehicle completes detection and tracking, a target state descriptor is constructed with a target tracking ID as a key and an initial state [0, 'keep' ] as a value, where 0 is a count value and 'keep' is a history state descriptor. Judging whether an initial lane of an adjacent vehicle is a left lane or a right lane according to different areas where adjacent vehicle detection Bounding boxes (Bounding boxes) are located; meanwhile, the width of the two sides of the lane line is properly increased according to the lane line model obtained by the lane line detection module, and a vehicle congestion prediction region of interest (Cut-In RoI) is set. According to the relative position relationship between the adjacent vehicle target and the vehicle jamming prediction region of interest (Cut-inRoI), two position states can be divided: firstly, when an adjacent vehicle target crosses the edge of the area; when an adjacent vehicle object is within the zone. The latter is simpler, at which point the target state descriptor is set to [2 ' (' follow ' ]). The former state occurs at two times of cut-in and cut-out of the adjacent vehicle, and under the condition, the states can be distinguished according to the historical state descriptors { 'keep', 'cut _ in', 'follow'.

If the target state descriptor is [ count < Threshold, 'keep' ], the target nearby vehicle is considered to be in a lane keeping state at the previous moment and has a potential trend of developing from lane keeping to congestion behavior, and meanwhile, in order to reduce the false detection rate of the early warning logic of the congestion behavior of the nearby vehicle, the prediction time of the congestion early warning can be delayed appropriately, and only the target which is tracked for a long enough time for congestion is considered. Specifically, only points farther from the congestion prediction interest region (Cut-In RoI) are located outside the region near the vehicle detection Bounding Box (Bounding Box), and when the number of times that a Bounding Box key point (a distance from the other end point is a certain ratio α) within the congestion prediction interest region (Cut-In RoI) is detected exceeds a Threshold value, the target state descriptor [ count +1, ' keep ' ] is kept updated until count is Threshold, and the target state descriptor is updated [ count, ' Threshold _ In ]. In this item, α is set to 0.4, and the detected number of times of jamming Threshold is set to 3. At this time, the congestion behavior of the vehicle approaching the current frame is changed from action ═ keep 'to action ═ Cut _ In', and the approaching vehicle detection Bounding Box (Bounding Box) and the congestion prediction interest area (Cut-In RoI) become red warning.

If the current frame target state descriptor is [ count _ Threshold, 'Cut _ In' ], the target can be considered to continue the congestion behavior of the previous frame, so the target state descriptor is kept as [ count _ Threshold, 'Cut _ In' ] unchanged, the congestion behavior of the vehicle adjacent to the current frame is kept red from action 'Cut _ In', an adjacent vehicle detection boundary Box (Bounding Box) and a congestion prediction interest area (Cut-In RoI), and the current frame is prompted that the congestion behavior of the adjacent vehicle is occurring.

If the current frame target state descriptor is [ count Threshold, 'Cut _ In' ], but the approaching vehicle detection Bounding Box (Bounding Box) completely enters the congestion prediction interest region (Cut-In RoI) at this time, the target can be considered to have completed the congestion behavior, so the target state descriptor is updated to [ count Threshold, 'follow' ], and the congestion behavior of the current frame approaching vehicle changes from action 'keep', the approaching vehicle detection Bounding Box (Bounding Box) and the congestion prediction interest region (Cut-In RoI) to green, which indicates that the congestion behavior of the approaching vehicle has been completed and the current frame is In a safe state.

If the current frame target state descriptor is pcount Threshold, 'below' ], and the neighboring vehicle detection boundary Box (Bounding Box) is located across the edge of the congestion prediction interest region (Cut-In RoI), the target may be considered to have completed the congestion behavior and is leaving the current lane, so the target state descriptor is updated to [ count Threshold, 'below' ], the congestion behavior of the current frame neighboring vehicle is from action 'keep', the neighboring vehicle detection boundary Box (Bounding Box) and the congestion prediction interest region (Cut-In RoI) continue to keep green, and the current frame is In a safe state.

In any case, if the approaching vehicle detection Bounding Box (Bounding Box) completely leaves the congestion prediction interest area (Cut-In RoI) range, the target state descriptor is immediately updated to [ count 0, 'keep' ], and the above steps are continuously repeated at the next frame. Using the above algorithm flow, there is an extra 3-frame delay for the case of a jam, which is equivalent to 0.1s due to the FPS of the input stream being 30; under real conditions, when a human driver encounters an unforeseen stuck cut, the driver usually needs at least 0.7s to brake. Therefore, even if there is a time delay from the detection of the sending of the early warning of the jamming, no serious problem is expected to be caused, and the practical technical requirements are met.

Tests show that the network provided by the invention respectively and successfully predicts the plugging behavior of the adjacent vehicle and gives early warning, and meanwhile, the early warning removal after the plugging is finished can be realized, and the network can be successfully distinguished under the similar two scenes.

In a word, the invention provides a near-field vehicle jam behavior prediction method based on image input, which greatly improves the performance of near-field vehicle real-time behavior prediction on the premise of limited calculation capacity of a mobile terminal, can be conveniently deployed in the existing intelligent driving system, enables an intelligent driving automobile to timely react to the near-field vehicle jam behavior, and improves the safety in the driving process.

The above embodiments are merely examples and do not limit the scope of the present invention. These embodiments may be implemented in other various manners, and various omissions, substitutions, and changes may be made without departing from the scope of the technical idea of the present invention.

Claims

1. A near-field vehicle jamming behavior prediction method based on image input is characterized by comprising the following steps:

2. The image input-based near-field vehicle jamming behavior prediction method according to claim 1, wherein the step (1) specifically comprises:

(12) acquiring video data in a real road scene by using a data acquisition vehicle with a camera, and recording the category of a vehicle target in an image during acquisition;

3. The image input-based near-field vehicle jamming behavior prediction method according to claim 1, wherein the step (2) specifically comprises:

(21) constructing a near-field vehicle target detection network based on improved Yolov5, inputting an input video slice into the near-field vehicle target detection network as an image time sequence, and performing feature extraction and feature coding on input image information through multilayer convolution and downsampling operation to obtain a multi-dimensional feature tensor obtained by dividing a picture;

4. The image input-based near-field vehicle jamming behavior prediction method according to claim 1, wherein the step (3) specifically comprises:

(33) calculating candidate points through a classifier in a global scope according to the features extracted by the backbone network and a longitudinal candidate anchor frame of a picture specified by prior, and finally obtaining lane line position nodes of a lane where a self vehicle is located;

5. According toThe near-field vehicle congestion behavior prediction method based on image input of claim 4, wherein the loss function of the lane line detection network is represented as L_total：

L_total＝L_cls+L_seg+ηL_lane

6. The image input-based near-field vehicle blocking behavior prediction method according to claim 5, wherein the multi-classification loss L is_clsExpressed as:

wherein L is_CE(. represents a cross-entropy loss function, P_i，j，：Represents all (w +1) lane line unit prediction results for the ith lane line and the jth transverse anchor frame, T_i，j，：Represents the true distribution of all (w +1) lane line units for the ith lane line, the jth transverse anchor frame, c_i，j，：Represents P_i，j，：And T_i，j，：The similarity of the two-dimensional data is represented by C and h, which respectively represent the number of lane lines and the number of longitudinal anchor points of the lane, and gamma and alpha are hyper-parameters.

7. The image input-based near-field vehicle congestion behavior prediction method as claimed in claim 5, wherein the lane structuring loss L is_laneExpressed as:

L_lane＝L_sim+λL_shp

wherein L is_simFor loss of similarity, L_shpFor shape loss, λ is a hyper-parameter representing the weight of the loss, P_i，j，kAnd (5) representing the prediction probability of the ith type lane line at the position of (j, k), wherein w is the number of the division units of each row.

8. The image input-based near-field vehicle congestion behavior prediction method of claim 1, wherein the step (4) of training the network and making the prior rule comprises the following steps:

9. The near-field vehicle jam behavior prediction method based on image input according to claim 8, characterized in that a lane line fitting model of a lane where a self vehicle is located is a linear model, and left and right lanes are respectively obtained through fitting according to lane line prediction position nodes.

10. The near-field vehicle jamming behavior prediction method based on image input according to claim 8, characterized in that the established expected times of jamming behavior of each target and a key value in a vehicle state symbol dictionary are target tracking IDs, and the values are the expected times and the vehicle state symbols.