CN114119896A

CN114119896A - Driving path planning method

Info

Publication number: CN114119896A
Application number: CN202210092210.2A
Authority: CN
Inventors: 崔志强; 曹广喜; 单慧琳; 王兴涛; 孙佳琪; 张银胜
Original assignee: Nanjing University of Information Science and Technology
Current assignee: Nanjing University of Information Science and Technology
Priority date: 2022-01-26
Filing date: 2022-01-26
Publication date: 2022-03-01
Anticipated expiration: 2042-01-26
Also published as: CN114119896B

Abstract

The invention discloses a driving path planning method, which comprises the steps of collecting video stream data of roads in front of and behind a vehicle by using a monocular camera, and performing frame extraction; constructing an improved depth residual error network for depth estimation operation; performing iterative matching integration on the front road depth map and the rear road depth map to form a 3D cloud map; scene semantic segmentation is carried out simultaneously in the depth estimation operation process; calculating the distances between other vehicles, obstacles, marks and road lines shot by the front and rear road frame pictures and the vehicles respectively; tracking other vehicles in the frame image, calculating the minimum distance between the vehicle closest to the vehicle and the vehicle in each lane, and estimating the running speed of the vehicle; judging the driving angle of the vehicle according to the road line, judging whether an emergency occurs according to the identification and the road surface information, adjusting the driving route in time and enabling the vehicle to drive according to the plan; and carrying out path planning and vehicle control on the vehicle from the position of the current frame to the position of the vehicle right in front of the current frame.

Description

Driving path planning method

Technical Field

The invention relates to a driving path planning method, and belongs to the technical field of driving path planning in the vehicle driving process.

Background

In recent years, the deep learning technology has been rapidly developed, and has achieved significant effects in many application fields, and many scholars have actively studied monocular depth estimation using various codec architectures, wherein the monocular image depth estimation has been a key task in real scenes. For example, the horizontal boundary or location of the vanishing points can be effectively estimated from the statistics of the depth information, which is very useful for fast understanding of a given scene. These clues have significant advantages in explaining the three-dimensional geometric layout, so that deducing depth information is now a key technology in the field of automatic driving, the research content of the clues is rich, and a great deal of effort is made by many researchers at home and abroad to solve the problem of monocular depth estimation.

With the continuous development of the field of computer images, the three-dimensional modeling technology based on computer vision obtains huge achievements, and the real-time interactive virtual three-dimensional scene is greatly popularized and applied. In the field of automatic driving of automobiles, the conventional technology mainly comprises the steps of acquiring traffic lights and traffic identification information through a camera, completing a three-dimensional reconstruction task of road scenes in real time through a series of technologies such as radar ranging, laser ranging and ultrasonic positioning, and judging by a vehicle machine system to realize the automatic driving function of a vehicle. The three-dimensional reconstruction is realized based on vision and ranging sensors, and the stereoscopic vision, which is the most important distance sensing technology in the passive ranging method, can directly simulate the mode of processing images by human vision and can flexibly show the stereoscopic information of the images under various conditions. The function of which cannot be replaced by other visual methods. The research on the method is of great significance from the visual physiology perspective and the engineering application perspective.

In 2019, Guoshuang proposes a visual mileage calculation method based on monocular depth estimation in a paper, namely a visual mileage calculation method based on monocular depth estimation, in which a relative scale in a road image is calculated through sparse CNN (convolutional neural network) operation, but the effect of expansion capability on an untrained strange environment or motion is limited. In the paper, "monocular depth estimation combining attention and unsupervised depth learning", by censored Shijie et al in 2020, a monocular depth estimation method combining attention and unsupervised depth learning is proposed. An improved competitive self-supervision algorithm is proposed in a thesis of an enhanced learning end-to-end automatic driving decision method combining images and monocular depth characteristics by Luxiao et al in 2021, projection loss is reduced, a shielding effect can be reduced, motion interference can be effectively reduced, but an image edge effect is fuzzy, and data display is not smooth. In the same year, the Shenjiafeng combines an automatic supervision monocular depth estimation technology and a semantic segmentation technology in a thesis of intelligent driving depth estimation and semantic segmentation multi-task prediction research facing complex scenes to complete the condition analysis of roads, but has poor recognition effect on objects such as pedestrians, trunks, identifications and the like at the near end, and is relatively close to the near end. The four methods use different methods to estimate the depth of the road, but are affected by many factors, and have certain limitations. In the patent "method for detecting and marking obstacle for automatic driving, equipment and storage medium", the singing et al in 2021 proposed a method for detecting obstacle for automatic driving, which obtains an environmental image collected by a camera and a laser point cloud collected by a laser radar; and constructing a pseudo point cloud corresponding to the environment image by performing three-dimensional reconstruction on the environment image, and inputting the target point cloud into an obstacle detector to obtain an obstacle detection result output by the obstacle detector. However, this system is expensive and does not plan the driving path.

Disclosure of Invention

The technical problem to be solved by the invention is as follows: the method comprises the steps of utilizing a depth residual error network to carry out depth estimation and three-dimensional image reconstruction, obtaining real-time information of a road, including information such as distribution conditions and distances of vehicles and pedestrians on the road, feeding back the information, and planning a driving path in real time. The speed of information processing is improved, the driving safety is enhanced, and the problems that in the prior art, hardware is high in price, the amount of information to be processed is large, and the realization is difficult are solved.

The invention adopts the following technical scheme for solving the technical problems:

a driving path planning method comprises the following steps:

step 1, respectively arranging a monocular camera at the position of a vehicle driving recorder and the middle position of a rear windshield, collecting front and rear road video stream data of a vehicle by using the monocular camera, and performing frame extraction processing on the front and rear road video stream data;

step 2, constructing an improved depth residual error network, and performing depth estimation operation on the frame image extracted in the step 1 by using the improved depth residual error network to obtain a depth map;

step 3, iterative matching integration is carried out on the depth maps of the front road and the rear road corresponding to the same frame extracting time by using an improved depth residual error network to form a 3D cloud map with the vehicle as the center;

step 4, based on the improved depth residual error network, performing scene semantic segmentation simultaneously in the depth estimation operation process to segment other vehicles, obstacles, pedestrians, identifiers and road lines in the road;

step 5, according to the 3D cloud picture, calculating the distances between other vehicles, obstacles, marks and road lines shot by the front road frame picture and the vehicles respectively, and calculating the distances between other vehicles, obstacles, marks and road lines shot by the rear road frame picture and the vehicles respectively;

step 6, for the extracted frame images, tracking other vehicles in the frame images by using a Deep-SORT multi-target tracking algorithm, calculating the minimum distance between the vehicle closest to the vehicle and the vehicle in each lane, and estimating the running speed of the vehicle by using the vehicles in front of and behind the vehicle;

step 7, judging the driving angle of the vehicle according to the road line, judging whether an emergency occurs according to the identification and the road information, adjusting the driving line in time and enabling the vehicle to drive according to the plan;

and 8, planning a path and controlling the vehicle from the position of the current frame to the position of the vehicle right in front of the current frame according to the steps.

In a preferred embodiment of the present invention, in step 1, the interval between the frame extraction of the front and rear road video stream data is set as the interval between the frame extraction

And the same is the sum of the operation time of the depth estimation in the step 2 and the iterative matching integration time in the step 3.

As a preferred embodiment of the present invention, the specific process of step 2 is as follows:

step 2.1, constructing an improved depth residual error network by using a Laplacian pyramid structure, defining a depth residual error by using pyramid decomposition in a decoder by using the Laplacian pyramid structure, taking the frame image extracted in the step 1 as the input of the improved depth residual error network, and operating the input image by 5 levels, wherein each level is reduced to 1/2 of the previous level, and the 1 st level is the frame image extracted in the step 1; for each levelnSubtracting the next level up-sampling from the level imageThe image obtains the preliminary difference contour feature of the levelLn，n=1, …, 4; for the frame picture of 1/16 size at 5 th level, one part of the frame picture is respectively subjected to semantic segmentation and ASPP convolution after passing through a pooling layer, segmentation lines obtained by the semantic segmentation are compared with contour lines obtained by the ASPP convolution, lines closer to the entity contour of an object are selected and assigned to the ASPP convolution result to obtain fuzzy depth contour information R5, and the up-sampling convolution result, the R5 up-sampling result and the R5 up-sampling result of the other part of the frame picture are assigned to the ASPP convolution resultL4, performing residual linking to obtain a fifth residual linking result; dividing the fifth residual linking result into two parts, wherein one part is respectively subjected to semantic segmentation and ASPP convolution, the segmentation line obtained by the semantic segmentation is compared with the contour line obtained by the ASPP convolution, the line closer to the entity contour of the object is selected and assigned to the ASPP convolution result, and then the ASPP convolution result and the line closer to the entity contour of the object are assignedL4 the superposition verification obtains R4, and the up-sampling convolution result of the other part, the up-sampling result of R4 andL3, performing residual linking to obtain a fourth residual linking result; dividing the fourth residual linking result into two parts, wherein one part is respectively subjected to semantic segmentation and ASPP convolution, the segmentation line obtained by the semantic segmentation is compared with the contour line obtained by the ASPP convolution, the line closer to the object entity contour is selected and assigned to the ASPP convolution result, and then the ASPP convolution result is assigned with the line closer to the object entity contourL3 the superposition verification obtains R3, and the up-sampling convolution result of the other part, the up-sampling result of R3 andL2, performing residual linking to obtain a third residual linking result; dividing the third residual linking result into two parts, wherein one part is respectively subjected to semantic segmentation and ASPP convolution, the segmentation line obtained by the semantic segmentation is compared with the contour line obtained by the ASPP convolution, the line closer to the entity contour of the object is selected and assigned to the ASPP convolution result, and then the ASPP convolution result and the contour line are assignedL2 the superposition verification obtains R2, and the up-sampling convolution result of the other part, the up-sampling result of R2 andL1, performing residual error linkage to obtain a second residual error linkage result; dividing the second residual error link result into two parts, wherein one part is respectively subjected to semantic segmentation and ASPP convolution, the segmentation line obtained by the semantic segmentation is compared with the contour line obtained by the ASPP convolution, the line closer to the entity contour of the object is selected and assigned to the ASPP convolution result, and then the ASPP convolution result and the contour line are assigned to the line closer to the entity contour of the objectL1, performing superposition verification to obtain R1; the specific formula is as follows:

wherein the content of the first and second substances,

is as follows

The division lines obtained by the hierarchical semantic division,

is as follows

The preliminary difference profile characteristics obtained by the hierarchy level,

for the convolution operation in the semantic segmentation,

is as follows

The optimal convolution result obtained by the hierarchy is obtained,

is as follows

The contour lines resulting from the hierarchical ASPP convolution,

is as follows

The result is upsampled by a hierarchical bilinear interpolation method,

in order to interpolate the value of the upsampling,

are the pixels of the image to be displayed,

representing overlay verification;

performing superposition verification on the R5 upsampling convolution result and R4 to obtain the 4 th-level depth information

Will be

Performing superposition verification on the up-sampling convolution result and R3 to obtain 3 rd level depth information

Will be

Performing superposition verification on the up-sampling convolution result and R2 to obtain 2 nd-level depth information

Will be

Performing superposition verification on the up-sampling convolution result and R1 to obtain 1 st level depth information

I.e. a depth map;

step 2.2, a pooling layer is arranged behind each convolution layer of the ASPP convolution, and all pooling layers are Attention based pooling layers; activation function selection for the deep residual networkSoftplusActivating a function;

step 2.3, calculating a conversion relation between the monocular camera and the actual distance to obtain a depth map, wherein a calculation formula is as follows:

wherein the content of the first and second substances,

depth map internal pixel points obtained for depth estimation

The coordinates of the position of the object to be imaged,

is the coordinate in the world coordinate system,

all are the internal parameters of the camera,

is an external reference of the camera,

as coordinates of a pixel coordinate system

The bias of (a) is such that,

as coordinates of a pixel coordinate system

Is used to control the bias of (1).

As a preferred embodiment of the present invention, the specific process of step 3 is as follows:

for frame pictures of front and rear roads corresponding to the same frame extraction time, in the process of obtaining a depth map by depth estimation, the frame pictures of the front roads are obtained

Obtained from the rear road frame picture

The splicing is carried out, and the splicing,

namely R5, obtained by taking the front road frame picture

Obtained from the rear road frame picture

Splicing the images of the front road frames

Obtained from the rear road frame picture

Splicing the images of the front road frames

Obtained from the rear road frame picture

Splicing the images of the front road frames

Obtained from the rear road frame picture

And splicing, and finally integrating all splicing results to form a 3D reconstructed road information summary 3D cloud picture taking the vehicle as the center.

As a preferred embodiment of the present invention, the specific process of step 4 is as follows:

performing scene semantic segmentation simultaneously in the depth estimation operation process, dividing a fifth residual linking result into two parts, wherein one part is subjected to semantic segmentation and ASPP convolution respectively, comparing segmentation lines obtained by the semantic segmentation with contour lines obtained by the ASPP convolution, selecting lines closer to the entity contour of an object to assign values to the semantic segmentation results, and then assigning values to the semantic segmentation resultsL4 obtaining R by superposition verification^*4; dividing the fourth residual link result into two parts, wherein one part is respectively subjected to semantic segmentation and ASPP convolution, the segmentation line obtained by the semantic segmentation is compared with the contour line obtained by the ASPP convolution, the line closer to the contour of the object entity is selected and assigned to the semantic segmentation result, and then the semantic segmentation result is assigned to the contour line of the object entityL3 obtaining R by superposition verification^*3; dividing the third residual linking result into two parts, wherein one part is respectively subjected to semantic segmentation and ASPP convolution, the segmentation line obtained by the semantic segmentation is compared with the contour line obtained by the ASPP convolution, the line closer to the object entity contour is selected and assigned to the semantic segmentation result, and then the semantic segmentation result is assigned to the contour line of the object entityL2 obtaining R by superposition verification^*2; dividing the second residual error link result into two parts, wherein one part is respectively subjected to semantic segmentation and ASPP convolution, the segmentation line obtained by the semantic segmentation is compared with the contour line obtained by the ASPP convolution, the line closer to the object entity contour is selected and assigned to the semantic segmentation result, and then the semantic segmentation result is assigned to the contour line of the object entityL1 overlay verification to get R^*1, thereby obtaining final road division information.

As a preferred embodiment of the present invention, the distance formula in step 5 is as follows:

wherein the content of the first and second substances,

for pixels in world coordinate systemPThe coordinate information of (a) is obtained,

all are the internal parameters of the camera,

is an external reference of the camera,

as coordinates of a pixel coordinate system

The bias of (a) is such that,

depth map internal pixel points obtained for depth estimation

And (4) coordinates.

As a preferred embodiment of the present invention, the specific process of step 6 is as follows:

6.1, for the extracted frame images, tracking other vehicles in the frame images by using a Deep-SORT multi-target tracking algorithm, and calculating the minimum distance between the vehicle closest to the vehicle and the vehicle in each lane, wherein the minimum distance formula comprises the following steps:

wherein the content of the first and second substances,

the minimum distance between the vehicle closest to the self-vehicle in a lane ahead and the self-vehicle,

the minimum distance between the vehicle closest to the self-vehicle in a certain lane at the rear and the self-vehicle,

the coordinates of the vehicle closest to the vehicle in a lane ahead under a coordinate system with the monocular camera in the front as the center of a circle,

the coordinates of the vehicle under a coordinate system taking the front monocular camera as the center of a circle,

the coordinates of the vehicle closest to the vehicle in the lane at the back under a coordinate system with the monocular camera at the back as the center of a circle,

the coordinates of the vehicle under a coordinate system taking a rear monocular camera as a circle center;

6.2, for the previous frame image and the current frame image, calculating the speed of the vehicle in front of the vehicle and the vehicle in back of the vehicle in the interval time of the two frames of images according to the minimum distance:

wherein the content of the first and second substances,

for vehicles directly behind at intervals

The speed of the internal air flow is controlled,

at intervals for vehicles in front

The speed of the internal air flow is controlled,

is the initial speed of the vehicle and,

respectively the minimum distance between the vehicle right behind and the vehicle in the previous frame image and the current image,

respectively the minimum distance between the vehicle right in front and the vehicle in front in the previous frame image and the current image;

6.3, estimating the driving time required by the vehicle to reach the position of the vehicle right ahead in the current framet：

6.4, calculating the speed required by the vehicle to reach the position of the vehicle right in front

：

Wherein the content of the first and second substances,

the distance traveled by the vehicle is the distance between the vehicle right ahead in the current frame and the vehicle.

As a preferred embodiment of the present invention, the specific process of step 7 is as follows:

7.1, tracking the road route and combining GPS or Beidou navigation to judge whether the vehicle runs in a direction inclined to the front road, namely calculating the angle of the vehicle deviating from the road route

：

Wherein the content of the first and second substances,

the distance between the vehicle and the right road in the driving direction in the previous image,

the distance between the vehicle and the right lane line in the driving direction in the current frame image,

the distance traveled by the vehicle;

taking the distance between the closest one of all the routes on the left side of the self vehicle and the closest one of all the routes on the right side of the self vehicle as the width of the road on which the self vehicle can run;

7.2, timely adjusting the vehicle according to the obtained road width and the deviation angle, specifically as follows: if deviating from the angle

Is composed of

Slowly aligning within the range until the deviation angle returns to 0 and exceeds

Or will deviate outThe vehicle is prompted to decelerate and return to the normal state after the road width or the line is pressed until the deviation angle is restored to 0 or the condition of observing the left lane and the right lane enters the lane changing operation to adjust the vehicle pose;

7.3, judge whether sign and pedestrian exist in the road and the road edge, select to dodge or the mode of traveling according to the condition, specifically as follows: if the pedestrian in the road is identified to be on the road or the road is prepared on the roadside, decelerating to park and avoiding the pedestrian; if the identification is detected, corresponding operation is carried out according to the meaning indicated by the identification; if an obstacle is detected, the lane change or turn back is performed by observing the conditions of the left and right lanes.

As a preferred embodiment of the present invention, the specific process of step 8 is as follows:

the speed required by the vehicle to reach the position of the vehicle right ahead from the position of the current frame is calculated according to the step 6

And the speed and distance of other vehicles in the road, and whether pedestrians, marks and obstacles exist in the road, reasonably planning a road route until the position of the vehicle in front is reached:

a. if the distance between the vehicle right in front and the vehicle and the distance between the vehicle right behind and the vehicle are not changed, the vehicle keeps running at a constant speed;

b. such as at speed

When the distance between the vehicle right ahead and the vehicle is not 0, gradually reducing the speed of the vehicle until the vehicle stops, and whistling to remind the vehicle ahead;

c. such as the speed

If the distance between the vehicle in front and the vehicle is 0 and the distance between the vehicle in front and the vehicle is increased, slowly accelerating until the distance between the vehicle in front and the vehicle reaches the set safe distance;

d. such as the speed

The distance between the vehicle in front and the vehicle in front is continuously reduced and is smaller than the safe distance, the distance between the vehicle in front and the vehicle in rear exceeds the set safe distance, the vehicle is slowly driven backwards, namely backs up until the distance between the vehicle in front and the vehicle in rear reaches the set safe distance, and stops, and a whistle is given to remind the vehicle in front;

e. such as the speed

And if the distance between the vehicle in front and the vehicle in front is 0, the distance between the vehicle in front and the vehicle in front is continuously reduced and is smaller than the set safe vehicle distance, and the distance between the vehicle in front and the vehicle in rear and the vehicle in front and the vehicle in rear is smaller than the set safe vehicle distance, other lanes with enough vehicle distances are searched and road planning is carried out.

Compared with the prior art, the invention adopting the technical scheme has the following technical effects:

1. according to the invention, the Laplacian pyramid structure is introduced into the decoder, and the coding characteristics are input into different video streams to perform decoding depth residual error operation so as to obtain the information of the contour size, distance and the like of objects around the vehicle in the driving process, thereby improving the accuracy of distance estimation, speed estimation and the like.

2. The monocular camera is adopted for depth estimation, compared with a binocular camera, the monocular camera is higher in timeliness, can reflect road information in a shorter time, and is convenient to install and lower in cost.

3. The method uses simple characteristic base lines to calculate the segmentation contour in the process of object identification and segmentation, and is quicker and stronger in performance.

Drawings

FIG. 1 is an overall architecture diagram of a driving path planning method of the present invention;

FIG. 2 is a diagram of a depth residual network architecture used by the present invention;

FIG. 3(a) is a network architecture diagram of a semantic segmentation implementation;

FIG. 3(b) is an extracted feature point and feedback network structure;

FIG. 4 is an effect diagram of implementing semantic segmentation;

FIG. 5 is a diagram of the effect of using monocular depth estimation in the present invention;

fig. 6 is a graph comparing the effects of depth estimation and other methods used by the present invention.

Detailed Description

Reference will now be made in detail to embodiments of the present invention, examples of which are illustrated in the accompanying drawings. The embodiments described below with reference to the accompanying drawings are illustrative only for the purpose of explaining the present invention, and are not to be construed as limiting the present invention.

In most of the existing methods for planning driving paths by using depth estimation, a decoding process of repeating simple up-sampling operation is used, and during encoding, the monocular depth estimation cannot be performed by fully utilizing the potential characteristics of good characteristic targets. According to the invention, a Laplacian pyramid structure is introduced into a decoder, and coding characteristics are input into different video streams to perform decoding depth residual error operation so as to obtain information such as the size of the outline of objects around a vehicle, the distance and the like in the driving process, thereby planning a driving path. Fig. 1 is a work flow chart of the present invention, and with reference to fig. 1, the main steps of the present invention are as follows:

step 1, a camera is used for collecting road video stream data information, and frame extraction processing is carried out on the video stream data. The camera uses a monocular camera, the frame extraction is to make the processing become smooth, the path is better calculated by using algorithms such as tracking and the like, and the calculation time is determined according to the path

The operation time involved in taking a frame of image is determined by the depth estimation operation time and the model calculation reconstruction time in the subsequent steps, and the sampling time in the invention

The value is around 10 ms.

Step 2, using the improved depth residual error network to carry out depth estimation operation on the extracted frame picture, wherein the process of carrying out depth estimation comprises the following steps:

(1) an improved depth residual network is constructed using the laplacian pyramid structure, as shown in fig. 2:

the laplacian pyramid structure is used in the decoder, and the depth residual is defined using the decomposition of the pyramid. The output of the convolution network is gradually integrated, and the depth boundary of the target in the road condition can be accurately estimated and the global condition can be known from the small and coarse depth map to the final comprehensive and fine depth map.

The input of the residual error network is a shot color image, the input image is subjected to hierarchical level operation, the 4 levels are images with Conv1, Conv 2, Conv 3 and Conv4 residual error links respectively, and differences of all the levels are extracted (specifically, the levels are directly up-sampled (ascending levels), and then the up-sampled images are subtracted from the original level images), so that preliminary difference contour characteristics L1-L4 of an object are extracted, one part of the original level images are put into an ASPP convolutional layer (such as a convolutional layer on the left side of R5), then the convolution is continued, and the other part of the original level images are up-sampled and convolved. And performing residual error linkage on the two parts and the corresponding difference characteristics, putting the two parts into a convolutional layer for convolution, and verifying to obtain R4. And residual linking results are repeated for four times, so that the depth image with the corresponding format can be obtained.

(2) Activation function modification

The activation function is located in the convolution layer, and the function of the activation function is to change the nonlinear combination and the hidden layer into linearity, so that the convolution can obtain the accurate characteristic value of the object. Activation function selection in the present inventionsoftplusThe activation function is defined as:

selectingsoftplusIs activating the function becausesoftplusFunction is more originalreluThe data after function processing is smoother and also keepsreluUnidirectional inhibition of (3).

(3) Pooling layer modification

In the invention, in order to prevent the overfitting phenomenon caused by less model data, the pooling layer is modified, and on the basis of the retention of the original pooling layer, the pooling layer is added in front of the first-layer convolution layer (at S/16) to simulate the mechanism of human visual attention. All the pooling layers are Attention based pooling layers. The pooling layer is data driven, and different features are selected according to different data to compress the amount of data and parameters, thereby mitigating the overfitting phenomenon. The modification can effectively improve the performance of the model and make the model more stable.

(4) Semantic segmentation and validation

And verifying the effect in the 3D reconstruction and the contour position of depth estimation by using a semantic segmentation algorithm in the convolutional layer, specifically, performing semantic segmentation processing on a data image of each layer, identifying a basic segmentation line of each object, matching the basic segmentation line with the rough contour of the object calculated in the residual error network, taking the optimal result of the two, performing superposition or splicing operation on the result until the next layer and the segmentation result of the data image are subjected to segmentation convolution to obtain the segmentation line, and repeating the steps until the final segmentation result and the depth estimation result of the object are obtained.

Wherein the content of the first and second substances,

is as follows

The division lines obtained by the hierarchical semantic division,

is as follows

for the convolution operation in the semantic segmentation,

is as follows

The optimal convolution result obtained by the hierarchy is obtained,

is as follows

The contour lines resulting from the hierarchical ASPP convolution,

is as follows

The result is upsampled by a hierarchical bilinear interpolation method,

in order to interpolate the value of the upsampling,

are the pixels of the image to be displayed,

representing overlay verification;

the semantic segmentation verification enables the contour analysis to be faster, and the depth estimation and reconstruction speed to be faster; the fuzzy contour boundary of the estimated object in the depth estimation can be reduced; enhancing the effect of image boundary estimation; the amount of information required to be processed in the subsequent step of analyzing the road information is reduced.

(5) Calculating a conversion relationship between a camera and an actual distance

Because of the difference of the cameras, the distance that the depth estimation can reflect is also different, wherein the formula of the actual distance corresponding to the pixels in the photo is as follows:

wherein the content of the first and second substances,

is the pixel coordinate information in the converted depth map,

is the number of distance meters in practice,

is camera internal reference, which is to convert the actual distance (meter) into pixels;

is an external parameter, which is the actual focal distance of the camera;

is a pixel coordinate system coordinate

The bias of (a) is such that,

as coordinates of a pixel coordinate system

Is used to control the bias of (1).

And 3, performing 3D reconstruction on the estimated image, and then forming a complete road condition 3D cloud picture. The method specifically comprises the steps of carrying out iterative matching integration through depth images shot before and after, extracting object feature points obtained by each level in a network in front and after camera processing in 3D reconstruction, matching the features with corresponding objects in the images and matching the features with other features, wherein the iteration times are consistent with the convolution times, selecting the feature points with the best matching and fitting after the convolution is completed, and overlapping, splicing and integrating the feature values with good matching and fitting degrees in an up-sampling process, so that the fuzzy profile is reduced by converted 3D information more completely, and the data is smoother. After the step, the road information summary which is reconstructed in a 3D mode and takes the vehicle as the center can be formed, and the analysis is more convenient.

And 4, analyzing the information of the road, and distinguishing the information of the road route, the identification of pedestrians and the like.

Performing scene semantic segmentation on a road picture, as shown in fig. 3(a) and 3(b), assigning a class label to each final eigenvalue obtained by downsampling, detecting and segmenting a specific example of each object according to the latest pose analysis result, upsampling the first result, splicing with the input of the next volume of lamination, and continuously performing matching and segmentation operation to form a segmentation result. And finally, determining a segmentation line of the object by using the IOU, and performing iterative registration on the segmentation line and the 3D contour extracted in the previous step until the contour is close to a threshold value to complete matching, so as to obtain final road information. The steps can quickly and accurately segment pedestrians, obstacles, marks and road lines in the road, and as shown in fig. 4, subsequent calculation, processing and judgment are facilitated. The purpose of the IOU is to measure the accuracy of detecting the corresponding object in a particular data set to determine the final contour of the object.

And 5, calculating distances between other shot vehicles, roadside obstacles, road marking lines and marks in the 3D cloud picture. And calculating the distance from the front vehicle, the obstacle and the like according to the obtained depth map.

Wherein the content of the first and second substances,

the distance between the vehicle and the photographed vehicle ahead and the distance between the vehicle and the obstacle in the field of view can be obtained according to the coordinate information in the world coordinate system.

Step 6, calculating the distance between the vehicle and the front vehicle and the distance between the vehicle and the rear vehicle by using a tracking algorithm and the self movement speed, recording the distance in the fourth step, tracking the identified target by using a Deep-SORT multi-target tracking algorithm, recording the target position transformation in the image of the previous frame and the image of the current frame, and comparing the target positions in the two images to obtain the speed and the distance of the tracked target:

wherein the content of the first and second substances,

calculated for the minimum distance of the vehicle for all lanes directly in front,

the minimum distance of the vehicles of all lanes directly behind the current vehicle is calculated.

After the minimum distance of the vehicles in all lanes in front of the vehicle and all lanes in back of the vehicle is obtained, the vehicle with the minimum distance between the front and the back of the vehicle is calculated for a period of time

Inner velocity

And

：

wherein the content of the first and second substances,

to take the computation time involved in a frame of image,

for vehicles directly behind at intervals

The speed of the internal air flow is controlled,

at intervals for vehicles in front

The speed of the internal air flow is controlled,

is the initial speed of the vehicle and,

and secondly, estimating according to the obtained minimum distance of the vehicles in all the lanes right in front and all the lanes right behind. Calculating an estimate of the travel time range to the planned target point (i.e. the position of the vehicle directly in front):

or:

finally, the speed required by the running vehicle reaching the planning target point can be obtained according to the obtained distance information of the front vehicle and the rear vehicle and the time range information

：

Wherein the content of the first and second substances,

the distance traveled by the vehicle, i.e. the distance between the vehicle right in front of the current frame and the vehicle,tthe time range is determined according to the distance between the front vehicle and the rear vehicle, and the running vehicle is required to be attAnd reaching the planned place in time, and calculating the speed in real time.

And 7, judging the driving angle of the vehicle according to the road line by analyzing the road environment, judging whether an emergency occurs according to the identification and the road surface information, and adjusting the driving route in time to drive the vehicle according to the plan. According to the method, the road information is analyzed through pictures and fed back to the vehicle, the position and the running pose of the road where the vehicle is located are calculated, the running direction, the posture and the speed of the vehicle can be adjusted in time, and the phenomenon that the vehicle collides or runs to press a road line or rushes to the edge of the road is avoided. The method mainly comprises the following steps:

1. track road route and navigation positioning to judge whether the vehicle is in the running direction of the inclined running road and the running direction of the front road

And tracking the road route and judging whether the vehicle is in the running direction of the running inclination and the running direction of the road ahead by combining GPS or Beidou navigation. Calculating deviation angle for one road route

：

Wherein the content of the first and second substances,arccosis composed ofcosThe inverse function of (a) is,

distance traveled by own vehicle

timely adjusting the vehicle according to the obtained road width and deviation angle, specifically if the deviation angle is not correctθIs composed of

Slowly return to normal within the range of

Or the vehicle is prompted to decelerate to return to the normal state or observe the condition of the left lane and the right lane to enter lane changing operation to adjust the vehicle pose after the vehicle deviates from the road width or presses the lane.

2. Judging whether the road and the road edge have marks or pedestrians, and selecting avoidance or driving modes according to conditions

If the pedestrian in the road is identified to be on the road or the road is prepared on the roadside, decelerating to park and avoiding the pedestrian; if a mark or an obstacle is detected, the lane change or the turn-back is carried out by observing the conditions of the left lane and the right lane and recognizing the mark representative.

And 8, planning a path and controlling the vehicle according to the obtained conditions. The actual road environment such as vehicle speed and distance in the road, whether pedestrians exist in the road, warning marks and the like obtained through the steps are reasonably planned according to navigation until the road reaches the target. Since the acquisition interval of the speed estimation method is 10ms, it means that the speed change can be completed once every 10ms, and the specific speed value is determined according to the conditions of the front and rear vehicles and the side vehicle, specifically, the following possibilities are possible:

a. if the distance between the vehicle right in front and the vehicle right behind is not changed, the vehicle is always kept to run at a constant speed;

b. such as at speed

When the distance from the vehicle to the front is not 0, the speed is gradually reduced until the vehicle stops, and the horn is sounded to remind the vehicle in front;

c. such as the speed

If the distance between the vehicle in front and the vehicle itself is still increased, the vehicle is slowly accelerated until reaching the safe distance;

d. such as the speed

0, the distance between the vehicle right in front and the vehicle itself is continuously reduced and is less than the safe distance, the distance between the vehicle right behind and the vehicle itself is still within the safe distance, and the speed is determined

Slowly driving backwards until the vehicle stops at a safe distance, and whistling to remind the front vehicle;

e. such as the speed

And if the distance between the vehicle in the front and the vehicle in the front is 0 and is continuously reduced and smaller than the safe vehicle distance, and the distance between the vehicle in the front and the vehicle in the rear is smaller than the safe vehicle distance, other lanes with enough vehicle distances are searched and road planning is carried out.

FIG. 5 is a diagram of the effect of using monocular depth estimation in the present invention; FIG. 6 is a graph comparing the effect of depth estimation and other methods used by the present invention, wherein the first row is the input image; the second row is ground real depth information; the third row is the parallax network calculation effect based on the reconstruction loss of the binocular camera; the fourth row is a semi-supervised monocular depth estimation effect based on sparse true depth; the fifth row is the result of monocular depth estimation trained using DCNN; the sixth row is the monocular prediction results using multi-scale planar regression; the last row is the effect graph of the monocular depth estimation based on the improved depth residual network used in the present invention.

The above embodiments are only for illustrating the technical idea of the present invention, and the protection scope of the present invention is not limited thereby, and any modifications made on the basis of the technical scheme according to the technical idea of the present invention fall within the protection scope of the present invention.

Claims

1. A driving path planning method is characterized by comprising the following steps:

2. The driving path planning method according to claim 1, wherein in step 1, the forward and backward road video stream data are decimated at intervals

3. The driving path planning method according to claim 1, wherein the specific process of step 2 is as follows:

step 2.1, constructing an improved depth residual error network by using a Laplacian pyramid structure, defining a depth residual error by using pyramid decomposition in a decoder by using the Laplacian pyramid structure, taking the frame image extracted in the step 1 as the input of the improved depth residual error network, and operating the input image by 5 levels, wherein each level is reduced to 1/2 of the previous level, and the 1 st level is the frame image extracted in the step 1; for each levelnAnd subtracting the image sampled at the next level from the image at the level to obtain the preliminary difference contour feature of the levelLn，n=1, …, 4; for the frame picture of 1/16 size at 5 th level, one part of the frame picture is respectively subjected to semantic segmentation and ASPP convolution after passing through a pooling layer, segmentation lines obtained by the semantic segmentation are compared with contour lines obtained by the ASPP convolution, lines closer to the entity contour of an object are selected and assigned to the ASPP convolution result to obtain fuzzy depth contour information R5, and the up-sampling convolution result, the R5 up-sampling result and the R5 up-sampling result of the other part of the frame picture are assigned to the ASPP convolution resultL4, performing residual linking to obtain a fifth residual linking result; dividing the fifth residual linking result into two parts, wherein one part is respectively subjected to semantic segmentation and ASPP convolution, the segmentation line obtained by the semantic segmentation is compared with the contour line obtained by the ASPP convolution, the line closer to the entity contour of the object is selected and assigned to the ASPP convolution result, and then the ASPP convolution result and the line closer to the entity contour of the object are assignedL4 the superposition verification obtains R4, and the up-sampling convolution result of the other part, the up-sampling result of R4 andL3, performing residual linking to obtain a fourth residual linking result; dividing the fourth residual linking result into two parts, wherein one part is respectively subjected to semantic segmentation and ASPP convolution, the segmentation line obtained by the semantic segmentation is compared with the contour line obtained by the ASPP convolution, the line closer to the object entity contour is selected and assigned to the ASPP convolution result, and then the ASPP convolution result is assigned with the line closer to the object entity contourL3 the superposition verification obtains R3, and the up-sampling convolution result of the other part, the up-sampling result of R3 andL2, performing residual linking to obtain a third residual linking result; dividing the third residual linking result into two parts, wherein one part is respectively subjected to semantic segmentation and ASPP convolution, the segmentation line obtained by the semantic segmentation is compared with the contour line obtained by the ASPP convolution, and the contour line closer to the entity of the object is selectedAssigns the line of (a) to the ASPP convolution result, andL2 the superposition verification obtains R2, and the up-sampling convolution result of the other part, the up-sampling result of R2 andL1, performing residual error linkage to obtain a second residual error linkage result; dividing the second residual error link result into two parts, wherein one part is respectively subjected to semantic segmentation and ASPP convolution, the segmentation line obtained by the semantic segmentation is compared with the contour line obtained by the ASPP convolution, the line closer to the entity contour of the object is selected and assigned to the ASPP convolution result, and then the ASPP convolution result and the contour line are assigned to the line closer to the entity contour of the objectL1, performing superposition verification to obtain R1; the specific formula is as follows:

wherein the content of the first and second substances,

is as follows

The division lines obtained by the hierarchical semantic division,

is as follows

for the convolution operation in the semantic segmentation,

is as follows

The optimal convolution result obtained by the hierarchy is obtained,

is as follows

The contour lines resulting from the hierarchical ASPP convolution,

is as follows

The result is upsampled by a hierarchical bilinear interpolation method,

in order to interpolate the value of the upsampling,

are the pixels of the image to be displayed,

representing overlay verification;

Will be

Will be

Will be

I.e. a depth map;

wherein the content of the first and second substances,

depth map internal pixel points obtained for depth estimation

The coordinates of the position of the object to be imaged,

is the coordinate in the world coordinate system,

all are the internal parameters of the camera,

is an external reference of the camera,

as coordinates of a pixel coordinate system

The bias of (a) is such that,

as coordinates of a pixel coordinate system

Is used to control the bias of (1).

4. The driving path planning method according to claim 3, wherein the specific process of step 3 is as follows:

Obtained from the rear road frame picture

The splicing is carried out, and the splicing,

namely R5, obtained by taking the front road frame picture

Obtained from the rear road frame picture

Splicing the images of the front road frames

Obtained from the rear road frame picture

Splicing the images of the front road frames

Obtained from the rear road frame picture

Splicing the images of the front road frames

Obtained from the rear road frame picture

5. The driving path planning method according to claim 3, wherein the specific process of step 4 is as follows:

performing scene semantic segmentation simultaneously in the depth estimation operation process, dividing a fifth residual linking result into two parts, wherein one part is subjected to semantic segmentation and ASPP convolution respectively, comparing segmentation lines obtained by the semantic segmentation with contour lines obtained by the ASPP convolution, and selecting the contour lines closer to the entity contour of the objectAssigning the lines to the semantic segmentation result, andL4 obtaining R by superposition verification^*4; dividing the fourth residual link result into two parts, wherein one part is respectively subjected to semantic segmentation and ASPP convolution, the segmentation line obtained by the semantic segmentation is compared with the contour line obtained by the ASPP convolution, the line closer to the contour of the object entity is selected and assigned to the semantic segmentation result, and then the semantic segmentation result is assigned to the contour line of the object entityL3 obtaining R by superposition verification^*3; dividing the third residual linking result into two parts, wherein one part is respectively subjected to semantic segmentation and ASPP convolution, the segmentation line obtained by the semantic segmentation is compared with the contour line obtained by the ASPP convolution, the line closer to the object entity contour is selected and assigned to the semantic segmentation result, and then the semantic segmentation result is assigned to the contour line of the object entityL2 obtaining R by superposition verification^*2; dividing the second residual error link result into two parts, wherein one part is respectively subjected to semantic segmentation and ASPP convolution, the segmentation line obtained by the semantic segmentation is compared with the contour line obtained by the ASPP convolution, the line closer to the object entity contour is selected and assigned to the semantic segmentation result, and then the semantic segmentation result is assigned to the contour line of the object entityL1 overlay verification to get R^*1, thereby obtaining final road division information.

6. The driving path planning method according to claim 1, wherein the distance formula in step 5 is as follows:

wherein the content of the first and second substances,

all are the internal parameters of the camera,

is an external reference of the camera,

as coordinates of a pixel coordinate system

The bias of (a) is such that,

depth map internal pixel points obtained for depth estimation

And (4) coordinates.

7. The driving path planning method according to claim 1, wherein the specific process of step 6 is as follows:

wherein the content of the first and second substances,

wherein the content of the first and second substances,

for vehicles directly behind at intervals

The speed of the internal air flow is controlled,

at intervals for vehicles in front

The speed of the internal air flow is controlled,

is the initial speed of the vehicle and,

：

Wherein the content of the first and second substances,

8. The driving path planning method according to claim 1, wherein the specific process of step 7 is as follows:

7.1, track the way and combine GPS or Beidou navigation to judgeWhether the vehicle is travelling in a direction inclined to the road ahead, i.e. calculating the angle at which the vehicle deviates from the road course

：

Wherein the content of the first and second substances,

the distance traveled by the vehicle;

Is composed of

Or the vehicle is prompted to decelerate and return to the normal state after the vehicle is about to deviate from the road width or line pressing until the deviation angle is recoveredWhen the vehicle arrives at 0 or the conditions of the left lane and the right lane are observed, the lane changing operation is carried out to adjust the vehicle pose;

9. The driving path planning method according to claim 1, wherein the specific process of step 8 is as follows:

b. such as at speed

c. such as the speed

d、such as the speed

e. such as the speed