CN108133484B

CN108133484B - Automatic driving processing method and device based on scene segmentation and computing equipment

Info

Publication number: CN108133484B
Application number: CN201711405705.1A
Authority: CN
Inventors: 董健; 韩玉刚; 颜水成
Original assignee: Beijing Qihoo Technology Co Ltd
Current assignee: Beijing Qihoo Technology Co Ltd
Priority date: 2017-12-22
Filing date: 2017-12-22
Publication date: 2022-01-28
Anticipated expiration: 2037-12-22
Also published as: CN108133484A

Abstract

The invention discloses an automatic driving processing method and device based on scene segmentation and computing equipment, wherein the method carries out grouping processing on frame images contained in a shot and/or recorded video, and comprises the following steps: acquiring a current frame image in a video shot and/or recorded by image acquisition equipment in real time in a vehicle driving process; inputting the current frame image into a trained neural network, and performing scene segmentation on the current frame image according to the frame position of the current frame image in the group to which the current frame image belongs to obtain a scene segmentation result of the current frame image; determining a driving route and/or a driving instruction according to a scene segmentation result; and performing automatic driving control on the vehicle according to the determined driving route and/or driving instructions. The invention correspondingly divides the scenes of the frame images according to the different frame positions of the current frame images in the groups to which the current frame images belong. The driving route and/or the driving instruction are accurately determined by using the scene segmentation result, and the safety of automatic driving is improved.

Description

Automatic driving processing method and device based on scene segmentation and computing equipment

Technical Field

The invention relates to the field of image processing, in particular to an automatic driving processing method and device based on scene segmentation and computing equipment.

Background

The image scene segmentation processing is mainly based on a full convolution neural network in deep learning, the processing methods utilize the idea of transfer learning to transfer a network obtained by pre-training on a large-scale classification data set to an image segmentation data set for training so as to obtain a segmentation network for scene segmentation, and then the segmentation network is utilized to perform scene segmentation on an image. Automatic driving based on scene segmentation has higher requirements on timeliness and accuracy of scene segmentation so as to guarantee safety of automatic driving.

In the prior art, when a scene is segmented, each frame image in video data is often taken as an individual frame image to perform scene segmentation, so as to obtain a scene segmentation result of each frame image. However, this processing method performs the same processing for each frame of image, and does not consider the correlation between frames of images in the video data. Making the process slower and more time consuming.

Disclosure of Invention

In view of the above, the present invention is proposed to provide a method and apparatus for automatic driving processing based on scene segmentation, and a computing device, which overcome or at least partially solve the above problems.

According to one aspect of the present invention, an automatic driving processing method based on scene segmentation is provided, which performs grouping processing on frame images included in a captured and/or recorded video, and includes:

acquiring a current frame image in a video shot and/or recorded by image acquisition equipment in real time in a vehicle driving process;

inputting the current frame image into a trained neural network, and performing scene segmentation on the current frame image according to the frame position of the current frame image in the group to which the current frame image belongs to obtain a scene segmentation result of the current frame image;

determining a driving route and/or a driving instruction according to a scene segmentation result;

and performing automatic driving control on the vehicle according to the determined driving route and/or driving instructions.

Optionally, determining the driving route and/or the driving instruction according to the scene segmentation result further comprises:

determining the outline information of a specific object according to a scene segmentation result;

calculating the relative position relation between the vehicle and the specific object according to the contour information of the specific object;

and determining a driving route and/or a driving instruction according to the calculated relative position relation.

Optionally, the relative positional relationship of the own vehicle and the specific object includes distance information and/or angle information between the own vehicle and the specific object.

and determining the driving route and/or driving instructions of the vehicle according to the traffic sign information contained in the scene segmentation result.

and determining a driving route and/or a driving instruction according to the traffic light information contained in the scene segmentation result.

Optionally, inputting the current frame image into the trained neural network, and performing scene segmentation on the current frame image according to the frame position of the current frame image in the group to which the current frame image belongs, to obtain a scene segmentation result of the current frame image, further includes:

judging whether the current frame image is the 1 st frame image of any group;

if so, inputting the current frame image into the trained neural network, and obtaining a scene segmentation result of the current frame image after the operation of all the convolution layers and the deconvolution layers of the neural network;

if not, inputting the current frame image into the trained neural network, after calculating to the ith convolution layer of the neural network to obtain the calculation result of the ith convolution layer, acquiring the calculation result of the jth deconvolution layer obtained by inputting the 1 st frame image of the group to which the current frame image belongs into the neural network, and directly carrying out image fusion on the calculation result of the ith convolution layer and the calculation result of the jth deconvolution layer to obtain the scene segmentation result of the current frame image; wherein i and j are natural numbers.

Optionally, after determining that the current frame image is not the 1 st frame image of any packet, the method further comprises:

calculating the frame distance between the current frame image and the 1 st frame image of the group to which the current frame image belongs;

determining values of i and j according to the frame interval; the layer distance between the ith convolutional layer and the last convolutional layer is in inverse proportion to the frame distance, and the layer distance between the jth deconvolution layer and the output layer is in direct proportion to the frame distance.

Optionally, the method further comprises: and presetting the corresponding relation between the frame interval and the values of i and j.

Optionally, after directly performing image fusion on the operation result of the i-th convolutional layer and the operation result of the j-th deconvolution layer, the method further includes:

if the jth deconvolution layer is the last deconvolution layer of the neural network, inputting the image fusion result into an output layer to obtain a scene segmentation result of the current frame image;

and if the j-th deconvolution layer is not the last deconvolution layer of the neural network, inputting the image fusion result into the j + 1-th deconvolution layer, and obtaining the scene segmentation result of the current frame image through the subsequent operation of the deconvolution layer and the output layer.

Optionally, inputting the current frame image into a trained neural network, and obtaining the scene segmentation result of the current frame image after the operation of all the convolution layers and the deconvolution layers of the neural network further includes: after each convolution layer before the last convolution layer of the neural network is calculated, the calculation result of each convolution layer is subjected to downsampling processing.

Optionally, before the operation on the ith convolutional layer of the neural network obtains the operation result of the ith convolutional layer, the method further includes: after each convolution layer before the ith convolution layer of the neural network is calculated, the calculation result of each convolution layer is subjected to down-sampling processing.

Optionally, each group of video contains n frame images; wherein n is a fixed preset value.

According to another aspect of the present invention, there is provided an automatic driving processing apparatus based on scene segmentation, which performs grouping processing on frame images included in a captured and/or recorded video, including:

the acquisition module is suitable for acquiring a current frame image in a video shot and/or recorded by the image acquisition equipment in real time in the driving process of the vehicle;

the segmentation module is suitable for inputting the current frame image into the trained neural network, and performing scene segmentation on the current frame image according to the frame position of the current frame image in the group to which the current frame image belongs to obtain a scene segmentation result of the current frame image;

the determining module is suitable for determining a driving route and/or a driving instruction according to a scene segmentation result;

and the control module is suitable for carrying out automatic driving control on the vehicle according to the determined driving route and/or driving instruction.

Optionally, the determining module is further adapted to:

determining the outline information of a specific object according to a scene segmentation result; calculating the relative position relation between the vehicle and the specific object according to the contour information of the specific object; and determining a driving route and/or a driving instruction according to the calculated relative position relation.

Optionally, the determining module is further adapted to:

Optionally, the segmentation module further comprises:

the judging unit is suitable for judging whether the current frame image is the 1 st frame image of any group, and if so, the first dividing unit is executed; otherwise, executing a second segmentation unit;

the first segmentation unit is suitable for inputting the current frame image into a trained neural network, and obtaining a scene segmentation result of the current frame image after the operation of all the convolution layers and the deconvolution layers of the neural network;

the second segmentation unit is suitable for inputting the current frame image into the trained neural network, obtaining the operation result of the ith convolution layer after the operation is carried out on the ith convolution layer of the neural network to obtain the operation result of the ith convolution layer, obtaining the operation result of the jth deconvolution layer obtained by inputting the 1 st frame image of the group to which the current frame image belongs into the neural network, and directly carrying out image fusion on the operation result of the ith convolution layer and the operation result of the jth deconvolution layer to obtain the scene segmentation result of the current frame image; wherein i and j are natural numbers.

Optionally, the segmentation module further comprises:

the frame distance calculating unit is suitable for calculating the frame distance between the current frame image and the 1 st frame image of the group to which the current frame image belongs;

the determining unit is suitable for determining values of i and j according to the frame interval; the layer distance between the ith convolutional layer and the last convolutional layer is in inverse proportion to the frame distance, and the layer distance between the jth deconvolution layer and the output layer is in direct proportion to the frame distance.

Optionally, the segmentation module further comprises:

and the presetting unit is suitable for presetting the corresponding relation between the frame interval and the values of i and j.

Optionally, the second segmentation unit is further adapted to:

Optionally, the first segmentation unit is further adapted to:

after each convolution layer before the last convolution layer of the neural network is calculated, the calculation result of each convolution layer is subjected to downsampling processing.

Optionally, the second segmentation unit is further adapted to:

after each convolution layer before the ith convolution layer of the neural network is calculated, the calculation result of each convolution layer is subjected to down-sampling processing.

According to yet another aspect of the present invention, there is provided a computing device comprising: the processor, the memory and the communication interface complete mutual communication through the communication bus;

the memory is used for storing at least one executable instruction, and the executable instruction enables the processor to execute the operation corresponding to the automatic driving processing method based on the scene segmentation.

According to still another aspect of the present invention, there is provided a computer storage medium having at least one executable instruction stored therein, the executable instruction causing a processor to perform operations corresponding to the above-mentioned scene segmentation-based automatic driving processing method.

According to the automatic driving processing method and device based on scene segmentation and the computing equipment, the current frame image in the video shot and/or recorded by the image acquisition equipment in the driving process of the vehicle is acquired in real time; inputting the current frame image into a trained neural network, and performing scene segmentation on the current frame image according to the frame position of the current frame image in the group to which the current frame image belongs to obtain a scene segmentation result of the current frame image; determining a driving route and/or a driving instruction according to a scene segmentation result; and performing automatic driving control on the vehicle according to the determined driving route and/or driving instructions. The invention utilizes the continuity and the relevance among all frame images in the video, when the scene is divided, the video is grouped and processed, the corresponding frame images are divided into the scene according to the different frame positions of the current frame image in the group to which the current frame image belongs, furthermore, the 1 st frame image in each group is subjected to the operation of all the convolution layers and the deconvolution layers by the neural network, other frame images except the 1 st frame image are only operated to the i-th layer convolution layer, and the operation result of the j-th layer deconvolution layer obtained by multiplexing the 1 st frame image is subjected to image fusion, thereby greatly reducing the operation amount of the neural network and improving the speed of the scene division. The driving route and/or the driving instruction are accurately determined by using the scene segmentation result, and the safety of automatic driving is improved.

The foregoing description is only an overview of the technical solutions of the present invention, and the embodiments of the present invention are described below in order to make the technical means of the present invention more clearly understood and to make the above and other objects, features, and advantages of the present invention more clearly understandable.

Drawings

Various other advantages and benefits will become apparent to those of ordinary skill in the art upon reading the following detailed description of the preferred embodiments. The drawings are only for purposes of illustrating the preferred embodiments and are not to be construed as limiting the invention. Also, like reference numerals are used to refer to like parts throughout the drawings. In the drawings:

FIG. 1 shows a flow diagram of a method of automated driving processing based on scene segmentation according to one embodiment of the invention;

FIG. 2 shows a flow diagram of a method of automated driving processing based on scene segmentation according to another embodiment of the invention;

FIG. 3 shows a functional block diagram of an automatic driving processing device based on scene segmentation according to an embodiment of the present invention;

FIG. 4 shows a schematic structural diagram of a computing device according to an embodiment of the invention.

Detailed Description

Exemplary embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. While exemplary embodiments of the present disclosure are shown in the drawings, it should be understood that the present disclosure may be embodied in various forms and should not be limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the disclosure to those skilled in the art.

Fig. 1 shows a flowchart of an automatic driving processing method based on scene segmentation according to an embodiment of the present invention. As shown in fig. 1, the automatic driving processing method based on scene segmentation specifically includes the following steps:

and step S101, acquiring a current frame image in the video shot and/or recorded by the image acquisition equipment in the driving process of the vehicle in real time.

The image capturing apparatus in this embodiment is described by taking a camera provided in an autonomous vehicle as an example. In order to realize automatic driving, the road condition information around the vehicle can be collected by a camera arranged on the vehicle for automatic driving, and then in step S101, a current frame image of the camera during video recording or shooting is obtained in real time.

In the embodiment, continuity and relevance among the frame images in the video shot and/or recorded by the image acquisition equipment in the driving process of the vehicle are utilized, and when the scene segmentation is carried out on the frame images in the video, the frame images in the video are firstly subjected to grouping processing. When grouping processing is carried out, the frame images with close association in each frame image are divided into a group by considering the association relationship among the frame images. The frame numbers of the frame images specifically contained in different groups of frame images may be the same or different, and it is assumed that each group of frame images contains n frame images, n may be a fixed value or a non-fixed value, and the value of n is set according to the implementation situation. When the current frame image is acquired in real time, the current frame image is grouped, and whether the current frame image is one frame image in the current group or the 1 st frame image in a new group is determined. Specifically, the grouping is performed according to the association relationship between the current frame image and the previous frame image or the previous frames of images. If the tracking algorithm is used, if the current frame image obtained by the tracking algorithm is an effective tracking result, the current frame image is determined as a frame image in the current group, and if the current frame image obtained by the tracking algorithm is an invalid tracking result, the current frame image is actually the 1 st frame image in the new group; or according to the sequence of each frame image, two or three adjacent frames are divided into a group, taking the group of three frames as an example, the 1 st frame image in the video is the 1 st frame image of the first group, the 2 nd frame image is the 2 nd frame image of the first group, the 3 rd frame image is the 3 rd frame image of the first group, the 4 th frame image is the 1 st frame image of the second group, the 5 th frame image is the 2 nd frame image of the second group, the 6 th frame image is the 3 rd frame image of the second group, and so on. The specific grouping manner in the implementation is determined according to the implementation situation, and is not limited herein.

Step S102, inputting the current frame image into the trained neural network, and performing scene segmentation on the current frame image according to the frame position of the current frame image in the group to which the current frame image belongs to obtain the scene segmentation result of the current frame image.

And after the current frame image is input into the trained neural network, carrying out scene segmentation on the current frame image according to the frame position of the current frame image in the group to which the current frame image belongs. The scene segmentation processing is different according to the frame position of the current frame in the grouping.

Specifically, whether the current frame image is the 1 st frame image of any one of the groups is judged, if the current frame image is judged to be the 1 st frame image of any one of the groups, the current frame image is input into the trained neural network, all the convolution layer operations and the deconvolution layer operations are executed on the current frame image by the neural network in sequence, and finally the scene segmentation result of the current frame image is obtained. Specifically, if the neural network includes the operations of 4 convolutional layers and 3 deconvolution layers, the current frame image is input to the neural network and is subjected to the operations of all 4 convolutional layers and the operations of 3 deconvolution layers.

If the current frame image is judged not to be the 1 st frame image in any group, the current frame image is input into the trained neural network, at the moment, the neural network does not need to execute all calculation of the convolution layer and calculation of the deconvolution layer, only the ith convolution layer of the neural network is calculated to obtain the calculation result of the ith convolution layer, the 1 st frame image of the group to which the current frame image belongs is directly obtained and input into the neural network to obtain the calculation result of the jth deconvolution layer, and the calculation result of the ith convolution layer and the calculation result of the jth deconvolution layer are subjected to image fusion, so that the scene segmentation result of the current frame image can be obtained. The corresponding relation is that the output dimensionality of the operation result of the ith convolution layer is the same as that of the operation result of the jth deconvolution layer. i and j are natural numbers, the value of i is not more than the number of the last convolution layer contained in the neural network, and the value of j is not more than the number of the last deconvolution layer contained in the neural network. Specifically, if the current frame image is input into the neural network, the current frame image is computed to the 1 st layer convolution layer of the neural network to obtain the computation result of the 1 st layer convolution layer, the 1 st frame image of the group to which the current frame image belongs is directly input into the neural network to obtain the computation result of the 3 rd layer deconvolution layer, and the computation result of the 1 st layer convolution layer and the computation result of the 3 rd layer deconvolution layer of the 1 st frame image are fused. Wherein, the output dimension of the operation result of the convolution layer at the 1 st layer and the operation result of the convolution layer at the 3 rd layer is the same. The operation result of the jth layer deconvolution layer obtained by the operation of the 1 st frame image in the belonged grouping is multiplexed, so that the operation of the neural network on the current frame image can be reduced, the processing speed of the neural network is greatly increased, and the calculation efficiency of the neural network is improved. Further, if the jth deconvolution layer is the last deconvolution layer of the neural network, the image fusion result is input to the output layer to obtain the scene segmentation result of the current frame image. And if the j-th deconvolution layer is not the last deconvolution layer of the neural network, inputting the image fusion result into the j + 1-th deconvolution layer, and obtaining the scene segmentation result of the current frame image through the subsequent operations of each deconvolution layer and the output layer.

For the current frame image not being the 1 st frame image in any packet, the values of i and j need to be determined. After judging that the current frame image is not the 1 st frame image of any group, calculating the frame spacing between the current frame image and the 1 st frame image of the group to which the current frame image belongs. If the current frame image is the 3 rd frame image of any group, the frame distance between the current frame image and the 1 st frame image of the group to which the current frame image belongs is calculated to be 2. According to the obtained frame interval, the value of i of the ith convolution layer in the neural network and the value of j of the jth deconvolution layer in the 1 st frame image can be determined.

When i and j are determined, it can be considered that the layer distance between the ith convolutional layer and the last convolutional layer (the bottleneck layer of the convolutional layer) is in inverse proportion to the frame distance, and the layer distance between the jth anti-convolutional layer and the output layer is in direct proportion to the frame distance. When the frame interval is larger, the layer interval between the i-th layer of convolution layer and the last layer of convolution layer is smaller, the value of i is larger, and more convolution layer operations need to be operated; the larger the layer distance between the jth deconvolution layer and the output layer is, the smaller the j value is, and the operation result of the deconvolution layer with a smaller number of layers needs to be obtained. Taking the example that the neural network comprises 1 st to 4 th convolutional layers, wherein the 4 th convolutional layer is the last convolutional layer; the neural network also comprises 1-3 deconvolution layers and an output layer. When the frame interval is 1, determining the layer interval between the ith convolution layer and the last convolution layer to be 3, determining i to be 1, namely, calculating to the 1 st convolution layer, determining the layer interval between the jth deconvolution layer and the output layer to be 1, determining j to be 3, and obtaining the operation result of the 3 rd deconvolution layer; when the frame interval is 2, determining that the layer interval between the ith convolutional layer and the last convolutional layer is 2, determining that i is 2, namely, calculating to the 2 nd convolutional layer, determining that the layer interval between the jth convolutional layer and the output layer is 2, and j is 2, and obtaining the operation result of the 2 nd convolutional layer. The specific layer distance is related to the number of layers of the convolutional layer and the deconvolution layer included in the neural network and the effect to be achieved in the actual implementation, which are all exemplified above.

Or, when i and j are determined, the corresponding relationship between the frame spacing and the values of i and j may be preset directly according to the frame spacing. Specifically, values of different i and j are preset according to different frame intervals, for example, the frame interval is 1, the value of i is 1, and the value of j is 3; setting the frame interval to be 2, setting the value of i to be 2, and setting the value of j to be 2; or the same values of i and j can be set according to different frame intervals; if no matter the size of the frame interval, the value of the corresponding i is set to be 2, and the value of the corresponding j is set to be 2; or the same values of i and j can be set for a part of different inter-frame distances, for example, the inter-frame distances are 1 and 2, the corresponding value of i is 1, and the value of j is 3; the inter-frame spacing is 3 and 4, the corresponding value of i is set to 2, and the value of j is set to 2. The method is specifically set according to implementation conditions, and is not limited herein.

Further, in order to increase the operation speed of the neural network, if the current frame image is judged to be the 1 st frame image of any one of the groups, after each layer of convolution layer before the last layer of convolution layer of the neural network is operated, the operation result of each layer of convolution layer is subjected to down-sampling processing. If the current frame image is judged not to be the 1 st frame image in any group, after each layer of convolution layer calculation before the ith layer of convolution layer of the neural network, the calculation result of each layer of convolution layer is subjected to down-sampling processing. After a current frame image is input into a neural network, after the 1 st layer of convolutional layer operation, down-sampling processing is carried out on an operation result, the resolution ratio of the operation result is reduced, then the 2 nd layer of convolutional layer operation is carried out on the operation result after down-sampling, the down-sampling processing is also carried out on the operation result of the 2 nd layer of convolutional layer, and the like until the last layer of convolutional layer (namely the bottleneck layer of the convolutional layer) or the i-th layer of the neural network, the last layer of convolutional layer or the i-th layer is taken as an example of the 4 th layer of convolutional layer, and the down-sampling processing is not carried out after the operation result of the 4 th layer of convolutional layer. After each layer of convolution layer before the 4 th layer of convolution layer is calculated, the calculation result of each layer of convolution layer is subjected to down-sampling processing, so that the resolution of the frame image input by each layer of convolution layer is reduced, and the calculation speed of the neural network can be improved. It should be noted that, in the first convolution layer operation of the neural network, the current frame image acquired in real time is input without down-sampling, so that the details of the current frame image can be better obtained. And then, when the output operation result is subjected to down-sampling processing, the details of the current frame image are not influenced, and the operation speed of the neural network can be improved.

And step S103, determining a driving route and/or a driving instruction according to the scene segmentation result.

The scene segmentation result includes various objects, and according to the relationship between the various objects and the vehicle, the reminding information of the various objects to the vehicle, and the like, the driving route of the vehicle within the preset time interval can be determined, and/or the driving instruction can be determined. Specifically, the travel instruction may include an instruction to start travel, stop travel, travel at a certain travel speed, or travel with acceleration or deceleration at a certain acceleration. The skilled person can set the preset time interval according to actual needs, and the preset time interval is not limited herein.

And step S104, performing automatic driving control on the vehicle according to the determined driving route and/or driving instruction.

After the travel route and/or the travel instruction are determined, the self-vehicle can be automatically driven and controlled according to the determined travel route and/or the determined travel instruction. Assuming that the determined travel command is in accordance with 6m/s²Then, in step S104, the self-vehicle is subjected to automatic driving control to control the braking system of the self-vehicle so that the self-vehicle follows a speed of 6m/S²The acceleration of (2) performs deceleration running.

According to the automatic driving processing method based on scene segmentation provided by the invention, a current frame image in a video shot and/or recorded by an image acquisition device in the driving process of a vehicle is obtained in real time; inputting the current frame image into a trained neural network, and performing scene segmentation on the current frame image according to the frame position of the current frame image in the group to which the current frame image belongs to obtain a scene segmentation result of the current frame image; determining a driving route and/or a driving instruction according to a scene segmentation result; and performing automatic driving control on the vehicle according to the determined driving route and/or driving instructions. The invention utilizes the continuity and the relevance among all frame images in the video, when the scene is divided, the video is grouped and processed, the corresponding frame images are divided into the scene according to the different frame positions of the current frame image in the group to which the current frame image belongs, furthermore, the 1 st frame image in each group is subjected to the operation of all the convolution layers and the deconvolution layers by the neural network, other frame images except the 1 st frame image are only operated to the i-th layer convolution layer, and the operation result of the j-th layer deconvolution layer obtained by multiplexing the 1 st frame image is subjected to image fusion, thereby greatly reducing the operation amount of the neural network and improving the speed of the scene division. The driving route and/or the driving instruction are accurately determined by using the scene segmentation result, and the safety of automatic driving is improved.

Fig. 2 shows a flowchart of an automatic driving processing method based on scene segmentation according to another embodiment of the present invention. As shown in fig. 2, the automatic driving processing method based on scene segmentation specifically includes the following steps:

step S201, acquiring a current frame image in a video shot and/or recorded by an image acquisition device in real time in the driving process of the vehicle.

Step S202, inputting the current frame image into the trained neural network, and performing scene segmentation on the current frame image according to the frame position of the current frame image in the group to which the current frame image belongs to obtain the scene segmentation result of the current frame image.

The above steps refer to steps S101-S102 in the embodiment of fig. 1, and are not described herein again.

Step S203 determines contour information of the specific object according to the scene segmentation result.

Specifically, the specific object may include an object such as a vehicle, a pedestrian, a road, an obstacle, or the like. The person skilled in the art can set specific objects according to actual needs, and the specific objects are not limited herein. After the scene segmentation result corresponding to the current frame image is obtained, the contour information of the specific object such as the vehicle, the pedestrian, the road and the like can be determined according to the scene segmentation result corresponding to the current frame image, so that the relative position relationship between the vehicle and the specific object can be calculated subsequently.

Step S204, calculating the relative position relation between the vehicle and the specific object according to the contour information of the specific object.

Assuming that the contour information of the vehicle 1 and the contour information of the vehicle 2 are determined to be obtained in step S203, the relative positional relationship of the own vehicle and the vehicle 1 and the relative positional relationship of the own vehicle and the vehicle 2 may be calculated from the contour information of the vehicle 1 and the contour information of the vehicle 2 in step S204.

The relative positional relationship between the own vehicle and the specific object includes information on the distance between the own vehicle and the specific object, for example, the straight distance between the own vehicle and the vehicle 1 is 200 m; the relative positional relationship between the own vehicle and the specific object also includes angle information between the own vehicle and the specific object, such as the 10 degree angle direction of the own vehicle at the rear right side of the vehicle 1.

And step S205, determining a driving route and/or a driving instruction according to the calculated relative position relation.

According to the calculated relative position relationship between the self-vehicle and the specific object, the driving route of the self-vehicle in a preset time interval can be determined, and/or the driving instruction can be determined. Specifically, the travel instruction may include an instruction to start travel, stop travel, travel at a certain travel speed, or travel with acceleration or deceleration at a certain acceleration. The skilled person can set the preset time interval according to actual needs, and the preset time interval is not limited herein.

For example, if a pedestrian is present at a position 10m ahead of the host vehicle according to the calculated relative positional relationship, it is determined that the travel command may be deceleration travel at an acceleration of 6m/s 2; or according to the calculated relative position relationship, if there is a vehicle 1 at a position 200m away from the right front of the own vehicle and there is a vehicle 2 at a position 2 m away from the left side of the own vehicle at an angle of 45 degrees, the determined driving route may be a driving route along the right front.

Step S206, determining the driving route and/or driving instruction of the vehicle according to the traffic sign information contained in the scene segmentation result.

The scene segmentation result includes various traffic sign information, such as warning sign: circular crossroads, sharp left turns, continuous curved roads, front tunnels and the like; forbidden mark: forbidding straight running, driving in and the like; indication mark: speed limiting, lane diversion, head drop allowance and the like; road construction safety sign: front construction, left lane closure, etc.; and also a road sign, a tourist area sign, an auxiliary sign, etc. Based on these specific traffic sign information, the own vehicle travel route and/or travel instruction can be determined.

For example, the current speed of the vehicle is 100km/h, and the deceleration driving instruction of the vehicle is determined according to the traffic sign information of the speed limit of 80km/h which is 500m ahead and is contained in the scene segmentation result; or determining that the vehicle runs to the right road according to the traffic sign information of the front 200m left road closure included in the scene segmentation result.

Step S207, determining a driving route and/or a driving instruction according to the traffic light information contained in the scene segmentation result.

The scene segmentation result includes traffic signal light information, such as traffic light information, and whether to continue driving along the current route or to determine a driving route and/or a driving instruction such as deceleration and parking according to the traffic light information.

If the vehicle is determined to be decelerated and stopped according to the red light information of 10m ahead in the scene segmentation result; or determining that the vehicle continues to run on the current road according to the green light information 10m ahead in the scene segmentation result.

Further, the above steps S205, S206 and S207 may be executed in parallel, and the driving route and/or the driving instruction may be determined by comprehensively considering the relative position relationship calculated according to the scene segmentation result, the included traffic sign information and/or the traffic signal information.

Step S208, according to the determined driving route and/or driving instruction, automatic driving control is carried out on the vehicle.

After the travel route and/or the travel instruction are determined, the self-vehicle can be automatically driven and controlled according to the determined travel route and/or the determined travel instruction.

According to the automatic driving processing method based on scene segmentation provided by the invention, continuity and relevance among frame images in the video are utilized, when the scene is segmented, the frame images in the video are grouped, and the corresponding frame images are segmented according to different frame positions of the current frame images in the groups to which the current frame images belong, so that the scene segmentation result of the current frame images is obtained. Further, the relative positional relationship between the vehicle and a specific object such as another vehicle, a pedestrian, a road, or the like can be calculated more accurately based on the obtained scene segmentation result, and the travel route and/or the travel instruction can be determined more accurately according to the calculated relative positional relationship. Based on the traffic sign information and the traffic signal light information contained in the obtained scene segmentation result, the method is beneficial to the automatic driving of the vehicle which can better comply with traffic regulations, is safe and accurate and complies with discipline, improves the safety of the automatic driving and optimizes the automatic driving processing mode.

Fig. 3 shows a functional block diagram of an automatic driving processing apparatus based on scene segmentation according to an embodiment of the present invention. As shown in fig. 3, the automatic driving processing device based on scene segmentation includes the following modules:

the acquiring module 310 is adapted to acquire the current frame image in the video of the vehicle driving way captured and/or recorded by the image capturing device in real time.

The image capturing apparatus in this embodiment is described by taking a camera provided in an autonomous vehicle as an example. In order to realize automatic driving, the camera arranged on the vehicle for automatic driving can be used for collecting road condition information around the vehicle, and the obtaining module 310 obtains a current frame image of the camera when recording a video or a current frame image when shooting the video in real time.

In the embodiment, continuity and relevance among the frame images in the video shot and/or recorded by the image acquisition equipment in the driving process of the vehicle are utilized, and when the scene segmentation is carried out on the frame images in the video, the frame images in the video are firstly subjected to grouping processing. When grouping processing is carried out, the frame images with close association in each frame image are divided into a group by considering the association relationship among the frame images. The frame numbers of the frame images specifically contained in different groups of frame images may be the same or different, and it is assumed that each group of frame images contains n frame images, n may be a fixed value or a non-fixed value, and the value of n is set according to the implementation situation. When the current frame image is acquired in real time, the current frame image is grouped, and whether the current frame image is one frame image in the current group or the 1 st frame image in a new group is determined. Specifically, the grouping is performed according to the association relationship between the current frame image and the previous frame image or the previous frames of images. If the tracking algorithm is used, if the current frame image obtained by the tracking algorithm is an effective tracking result, the current frame image is determined as a frame image in the current group, and if the current frame image obtained by the tracking algorithm is an invalid tracking result, the current frame image is actually the 1 st frame image in the new group; or according to the sequence of each frame image, two or three adjacent frames of images are divided into a group, taking a group of three frames of images as an example, the 1 st frame of image in the video data is the 1 st frame of image of the first group, the 2 nd frame of image is the 2 nd frame of image of the first group, the 3 rd frame of image is the 3 rd frame of image of the first group, the 4 th frame of image is the 1 st frame of image of the second group, the 5 th frame of image is the 2 nd frame of image of the second group, the 6 th frame of image is the 3 rd frame of image of the second group, and so on. The specific grouping manner in the implementation is determined according to the implementation situation, and is not limited herein.

The segmentation module 320 is adapted to input the current frame image into the trained neural network, and perform scene segmentation on the current frame image according to the frame position of the current frame image in the group to which the current frame image belongs, so as to obtain a scene segmentation result of the current frame image.

After the segmentation module 320 inputs the current frame image into the trained neural network, the segmentation module 320 performs scene segmentation on the current frame image according to the frame position of the current frame image in the group to which the current frame image belongs. The processing of the segmentation module 320 for performing scene segmentation on the current frame in the packet is different according to the frame position of the current frame in the packet.

The segmentation module 320 includes a judgment unit 321, a first segmentation unit 322, and a second segmentation unit 323.

Specifically, the determining unit 321 determines whether the current frame image is the 1 st frame image of any one of the groups, and if the determining unit 321 determines that the current frame image is the 1 st frame image of any one of the groups, the first dividing unit 322 inputs the current frame image into the trained neural network, and the neural network sequentially performs all operations on the convolution layer and the deconvolution layer, so as to finally obtain the scene division result of the current frame image. Specifically, if the neural network includes 4 convolutional layer operations and 3 deconvolution layer operations, the first segmentation unit 322 inputs the current frame image into the neural network and performs all 4 convolutional layer operations and 3 deconvolution layer operations.

If the determining unit 321 determines that the current frame image is not the 1 st frame image in any packet, the second dividing unit 323 inputs the current frame image into the trained neural network, at this time, the neural network does not need to perform all convolution layer operations and deconvolution layer operations on the current frame image, the second dividing unit 323 directly obtains the operation result of the jth layer deconvolution layer obtained by inputting the 1 st frame image of the packet to which the current frame image belongs into the neural network after only operating the ith layer convolution layer of the neural network to obtain the operation result of the ith layer convolution layer, and the second dividing unit 323 performs image fusion on the operation result of the ith layer convolution layer and the operation result of the jth layer deconvolution layer to obtain the scene division result of the current frame image. The corresponding relation is that the output dimensionality of the operation result of the ith convolution layer is the same as that of the operation result of the jth deconvolution layer. i and j are natural numbers, the value of i is not more than the number of the last convolution layer contained in the neural network, and the value of j is not more than the number of the last deconvolution layer contained in the neural network. Specifically, if the second segmentation unit 323 inputs the current frame image into the neural network, the current frame image is computed into the 1 st layer convolution layer of the neural network to obtain the computation result of the 1 st layer convolution layer, the second segmentation unit 323 directly obtains the computation result of the 3 rd layer deconvolution layer obtained by inputting the 1 st frame image of the group to which the current frame image belongs into the neural network, and the second segmentation unit 323 fuses the computation result of the 1 st layer convolution layer and the computation result of the 3 rd layer deconvolution layer of the 1 st frame image. Wherein, the output dimension of the operation result of the convolution layer at the 1 st layer and the operation result of the convolution layer at the 3 rd layer is the same. The second segmentation unit 323 can reduce the operation of the neural network on the current frame image by multiplexing the operation result of the deconvolution layer of the j layer obtained by the operation of the 1 st frame image in the belonged group, and greatly accelerate the processing speed of the neural network, thereby improving the calculation efficiency of the neural network. Further, if the jth deconvolution layer is the last deconvolution layer of the neural network, the second segmentation unit 323 inputs the image fusion result to the output layer to obtain the scene segmentation result of the current frame image. If the jth deconvolution layer is not the last deconvolution layer of the neural network, the second segmentation unit 323 inputs the image fusion result to the (j + 1) th deconvolution layer, and the scene segmentation result of the current frame image is obtained through the subsequent operations of each deconvolution layer and the output layer.

The segmentation module 320 further comprises a frame interval calculation unit 324, a determination unit 325 and/or a preset unit 326.

For the current frame image not being the 1 st frame image in any of the groups, the segmentation module 320 needs to determine the values of i and j. After the judgment unit 321 judges that the current frame image is not the 1 st frame image of any packet, the inter-frame distance calculation unit 324 calculates the inter-frame distance of the 1 st frame image of the packet to which the current frame image belongs. If the current frame image is the 3 rd frame image of any packet, the inter-frame distance calculation unit 324 calculates that the inter-frame distance between the current frame image and the 1 st frame image of the packet is 2. The determining unit 325 may determine, according to the obtained frame interval, a value of i of the i-th convolutional layer in the neural network and a value of j of the j-th deconvolution layer in the 1 st frame image.

When determining i and j, the determining unit 325 may consider that the layer distance between the i-th convolutional layer and the last convolutional layer (bottleneck layer of convolutional layer) is in inverse proportion to the frame distance, and the layer distance between the j-th anti-convolutional layer and the output layer is in direct proportion to the frame distance. When the frame interval is larger, the layer interval between the i-th layer of convolution layer and the last layer of convolution layer is smaller, the value of i is larger, and more convolution layer operations need to be operated; the larger the layer distance between the jth deconvolution layer and the output layer is, the smaller the j value is, and the operation result of the deconvolution layer with a smaller number of layers needs to be obtained. Taking the example that the neural network comprises 1 st to 4 th convolutional layers, wherein the 4 th convolutional layer is the last convolutional layer; the neural network also comprises 1-3 deconvolution layers and an output layer. When the frame distance calculation unit 324 calculates that the frame distance is 1, the determination unit 325 determines that the layer distance between the ith convolution layer and the last convolution layer is 3, determines that i is 1, namely the second segmentation unit 323 operates to the 1 st convolution layer, the determination unit 325 determines that the layer distance between the jth deconvolution layer and the output layer is 1, determines that j is 3, and the second segmentation unit 323 acquires the operation result of the 3 rd deconvolution layer; when the inter-frame distance calculation unit 324 calculates the inter-frame distance to be 2, the determination unit 325 determines the inter-frame distance between the i-th convolutional layer and the last convolutional layer to be 2, i is determined to be 2, that is, the second division unit 323 operates to the 2 nd convolutional layer, the determination unit 325 determines the inter-frame distance between the j-th anti-convolutional layer and the output layer to be 2, j is 2, and the second division unit 323 acquires the operation result of the 2 nd anti-convolutional layer. The specific layer distance is related to the number of layers of the convolutional layer and the deconvolution layer included in the neural network and the effect to be achieved in the actual implementation, which are all exemplified above.

Alternatively, when i and j are determined, the preset unit 326 may preset the corresponding relationship between the frame interval and the values of i and j directly according to the frame interval. Specifically, the preset unit 326 presets different values of i and j according to different inter-frame distances, for example, the inter-frame distance calculation unit 324 calculates that the inter-frame distance is 1, the preset unit 326 sets the value of i to be 1, and the value of j to be 3; the inter-frame distance calculating unit 324 calculates the inter-frame distance to be 2, the presetting unit 326 sets the value of i to be 2, and the value of j to be 2; or the preset unit 326 may also set the same values of i and j according to different frame intervals; for example, when the frame interval is no matter how large or small, the preset unit 326 sets the value of i to be 2 and the value of j to be 2; or the preset unit 326 may also set the same values of i and j for a part of different inter-frame distances, for example, the inter-frame distance calculation unit 324 calculates that the inter-frame distance is 1 and 2, the preset unit 326 sets the corresponding value of i to 1, and the value of j to 3; the interframe space calculating unit 324 calculates interframe spaces as 3 and 4, and the presetting unit 326 sets the value of i to be 2 and the value of j to be 2. The method is specifically set according to implementation conditions, and is not limited herein.

Further, in order to increase the operation speed of the neural network, if the determining unit 321 determines that the current frame image is the 1 st frame image of any one of the groups, the first dividing unit 322 performs downsampling on the operation result of each convolutional layer after performing the operation on each convolutional layer before the last convolutional layer of the neural network. If the judging unit judges that the current frame image is not the 1 st frame image in any packet, the second dividing unit 323 performs downsampling processing on the operation result of each convolutional layer after each convolutional layer operation before the ith convolutional layer of the neural network. After the first segmentation unit 322 or the second segmentation unit 323 inputs the current frame image into the neural network, after the 1 st layer convolutional layer operation, the operation result is downsampled to reduce the resolution of the operation result, then the downsampled operation result is subjected to the 2 nd layer convolutional layer operation, the operation result of the 2 nd layer convolutional layer is also downsampled, and the process is repeated until the last layer convolutional layer (i.e. the bottleneck layer of the convolutional layer) or the i-th layer convolutional layer of the neural network, taking the last layer convolutional layer or the i-th layer as the example of the 4 th layer convolutional layer, and after the 4 th layer convolutional layer operation result, the first segmentation unit 322 or the second segmentation unit 323 does not perform downsampling any more. After each convolutional layer before the 4 th convolutional layer is calculated, the first division unit 322 or the second division unit 323 performs downsampling processing on the calculation result of each convolutional layer, so that the resolution of the frame image input by each convolutional layer is reduced, and the calculation speed of the neural network can be improved. It should be noted that, in the first convolution layer operation of the neural network, the current frame image acquired in real time is input without down-sampling, so that the details of the current frame image can be better obtained. And then, when the output operation result is subjected to down-sampling processing, the details of the current frame image are not influenced, and the operation speed of the neural network can be improved.

The determining module 330 is adapted to determine a driving route and/or a driving instruction according to the scene segmentation result.

Specifically, the specific object may include an object such as a vehicle, a pedestrian, a road, an obstacle, or the like. The person skilled in the art can set specific objects according to actual needs, and the specific objects are not limited herein. After the recognition module 420 obtains the scene segmentation result corresponding to the current frame image, the determination module 330 may determine the contour information of the specific object, such as a vehicle, a pedestrian, a road, etc., according to the scene segmentation result corresponding to the current frame image. If the determination module 330 determines that the contour information of the vehicle 1 and the contour information of the vehicle 2 are obtained, the relative positional relationship between the host vehicle and the vehicle 1 and the relative positional relationship between the host vehicle and the vehicle 2 are calculated according to the contour information of the vehicle 1 and the contour information of the vehicle 2.

The relative position relationship between the vehicle and the specific object includes distance information between the vehicle and the specific object, for example, the determining module 330 determines that the straight distance between the vehicle and the vehicle 1 is 200 meters; the relative position relationship between the self-vehicle and the specific object further includes the angle information between the self-vehicle and the specific object, for example, the determining module 330 determines that the self-vehicle is in the 10 degree angle direction at the rear right side of the vehicle 1.

The determining module 330 can determine a driving route of the vehicle within a preset time interval and/or determine a driving instruction according to the calculated relative position relationship between the vehicle and the specific object. For example, the determining module 330 may determine that there is a pedestrian right in front of the vehicle by 10 meters according to the calculated relative position relationship, and the determining module 330 may determine that the driving instruction is deceleration driving according to an acceleration of 6m/s 2; or the determining module 330 may determine, according to the calculated relative position relationship, that there is a vehicle 1 at a position 200 meters away from the front of the own vehicle, and there is a vehicle 2 at a position 2 meters away from the left side of the own vehicle at an angle of 45 degrees, and the driving route determined by the determining module 330 may be the driving route along the front.

The determining module 330 is further adapted to determine the driving route and/or driving instruction of the vehicle according to the traffic sign information included in the scene segmentation result.

The scene segmentation result includes various traffic sign information, such as warning sign: circular crossroads, sharp left turns, continuous curved roads, front tunnels and the like; forbidden mark: forbidding straight running, driving in and the like; indication mark: speed limiting, lane diversion, head drop allowance and the like; road construction safety sign: front construction, left lane closure, etc.; and also a road sign, a tourist area sign, an auxiliary sign, etc. The determining module 330 may determine the driving route and/or driving instruction of the vehicle according to the specific traffic sign information.

For example, when the current speed of the vehicle is 100km/h, the determining module 330 determines a deceleration driving instruction of the vehicle according to traffic sign information of the speed limit of 80km/h which is 500m ahead and is contained in the scene segmentation result; or the determining module 330 determines that the vehicle is driving to the right road according to the traffic sign information of the front 200m left road closure included in the scene segmentation result.

The determining module 330 is further adapted to determine a driving route and/or driving instructions according to the traffic light information contained in the scene segmentation result.

The scene segmentation result includes traffic light information, such as traffic light information, and the determining module 330 may determine whether to continue driving along the current route or to determine a driving route and/or a driving instruction such as deceleration and parking according to the traffic light information.

If the determining module 330 determines that the vehicle decelerates and stops according to the red light information 10m ahead in the scene segmentation result; alternatively, the determining module 330 determines that the vehicle continues to travel on the current road according to the green light information 10m ahead in the scene segmentation result.

And the control module 340 is suitable for carrying out automatic driving control on the self vehicle according to the determined running route and/or the running instruction.

After the determination module 330 determines the driving route and/or the driving instruction, the control module 340 may perform the automatic driving control on the own vehicle according to the determined driving route and/or the driving instruction. Assume that the travel command determined by the determination module 330 is in accordance with 6m/s²The control module 340 performs automatic driving control on the vehicle itself to control the braking system of the vehicle itself so that the vehicle itself can run at a speed of 6m/s²The acceleration of (2) performs deceleration running.

According to the automatic driving processing device based on scene segmentation, the current frame image in the video shot and/or recorded by the image acquisition equipment in the driving process of the vehicle is acquired in real time; inputting the current frame image into a trained neural network, and performing scene segmentation on the current frame image according to the frame position of the current frame image in the group to which the current frame image belongs to obtain a scene segmentation result of the current frame image; determining a driving route and/or a driving instruction according to a scene segmentation result; and performing automatic driving control on the vehicle according to the determined driving route and/or driving instructions. The invention utilizes the continuity and the relevance among all frame images in the video, when the scene is divided, the video is grouped and processed, the corresponding frame images are divided into the scene according to the different frame positions of the current frame image in the group to which the current frame image belongs, furthermore, the 1 st frame image in each group is subjected to the operation of all the convolution layers and the deconvolution layers by the neural network, other frame images except the 1 st frame image are only operated to the i-th layer convolution layer, and the operation result of the j-th layer deconvolution layer obtained by multiplexing the 1 st frame image is subjected to image fusion, thereby greatly reducing the operation amount of the neural network and improving the speed of the scene division. The relative position relation between the vehicle and specific objects such as other vehicles, pedestrians and roads can be calculated more accurately based on the obtained scene segmentation result, and the driving route and/or the driving instruction can be determined more accurately according to the calculated relative position relation. Based on the traffic sign information and the traffic signal light information contained in the obtained scene segmentation result, the method is beneficial to the automatic driving of the vehicle which can better comply with traffic regulations, is safe and accurate and complies with discipline, improves the safety of the automatic driving and optimizes the automatic driving processing mode.

The application also provides a non-volatile computer storage medium, wherein the computer storage medium stores at least one executable instruction, and the computer executable instruction can execute the automatic driving processing method based on scene segmentation in any method embodiment.

Fig. 4 is a schematic structural diagram of a computing device according to an embodiment of the present invention, and the specific embodiment of the present invention does not limit the specific implementation of the computing device.

As shown in fig. 4, the computing device may include: a processor (processor)402, a Communications Interface 404, a memory 406, and a Communications bus 408.

Wherein:

the processor 402, communication interface 404, and memory 406 communicate with each other via a communication bus 408.

A communication interface 404 for communicating with network elements of other devices, such as clients or other servers.

The processor 402 is configured to execute the program 410, and may specifically execute relevant steps in the above-described automatic driving processing method embodiment based on scene segmentation.

In particular, program 410 may include program code comprising computer operating instructions.

The processor 402 may be a central processing unit CPU or an application Specific Integrated circuit asic or one or more Integrated circuits configured to implement embodiments of the present invention. The computing device includes one or more processors, which may be the same type of processor, such as one or more CPUs; or may be different types of processors such as one or more CPUs and one or more ASICs.

And a memory 406 for storing a program 410. Memory 406 may comprise high-speed RAM memory, and may also include non-volatile memory (non-volatile memory), such as at least one disk memory.

The program 410 may be specifically configured to cause the processor 402 to execute an automatic driving processing method based on scene segmentation in any of the method embodiments described above. For specific implementation of each step in the program 410, reference may be made to corresponding steps and corresponding descriptions in units in the above automatic driving processing embodiment based on scene segmentation, which are not described herein again. It can be clearly understood by those skilled in the art that, for convenience and brevity of description, the specific working processes of the above-described devices and modules may refer to the corresponding process descriptions in the foregoing method embodiments, and are not described herein again.

The algorithms and displays presented herein are not inherently related to any particular computer, virtual machine, or other apparatus. Various general purpose systems may also be used with the teachings herein. The required structure for constructing such a system will be apparent from the description above. Moreover, the present invention is not directed to any particular programming language. It is appreciated that a variety of programming languages may be used to implement the teachings of the present invention as described herein, and any descriptions of specific languages are provided above to disclose the best mode of the invention.

In the description provided herein, numerous specific details are set forth. It is understood, however, that embodiments of the invention may be practiced without these specific details. In some instances, well-known methods, structures and techniques have not been shown in detail in order not to obscure an understanding of this description.

Similarly, it should be appreciated that in the foregoing description of exemplary embodiments of the invention, various features of the invention are sometimes grouped together in a single embodiment, figure, or description thereof for the purpose of streamlining the disclosure and aiding in the understanding of one or more of the various inventive aspects. However, the disclosed method should not be interpreted as reflecting an intention that: that the invention as claimed requires more features than are expressly recited in each claim. Rather, as the following claims reflect, inventive aspects lie in less than all features of a single foregoing disclosed embodiment. Thus, the claims following the detailed description are hereby expressly incorporated into this detailed description, with each claim standing on its own as a separate embodiment of this invention.

Those skilled in the art will appreciate that the modules in the device in an embodiment may be adaptively changed and disposed in one or more devices different from the embodiment. The modules or units or components of the embodiments may be combined into one module or unit or component, and furthermore they may be divided into a plurality of sub-modules or sub-units or sub-components. All of the features disclosed in this specification (including any accompanying claims, abstract and drawings), and all of the processes or elements of any method or apparatus so disclosed, may be combined in any combination, except combinations where at least some of such features and/or processes or elements are mutually exclusive. Each feature disclosed in this specification (including any accompanying claims, abstract and drawings) may be replaced by alternative features serving the same, equivalent or similar purpose, unless expressly stated otherwise.

Furthermore, those skilled in the art will appreciate that while some embodiments described herein include some features included in other embodiments, rather than other features, combinations of features of different embodiments are meant to be within the scope of the invention and form different embodiments. For example, in the following claims, any of the claimed embodiments may be used in any combination.

The various component embodiments of the invention may be implemented in hardware, or in software modules running on one or more processors, or in a combination thereof. Those skilled in the art will appreciate that a microprocessor or Digital Signal Processor (DSP) may be used in practice to implement some or all of the functions of some or all of the components of an apparatus for scene segmentation based autopilot processing according to an embodiment of the present invention. The present invention may also be embodied as apparatus or device programs (e.g., computer programs and computer program products) for performing a portion or all of the methods described herein. Such programs implementing the present invention may be stored on computer-readable media or may be in the form of one or more signals. Such a signal may be downloaded from an internet website or provided on a carrier signal or in any other form.

It should be noted that the above-mentioned embodiments illustrate rather than limit the invention, and that those skilled in the art will be able to design alternative embodiments without departing from the scope of the appended claims. In the claims, any reference signs placed between parentheses shall not be construed as limiting the claim. The word "comprising" does not exclude the presence of elements or steps not listed in a claim. The word "a" or "an" preceding an element does not exclude the presence of a plurality of such elements. The invention may be implemented by means of hardware comprising several distinct elements, and by means of a suitably programmed computer. In the unit claims enumerating several means, several of these means may be embodied by one and the same item of hardware. The usage of the words first, second and third, etcetera do not indicate any ordering. These words may be interpreted as names.

Claims

1. An automatic driving processing method based on scene segmentation, which carries out grouping processing on frame images contained in a shot and/or recorded video, comprises the following steps:

determining a driving route and/or a driving instruction according to the scene segmentation result;

according to the determined driving route and/or driving instruction, carrying out automatic driving control on the vehicle;

wherein, the inputting the current frame image into the trained neural network, and performing scene segmentation on the current frame image according to the frame position of the current frame image in the group to which the current frame image belongs, and obtaining the scene segmentation result of the current frame image further includes:

judging whether the current frame image is the 1 st frame image of any group;

if so, inputting the current frame image into a trained neural network, and obtaining a scene segmentation result of the current frame image after the operation of all convolution layers and deconvolution layers of the neural network;

if not, inputting the current frame image into the trained neural network, after calculating to the ith layer of convolution layer of the neural network to obtain the calculation result of the ith layer of convolution layer, obtaining the calculation result of the jth layer of deconvolution layer obtained by inputting the 1 st frame image of the group to which the current frame image belongs into the neural network, and directly carrying out image fusion on the calculation result of the ith layer of convolution layer and the calculation result of the jth layer of deconvolution layer to obtain the scene segmentation result of the current frame image; wherein i and j are natural numbers.

2. The method of claim 1, wherein said determining a driving route and/or driving instructions according to said scene segmentation result further comprises:

determining the contour information of a specific object according to the scene segmentation result;

3. The method according to claim 2, wherein the relative positional relationship of the own vehicle and the specific object includes distance information and/or angle information between the own vehicle and the specific object.

4. The method of claim 1, wherein said determining a driving route and/or driving instructions according to said scene segmentation result further comprises:

5. The method of claim 1, wherein said determining a driving route and/or driving instructions according to said scene segmentation result further comprises:

6. The method of claim 1, wherein after determining that the current frame image is not the 1 st frame image of any packet, the method further comprises:

7. The method of claim 6, wherein the method further comprises: and presetting the corresponding relation between the frame interval and the values of i and j.

8. The method of claim 7, wherein after said image fusing directly the operation result of the i-th convolutional layer with the operation result of the j-th anti-convolutional layer, the method further comprises:

if the jth deconvolution layer is the last deconvolution layer of the neural network, inputting an image fusion result to an output layer to obtain a scene segmentation result of the current frame image;

9. The method of claim 8, wherein the inputting the current frame image into the trained neural network, and obtaining the scene segmentation result of the current frame image after the operation of all convolutional layers and deconvolution layers of the neural network further comprises: after each convolution layer before the last convolution layer of the neural network is calculated, the calculation result of each convolution layer is subjected to downsampling processing.

10. The method of claim 9, wherein before computing to an ith convolutional layer of the neural network results in a result of the computation of the ith convolutional layer, the method further comprises: after each convolution layer before the ith convolution layer of the neural network is calculated, the calculation result of each convolution layer is subjected to down-sampling processing.

11. The method of claim 10, wherein the video comprises n frame images per group; wherein n is a fixed preset value.

12. An automatic driving processing device based on scene segmentation, which carries out grouping processing on frame images contained in a shot and/or recorded video, comprising:

the determining module is suitable for determining a driving route and/or a driving instruction according to the scene segmentation result;

the control module is suitable for automatically driving and controlling the vehicle according to the determined driving route and/or driving instruction;

wherein the segmentation module further comprises:

the first segmentation unit is suitable for inputting the current frame image into a trained neural network, and obtaining a scene segmentation result of the current frame image after the operation of all convolutional layers and deconvolution layers of the neural network;

the second segmentation unit is suitable for inputting the current frame image into a trained neural network, obtaining an operation result of a jth layer deconvolution layer after the operation result of the ith layer of the convolutional layer of the neural network is obtained by operation, obtaining an operation result of a jth layer of deconvolution layer obtained by inputting a 1 st frame image of a group to which the current frame image belongs into the neural network, and directly carrying out image fusion on the operation result of the ith layer of convolutional layer and the operation result of the jth layer of deconvolution layer to obtain a scene segmentation result of the current frame image; wherein i and j are natural numbers.

13. The apparatus of claim 12, wherein the determination module is further adapted to:

determining the contour information of a specific object according to the scene segmentation result; calculating the relative position relation between the vehicle and the specific object according to the contour information of the specific object; and determining a driving route and/or a driving instruction according to the calculated relative position relation.

14. The apparatus according to claim 13, wherein the relative positional relationship of the own vehicle and the specific object includes distance information and/or angle information between the own vehicle and the specific object.

15. The apparatus of claim 12, wherein the determination module is further adapted to:

16. The apparatus of claim 12, wherein the determination module is further adapted to:

17. The apparatus of claim 12, wherein the means for segmenting further comprises:

18. The apparatus of claim 17, wherein the means for segmenting further comprises:

19. The apparatus according to claim 18, wherein the second segmentation unit is further adapted to:

20. The apparatus of claim 19, wherein the first segmentation unit is further adapted to:

21. The apparatus according to claim 20, wherein the second segmentation unit is further adapted to:

22. The apparatus of claim 21, wherein the video comprises n frame images per group; wherein n is a fixed preset value.

23. A computing device, comprising: the system comprises a processor, a memory, a communication interface and a communication bus, wherein the processor, the memory and the communication interface complete mutual communication through the communication bus;

the memory is used for storing at least one executable instruction, and the executable instruction causes the processor to execute the operation corresponding to the scene segmentation based automatic driving processing method according to any one of claims 1-11.

24. A computer storage medium having stored therein at least one executable instruction for causing a processor to perform operations corresponding to the scene segmentation based autopilot processing method of any one of claims 1-11.