CN112634324A

CN112634324A - Optical flow field estimation method based on deep convolutional neural network

Info

Publication number: CN112634324A
Application number: CN202011416778.2A
Authority: CN
Inventors: 韩荣; 蒋伟; 许祎晗
Original assignee: China University of Geosciences
Current assignee: China University of Geosciences
Priority date: 2020-12-07
Filing date: 2020-12-07
Publication date: 2021-04-09

Abstract

The invention provides an optical flow field estimation method based on a deep convolutional neural network, which comprises the following specific steps of: constructing Optical-Flow-Net-S and Optical-Flow-Net-C, namely OFNS and OFNC, on the basis of a convolutional neural network; an Optical Flow field estimation model Optical-Flow-Net-Mix, ODNM for short, is constructed on the basis of OFNS and OFNC; the method comprises the following steps of pre-training and retraining an OFNM model by using optical flow data sets Flying Chairs and Flying Things3D in sequence; the OFNM model designed by the invention effectively improves the precision of the optical flow field estimation of the moving target.

Description

Optical flow field estimation method based on deep convolutional neural network

Technical Field

The invention relates to the technical field of unmanned driving and artificial intelligence, in particular to an optical flow field estimation method based on a deep convolutional neural network.

Background

Since the first automobile came into existence, mankind strived to realize the dream of unmanned driving, companies such as ***, bmoss, foyota, tesla, hundredth, young peng, yuelai, and shang vapour have started to carry out large-scale unmanned road test experiments in recent three years. At present, there are two main ways of detecting a moving target by using a camera: one is based on the traditional computer vision algorithm, the lane line, the known geometric characteristics, texture, color and the like of the vehicle are used for detection, or the optical flow field of the adjacent frame images of the video is estimated by using the optical flow method to obtain the motion information of the target. The other method is based on a deep neural network method for detection, and the moving target can be estimated more accurately by training the deep neural network to enable the deep neural network to learn multi-level characteristics required by self. The method for reversely reconstructing the three-dimensional motion scene based on the two-dimensional image video information provided by the vehicle-mounted camera is a typical underdetermined solving problem, and when strong shielding, uniform large and small displacement and high dynamic change exist in a target in an unmanned environment, the establishment of an accurate, rapid and robust optical flow estimation model still faces a great challenge.

Disclosure of Invention

In view of this, the present invention provides an optical flow field estimation method based on a deep convolutional neural network, including the following steps:

s1, using optical flow data sets FlyingChairs and FlyingThings3D as training data sets;

s2, constructing an Optical-Flow-Network-S, OFNS model for short, on the basis of the convolutional neural Network;

s3, constructing an Optical-Flow-Network-C (OFNC model for short) based on the convolutional neural Network;

s4, constructing an Optical-Flow-Network-Mix, namely an OFNM model for short on the basis of the model constructed by S2 and S3;

s5, the OFNM model constructed in S4 is pre-trained by using the optical flow data set FlyingChairs in S1 to obtain a pre-trained model, the pre-trained model is re-trained by using the optical flow data set FlyingThings3D to obtain parameters of the model, and then the trained model is used for testing test data.

The technical scheme provided by the invention has the beneficial effects that: the precision of the estimated optical flow field and the definition of the extracted target contour are greatly improved, the performance of the method is comparable to that of the EpicFlow of the traditional method with the best performance, and even exceeds that of the EpicFlow on a part of data sets.

Drawings

FIG. 1 is a flow chart of an optical flow field estimation method based on a deep convolutional neural network according to the present invention;

FIG. 2 is a schematic diagram of an OFNS feature extraction part in the optical flow field estimation method based on the deep convolutional neural network of the present invention;

FIG. 3 is a schematic diagram of the OFNS optical flow field estimation part in the optical flow field estimation method based on the deep convolutional neural network of the present invention;

FIG. 4 is a schematic diagram of an OFNC feature extraction part in the optical flow field estimation method based on the deep convolutional neural network of the present invention;

FIG. 5 is a schematic diagram of the correlation between two image blocks calculated by a correlation layer in the optical flow field estimation method based on the deep convolutional neural network of the present invention;

FIG. 6 is a schematic diagram of the relevance of two characteristic graphs calculated by a relevance layer in the optical flow field estimation method based on the deep convolutional neural network of the present invention;

FIG. 7 is a schematic diagram of a portion of an OFNM optical flow field estimation in an optical flow field estimation method based on a deep convolutional neural network according to the present invention;

FIG. 8 is a schematic diagram of the learning rate and the iteration number of OFNM training in the optical flow field estimation method based on the deep convolutional neural network.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, embodiments of the present invention will be further described with reference to the accompanying drawings.

As shown in fig. 1, the optical flow field estimation method based on the deep convolutional neural network provided by the present invention specifically includes the following steps:

s1: the optical-flow datasets FlyingChairs and FlyingThins 3D used were both synthetic scene optical-flow datasets, a public dataset made by Freiburg university, Germany, each sample of the dataset containing two images, one image pair and their corresponding optical-flow fields.

S2: and constructing an OFNS model based on the convolutional neural network.

The OFNS is divided into a feature extraction part and an optical flow field estimation part, wherein the feature extraction part is composed of convolution layers and aims to extract abstract features of images, a feature diagram gradually shrinks, as shown in FIG. 2, the feature extraction part is provided with nine convolution layers, and ReLU operation is carried out after each layer; this network has no fully connected layers, and therefore cannot input pictures of any size, and as the number of convolution layers increases, the size of the convolution kernel gradually decreases, and the number of feature map channels increases twice per layer (except conv3_1, conv4_1, conv5_1, and conv 6). conv3_1, conv4_1, conv5_1 and conv6 were used as the optical flow field estimation portion shown in fig. 3, and the feature map size was unchanged after convolution.

CNN excels in extracting highly abstract features of images by cross-convolution and pooling, i.e., spatially reduced feature maps, pooling is necessary in training, however pooling reduces resolution, while optical flow requires intensive pixel-by-pixel prediction, thus requiring fine-tuning of pooled feature maps, and the resolution of the optical flow map is reduced due to reduced resolution, and only the original map 1/64 is required for the resolution of the optical flow map, thus requiring processing of the optical flow field estimation part of fig. 3 to improve the optical flow map resolution.

The estimation part of the OFNS optical flow field shown in fig. 3 mainly comprises a transposed convolution layer, and performs a transposed convolution operation backward by using conv3_1, conv4_1, conv5_1 and conv6 obtained by the feature extraction part, and at the same time, directly predicts the optical flow field on a small feature map, and then combines the prediction result with the feature map after the transposed convolution after bilinear interpolation, and then continues to perform the same operation four times backward, and the resolution of the predicted optical flow field is gradually expanded to 1/4 of the original image, and finally obtains the optical flow field with full resolution by directly bilinear interpolation on the basis of 1/4 optical flow field.

S3: and constructing an OFNC model on the basis of the convolutional neural network.

As shown in fig. 4, two image features are first extracted from the OFNC framework, and then fused in the higher level association layer, and the association layer compares all image blocks in the two-path feature map, as shown in fig. 5, where (a) is f₁Is given by x₁A centered image block, (b) f₂Is given by x₂An image block centered; given two profiles f from two process streams respectively₁And f₂The association layer is let the network compare f₁Each image block of (a) and (f)₂Each of which isImage block, f₁Is given by x₁Image block centered and f₂Is given by x₂The association between the image blocks as centers is defined as:

wherein o represents x₁And x₂K is x₁And x₂The size of the neighborhood of (2) K + 1; this operation is similar to the operation of the convolutional layer, but rather than convolving the data with a filter, the data is convolved with data, so its weights do not need to be learned.

Characteristic point x₁And x₂The relative position of (a) is uncertain, if not limited, the total calculation amount of the associated layer is large, and then a maximum value is defined for the relative distance, which is the maximum displacement d of the feature point, as shown in fig. 6, (a) is f₁Is (b) a certain channel of (a), (b) is f₂A certain channel x of₁Defines a maximum displacement zone only when x₂The correlation calculation is performed when the displacement area is located.

S4: the OFNM model was constructed based on the models constructed at S2 and S3.

As shown in fig. 7, the OFNC and the OFNS network are cascaded as sub-networks to form a large network OFNM, and warp operation is introduced between the sub-networks to cascade the sub-networks, so as to obtain the OFNM network.

The input picture is passed through a sub-network to obtain an estimated optical flow field F':

I₁(x_i,y_i)&I₂(x_i,y_i)→F_i'(u′_i,v′_i) (2)

wherein, I₁And I₂Two images representing a pair of input images, respectively, (x)_i,y_i) Represents a pixel (u ') in an image'_i,v′_i) Representing the optical flow in the optical flow field, and then, for a second frame image I₂And F' are subjected to warp operation, i.e. toQuantity subtraction:

the essence of warp operation is in pair I₁The reconstruction is carried out by the user,

is I₂And F' reconstructed first frame image, the input of which will no longer be the first frame image I for the next level subnetwork₁And a second frame image I₂But the first frame image I₁And a reconstructed first frame image

After warp operation, the next level of sub-network can focus on I₁And

relative to the flow of light of I₁And I₂，I₁And

the feature points of (a) are more easily matched.

The OFNM uses the sum of the multiple levels of weighted loss as the overall cost function value, and the calculation flow is as follows:

s41, loading an image pair and an optical flow field true value through a network;

s42, carrying out forward propagation, and carrying out partial sampling on the expansion part to obtain a low-resolution estimated optical flow field;

s43, downsampling the optical flow field truth value to obtain the optical flow field truth value with the same resolution;

s44, calculating average EPE according to the estimated optical flow field and the real value of the optical flow field;

s45, calculating an L1 norm regularization weight attenuation term, and adding the L1 norm regularization weight attenuation term and the average EPE to obtain flow _ loss;

s46, repeating the operations of S42-S45 to obtain the optical flow field flow _ loss with gradually improved resolution;

s47: the five flow _ loss are weighted and summed to obtain the overall loss.

The loss _ weight is the weight of flow _ loss, which represents the importance of the loss generated by different layers, and 5 loss _ lights are fixed values, which are 0.32, 0.08, 0.02, 0.01 and 0.005 respectively.

Let LW (i) be 5 Loss _ weight and FS (i) be 5 flow _ Loss, the formula for calculating the cost function Loss of OFNM is as follows:

wherein, FS (flow _ loss), FS (i), represents flow _ loss generated by the i-th layer of the OFNM optical flow field estimation portion, LW (loss _ weight), LW (i), represents the weight of FS (i), and 5 loss _ lights are fixed values, which are 0.32, 0.08, 0.02, 0.01, and 0.005, respectively.

The calculation formula of FS is as follows:

wherein, F_r(u, v) represents the true value, F 'of the optical flow field after down-sampling'_r(u, v) represents the estimated optical flow field obtained by upsampling, | F_r(u,v)-F′_r(u, v) | | represents the difference between the two optical flow vectors, i.e., the EPE error. N represents the size of a batch in the OFNM network training process, lambda is a regularization coefficient, and N_lRepresenting the number of layers of the OFNM network, t_lNumber of convolution kernels representing the l-th layer of the OFNM network, f_lNumber of channels, s, representing convolution kernel of layer l of OFNM network_lRepresents the size of the convolution kernel of the l-th layer of the OFNM network,

represents the value of the s-th weight of the f-th channel of the t-th convolution kernel of the l-th layer of the OFNM network.

S5: the OFNM model constructed in S4 is pre-trained and retrained by using the data set flyingtrains in S1 to obtain the parameters of the model, the training mode is as shown in fig. 8, pre-training is started on the flyingtrains at the initial learning rate of 0.0001, pre-training is completed after 1200000 iterations, then retraining is completed on the training set FlyingThings3D at the initial learning rate of 0.00001, and retraining is completed after 500000 iterations.

The performance of the invention in estimating the optical flow field will be demonstrated experimentally.

OFNM was tested on fliingchairs and some other test sets, with the test results shown in table 1:

TABLE 1 EPE comparison of OFNM and other optical flow algorithms on respective test data sets

According to the data in table 1, although the error of OFNM is still not small enough on the complex synthetic dataset site and the real scene dataset KITTI, the error is still lower than that of the traditional method EpicFlow with the best performance, except that the error of OFNM is slightly higher on Middlebury and KITTI, and the error of OFNM is smaller than that of other algorithms on other datasets.

Claims

1. An optical flow field estimation method based on a deep convolutional neural network is characterized by comprising the following steps:

s1, using optical flow data sets Flying Chairs and Flying Things3D as training data sets;

s5, the OFNM model constructed in S4 is pre-trained by using the optical flow data set Flying rules in S1 to obtain a pre-trained model, the pre-trained model is re-trained by using the optical flow data set Flying rules 3D to obtain parameters of the model, and then the trained model is used for testing test data.

2. The method for estimating the optical flow field based on the deep convolutional neural network as claimed in claim 1, wherein the optical flow datasets Flying channels and Flying Things3D used in step S1 are synthetic scene optical flow datasets, and each sample of the dataset comprises two images, i.e. one image pair and their corresponding optical flow field.

3. The method for estimating the optical flow field based on the deep convolutional neural network as claimed in claim 1, wherein the step S2 is as follows:

s21, constructing a feature extraction part of the OFNS, wherein the feature extraction part comprises nine convolution layers, ReLU activation operation is carried out after each convolution layer, the size of convolution kernels is gradually reduced along with the increase of the number of the convolution layers, and the number of the convolution kernels is gradually increased by two for each convolution layer;

s22, constructing an optical flow field estimation part of the OFNS, wherein the optical flow field estimation part consists of a transposition convolution layer, performing transposition convolution operation backwards by using a feature map obtained by a feature extraction part, directly predicting an optical flow field on a small feature map, then combining a prediction result with the feature map after transposition convolution after bilinear interpolation, continuing to perform the same operation for four times backwards, gradually expanding the resolution of the predicted optical flow field to 1/4 of the original image, and finally directly performing bilinear interpolation on the basis of a 1/4 optical flow field to obtain an optical flow field with full resolution;

and S23, combining the feature extraction part and the optical flow field estimation part to obtain an OFNS model.

4. The method for estimating the optical flow field based on the deep convolutional neural network as claimed in claim 3, wherein the step S3 is as follows:

s31, constructing a feature extraction part of the OFNC, wherein the feature extraction part firstly creates two independent but same processing flows aiming at the image pair and respectively extracts the features of the two images;

s32, carrying out feature fusion at a higher level association layer, comparing all image blocks in the two paths of feature maps extracted in the front by the association layer, and giving two feature maps f from the two paths of processing flows respectively₁And f₂The association layer is let the network compare f₁Each image block of (a) and (f)₂Each image block of f₁Is given by x₁Image block centered and f₂Is given by x₂The association between the image blocks as centers is defined as:

wherein o represents x₁And x₂K is x₁And x₂The size of the neighborhood of (a), the size of the image block is K2 × K +1, the operation is similar to the operation of the convolutional layer;

s33, constructing an optical flow field estimation portion of the OFNC, which is the same as the optical flow field estimation portion of the OFNS in step S22;

and S34, combining the feature extraction part and the optical flow field estimation part to obtain an OFNC model.

5. The method for estimating the optical flow field based on the deep convolutional neural network as claimed in claim 1, wherein the step S4 is as follows:

the OFNC and the OFNS network are used as sub-networks to be cascaded into a large network OFNM, warp operation is introduced between the sub-networks to cascade the sub-networks, and the OFNM network is obtained;

I₁(x_i,y_i)&I₂(x_i,y_i)→F_i'(u′_i,v′_i)

wherein, I₁And I₂Two images representing a pair of input images, respectively, (x)_i,y_i) Watch (A)Showing a pixel, (u ') in an image'_i,v′_i) Representing the optical flow in the optical flow field, for a second frame image I₂And F' performs warp operation, i.e. vector subtraction:

After warp operation, the next level of subnetwork is focused on I₁And

the optical flow of (2).

6. The method for estimating the optical flow field based on the deep convolutional neural network as claimed in claim 1, wherein the specific steps of step S5 are as follows:

s51, pre-training the data set Flying Chairs with an initial learning rate of 0.0001 to obtain a pre-training model;

s52, retraining the pre-trained model on the data set Flying Things3D with an initial learning rate of 0.00001.

7. The optical flow field estimation method based on the deep convolutional neural network as claimed in claim 1, wherein the calculation formula of the cost function Loss of OFNM is as follows:

wherein, FS (flow _ loss), FS (i) indicates flow _ loss generated by the i-th layer of the OFNM optical flow field estimation portion, LW (loss _ weight), LW (i) indicates the weight of FS (i), 5 loss _ lights are fixed values, and are respectively 0.32, 0.08, 0.02, 0.01, and 0.005, and the calculation formula of FS is as follows:

wherein, F_r(u, v) represents the true value, F 'of the optical flow field after down-sampling'_r(u, v) represents the estimated optical flow field obtained by upsampling, | F_r(u,v)-F′_r(u, v) | | represents two optical flow vector differences, namely EPE errors, N represents the size of one batch in the OFNM network training process, lambda is a regularization coefficient, and N is_lRepresenting the number of layers of the OFNM network, t_lNumber of convolution kernels representing the l-th layer of the OFNM network, f_lNumber of channels, s, representing convolution kernel of layer l of OFNM network_lRepresents the size of the convolution kernel of the l-th layer of the OFNM network,