CN113538527A

CN113538527A - Efficient lightweight optical flow estimation method

Info

Publication number: CN113538527A
Application number: CN202110773332.3A
Authority: CN
Inventors: 吴飞; 胡毅轩; 熊玉洁; 朱海; 张玉金
Original assignee: Shanghai University of Engineering Science
Current assignee: Shanghai University of Engineering Science
Priority date: 2021-07-08
Filing date: 2021-07-08
Publication date: 2021-10-22
Anticipated expiration: 2041-07-08
Also published as: CN113538527B

Abstract

The invention relates to the technical field of image processing, and aims to provide an efficient lightweight optical flow estimation method which adopts a six-layer pyramid network structure, the method comprises the steps of respectively carrying out downsampling on input image pairs to form a six-layer pyramid structure, and specifically comprises a feature input layer, a distortion layer, a cost calculation layer, a decoupling optical flow estimation layer, an optical flow estimation layer with an improved pyramid structure and a large displacement estimation layer, wherein the output end of the cost calculation layer is connected with the decoupling optical flow estimation layer, the output end of the decoupling optical flow estimation layer is connected with the large displacement estimation layer to obtain optical flow prediction after large displacement change, the output end of the feature input layer is directly connected with the decoupling optical flow estimation layer to obtain initial optical flow prediction, the weight of each parameter in a model is updated, the same estimation accuracy is kept, the optical flow estimation operation performance is greatly improved, and meanwhile, the condition that the accuracy is reduced when an object is provided with a shelter is improved.

Description

Efficient lightweight optical flow estimation method

Technical Field

The invention relates to the technical field of image processing, in particular to an efficient lightweight optical flow estimation method.

Background

Light flow, as the name implies, the flow of light. Such as meteor staring in the night sky as perceived by the human eye. In computer vision, movement of an object in an image is defined, which movement may be caused by camera movement or object movement. Specifically, the amount of movement of a pixel point representing the same object (object) in one frame of a video image to the next frame is represented by a two-dimensional vector. The dense optical flow describes the optical flow of each pixel of the image moving to the next frame.

Optical flow estimation is a classical problem in computer vision. It is widely used in many fields such as motion tracking, motion recognition, video segmentation, three-dimensional reconstruction, video repair, etc. In 1981, horns and Schunck proposed the basic conservation assumption of optical flow and the corresponding optical flow estimation method for the first time, and then they proposed an energy minimization method using an energy function to couple the invariance of brightness and the spatial smoothness, which is the calculation method with the highest accuracy at that time. The method has a large calculation amount and cannot meet the requirement of real-time application. Brox et al then derived theoretically the warp-based optical flow estimation method. Sun et al improved on the method and model of Horn and Schunck and proposed a non-local term to recover the motion details. The computing method based on the Flowfields local matching has high computing precision, but still has large computing amount. The variational method is still a popular optical flow calculation method in the present day, however, the variational method needs to solve a complex optimization problem and does not meet the basic requirement of a real-time program.

With the advancement and development of deep learning techniques, many conventional image problems employ Convolutional Neural Networks (CNNs). There are many correlation algorithms that use CNNs in optical flow estimation. In supervised learning, FlowNet of Dosovitskiy et al creatively applies a U-Net loop network architecture to optical flow estimation based on an optical flow estimation model of supervised learning, uses an Encoder-Decoder network architecture, and proposes to calculate the cost of features between image pairs so as to generate a link between two frames, thereby proving the feasibility of directly estimating the optical flow of an image sequence by a convolutional neural network. In order to solve the problems of low accuracy of FlowNet and inaccurate prediction of small displacement, FlowNet2 in 2017 adopts a method of stacking FlowNet C and FlowNet S models to solve the problems, so that the accuracy of optical flow is greatly improved, but the models need a storage space of 640MB, and the operation speed is not high enough, so that the FlowNet model is not suitable for being used in a mobile terminal and an embedded device. Ranjan and Black combine the idea of a classical spatial pyramid with a convolutional neural network, and provide a SpyNet network model, so that model parameters are reduced remarkably, but the SpyNet network has a single structure, so that the estimation accuracy is low although the operation speed is high. In 2018, Sun et al propose a PWC-Net network structure, and the input of the PWC-Net network structure adopts a pyramid structure, so that the confidence of an input feature map is improved, and a method for forming an optical flow feature map by an image pair is redesigned according to a warped optical flow estimation theory. Finally, a hole convolution estimation network is adopted, so that the small-displacement optical flow estimation also has good estimation precision on the network. PWC-Net also improves accuracy over SpyNet while reducing time consumption. In 2019, VCNs of Yang and Deva propose different image matching modes and have good accuracy. The literature carries out matching correction on the front frame and the rear frame of the occlusion area to solve the problem of optical flow estimation of the partial occlusion area. IRR-PWC is an improvement of PWC-Net, and the accuracy of optical flow estimation is improved by mainly fusing information of a few frames before and after through an iterative idea, but the estimation speed is relatively slow.

The unmanned system has been receiving much attention as a recent popular research direction, and if the unmanned vehicle and the unmanned aerial vehicle can be used for autonomous navigation and target tracking, reliability of autonomous operation of the unmanned system can be greatly improved if optical flow estimation data can be used.

Disclosure of Invention

The invention aims to provide an efficient lightweight optical flow estimation method, the existing traditional optical flow method is accurate but has huge calculation amount, and compared with the traditional optical flow calculation method, the optical flow estimation neural network model still cannot meet the real-time use requirement of embedded equipment or mobile terminal equipment although the calculation amount is slightly smaller. The existing optical flow estimation method has the problem that the estimation accuracy is reduced when an obstruction exists, the optical flow estimation method greatly improves the optical flow estimation operation performance while keeping the same estimation accuracy, and improves the condition that the accuracy is reduced when a main estimation object has the obstruction;

the technical scheme adopted by the invention is as follows: in one aspect, a method for efficient lightweight optical flow estimation includes the following steps:

step S1: the unmanned navigation system takes the collected image pair as input and sends the input to the trained pyramid network model, wherein the unmanned navigation system obtains a control instruction of the hardware equipment through the collected image pair;

step S2: the pyramid network model comprises six layers of neural network structures, six layers of pyramids are subjected to convolution twice at the same time to obtain feature maps with six parameters which do not interfere with each other, and the trained pyramid network model outputs high-precision image optical flow estimation results at high speed;

in step S3: and the hardware equipment of the unmanned aerial vehicle navigation system executes and receives the new control instruction.

Preferably, in step S1, the training method of the pyramid network model includes the following steps:

step S21: respectively carrying out downsampling on the input image pair to form a six-layer pyramid structure, wherein the six-layer pyramid structure comprises a feature input layer, a distortion layer, a cost calculation layer, a decoupling optical flow estimation layer, an optical flow estimation layer with an improved pyramid structure and a large displacement estimation layer;

step S22: carrying out common convolution operation on the image pair and sending the image pair to a characteristic input layer, wherein the former image is output to a cost calculation layer through the characteristic input layer, the latter image is output to a distortion layer through the characteristic input layer, and the output end of the distortion layer is connected with the cost calculation layer;

step S23: the output of the cost calculation layer is connected with the decoupling optical flow estimation layer, and the output end of the decoupling optical flow estimation layer is connected with the large displacement estimation layer to obtain the optical flow prediction after the large displacement change;

step S24: the output of the characteristic input layer is also directly connected with the decoupling light stream estimation layer to obtain preliminary light stream prediction;

step S25: and superposing the optical flow predictions generated in the S24 and the S25 to generate a batch of training results, updating the weight of each parameter in the model, and inputting the up-sampled optical flow in the training results into the warping layer as the up-sampled optical flow in the upper-level neural network.

Preferably, in the feature input layer, the input image is downsampled five times, the pixel of each sampling is half of the previous layer, six layers of pyramids are convolved twice at the same time to form a feature map with six parameters that do not interfere with each other, the step length of the first convolution is 2, and the step length of the second convolution is 1.

Preferably, in step S22, the predicted optical flow of the i +1 layer in the pyramid in the distorted layer is flow^l+1Second input feature map of I layer after 2 times of upsampling

Performing a light stream distortion to obtain

Make it closer to the first feature map, and the lowest layer l₆The optical flow estimation is set to be 0, the matching cost calculation layer matches the two processed feature graphs with associated pixels, the correlation between the first graph and the distorted second graph is defined as matching cost, and the matching cost calculation formula of the feature graph pair which is subjected to similar convolution operation is as follows

Where T is the transpose operation and N is the column vector

Is set to a limiting parameter d such that | x1-x2|_∞D is less than or equal to d. Since the movement of one pixel of the top pyramid corresponds to the full resolution image 2^L-1And d of each pyramid layer is reduced in proportion by pixel movement, wherein an image distortion layer and a matching cost calculation layer are calculation layers.

Preferably, the input to the optical flow estimation layer is the matching cost penalty, first image x₁And flow of sampling result on the optical flow prediction graph of the previous layer^lThe output is the optical flow prediction graph of the current layer and the partial weight of the optical flow prediction of the previous layer.

In another aspect, a computer-readable storage medium has stored thereon one or more computer programs which, when executed by one or more processors, implement the efficient lightweight optical flow estimation method as described above.

In another aspect, an efficient lightweight optical flow estimation apparatus includes:

one or more processors;

a computer readable storage medium storing one or more computer programs; the one or more computer programs, when executed by the one or more processors, implement the efficient lightweight optical flow estimation method as described above.

In another aspect, an efficient lightweight optical flow estimation system includes: and calculating an optical flow estimation value in the image through a pyramid network model, wherein the pyramid network model comprises a feature input layer, a distortion layer, a cost calculation layer, a decoupling optical flow estimation layer, an optical flow estimation layer with an improved pyramid structure and a large displacement estimation layer, and the pyramid network model is established based on PWC-Net.

Preferably, during training of the pyramid network model, a random erasing enhancement strategy is applied to randomly select a rectangular region of an image in a training set, and pixels of the rectangular region are erased by using random values in the range of 0-255.

Preferably, the efficient lightweight optical flow estimation method is applied to autonomous navigation and tracking of unmanned vehicles and unmanned planes.

Compared with the prior art, the invention has the beneficial effects that:

1. the parameters of the optical flow estimation model are reduced, and the estimation speed of the model is greatly accelerated;

2. by reducing the negative influence of the low-resolution condition on high-resolution estimation, the convergence speed of the model and the final convergence effect of the model are improved;

3. by analyzing the real optical flow value, the data erasure is carried out on the key information of the data set, and the model association estimation capability can be effectively improved.

Drawings

FIG. 1 is a schematic diagram of the operation of the present invention;

FIG. 2 is a diagram of the operating principle of the depth separable convolution structure of the present invention;

FIG. 3 is a schematic illustration of the principle of depth separability in one embodiment of the present invention;

FIG. 4 is a graph of the predicted effect of the model in an embodiment of the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention are clearly and completely described below with reference to fig. 1 of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all embodiments. All other implementations made by those of ordinary skill in the art based on the embodiments of the present invention are obtained without inventive efforts.

In the description of the present invention, it is to be understood that the terms "counterclockwise", "clockwise", "longitudinal", "lateral", "up", "down", "front", "back", "left", "right", "vertical", "horizontal", "top", "bottom", "inner", "outer", etc., indicate orientations or positional relationships based on those shown in the drawings, and are used for convenience of description only, and do not indicate or imply that the referenced devices or elements must have a particular orientation, be constructed and operated in a particular orientation, and thus, are not to be considered as limiting.

Example 1:

a high-efficiency lightweight optical flow estimation method totally adopts a six-layer pyramid network structure, the results after six-layer down-sampling are input into six same neural network structures, and the final output result of the next layer is input into the cost estimation layer of the previous layer.

Firstly, the input image pair is respectively downsampled to form a six-layer pyramid structure, and then the six-layer pyramid structure of the image pair is respectively subjected to common convolution operation to form an input characteristic diagram. And secondly, warping (warping) the next input picture of the image pair to compensate partial warping caused by deformation caused by partial shooting and other hardware, so that the input becomes more accurate and smooth. And thirdly, calculating the cost of each layer of the pyramid input feature map, and connecting the two feature input maps. And fourthly, performing depth separable convolution on the feature map obtained in the previous step, and then performing up-sampling to generate a preliminary optical flow estimation map. And fifthly, bringing the feature map of the previous step into a context network to predict the large displacement change of the optical flow. And finally, superposing the optical flow predictions generated in the fourth step and the fifth step to generate a batch of training results, comparing the training results with the real optical flow labels, and reversely propagating the generated loss value for updating the weight of the parameters in the model.

A characteristic input layer: the PWC downsamples the input image pair five times, and each sampled pixel is half of the previous layer, thereby forming a six-layer pyramid structure. And carrying out convolution on the six layers of pyramids for two times at the same time to form a characteristic diagram with six parameters not interfering with each other, wherein the step length of the convolution for the first time is 2, and the step length of the convolution for the second time is 1. The convolution characteristic diagram is used as an input layer, so that the confidence coefficient of the following network structure in calculation is improved.

Image warping computation layer: optical flow predicted by pyramid l +1 layer through image distortion layer^l+1Second input feature map of I layer after 2 times of upsampling

Performing a light stream distortion to obtain

Make it closer to the first feature map, and the lowest layer l₆Is set to 0. This algorithmThe method can improve certain geometric distortion, overcome the influence of large displacement movement and shielding on optical flow estimation, and enable input to be smoother.

Matching cost calculation layer: and the matching cost calculation layer matches the two processed feature maps with associated pixels. PWC-Net proposes a new matching cost calculation method, and defines the correlation between the first graph and the distorted second graph as the matching cost. The matching cost calculation formula of the feature graph pair similar convolution operation is as follows:

in formula (1), T is the transpose operation and N is the column vector

The number of the cells. To avoid too large a calculation amount, the calculation method sets the limiting parameter d such that | x1-x2_∞D is less than or equal to d. Since the movement of one pixel of the top pyramid corresponds to the full resolution image 2^L-1The pixel shift, d of each layer of pyramid will scale down. The image distortion layer and the matching cost calculation layer are calculation layers, and weight parameters do not need to be trained, so that the size and the parameter quantity of the model are reduced.

Decoupling the improved optical flow estimation layer: the input to the optical flow estimation layer is the matching cost corr, the first image x₁And flow of sampling result on the optical flow prediction graph of the previous layer^lThe output is the optical flow prediction graph of the current layer and the partial weight of the optical flow prediction of the previous layer. The infrastructure of this layer refers to the sub-structure PWC-Net-s of PWC-Net, which reduces the residual structure in the conventional structure. And 5 depth separable convolution structures replace the conventional convolution in the original structure, each depth separable convolution structure reference is composed of a depth separable convolution layer, a BN layer and a LeakyReLU layer, and the preposed BN layer and the ReLU layer can transfer the gradient of a deep layer to an arbitrary shallow layer when propagating reversely, so that the depth separable principle is shown in a schematic diagram 3. No matter the parameters are small, the gradient disappears. Conventional convolutional layers are decoupled intoThe depth separable convolutional layer can greatly reduce the parameter quantity of the model, and meanwhile, the expression capability of the optical flow estimation layer is kept. Depth separable convolutional layers in a network are shown below.

Pyramid-structured improved optical flow estimation layer: in the network structure of the PWC-Net-s low-layer pyramid, the data volume of the characteristics and the parameter quantity caused by down-sampling is small, and the error can not be converged to a lower level all the time. When such optical flow prediction data is used for training input in an upper network and the weight of information is too high, it becomes an interference term in the later stage of convergence of the network, which becomes a factor that prevents the network from converging to a lower value. Therefore, the text network adds a pyramid-based weighting coefficient of the layer number l to the input used for upper network training_σ：

output^l＝net(input^l)+l×σ×flow^l-1 (7)

As shown in fig. 1, the weight of the result obtained after upsampling the previous optical flow of the input and output is increased along with the increase of the pyramid layer number. The weight coefficient σ is calculated as follows:

σ＝k×U (8)

u is the average endpoint error value of the layer 0 optical flow prediction data after training with the previous training data set, k is a constant, and k is an empirical value of 1.1. The input and output of the first data light collecting flow estimating layer are the same as those in the literature.

Large displacement optical flow estimation layer: and a large displacement estimation layer is used as a post-processing network layer of the network model, so that the estimation precision of the large displacement optical flow is further improved. The large displacement estimation layer is formed by the hole convolution, so that the pixel information acquisition range can be improved, and the grid effect caused by the hole convolution is reduced. The large-displacement optical flow estimation network formed by the cavity convolution can effectively increase the receptive field of the network on the basis of not increasing the parameter number, and improve the correlation among the remote pixels of the convolution characteristic image. The large displacement estimation layer does not use a residual error network structure, and high-frequency signals generated by a grid effect caused by cavity convolution are prevented from continuously propagating downwards.

And (3) defining theta as a set of trainable parameters in the neural network, wherein the set of trainable parameters comprises a feature pyramid layer, an optical flow estimation layer and a large displacement optical flow estimation layer. Wherein the warp layer and the cost-penalty layer do not contain trainable parameters, only the computation layer. Definition of

Defining the predicted optical flow for the first layer pyramid

Is the corresponding true optical flow value. The calculation of the loss value includes a multiscale endpoint error loss equation:

the real optical flow in the real world is very difficult to acquire and cannot be manually labeled by a human. Butler et al automatically produced image pairs and associated light flow graphs via a game engine, but the amount of data for optical flow learning is still too small, since the optical flow dataset is an image dataset, only 2 tens of thousands of data occupy 75G capacity. However, the data in two ten-thousandths is not sufficient to train a good model. In order to make up for the defect of non-real data of the data set, the image pair and the optical flow data thereof are subjected to the same random cutting and random rotation, and the image mirror image color enhancement and noise superposition are performed. There is a need to expand the samples of the data set using data enhancement methods in order to improve the robustness of the model and reduce the risk of over-fitting.

Improved image and object-aware random wipe based on optical flow truth values:

the random erasure enhancement strategy is to randomly select a rectangular region of the image in the training set and erase its pixels using random values in the range of 0-255. Generating a training image with an occlusion level also reduces the risk of over-fitting and makes the model somewhat robust to occlusions. A method of image and object aware random erasure (I + ORE) is used. And reading the cut label optical flow data and detecting the boundary of the object in the optical flow graph. And selecting the area position of the random erasing block of the image in the boundary, wherein the size of the area position is 0.02-0.15 times of the resolution of the image, the length-width ratio is uniformly distributed between 0.33 and 3.33, the random numbers are taken, and the random pixel values are used as masks to fill the image in the erasing area. The method can effectively shield partial key information and improve the associative learning capability of the network.

The total improvement is as follows: the optical flow estimation speed is greatly improved without reducing the optical flow estimation accuracy.

The improvement point 1 reduces the parameters of the optical flow estimation model and greatly accelerates the estimation speed of the model.

The principle of the depth separable convolution is as shown in FIG. 2, which first performs convolution on convolution layers of different channels, and then performs convolution on convolution layers of different channels by C_outEach size is 1 × 1 × C_inThe output feature map still conforms to the convolution output of the conventional convolution. K_h,K_wLength and width of the convolution kernel, C, respectively_in,C_outThe number of input and output channels is respectively. F_hThen the length of the feature map minus the length of the convolution kernel, F_wIs the width of the feature map minus the width of the convolution kernel.

Parameter quantities of conventional convolution:

P_conv＝K_h×K_w×C_in×C_out (10)

parameter quantity of depth separable convolution:

P_depth＝K_h×K_w×C_in

P_point＝1×1×C_in×C_out

P_dsconv＝P_depth+P_point (11)

the parameter quantities of the depth separable convolution are linearly superimposed by the parameter quantities of the depth-wise convolution and the point-wise convolution, and the parameter quantities are obviously smaller than those of the conventional convolution.

The amount of calculation of the conventional convolution:

C_conv＝K_h×K_w×C_in×C_out×F_h×F_w (12)

the amount of computation of the depth separable convolution:

C_depth＝K_h×K_w×C_in×F_h×F_w

C_point＝C_in×C_out×F_h×F_w

C_dsconv＝C_depth+C_point (13)

by comparison, we can derive the reduction ratio of the calculated amount after replacing the conventional convolution with the depth separable convolution:

the network expression ability of this structure is demonstrated in the literature to be substantially similar to conventional convolution. In MobileNet, network decoupling using a deep separable convolutional layer reduces the amount of computation to one ninth of the original, while the recognition accuracy only drops by 1.7%.

The improvement point 2 improves the model convergence speed and the final convergence effect of the model by reducing the negative influence of the low resolution situation on the high resolution estimation.

The optical flow estimation model under low resolution always converges to a higher value, and when the error brought by the low resolution is larger than the convergence loss under high resolution, the optical flow data under low resolution will be an interference term for convergence of the optical flow model under high resolution. Reducing the weight on the high resolution in the low resolution case reduces the loss of the high resolution model and thus improves accuracy.

The improvement point 3 can effectively improve the model association estimation capability by analyzing the real optical flow value so as to erase the data of the key information of the data set.

A new data enhancement technique randomly erases, in training, a rectangular area of an image is randomly selected and its pixels are erased using random values. Generating a training image with an occlusion level reduces the risk of over-fitting and makes the model robust to occlusions. An improvement herein is to use the optical flow values to find the wipe interest points. The calculation time of the interest points is reduced, and the accuracy of selecting the interest points is improved, so that the optical flow estimation effect on the partially-shielded object is improved.

It should be noted that the random erase algorithm of the present invention is:

algorithm 1 random Erase step

Inputting:

input image I

Length H and width w of the image

Area s of the image

Probability of erasure p

Ratios S _1 and S _ h (upper and lower limits) of erased areas

Aspect ratios for erasure r _1 and r _2 (upper and lower limits)

And (3) outputting:

erased image I

Initialization p1 is a random number between (0, 1).

It is worth mentioning a processor and a computer readable storage medium storing a computer program which, when executed by the processor, implements the control method of the multifunctional smart home capable of the present invention. Because the program logic of each step is different, a special processor or a general-purpose chip can be adopted to execute the corresponding step, so that the processing efficiency of the whole program is improved, and the cost is reasonably controlled. Therefore, those skilled in the art can adaptively design and adjust the optical flow calculation according to the specific application.

In summary, the implementation principle of the invention is as follows: by building the pyramid network model, the change of optical flow transfer in the image is obtained, a new estimation method with the calculation speed superior to that of the existing calculation model is obtained by training and optimizing the model, please refer to fig. 4, and fig. 4 shows the optimized image effect.

Claims

1. An efficient lightweight optical flow estimation method, characterized by comprising the steps of:

in step S3: and determining the shielded perception object in the image according to the image optical flow estimation result, and executing and receiving a new control instruction by hardware equipment of the unmanned aerial vehicle navigation system to avoid or approach the perception object.

2. The method for efficient lightweight optical flow estimation according to claim 1, wherein in step S1, the method for training the pyramid network model comprises the following steps:

3. The method as claimed in claim 2, wherein in the feature input layer, the input image is downsampled five times, each time the pixel of sampling is half of the previous layer, six layers of pyramids are convolved twice at the same time to form a feature map with six parameters that do not interfere with each other, the first convolution step is 2, and the second convolution step is 1.

4. The method as claimed in claim 2, wherein the step S22 is performed by warping the predicted optical flow of the l +1 layer in the pyramid^l+1Second input feature map of I layer after 2 times of upsampling

Performing a light stream distortion to obtain

Where T is the transpose operation and N is the column vector

Is determined by setting a limiting parameter d such that | x1-x2 |)_∞D is less than or equal to d. Since the movement of one pixel of the top pyramid corresponds to the full resolution image 2^L-1And d of each pyramid layer is reduced in proportion by pixel movement, wherein an image distortion layer and a matching cost calculation layer are calculation layers.

5. An efficient lightweight optical flow estimation method as claimed in claim 4, wherein the input to the optical flow estimation layer is a matching cost, the first image x₁And flow of sampling result on the optical flow prediction graph of the previous layer^lThe output is the optical flow prediction graph of the current layer and the partial weight of the optical flow prediction of the previous layer.

6. A computer-readable storage medium, having one or more computer programs stored thereon, which when executed by one or more processors implement the method for efficient lightweight optical flow estimation as claimed in any of claims 1 to 5.

7. An efficient lightweight optical flow estimation apparatus, comprising:

one or more processors;

a computer readable storage medium storing one or more computer programs; the one or more computer programs, when executed by the one or more processors, implement the efficient lightweight optical flow estimation method as recited in any of claims 1-5.

8. An efficient lightweight optical flow estimation system, comprising: and calculating an optical flow estimation value in the image through a pyramid network model, wherein the pyramid network model comprises a feature input layer, a distortion layer, a cost calculation layer, a decoupling optical flow estimation layer, an optical flow estimation layer with an improved pyramid structure and a large displacement estimation layer, and the pyramid network model is established based on PWC-Net.

9. The system of claim 8, wherein the pyramid network model is trained by applying a random erasure enhancement strategy to randomly select rectangular regions of images in the training set, and erasing their pixels using random values from 0 to 255.

10. Use of the efficient lightweight optical flow estimation method according to claims 1-5 for autonomous navigation and tracking of unmanned vehicles, drones.