CN113538527A - Efficient lightweight optical flow estimation method - Google Patents

Efficient lightweight optical flow estimation method Download PDF

Info

Publication number
CN113538527A
CN113538527A CN202110773332.3A CN202110773332A CN113538527A CN 113538527 A CN113538527 A CN 113538527A CN 202110773332 A CN202110773332 A CN 202110773332A CN 113538527 A CN113538527 A CN 113538527A
Authority
CN
China
Prior art keywords
layer
optical flow
estimation
flow estimation
image
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202110773332.3A
Other languages
Chinese (zh)
Other versions
CN113538527B (en
Inventor
吴飞
胡毅轩
熊玉洁
朱海
张玉金
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai University of Engineering Science
Original Assignee
Shanghai University of Engineering Science
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai University of Engineering Science filed Critical Shanghai University of Engineering Science
Priority to CN202110773332.3A priority Critical patent/CN113538527B/en
Publication of CN113538527A publication Critical patent/CN113538527A/en
Application granted granted Critical
Publication of CN113538527B publication Critical patent/CN113538527B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/20Analysis of motion
    • G06T7/269Analysis of motion using gradient-based methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/20Analysis of motion
    • G06T7/246Analysis of motion using feature-based methods, e.g. the tracking of corners or segments
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20016Hierarchical, coarse-to-fine, multiscale or multiresolution image processing; Pyramid transform
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Biophysics (AREA)
  • Evolutionary Computation (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Biomedical Technology (AREA)
  • Multimedia (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Image Analysis (AREA)

Abstract

The invention relates to the technical field of image processing, and aims to provide an efficient lightweight optical flow estimation method which adopts a six-layer pyramid network structure, the method comprises the steps of respectively carrying out downsampling on input image pairs to form a six-layer pyramid structure, and specifically comprises a feature input layer, a distortion layer, a cost calculation layer, a decoupling optical flow estimation layer, an optical flow estimation layer with an improved pyramid structure and a large displacement estimation layer, wherein the output end of the cost calculation layer is connected with the decoupling optical flow estimation layer, the output end of the decoupling optical flow estimation layer is connected with the large displacement estimation layer to obtain optical flow prediction after large displacement change, the output end of the feature input layer is directly connected with the decoupling optical flow estimation layer to obtain initial optical flow prediction, the weight of each parameter in a model is updated, the same estimation accuracy is kept, the optical flow estimation operation performance is greatly improved, and meanwhile, the condition that the accuracy is reduced when an object is provided with a shelter is improved.

Description

Efficient lightweight optical flow estimation method
Technical Field
The invention relates to the technical field of image processing, in particular to an efficient lightweight optical flow estimation method.
Background
Light flow, as the name implies, the flow of light. Such as meteor staring in the night sky as perceived by the human eye. In computer vision, movement of an object in an image is defined, which movement may be caused by camera movement or object movement. Specifically, the amount of movement of a pixel point representing the same object (object) in one frame of a video image to the next frame is represented by a two-dimensional vector. The dense optical flow describes the optical flow of each pixel of the image moving to the next frame.
Optical flow estimation is a classical problem in computer vision. It is widely used in many fields such as motion tracking, motion recognition, video segmentation, three-dimensional reconstruction, video repair, etc. In 1981, horns and Schunck proposed the basic conservation assumption of optical flow and the corresponding optical flow estimation method for the first time, and then they proposed an energy minimization method using an energy function to couple the invariance of brightness and the spatial smoothness, which is the calculation method with the highest accuracy at that time. The method has a large calculation amount and cannot meet the requirement of real-time application. Brox et al then derived theoretically the warp-based optical flow estimation method. Sun et al improved on the method and model of Horn and Schunck and proposed a non-local term to recover the motion details. The computing method based on the Flowfields local matching has high computing precision, but still has large computing amount. The variational method is still a popular optical flow calculation method in the present day, however, the variational method needs to solve a complex optimization problem and does not meet the basic requirement of a real-time program.
With the advancement and development of deep learning techniques, many conventional image problems employ Convolutional Neural Networks (CNNs). There are many correlation algorithms that use CNNs in optical flow estimation. In supervised learning, FlowNet of Dosovitskiy et al creatively applies a U-Net loop network architecture to optical flow estimation based on an optical flow estimation model of supervised learning, uses an Encoder-Decoder network architecture, and proposes to calculate the cost of features between image pairs so as to generate a link between two frames, thereby proving the feasibility of directly estimating the optical flow of an image sequence by a convolutional neural network. In order to solve the problems of low accuracy of FlowNet and inaccurate prediction of small displacement, FlowNet2 in 2017 adopts a method of stacking FlowNet C and FlowNet S models to solve the problems, so that the accuracy of optical flow is greatly improved, but the models need a storage space of 640MB, and the operation speed is not high enough, so that the FlowNet model is not suitable for being used in a mobile terminal and an embedded device. Ranjan and Black combine the idea of a classical spatial pyramid with a convolutional neural network, and provide a SpyNet network model, so that model parameters are reduced remarkably, but the SpyNet network has a single structure, so that the estimation accuracy is low although the operation speed is high. In 2018, Sun et al propose a PWC-Net network structure, and the input of the PWC-Net network structure adopts a pyramid structure, so that the confidence of an input feature map is improved, and a method for forming an optical flow feature map by an image pair is redesigned according to a warped optical flow estimation theory. Finally, a hole convolution estimation network is adopted, so that the small-displacement optical flow estimation also has good estimation precision on the network. PWC-Net also improves accuracy over SpyNet while reducing time consumption. In 2019, VCNs of Yang and Deva propose different image matching modes and have good accuracy. The literature carries out matching correction on the front frame and the rear frame of the occlusion area to solve the problem of optical flow estimation of the partial occlusion area. IRR-PWC is an improvement of PWC-Net, and the accuracy of optical flow estimation is improved by mainly fusing information of a few frames before and after through an iterative idea, but the estimation speed is relatively slow.
The unmanned system has been receiving much attention as a recent popular research direction, and if the unmanned vehicle and the unmanned aerial vehicle can be used for autonomous navigation and target tracking, reliability of autonomous operation of the unmanned system can be greatly improved if optical flow estimation data can be used.
Disclosure of Invention
The invention aims to provide an efficient lightweight optical flow estimation method, the existing traditional optical flow method is accurate but has huge calculation amount, and compared with the traditional optical flow calculation method, the optical flow estimation neural network model still cannot meet the real-time use requirement of embedded equipment or mobile terminal equipment although the calculation amount is slightly smaller. The existing optical flow estimation method has the problem that the estimation accuracy is reduced when an obstruction exists, the optical flow estimation method greatly improves the optical flow estimation operation performance while keeping the same estimation accuracy, and improves the condition that the accuracy is reduced when a main estimation object has the obstruction;
the technical scheme adopted by the invention is as follows: in one aspect, a method for efficient lightweight optical flow estimation includes the following steps:
step S1: the unmanned navigation system takes the collected image pair as input and sends the input to the trained pyramid network model, wherein the unmanned navigation system obtains a control instruction of the hardware equipment through the collected image pair;
step S2: the pyramid network model comprises six layers of neural network structures, six layers of pyramids are subjected to convolution twice at the same time to obtain feature maps with six parameters which do not interfere with each other, and the trained pyramid network model outputs high-precision image optical flow estimation results at high speed;
in step S3: and the hardware equipment of the unmanned aerial vehicle navigation system executes and receives the new control instruction.
Preferably, in step S1, the training method of the pyramid network model includes the following steps:
step S21: respectively carrying out downsampling on the input image pair to form a six-layer pyramid structure, wherein the six-layer pyramid structure comprises a feature input layer, a distortion layer, a cost calculation layer, a decoupling optical flow estimation layer, an optical flow estimation layer with an improved pyramid structure and a large displacement estimation layer;
step S22: carrying out common convolution operation on the image pair and sending the image pair to a characteristic input layer, wherein the former image is output to a cost calculation layer through the characteristic input layer, the latter image is output to a distortion layer through the characteristic input layer, and the output end of the distortion layer is connected with the cost calculation layer;
step S23: the output of the cost calculation layer is connected with the decoupling optical flow estimation layer, and the output end of the decoupling optical flow estimation layer is connected with the large displacement estimation layer to obtain the optical flow prediction after the large displacement change;
step S24: the output of the characteristic input layer is also directly connected with the decoupling light stream estimation layer to obtain preliminary light stream prediction;
step S25: and superposing the optical flow predictions generated in the S24 and the S25 to generate a batch of training results, updating the weight of each parameter in the model, and inputting the up-sampled optical flow in the training results into the warping layer as the up-sampled optical flow in the upper-level neural network.
Preferably, in the feature input layer, the input image is downsampled five times, the pixel of each sampling is half of the previous layer, six layers of pyramids are convolved twice at the same time to form a feature map with six parameters that do not interfere with each other, the step length of the first convolution is 2, and the step length of the second convolution is 1.
Preferably, in step S22, the predicted optical flow of the i +1 layer in the pyramid in the distorted layer is flowl+1Second input feature map of I layer after 2 times of upsampling
Figure BDA0003154738780000031
Performing a light stream distortion to obtain
Figure BDA0003154738780000032
Make it closer to the first feature map, and the lowest layer l6The optical flow estimation is set to be 0, the matching cost calculation layer matches the two processed feature graphs with associated pixels, the correlation between the first graph and the distorted second graph is defined as matching cost, and the matching cost calculation formula of the feature graph pair which is subjected to similar convolution operation is as follows
Figure BDA0003154738780000033
Where T is the transpose operation and N is the column vector
Figure BDA0003154738780000034
Is set to a limiting parameter d such that | x1-x2|D is less than or equal to d. Since the movement of one pixel of the top pyramid corresponds to the full resolution image 2L-1And d of each pyramid layer is reduced in proportion by pixel movement, wherein an image distortion layer and a matching cost calculation layer are calculation layers.
Preferably, the input to the optical flow estimation layer is the matching cost penalty, first image x1And flow of sampling result on the optical flow prediction graph of the previous layerlThe output is the optical flow prediction graph of the current layer and the partial weight of the optical flow prediction of the previous layer.
In another aspect, a computer-readable storage medium has stored thereon one or more computer programs which, when executed by one or more processors, implement the efficient lightweight optical flow estimation method as described above.
In another aspect, an efficient lightweight optical flow estimation apparatus includes:
one or more processors;
a computer readable storage medium storing one or more computer programs; the one or more computer programs, when executed by the one or more processors, implement the efficient lightweight optical flow estimation method as described above.
In another aspect, an efficient lightweight optical flow estimation system includes: and calculating an optical flow estimation value in the image through a pyramid network model, wherein the pyramid network model comprises a feature input layer, a distortion layer, a cost calculation layer, a decoupling optical flow estimation layer, an optical flow estimation layer with an improved pyramid structure and a large displacement estimation layer, and the pyramid network model is established based on PWC-Net.
Preferably, during training of the pyramid network model, a random erasing enhancement strategy is applied to randomly select a rectangular region of an image in a training set, and pixels of the rectangular region are erased by using random values in the range of 0-255.
Preferably, the efficient lightweight optical flow estimation method is applied to autonomous navigation and tracking of unmanned vehicles and unmanned planes.
Compared with the prior art, the invention has the beneficial effects that:
1. the parameters of the optical flow estimation model are reduced, and the estimation speed of the model is greatly accelerated;
2. by reducing the negative influence of the low-resolution condition on high-resolution estimation, the convergence speed of the model and the final convergence effect of the model are improved;
3. by analyzing the real optical flow value, the data erasure is carried out on the key information of the data set, and the model association estimation capability can be effectively improved.
Drawings
FIG. 1 is a schematic diagram of the operation of the present invention;
FIG. 2 is a diagram of the operating principle of the depth separable convolution structure of the present invention;
FIG. 3 is a schematic illustration of the principle of depth separability in one embodiment of the present invention;
FIG. 4 is a graph of the predicted effect of the model in an embodiment of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention are clearly and completely described below with reference to fig. 1 of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all embodiments. All other implementations made by those of ordinary skill in the art based on the embodiments of the present invention are obtained without inventive efforts.
In the description of the present invention, it is to be understood that the terms "counterclockwise", "clockwise", "longitudinal", "lateral", "up", "down", "front", "back", "left", "right", "vertical", "horizontal", "top", "bottom", "inner", "outer", etc., indicate orientations or positional relationships based on those shown in the drawings, and are used for convenience of description only, and do not indicate or imply that the referenced devices or elements must have a particular orientation, be constructed and operated in a particular orientation, and thus, are not to be considered as limiting.
Example 1:
a high-efficiency lightweight optical flow estimation method totally adopts a six-layer pyramid network structure, the results after six-layer down-sampling are input into six same neural network structures, and the final output result of the next layer is input into the cost estimation layer of the previous layer.
Firstly, the input image pair is respectively downsampled to form a six-layer pyramid structure, and then the six-layer pyramid structure of the image pair is respectively subjected to common convolution operation to form an input characteristic diagram. And secondly, warping (warping) the next input picture of the image pair to compensate partial warping caused by deformation caused by partial shooting and other hardware, so that the input becomes more accurate and smooth. And thirdly, calculating the cost of each layer of the pyramid input feature map, and connecting the two feature input maps. And fourthly, performing depth separable convolution on the feature map obtained in the previous step, and then performing up-sampling to generate a preliminary optical flow estimation map. And fifthly, bringing the feature map of the previous step into a context network to predict the large displacement change of the optical flow. And finally, superposing the optical flow predictions generated in the fourth step and the fifth step to generate a batch of training results, comparing the training results with the real optical flow labels, and reversely propagating the generated loss value for updating the weight of the parameters in the model.
A characteristic input layer: the PWC downsamples the input image pair five times, and each sampled pixel is half of the previous layer, thereby forming a six-layer pyramid structure. And carrying out convolution on the six layers of pyramids for two times at the same time to form a characteristic diagram with six parameters not interfering with each other, wherein the step length of the convolution for the first time is 2, and the step length of the convolution for the second time is 1. The convolution characteristic diagram is used as an input layer, so that the confidence coefficient of the following network structure in calculation is improved.
Image warping computation layer: optical flow predicted by pyramid l +1 layer through image distortion layerl+1Second input feature map of I layer after 2 times of upsampling
Figure BDA0003154738780000051
Performing a light stream distortion to obtain
Figure BDA0003154738780000052
Make it closer to the first feature map, and the lowest layer l6Is set to 0. This algorithmThe method can improve certain geometric distortion, overcome the influence of large displacement movement and shielding on optical flow estimation, and enable input to be smoother.
Matching cost calculation layer: and the matching cost calculation layer matches the two processed feature maps with associated pixels. PWC-Net proposes a new matching cost calculation method, and defines the correlation between the first graph and the distorted second graph as the matching cost. The matching cost calculation formula of the feature graph pair similar convolution operation is as follows:
Figure BDA0003154738780000061
in formula (1), T is the transpose operation and N is the column vector
Figure BDA0003154738780000063
The number of the cells. To avoid too large a calculation amount, the calculation method sets the limiting parameter d such that | x1-x2D is less than or equal to d. Since the movement of one pixel of the top pyramid corresponds to the full resolution image 2L-1The pixel shift, d of each layer of pyramid will scale down. The image distortion layer and the matching cost calculation layer are calculation layers, and weight parameters do not need to be trained, so that the size and the parameter quantity of the model are reduced.
Decoupling the improved optical flow estimation layer: the input to the optical flow estimation layer is the matching cost corr, the first image x1And flow of sampling result on the optical flow prediction graph of the previous layerlThe output is the optical flow prediction graph of the current layer and the partial weight of the optical flow prediction of the previous layer. The infrastructure of this layer refers to the sub-structure PWC-Net-s of PWC-Net, which reduces the residual structure in the conventional structure. And 5 depth separable convolution structures replace the conventional convolution in the original structure, each depth separable convolution structure reference is composed of a depth separable convolution layer, a BN layer and a LeakyReLU layer, and the preposed BN layer and the ReLU layer can transfer the gradient of a deep layer to an arbitrary shallow layer when propagating reversely, so that the depth separable principle is shown in a schematic diagram 3. No matter the parameters are small, the gradient disappears. Conventional convolutional layers are decoupled intoThe depth separable convolutional layer can greatly reduce the parameter quantity of the model, and meanwhile, the expression capability of the optical flow estimation layer is kept. Depth separable convolutional layers in a network are shown below.
Pyramid-structured improved optical flow estimation layer: in the network structure of the PWC-Net-s low-layer pyramid, the data volume of the characteristics and the parameter quantity caused by down-sampling is small, and the error can not be converged to a lower level all the time. When such optical flow prediction data is used for training input in an upper network and the weight of information is too high, it becomes an interference term in the later stage of convergence of the network, which becomes a factor that prevents the network from converging to a lower value. Therefore, the text network adds a pyramid-based weighting coefficient of the layer number l to the input used for upper network trainingσ
Figure BDA0003154738780000062
outputl=net(inputl)+l×σ×flowl-1 (7)
As shown in fig. 1, the weight of the result obtained after upsampling the previous optical flow of the input and output is increased along with the increase of the pyramid layer number. The weight coefficient σ is calculated as follows:
σ=k×U (8)
u is the average endpoint error value of the layer 0 optical flow prediction data after training with the previous training data set, k is a constant, and k is an empirical value of 1.1. The input and output of the first data light collecting flow estimating layer are the same as those in the literature.
Large displacement optical flow estimation layer: and a large displacement estimation layer is used as a post-processing network layer of the network model, so that the estimation precision of the large displacement optical flow is further improved. The large displacement estimation layer is formed by the hole convolution, so that the pixel information acquisition range can be improved, and the grid effect caused by the hole convolution is reduced. The large-displacement optical flow estimation network formed by the cavity convolution can effectively increase the receptive field of the network on the basis of not increasing the parameter number, and improve the correlation among the remote pixels of the convolution characteristic image. The large displacement estimation layer does not use a residual error network structure, and high-frequency signals generated by a grid effect caused by cavity convolution are prevented from continuously propagating downwards.
And (3) defining theta as a set of trainable parameters in the neural network, wherein the set of trainable parameters comprises a feature pyramid layer, an optical flow estimation layer and a large displacement optical flow estimation layer. Wherein the warp layer and the cost-penalty layer do not contain trainable parameters, only the computation layer. Definition of
Figure BDA0003154738780000071
Defining the predicted optical flow for the first layer pyramid
Figure BDA0003154738780000072
Is the corresponding true optical flow value. The calculation of the loss value includes a multiscale endpoint error loss equation:
Figure BDA0003154738780000073
the real optical flow in the real world is very difficult to acquire and cannot be manually labeled by a human. Butler et al automatically produced image pairs and associated light flow graphs via a game engine, but the amount of data for optical flow learning is still too small, since the optical flow dataset is an image dataset, only 2 tens of thousands of data occupy 75G capacity. However, the data in two ten-thousandths is not sufficient to train a good model. In order to make up for the defect of non-real data of the data set, the image pair and the optical flow data thereof are subjected to the same random cutting and random rotation, and the image mirror image color enhancement and noise superposition are performed. There is a need to expand the samples of the data set using data enhancement methods in order to improve the robustness of the model and reduce the risk of over-fitting.
Improved image and object-aware random wipe based on optical flow truth values:
the random erasure enhancement strategy is to randomly select a rectangular region of the image in the training set and erase its pixels using random values in the range of 0-255. Generating a training image with an occlusion level also reduces the risk of over-fitting and makes the model somewhat robust to occlusions. A method of image and object aware random erasure (I + ORE) is used. And reading the cut label optical flow data and detecting the boundary of the object in the optical flow graph. And selecting the area position of the random erasing block of the image in the boundary, wherein the size of the area position is 0.02-0.15 times of the resolution of the image, the length-width ratio is uniformly distributed between 0.33 and 3.33, the random numbers are taken, and the random pixel values are used as masks to fill the image in the erasing area. The method can effectively shield partial key information and improve the associative learning capability of the network.
The total improvement is as follows: the optical flow estimation speed is greatly improved without reducing the optical flow estimation accuracy.
The improvement point 1 reduces the parameters of the optical flow estimation model and greatly accelerates the estimation speed of the model.
The principle of the depth separable convolution is as shown in FIG. 2, which first performs convolution on convolution layers of different channels, and then performs convolution on convolution layers of different channels by CoutEach size is 1 × 1 × CinThe output feature map still conforms to the convolution output of the conventional convolution. Kh,KwLength and width of the convolution kernel, C, respectivelyin,CoutThe number of input and output channels is respectively. FhThen the length of the feature map minus the length of the convolution kernel, FwIs the width of the feature map minus the width of the convolution kernel.
Parameter quantities of conventional convolution:
Pconv=Kh×Kw×Cin×Cout (10)
parameter quantity of depth separable convolution:
Pdepth=Kh×Kw×Cin
Ppoint=1×1×Cin×Cout
Pdsconv=Pdepth+Ppoint (11)
the parameter quantities of the depth separable convolution are linearly superimposed by the parameter quantities of the depth-wise convolution and the point-wise convolution, and the parameter quantities are obviously smaller than those of the conventional convolution.
The amount of calculation of the conventional convolution:
Cconv=Kh×Kw×Cin×Cout×Fh×Fw (12)
the amount of computation of the depth separable convolution:
Cdepth=Kh×Kw×Cin×Fh×Fw
Cpoint=Cin×Cout×Fh×Fw
Cdsconv=Cdepth+Cpoint (13)
by comparison, we can derive the reduction ratio of the calculated amount after replacing the conventional convolution with the depth separable convolution:
Figure BDA0003154738780000081
the network expression ability of this structure is demonstrated in the literature to be substantially similar to conventional convolution. In MobileNet, network decoupling using a deep separable convolutional layer reduces the amount of computation to one ninth of the original, while the recognition accuracy only drops by 1.7%.
The improvement point 2 improves the model convergence speed and the final convergence effect of the model by reducing the negative influence of the low resolution situation on the high resolution estimation.
The optical flow estimation model under low resolution always converges to a higher value, and when the error brought by the low resolution is larger than the convergence loss under high resolution, the optical flow data under low resolution will be an interference term for convergence of the optical flow model under high resolution. Reducing the weight on the high resolution in the low resolution case reduces the loss of the high resolution model and thus improves accuracy.
The improvement point 3 can effectively improve the model association estimation capability by analyzing the real optical flow value so as to erase the data of the key information of the data set.
A new data enhancement technique randomly erases, in training, a rectangular area of an image is randomly selected and its pixels are erased using random values. Generating a training image with an occlusion level reduces the risk of over-fitting and makes the model robust to occlusions. An improvement herein is to use the optical flow values to find the wipe interest points. The calculation time of the interest points is reduced, and the accuracy of selecting the interest points is improved, so that the optical flow estimation effect on the partially-shielded object is improved.
It should be noted that the random erase algorithm of the present invention is:
algorithm 1 random Erase step
Inputting:
input image I
Length H and width w of the image
Area s of the image
Probability of erasure p
Ratios S _1 and S _ h (upper and lower limits) of erased areas
Aspect ratios for erasure r _1 and r _2 (upper and lower limits)
And (3) outputting:
erased image I
Initialization p1 is a random number between (0, 1).
It is worth mentioning a processor and a computer readable storage medium storing a computer program which, when executed by the processor, implements the control method of the multifunctional smart home capable of the present invention. Because the program logic of each step is different, a special processor or a general-purpose chip can be adopted to execute the corresponding step, so that the processing efficiency of the whole program is improved, and the cost is reasonably controlled. Therefore, those skilled in the art can adaptively design and adjust the optical flow calculation according to the specific application.
In summary, the implementation principle of the invention is as follows: by building the pyramid network model, the change of optical flow transfer in the image is obtained, a new estimation method with the calculation speed superior to that of the existing calculation model is obtained by training and optimizing the model, please refer to fig. 4, and fig. 4 shows the optimized image effect.

Claims (10)

1. An efficient lightweight optical flow estimation method, characterized by comprising the steps of:
step S1: the unmanned navigation system takes the collected image pair as input and sends the input to the trained pyramid network model, wherein the unmanned navigation system obtains a control instruction of the hardware equipment through the collected image pair;
step S2: the pyramid network model comprises six layers of neural network structures, six layers of pyramids are subjected to convolution twice at the same time to obtain feature maps with six parameters which do not interfere with each other, and the trained pyramid network model outputs high-precision image optical flow estimation results at high speed;
in step S3: and determining the shielded perception object in the image according to the image optical flow estimation result, and executing and receiving a new control instruction by hardware equipment of the unmanned aerial vehicle navigation system to avoid or approach the perception object.
2. The method for efficient lightweight optical flow estimation according to claim 1, wherein in step S1, the method for training the pyramid network model comprises the following steps:
step S21: respectively carrying out downsampling on the input image pair to form a six-layer pyramid structure, wherein the six-layer pyramid structure comprises a feature input layer, a distortion layer, a cost calculation layer, a decoupling optical flow estimation layer, an optical flow estimation layer with an improved pyramid structure and a large displacement estimation layer;
step S22: carrying out common convolution operation on the image pair and sending the image pair to a characteristic input layer, wherein the former image is output to a cost calculation layer through the characteristic input layer, the latter image is output to a distortion layer through the characteristic input layer, and the output end of the distortion layer is connected with the cost calculation layer;
step S23: the output of the cost calculation layer is connected with the decoupling optical flow estimation layer, and the output end of the decoupling optical flow estimation layer is connected with the large displacement estimation layer to obtain the optical flow prediction after the large displacement change;
step S24: the output of the characteristic input layer is also directly connected with the decoupling light stream estimation layer to obtain preliminary light stream prediction;
step S25: and superposing the optical flow predictions generated in the S24 and the S25 to generate a batch of training results, updating the weight of each parameter in the model, and inputting the up-sampled optical flow in the training results into the warping layer as the up-sampled optical flow in the upper-level neural network.
3. The method as claimed in claim 2, wherein in the feature input layer, the input image is downsampled five times, each time the pixel of sampling is half of the previous layer, six layers of pyramids are convolved twice at the same time to form a feature map with six parameters that do not interfere with each other, the first convolution step is 2, and the second convolution step is 1.
4. The method as claimed in claim 2, wherein the step S22 is performed by warping the predicted optical flow of the l +1 layer in the pyramidl+1Second input feature map of I layer after 2 times of upsampling
Figure FDA0003154738770000021
Performing a light stream distortion to obtain
Figure FDA0003154738770000022
Make it closer to the first feature map, and the lowest layer l6The optical flow estimation is set to be 0, the matching cost calculation layer matches the two processed feature graphs with associated pixels, the correlation between the first graph and the distorted second graph is defined as matching cost, and the matching cost calculation formula of the feature graph pair which is subjected to similar convolution operation is as follows
Figure FDA0003154738770000023
Where T is the transpose operation and N is the column vector
Figure FDA0003154738770000024
Is determined by setting a limiting parameter d such that | x1-x2 |)D is less than or equal to d. Since the movement of one pixel of the top pyramid corresponds to the full resolution image 2L-1And d of each pyramid layer is reduced in proportion by pixel movement, wherein an image distortion layer and a matching cost calculation layer are calculation layers.
5. An efficient lightweight optical flow estimation method as claimed in claim 4, wherein the input to the optical flow estimation layer is a matching cost, the first image x1And flow of sampling result on the optical flow prediction graph of the previous layerlThe output is the optical flow prediction graph of the current layer and the partial weight of the optical flow prediction of the previous layer.
6. A computer-readable storage medium, having one or more computer programs stored thereon, which when executed by one or more processors implement the method for efficient lightweight optical flow estimation as claimed in any of claims 1 to 5.
7. An efficient lightweight optical flow estimation apparatus, comprising:
one or more processors;
a computer readable storage medium storing one or more computer programs; the one or more computer programs, when executed by the one or more processors, implement the efficient lightweight optical flow estimation method as recited in any of claims 1-5.
8. An efficient lightweight optical flow estimation system, comprising: and calculating an optical flow estimation value in the image through a pyramid network model, wherein the pyramid network model comprises a feature input layer, a distortion layer, a cost calculation layer, a decoupling optical flow estimation layer, an optical flow estimation layer with an improved pyramid structure and a large displacement estimation layer, and the pyramid network model is established based on PWC-Net.
9. The system of claim 8, wherein the pyramid network model is trained by applying a random erasure enhancement strategy to randomly select rectangular regions of images in the training set, and erasing their pixels using random values from 0 to 255.
10. Use of the efficient lightweight optical flow estimation method according to claims 1-5 for autonomous navigation and tracking of unmanned vehicles, drones.
CN202110773332.3A 2021-07-08 2021-07-08 Efficient lightweight optical flow estimation method, storage medium and device Active CN113538527B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110773332.3A CN113538527B (en) 2021-07-08 2021-07-08 Efficient lightweight optical flow estimation method, storage medium and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110773332.3A CN113538527B (en) 2021-07-08 2021-07-08 Efficient lightweight optical flow estimation method, storage medium and device

Publications (2)

Publication Number Publication Date
CN113538527A true CN113538527A (en) 2021-10-22
CN113538527B CN113538527B (en) 2023-09-26

Family

ID=78127164

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110773332.3A Active CN113538527B (en) 2021-07-08 2021-07-08 Efficient lightweight optical flow estimation method, storage medium and device

Country Status (1)

Country Link
CN (1) CN113538527B (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114092337A (en) * 2022-01-19 2022-02-25 苏州浪潮智能科技有限公司 Method and device for super-resolution amplification of image at any scale
CN114581493A (en) * 2022-03-04 2022-06-03 三星电子(中国)研发中心 Bidirectional optical flow estimation method and device

Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109871789A (en) * 2019-01-30 2019-06-11 电子科技大学 Vehicle checking method under a kind of complex environment based on lightweight neural network
CN110378348A (en) * 2019-07-11 2019-10-25 北京悉见科技有限公司 Instance of video dividing method, equipment and computer readable storage medium
CN111144465A (en) * 2019-12-17 2020-05-12 上海工程技术大学 Multi-scene-oriented smoke detection algorithm and electronic equipment applying same
CN111275746A (en) * 2020-01-19 2020-06-12 浙江大学 Dense optical flow computing system and method based on FPGA
WO2020150264A1 (en) * 2019-01-15 2020-07-23 Portland State University Feature pyramid warping for video frame interpolation
CN111582483A (en) * 2020-05-14 2020-08-25 哈尔滨工程大学 Unsupervised learning optical flow estimation method based on space and channel combined attention mechanism
CN111626308A (en) * 2020-04-22 2020-09-04 上海交通大学 Real-time optical flow estimation method based on lightweight convolutional neural network
US20200394752A1 (en) * 2018-02-27 2020-12-17 Portland State University Context-aware synthesis for video frame interpolation
CN112288630A (en) * 2020-10-27 2021-01-29 武汉大学 Super-resolution image reconstruction method and system based on improved wide-depth neural network
CN113052885A (en) * 2021-03-29 2021-06-29 中国海洋大学 Underwater environment safety assessment method based on optical flow and depth estimation

Patent Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20200394752A1 (en) * 2018-02-27 2020-12-17 Portland State University Context-aware synthesis for video frame interpolation
WO2020150264A1 (en) * 2019-01-15 2020-07-23 Portland State University Feature pyramid warping for video frame interpolation
CN109871789A (en) * 2019-01-30 2019-06-11 电子科技大学 Vehicle checking method under a kind of complex environment based on lightweight neural network
CN110378348A (en) * 2019-07-11 2019-10-25 北京悉见科技有限公司 Instance of video dividing method, equipment and computer readable storage medium
CN111144465A (en) * 2019-12-17 2020-05-12 上海工程技术大学 Multi-scene-oriented smoke detection algorithm and electronic equipment applying same
CN111275746A (en) * 2020-01-19 2020-06-12 浙江大学 Dense optical flow computing system and method based on FPGA
CN111626308A (en) * 2020-04-22 2020-09-04 上海交通大学 Real-time optical flow estimation method based on lightweight convolutional neural network
CN111582483A (en) * 2020-05-14 2020-08-25 哈尔滨工程大学 Unsupervised learning optical flow estimation method based on space and channel combined attention mechanism
CN112288630A (en) * 2020-10-27 2021-01-29 武汉大学 Super-resolution image reconstruction method and system based on improved wide-depth neural network
CN113052885A (en) * 2021-03-29 2021-06-29 中国海洋大学 Underwater environment safety assessment method based on optical flow and depth estimation

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
A. G. HOWARD 等: "MobileNets: efficient convolutional neural networks for mobile vision applications", 《ARXIV平台: HTTPS://ARXIV.ORG/ABS/1704.04861》, pages 1 - 9 *
DEQING SUN 等: "PWC-Net: CNNs for Optical Flow Using Pyramid, Warping, and Cost Volume", 《2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION》, pages 8934 - 8943 *
胡毅轩 等: "基于PWC-Net的多层权值和轻量化改进光流估计算法", 《计算机应用研究》, vol. 39, no. 1, pages 291 - 295 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114092337A (en) * 2022-01-19 2022-02-25 苏州浪潮智能科技有限公司 Method and device for super-resolution amplification of image at any scale
CN114581493A (en) * 2022-03-04 2022-06-03 三星电子(中国)研发中心 Bidirectional optical flow estimation method and device

Also Published As

Publication number Publication date
CN113538527B (en) 2023-09-26

Similar Documents

Publication Publication Date Title
CN110335290B (en) Twin candidate region generation network target tracking method based on attention mechanism
US11501415B2 (en) Method and system for high-resolution image inpainting
CN109377530B (en) Binocular depth estimation method based on depth neural network
CN112052886A (en) Human body action attitude intelligent estimation method and device based on convolutional neural network
CN111508013B (en) Stereo matching method
CN113538527B (en) Efficient lightweight optical flow estimation method, storage medium and device
US11915383B2 (en) Methods and systems for high definition image manipulation with neural networks
CN111161306A (en) Video target segmentation method based on motion attention
CN113283525A (en) Image matching method based on deep learning
CN113963117B (en) Multi-view three-dimensional reconstruction method and device based on variable convolution depth network
CN113554039B (en) Method and system for generating optical flow graph of dynamic image based on multi-attention machine system
CN112465872B (en) Image sequence optical flow estimation method based on learnable occlusion mask and secondary deformation optimization
CN112785636A (en) Multi-scale enhanced monocular depth estimation method
CN115239581A (en) Image processing method and related device
US20220215617A1 (en) Viewpoint image processing method and related device
Yang et al. UGC-YOLO: underwater environment object detection based on YOLO with a global context block
CN112115786A (en) Monocular vision odometer method based on attention U-net
Li et al. Monocular 3-D Object Detection Based on Depth-Guided Local Convolution for Smart Payment in D2D Systems
CN115115860A (en) Image feature point detection matching network based on deep learning
CN114066750B (en) Self-encoder deblurring method based on domain transformation
CN115482280A (en) Visual positioning method based on adaptive histogram equalization
CN114494339A (en) Unmanned aerial vehicle target tracking method based on DAMDNet-EKF algorithm
Ghosh et al. Two-stage cross-fusion network for stereo event-based depth estimation
CN117275069B (en) End-to-end head gesture estimation method based on learnable vector and attention mechanism
CN112288738B (en) Single image snowflake removing method and device based on deep learning and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant