CN113538527B - Efficient lightweight optical flow estimation method, storage medium and device - Google Patents

Efficient lightweight optical flow estimation method, storage medium and device Download PDF

Info

Publication number
CN113538527B
CN113538527B CN202110773332.3A CN202110773332A CN113538527B CN 113538527 B CN113538527 B CN 113538527B CN 202110773332 A CN202110773332 A CN 202110773332A CN 113538527 B CN113538527 B CN 113538527B
Authority
CN
China
Prior art keywords
layer
optical flow
estimation
image
pyramid
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110773332.3A
Other languages
Chinese (zh)
Other versions
CN113538527A (en
Inventor
吴飞
胡毅轩
熊玉洁
朱海
张玉金
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai University of Engineering Science
Original Assignee
Shanghai University of Engineering Science
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai University of Engineering Science filed Critical Shanghai University of Engineering Science
Priority to CN202110773332.3A priority Critical patent/CN113538527B/en
Publication of CN113538527A publication Critical patent/CN113538527A/en
Application granted granted Critical
Publication of CN113538527B publication Critical patent/CN113538527B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/20Analysis of motion
    • G06T7/269Analysis of motion using gradient-based methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/20Analysis of motion
    • G06T7/246Analysis of motion using feature-based methods, e.g. the tracking of corners or segments
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20016Hierarchical, coarse-to-fine, multiscale or multiresolution image processing; Pyramid transform
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Biophysics (AREA)
  • Evolutionary Computation (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Biomedical Technology (AREA)
  • Multimedia (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Image Analysis (AREA)

Abstract

The application relates to the technical field of image processing and aims at providing a high-efficiency lightweight optical flow estimation method, which adopts a six-layer pyramid network structure, respectively performs downsampling on input images to form a six-layer pyramid structure, and specifically comprises a characteristic input layer, a distortion layer, a cost calculation layer, a decoupling optical flow estimation layer, an optical flow estimation layer with improved pyramid structure and a large displacement estimation layer, wherein the output of the cost calculation layer is connected with the decoupling optical flow estimation layer, the output end of the decoupling optical flow estimation layer is connected with the large displacement estimation layer to obtain optical flow prediction after large displacement change, the output of the characteristic input layer is directly connected with the decoupling optical flow estimation layer to obtain preliminary optical flow prediction, the optical flow estimation operation performance is greatly improved while the accuracy of each parameter in a model is improved when the weight of an object is updated, and meanwhile the accuracy of the object is reduced when the object is provided with a shielding object.

Description

Efficient lightweight optical flow estimation method, storage medium and device
Technical Field
The present application relates to the field of image processing technologies, and in particular, to a method, a storage medium, and an apparatus for efficient lightweight optical flow estimation.
Background
Optical flow, as the name suggests, the flow of light. Such as a meteor drawn in the night sky as perceived by the human eye. In computer vision, movement of an object in an image is defined, which movement may be caused by camera movement or object movement. Specifically, the amount of movement of a pixel representing the same object (object) in one frame of a video image to the next frame is represented by a two-dimensional vector. Dense optical flow describes the optical flow of each pixel of an image moving to the next frame.
Optical flow estimation is a classical problem in computer vision. It is widely used in many fields such as motion tracking, motion recognition, video segmentation, three-dimensional reconstruction, video repair, etc. In 1981, horns and Schunck proposed the basic conservation assumption of optical flow and the corresponding optical flow estimation method for the first time, and then they proposed an energy minimization method using an energy function to couple luminance invariance and spatial smoothness, which is the calculation method with highest accuracy at that time. The method has great calculation amount and can not meet the requirement of real-time application. The warp-based optical flow estimation method was then deduced theoretically by Brox et al. Sun et al have improved on the methods and models of horns and Schunck and have proposed a non-local term to recover motion details. The calculation method of the FlowFields based on the local matching has higher calculation accuracy, but the calculation amount is still larger. The variational method is still a popular optical flow calculation method nowadays, however, the variational method needs to solve the complex optimization problem and does not meet the basic requirement of real-time programs.
With the advancement and development of deep learning technology, many conventional image problems employ convolutional neural networks (Convolutional Neural Networks, CNNs). There are many correlation algorithms that use CNNs in optical flow estimation. In supervised learning, the FlowNet of Dosovitskiy et al is based on a supervised learning optical flow estimation model, a U-Net loop network architecture is applied to optical flow estimation in an original way, an Encoder-Decoder network architecture is used, the cost of features between image pairs is calculated, a relation is generated between two frames, and the feasibility of directly estimating the optical flow of an image sequence by a convolutional neural network is proved. In order to solve the problems of low FlowNet accuracy and inaccurate small-displacement prediction, flowNet2 in 2017 adopts a FlowNet C and FlowNet S model stacking method to solve the problems, so that the optical flow accuracy is greatly improved, but the model needs 640MB of storage space, the running speed is not quite fast, and the method is not suitable for being used at a mobile terminal and embedded equipment. The Ranjan and the Black combine the thought of a classical spatial pyramid with a convolutional neural network, a SpyNet network model is provided, and model parameters are remarkably reduced, but the SpyNet network structure is single, so that the operation speed is high, but the estimation accuracy is low. In 2018 Sun et al proposed a PWC-Net network structure, the input of which adopts a pyramid structure, which improves the confidence of the input feature map, and redesigns the method of image pairs to form the optical flow feature map according to the warp optical flow estimation theory. Finally, a cavity convolution estimation network is adopted, so that small-displacement optical flow estimation has good estimation precision on the network. While reducing time consumption, PWC-Net also improves accuracy over SpyNet. In 2019, VCNs of Yang and Deva proposed different image pair matching methods, which also have good accuracy. The literature corrects the front and back frame matching of the shielding area to solve the problem of estimating the optical flow of the partial shielding area. IRR-PWC is an improvement of PWC-Net, and the accuracy of optical flow estimation is improved mainly by fusing information of several frames before and after an iterative thought, but the estimation speed is relatively slow.
The unmanned system is always focused as a recent popular research direction, and when unmanned vehicles and unmanned planes are used for autonomous navigation and target tracking, the reliability of the unmanned system during autonomous operation can be greatly improved if the optical flow estimation data can be used, and the existing optical flow estimation method can greatly improve the speed as far as the traditional calculation method is concerned, but still cannot meet the use requirement of the deep learning edge calculation hardware equipment in the unmanned system environment.
Disclosure of Invention
The application aims to provide a high-efficiency lightweight optical flow estimation method, a storage medium and a device, wherein the existing traditional optical flow method is accurate but has huge calculated amount, and an optical flow estimation neural network model still cannot meet the real-time use requirement of embedded equipment or mobile terminal equipment although the calculated amount is smaller than that of the traditional optical flow calculation method. In addition, the problem of reduced estimation accuracy when the object is blocked exists in the existing optical flow estimation method, the optical flow estimation method greatly improves the optical flow estimation operation performance while maintaining the same estimation accuracy, and improves the condition of reduced accuracy when the object is blocked;
the technical scheme adopted by the application is as follows: in one aspect, a method for efficient lightweight optical flow estimation includes the steps of:
step S1: the unmanned navigation system sends the acquired image pair as input to a trained pyramid network model, wherein the unmanned navigation system obtains a control instruction of hardware equipment through the acquired image pair;
step S2: the pyramid network model comprises six layers of neural network structures, and six layers of pyramids are convolved simultaneously for two times to obtain six feature graphs with parameters which are not interfered with each other, and the trained pyramid network model outputs high-precision image optical flow estimation results at high speed;
in step S3: and executing and receiving a new control instruction by the hardware equipment of the unmanned aerial vehicle navigation system.
Preferably, in the step S1, the training method of the pyramid network model includes the following steps:
step S21: respectively downsampling the input images to form a six-layer pyramid structure, wherein the six-layer pyramid structure comprises a characteristic input layer, a distortion layer, a cost calculation layer, a decoupling optical flow estimation layer, an optical flow estimation layer with improved pyramid structure and a large displacement estimation layer;
step S22: the method comprises the steps that an image pair is subjected to common convolution operation and sent to a feature input layer, the former image is output to a cost calculation layer through the feature input layer, the latter image is output to a distortion layer through the feature input layer, and the output end of the distortion layer is connected with the cost calculation layer;
step S23: the output of the cost calculation layer is connected with the decoupling optical flow estimation layer, and the output end of the decoupling optical flow estimation layer is connected with the large displacement estimation layer to obtain optical flow prediction after large displacement change;
step S24: the output of the characteristic input layer is directly connected with the decoupling optical flow estimation layer to obtain primary optical flow prediction;
step S25: and (3) superposing the optical flow predictions generated in S24 and S25 to generate a batch of training results, updating weights of all parameters in the model, and inputting the up-sampled optical flow in the training results into the warping layer as the up-sampled optical flow in the upper neural network.
Preferably, in the feature input layer, the input image is downsampled for five times, the pixels sampled each time are half of the pixels of the previous layer, and the six-layer pyramid is convolved for two times at the same time to form a feature map with six parameters which do not interfere with each other, wherein the first convolution step length is 2, and the second convolution step length is 1.
Preferably, in the step S22, the optical flow predicted by the l+1 layer in the pyramid is flowed in the warp layer l+1 2 times up sampling and the second input characteristic diagram of layer IPerforming optical flow distortion to obtain->To make it more similar to the first feature map, and the lowest layer l 6 The optical flow estimation of (2) is set to 0, the matching cost calculation layer takes the processed two feature images as the matching of the associated pixels, the correlation between the first image and the distorted second image is defined as the matching cost, and the feature image pair takes a similar convolution operation matching cost calculation formula as follows
Where T is the transpose operation and N is the column vectorSetting a limiting parameter d such that |x1-x2| D is less than or equal to d. Since the movement of one pixel of the top pyramid corresponds to the full resolution image 2 L-1 The pixel movement, d of each layer of pyramid is scaled down, wherein the image warping layer and the matching cost price calculating layer are both calculating layers.
Preferably, the input to the optical flow estimation layer is a cost of matching the cost, the first image x 1 And the up-sampling result flow of the optical flow prediction graph of the upper layer l The output is the current layer's optical flow prediction map and the partial weights of the previous layer's optical flow predictions.
In another aspect, a computer readable storage medium has one or more computer programs stored thereon, which when executed by one or more processors implement the efficient lightweight optical flow estimation method as described above.
In another aspect, an efficient lightweight optical flow estimation device includes:
one or more processors;
a computer readable storage medium storing one or more computer programs; the one or more computer programs, when executed by the one or more processors, implement the efficient lightweight optical flow estimation method as described above.
In another aspect, a high efficiency lightweight optical flow estimation system, comprising: and calculating an optical flow estimated value in the image through a pyramid network model, wherein the pyramid network model comprises a feature input layer, a distortion layer, a cost calculation layer, a decoupling optical flow estimated layer, an optical flow estimated layer with improved pyramid structure and a large displacement estimated layer, and the pyramid network model is built based on PWC-Net.
Preferably, the pyramid network model randomly selects rectangular areas of the image in the training set by applying a random erasure enhancement strategy during training, and erases pixels of the rectangular areas by using random values in the range of 0 to 255.
Preferably, the high-efficiency lightweight optical flow estimation method is applied to unmanned vehicles and autonomous navigation and tracking purposes of unmanned vehicles.
Compared with the prior art, the application has the beneficial effects that:
1. parameters of an optical flow estimation model are reduced, and the estimation speed of the model is greatly increased;
2. by reducing the negative influence of the low-resolution condition on the high-resolution estimation, the model convergence speed and the final model convergence effect are improved;
3. the data set key information is erased through analyzing the real light current value, so that the model association estimation capability can be effectively improved.
Drawings
FIG. 1 is a schematic diagram of the operation of the present application;
FIG. 2 is a schematic diagram of the operation of the depth separable convolution structure of the present application;
FIG. 3 is a schematic diagram of the principle of depth separation in one embodiment of the application;
FIG. 4 is a model predictive effect graph in accordance with one embodiment of the application.
Detailed Description
The following description of the embodiments of the present application will be made clearly and fully with reference to fig. 1 of the drawings, it being apparent that the embodiments described are only some, but not all, embodiments of the present application. Based on the embodiments of the present application, one of ordinary skill in the art would obtain all other implementations that may be obtained without undue burden.
In the description of the present application, it should be understood that the terms "counterclockwise," "clockwise," "longitudinal," "transverse," "upper," "lower," "front," "rear," "left," "right," "vertical," "horizontal," "top," "bottom," "inner," "outer," and the like indicate orientations or positional relationships based on the orientation or positional relationships shown in the drawings, are merely for convenience in describing the present application, and do not indicate or imply that the devices or elements referred to must have a specific orientation, be configured and operated in a specific orientation, and therefore should not be construed as limiting the present application.
Example 1:
a high-efficiency lightweight optical flow estimation method totally adopts a six-layer pyramid network structure, six-layer downsampled results are input into six same neural network structures, and the final output result of the next layer is input into the cost estimation layer of the last layer.
Firstly, respectively downsampling the input images to form six-layer pyramid structures, and then respectively carrying out common convolution operation on the six-layer pyramid structures of the image pairs to form an input feature map. And secondly, the later input picture of the image pair is distorted (warp ping), and partial distortion caused by deformation caused by partial shooting and other hardware is compensated, so that the input becomes more accurate and smooth. And thirdly, calculating cost quantity for each layer of the pyramid input feature map, and connecting the two feature input maps. And fourthly, performing depth separable convolution on the feature map obtained in the last step, and then upsampling to generate a preliminary optical flow estimation map. And fifthly, carrying the characteristic map of the last step into a context network to predict large displacement change of the optical flow. And finally, superposing the optical flow predictions generated in the fourth step and the fifth step to generate a batch of training results, comparing the training results with the real optical flow labels, and reversely propagating the generated loss values for updating the weights of the parameters in the model.
Feature input layer: the PWC downsamples the input image five times, with each sampled pixel being half of the previous layer, thereby forming a six-layer pyramid structure. And carrying out convolution on the six layers of pyramids twice simultaneously to form a characteristic diagram with six parameters not interfering with each other, wherein the step length of the first convolution is 2, and the second convolution is 1. The convolution feature map is used as an input layer to help improve the confidence in the calculation of the network structure below.
Image distortion calculation layer: the image distortion layer predicts the optical flow of the pyramid l+1 layer l+1 2 times up sampling and the second input characteristic diagram of layer IPerforming optical flow distortion to obtain->To make it more similar to the first feature map, and the lowest layer l 6 Is set to 0. The algorithm can improve certain geometric distortion, overcome the influence of large displacement movement and shielding on optical flow estimation, and enable input to be smoother.
Matching cost calculation layer: the matching cost calculation layer makes the two processed feature images as the matching of the associated pixels. The PWC-Net provides a new matching cost calculation method, and the correlation between the first graph and the distorted second graph is defined as the matching cost. The matching cost calculation formula of the feature map pair for the class convolution operation is as follows:
in equation (1), T is the transpose operation and N is the column vectorIs a number of (3). In order to avoid too large a calculation, the calculation method sets the limiting parameter d such that |x1-x2| D is less than or equal to d. Since the movement of one pixel of the top pyramid corresponds to the full resolution image 2 L-1 The d of each layer of pyramid will scale down with the movement of the pixel. The image distortion layer and the matching cost calculation layer are both calculation layers, training weight parameters are not needed, and the size and the parameter number of the model are reduced.
Decoupling improved optical flow estimation layer: the input to the optical flow estimation layer is the matching cost corr, the first image x 1 And the up-sampling result flow of the optical flow prediction graph of the upper layer l The output is the current layer's optical flow prediction map and the partial weights of the previous layer's optical flow predictions. The infrastructure of this layer references the substructure PWC-Net-s of PWC-Net, which reduces the residual structure in conventional structures. And the conventional convolution in the original structure is replaced by 5 depth-separable convolution structures, each depth-separable convolution structure reference is composed of a depth-separable convolution layer, a BN layer and a LeakyReLU layer, and the leading BN layer and the ReLU layer can transfer the gradient of the deep layer to any shallow layer during back propagation, so the depth-separable principle is shown in figure 3. No gradient vanishes regardless of the smaller parameters. Decoupling of conventional convolution layers into depth-separable convolution layers can greatly reduce the parameters of the model while maintaining the expressive power of the optical flow estimation layer. Depth separable convolutional layers in the network are shown in the following figures.
Optical flow with improved pyramid structureEstimation layer: the network structure of the low-layer pyramid in the PWC-Net-s has less data quantity of characteristic and parameter quantity caused by downsampling, and the error can not be converged to a lower level all the time. If such optical flow prediction data is used for training input in an upper network and the weight of the occupied information is too high, the data becomes an interference term in the later stage of convergence of the network, and causes the network to be prevented from converging to a lower value. The network adds a weighting factor for the number of pyramid-based layers/to the input for upper layer network training σ
output l =net(input l )+l×σ×flow l-1 (7)
As shown in fig. 1, the weight of the result after the up-sampling of the optical flow of the last layer of input and output increases with the increase of the pyramid layer number. Weight coefficient σ The calculation formula of (2) is as follows:
σ=k×U (8)
u is the average endpoint error value of the layer 0 optical flow prediction data trained by the previous training data set, k is a constant, and k is an empirical value of 1.1. The input and output of the first dataset optical flow estimation layer is the same as in the literature.
Large displacement optical flow estimation layer: the large displacement estimation layer is used as a post-processing network layer of the network model, so that the estimation accuracy of the large displacement optical flow is further improved. The large displacement estimation layer is formed by cavity convolution, so that the pixel information acquisition range can be improved, and grid effect caused by the cavity convolution is reduced. The large-displacement optical flow estimation network formed by using the cavity convolution can effectively increase the receptive field of the network on the basis of not increasing the parameter quantity, and the correlation between the long-distance pixels of the convolution characteristic diagram is improved. The large displacement estimation layer does not use a residual network structure, and high-frequency signals generated by grid effect caused by cavity convolution are prevented from continuously propagating downwards.
Training loss defining theta as the set of trainable parameters in the neural network including a feature pyramid layer,Optical flow estimation, large displacement optical flow estimation layer. Wherein the warp layer and the cost layer do not contain trainable parameters, but are only calculation layers. Definition of the definitionFor the optical flow predicted by the layer I pyramid, define +.>Is the corresponding real light value. The calculation of the loss value includes a multi-scale endpoint error loss equation:
the real optical flow in the real world is very difficult to obtain and cannot be manually noted by hand. Butler et al automatically produced image pairs and associated light flow graphs through the game engine, but still had too little data for light flow learning, since the light flow dataset was an image dataset, only 2 tens of thousands of data occupied 75G capacity. But two parts per million data is not enough to train a good model. In order to make up for the defect of the data set of unreal data, the image pair and the optical flow data thereof are subjected to the same random clipping and random rotation, and the image mirror image color is enhanced and the noise is superposed. This requires expanding the samples of the data set using data enhancement methods in order to increase model robustness and reduce the risk of overfitting.
Improved image and object aware random erasure based on optical flow realism:
the random erase enhancement strategy is to randomly select a rectangular region of the image in the training set and erase its pixels using a random value in the range of 0-255. Generating training images with occlusion levels also reduces the risk of overfitting and makes the model somewhat robust to occlusion. Methods of image and object aware random erasure (i+ore) are used. And reading the cut label optical flow data, and detecting the boundary of the object in the optical flow graph. And selecting the area position of the image random erasing block in the boundary, wherein the size of the area position is a random number according to 0.02-0.15 times of the resolution of the image, the aspect ratio is uniformly distributed between 0.33 and 3.33, and the random pixel value is used as a mask to fill the image of the erasing area. The method can effectively shade part of key information and improve the associative learning capacity of the network.
The total improvement is as follows: the optical flow estimation speed is greatly improved without reducing the optical flow estimation accuracy.
The improvement point 1 reduces the parameters of the optical flow estimation model and greatly accelerates the estimation speed of the model.
The principle of the depth separable convolution is as shown in the figure 2, the convolution layers of different channels are respectively convolved, and then C is used out The size is 1 multiplied by C in Is convolved, and the output feature map still conforms to the convolution output of the conventional convolution. K (K) h ,K w Length and width of convolution kernel, C in ,C out The number of channels is input and output respectively. F (F) h Then the length of the feature map minus the length of the convolution kernel, F w Is the width of the feature map minus the width of the convolution kernel.
Parameter amount of conventional convolution:
P conv =K h ×K w ×C in ×C out (10)
parameter amount of depth separable convolution:
P depth =K h ×K w ×C in
P point =1×1×C in ×C out
P dsconv =P depth +P point (11)
the parameter quantity of the depth separable convolution is linearly overlapped by the parameter quantity of the depth-by-depth convolution and the point-by-point convolution, and the parameter quantity is obviously smaller than that of the conventional convolution.
Calculation amount of conventional convolution:
C conv =K h ×K w ×C in ×C out ×F h ×F w (12)
calculated amount of depth separable convolution:
C depth =K h ×K w ×C in ×F h ×F w
C point =C in ×C out ×F h ×F w
C dsconv =C depth +C point (13)
by comparing we can get the reduction ratio of the calculated amount after replacing the conventional convolution with the depth separable convolution:
the network expression capacity of this structure has been demonstrated in the literature to be substantially similar to conventional convolution. In MobileNet, the use of depth separable convolutional layers for network decoupling reduces its computational effort to one-ninth of the original, while the recognition accuracy is only reduced by 1.7%.
The improvement point 2 increases the model convergence rate and the final convergence effect of the model by reducing the negative impact of the low resolution case on the high resolution estimation.
The optical flow estimation model under low resolution always converges to a higher value, and when the error brought by the low resolution is larger than the convergence loss under high resolution, the optical flow data under low resolution becomes an interference item for the convergence of the optical flow model under high resolution. Reducing the weight on the high resolution effect in the case of low resolution can reduce the loss of the high resolution model and thus improve accuracy.
The improvement point 3 can effectively improve the model association estimation capability by analyzing the real light current value so as to erase the data of the data set key information.
A new data enhancement technique randomly erases, trains, randomly selects rectangular areas of an image, and erases its pixels using random values. Generating a training image with occlusion levels reduces the risk of overfitting and makes the model robust to occlusion. The improvement here is to use the light flow value to find the erasure point of interest. The calculation time of the interest points is reduced, the precision of the interest point selection is improved, and therefore the optical flow estimation effect on the partially-shielded object is improved.
It should be noted that the random erasure algorithm of the present application is:
algorithm 1 random Erasing step
Input:
input image I
Length H and width w of image
Area s of image
Probability of erasure p
Ratios s_1 and s_h (upper and lower limits) of the erase regions
Aspect ratios r_1 and r_2 (upper and lower limits) for erasure
And (3) outputting:
erased image I ×
Initialization p1 is a random number between (0, 1).
It should be noted that, the processor and the computer readable storage medium store a computer program, and the computer program realizes the control method of the multifunctional smart home capable of the present application when being executed by the processor. Wherein, because the program logic of each step is different, a special processor or a general-purpose chip can be adopted to execute the corresponding step, so as to improve the processing efficiency of the whole program and reasonably control the cost. Therefore, those skilled in the art can adaptively design and adjust the optical flow calculation according to the specific application.
In summary, the implementation principle of the application is as follows: by constructing a pyramid network model, the change of optical flow transfer in an image is obtained, and a new estimation method with calculation speed superior to that of the existing calculation model is obtained through training and optimizing the model, and referring to fig. 4, fig. 4 is an optimized image effect.

Claims (6)

1. The efficient lightweight optical flow estimation method is characterized by comprising the following steps of:
step S1: the unmanned navigation system sends the acquired image pair as input to a trained pyramid network model, wherein the unmanned navigation system obtains a control instruction of hardware equipment through the acquired image pair;
step S2: the pyramid network model comprises six layers of neural network structures, and six layers of pyramids are convolved simultaneously for two times to obtain six feature graphs with parameters which are not interfered with each other, and the trained pyramid network model outputs high-precision image optical flow estimation results at high speed;
in step S3: determining a perceived object blocked in an image according to an image optical flow estimation result, and executing and receiving a new control instruction by hardware equipment of the unmanned aerial vehicle navigation system to avoid or approach the perceived object;
in the step S1, the training method of the pyramid network model includes the following steps:
step S21: respectively downsampling the input images to form a six-layer pyramid structure, wherein the six-layer pyramid structure comprises a characteristic input layer, a distortion layer, a cost calculation layer, a decoupling optical flow estimation layer, an optical flow estimation layer with improved pyramid structure and a large displacement estimation layer, and a pyramid network model is built based on PWC-Net;
step S22: the method comprises the steps that an image pair is subjected to common convolution operation and sent to a feature input layer, the former image is output to a cost calculation layer through the feature input layer, the latter image is output to a distortion layer through the feature input layer, and the output end of the distortion layer is connected with the cost calculation layer;
step S23: the output of the cost calculation layer is connected with the decoupling optical flow estimation layer, and the output end of the decoupling optical flow estimation layer is connected with the large displacement estimation layer to obtain optical flow prediction after large displacement change;
step S24: the output of the characteristic input layer is directly connected with the decoupling optical flow estimation layer to obtain primary optical flow prediction;
step S25: superposing the optical flow predictions generated in S23 and S24 to generate a batch of training results, updating weights of all parameters in the model, and inputting the up-sampled optical flow in the training results into a distortion layer as the up-sampled optical flow in an upper neural network;
calculating an optical flow estimated value in the image through a pyramid network model;
when the pyramid network model is trained, a random erasure enhancement strategy is applied to randomly select a rectangular region of an image in a training set, and pixels of the rectangular region are erased by using random values in 0-255;
decoupling optical flow estimation layer: the input to the optical flow estimation layer is the cost of matching c o rr, the first image x 1 And the up-sampling result flow of the optical flow prediction graph of the upper layer l The output is the current layer optical flow prediction graph and the partial weight of the previous layer optical flow prediction, the infrastructure of the layer refers to the substructure PWC-Net-s of PWC-Net;
optical flow estimation layer with improved pyramid structure: input for upper layer network training, adding a weight coefficient of layer number l based on pyramid σ
output l =net(input l )+l×σ×flow l-1
The weight of the result after the up-sampling of the last optical flow of the input and output is increased along with the increase of the pyramid layer number, and the weight coefficient σ The calculation formula of (2) is as follows:
σ=k×U
u is the average endpoint error value of the layer 0 optical flow prediction data after the training of the last training data set, k is a constant, and k is an empirical value of 1.1;
large displacement estimation layer: the large displacement estimation layer is used as a post-processing network layer of the network model, the estimation precision of the large displacement optical flow is improved, the large displacement estimation layer is formed by hole convolution, the pixel information acquisition range is improved, the grid effect caused by the hole convolution is reduced, the large displacement estimation layer is formed by hole convolution, the receptive field of the network can be increased on the basis of not increasing the parameter, and the correlation between the long-distance pixels of the convolution characteristic diagram is improved.
2. The method for estimating light-weight optical flow according to claim 1, wherein in the feature input layer, the input image is downsampled five times, each sampled pixel is half of the previous layer, and six layers of pyramids are simultaneously convolved twice to form a feature map with six parameters which do not interfere with each other, wherein the first convolution step length is 2, and the second convolution step length is 1.
3. The method according to claim 2, wherein in the step S22, the l+1 layer predicted optical flow in the pyramid is distorted l+1 2 times up sampling and the second input characteristic diagram of layer IPerforming optical flow distortion to obtain->To make it more similar to the first feature map, and the lowest layer l 6 The optical flow estimation of (2) is set to 0, the matching cost calculation layer takes the processed two feature images as the matching of the associated pixels, the correlation between the first image and the distorted second image is defined as the matching cost, and the feature image pair takes a similar convolution operation matching cost calculation formula as follows
Where T is the transpose operation and N is the column vectorSetting a limiting parameter d such that |x 1 -x 2 | D, due to the movement of one pixel of the top pyramid corresponding to the full resolution image 2 L-1 The pixel movement, d of each pyramid layer is scaled down, wherein the image distortion layer and the matching cost price calculation layer are both calculation layers。
4. The method for estimating the light-weight optical flow with high efficiency according to any one of claims 1 to 3, which is characterized in that the method is applied to autonomous navigation and tracking of unmanned vehicles and unmanned vehicles.
5. A computer-readable storage medium, having stored thereon one or more computer programs, which when executed by one or more processors, implement the efficient lightweight optical flow estimation method of any of claims 1-3.
6. An efficient lightweight optical flow estimation device, comprising:
one or more processors;
a computer readable storage medium storing one or more computer programs; the one or more computer programs, when executed by the one or more processors, implement the efficient lightweight optical flow estimation method of any of claims 1-3.
CN202110773332.3A 2021-07-08 2021-07-08 Efficient lightweight optical flow estimation method, storage medium and device Active CN113538527B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110773332.3A CN113538527B (en) 2021-07-08 2021-07-08 Efficient lightweight optical flow estimation method, storage medium and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110773332.3A CN113538527B (en) 2021-07-08 2021-07-08 Efficient lightweight optical flow estimation method, storage medium and device

Publications (2)

Publication Number Publication Date
CN113538527A CN113538527A (en) 2021-10-22
CN113538527B true CN113538527B (en) 2023-09-26

Family

ID=78127164

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110773332.3A Active CN113538527B (en) 2021-07-08 2021-07-08 Efficient lightweight optical flow estimation method, storage medium and device

Country Status (1)

Country Link
CN (1) CN113538527B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114092337B (en) * 2022-01-19 2022-04-22 苏州浪潮智能科技有限公司 Method and device for super-resolution amplification of image at any scale
CN114581493A (en) * 2022-03-04 2022-06-03 三星电子(中国)研发中心 Bidirectional optical flow estimation method and device

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109871789A (en) * 2019-01-30 2019-06-11 电子科技大学 Vehicle checking method under a kind of complex environment based on lightweight neural network
CN110378348A (en) * 2019-07-11 2019-10-25 北京悉见科技有限公司 Instance of video dividing method, equipment and computer readable storage medium
CN111144465A (en) * 2019-12-17 2020-05-12 上海工程技术大学 Multi-scene-oriented smoke detection algorithm and electronic equipment applying same
CN111275746A (en) * 2020-01-19 2020-06-12 浙江大学 Dense optical flow computing system and method based on FPGA
WO2020150264A1 (en) * 2019-01-15 2020-07-23 Portland State University Feature pyramid warping for video frame interpolation
CN111582483A (en) * 2020-05-14 2020-08-25 哈尔滨工程大学 Unsupervised learning optical flow estimation method based on space and channel combined attention mechanism
CN111626308A (en) * 2020-04-22 2020-09-04 上海交通大学 Real-time optical flow estimation method based on lightweight convolutional neural network
CN112288630A (en) * 2020-10-27 2021-01-29 武汉大学 Super-resolution image reconstruction method and system based on improved wide-depth neural network
CN113052885A (en) * 2021-03-29 2021-06-29 中国海洋大学 Underwater environment safety assessment method based on optical flow and depth estimation

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11475536B2 (en) * 2018-02-27 2022-10-18 Portland State University Context-aware synthesis for video frame interpolation

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2020150264A1 (en) * 2019-01-15 2020-07-23 Portland State University Feature pyramid warping for video frame interpolation
CN109871789A (en) * 2019-01-30 2019-06-11 电子科技大学 Vehicle checking method under a kind of complex environment based on lightweight neural network
CN110378348A (en) * 2019-07-11 2019-10-25 北京悉见科技有限公司 Instance of video dividing method, equipment and computer readable storage medium
CN111144465A (en) * 2019-12-17 2020-05-12 上海工程技术大学 Multi-scene-oriented smoke detection algorithm and electronic equipment applying same
CN111275746A (en) * 2020-01-19 2020-06-12 浙江大学 Dense optical flow computing system and method based on FPGA
CN111626308A (en) * 2020-04-22 2020-09-04 上海交通大学 Real-time optical flow estimation method based on lightweight convolutional neural network
CN111582483A (en) * 2020-05-14 2020-08-25 哈尔滨工程大学 Unsupervised learning optical flow estimation method based on space and channel combined attention mechanism
CN112288630A (en) * 2020-10-27 2021-01-29 武汉大学 Super-resolution image reconstruction method and system based on improved wide-depth neural network
CN113052885A (en) * 2021-03-29 2021-06-29 中国海洋大学 Underwater environment safety assessment method based on optical flow and depth estimation

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
MobileNets: efficient convolutional neural networks for mobile vision applications;A. G. Howard 等;《arXiv平台: https://arxiv.org/abs/1704.04861》;1-9 *
PWC-Net: CNNs for Optical Flow Using Pyramid, Warping, and Cost Volume;Deqing Sun 等;《2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition》;8934-8943 *
基于PWC-Net的多层权值和轻量化改进光流估计算法;胡毅轩 等;《计算机应用研究》;第39卷(第1期);291-295 *

Also Published As

Publication number Publication date
CN113538527A (en) 2021-10-22

Similar Documents

Publication Publication Date Title
CN110335290B (en) Twin candidate region generation network target tracking method based on attention mechanism
US11501415B2 (en) Method and system for high-resolution image inpainting
CN112052886A (en) Human body action attitude intelligent estimation method and device based on convolutional neural network
CN112434655B (en) Gait recognition method based on adaptive confidence map convolution network
CN113538527B (en) Efficient lightweight optical flow estimation method, storage medium and device
US11915383B2 (en) Methods and systems for high definition image manipulation with neural networks
CN112560865B (en) Semantic segmentation method for point cloud under outdoor large scene
CN114463492B (en) Self-adaptive channel attention three-dimensional reconstruction method based on deep learning
CN113283525A (en) Image matching method based on deep learning
CN115002379B (en) Video frame inserting method, training device, electronic equipment and storage medium
CN114677479A (en) Natural landscape multi-view three-dimensional reconstruction method based on deep learning
US20220215617A1 (en) Viewpoint image processing method and related device
CN113554039A (en) Method and system for generating optical flow graph of dynamic image based on multi-attention machine system
CN116402679A (en) Lightweight infrared super-resolution self-adaptive reconstruction method
CN117296078A (en) Optical flow techniques and systems for accurately identifying and tracking moving objects
Hara et al. Enhancement of novel view synthesis using omnidirectional image completion
CN112115786A (en) Monocular vision odometer method based on attention U-net
CN115115860A (en) Image feature point detection matching network based on deep learning
CN114066750B (en) Self-encoder deblurring method based on domain transformation
CN113239771A (en) Attitude estimation method, system and application thereof
CN117241065B (en) Video plug-in frame image generation method, device, computer equipment and storage medium
CN117853340B (en) Remote sensing video super-resolution reconstruction method based on unidirectional convolution network and degradation modeling
CN117275069B (en) End-to-end head gesture estimation method based on learnable vector and attention mechanism
Li et al. Infrared scene prediction of night unmanned vehicles based on multi-scale feature maps
CN117690188A (en) Hand gesture estimation method and system and electronic equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant