CN114640885A - Video frame insertion method, training method, device and electronic equipment - Google Patents
Video frame insertion method, training method, device and electronic equipment Download PDFInfo
- Publication number
- CN114640885A CN114640885A CN202210171767.5A CN202210171767A CN114640885A CN 114640885 A CN114640885 A CN 114640885A CN 202210171767 A CN202210171767 A CN 202210171767A CN 114640885 A CN114640885 A CN 114640885A
- Authority
- CN
- China
- Prior art keywords
- optical flow
- training
- map
- fusion
- initial
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000012549 training Methods 0.000 title claims abstract description 327
- 238000000034 method Methods 0.000 title claims abstract description 91
- 238000012966 insertion method Methods 0.000 title description 3
- 230000003287 optical effect Effects 0.000 claims abstract description 426
- 238000013507 mapping Methods 0.000 claims abstract description 148
- 230000001131 transforming effect Effects 0.000 claims abstract description 21
- 230000004927 fusion Effects 0.000 claims description 201
- 238000013528 artificial neural network Methods 0.000 claims description 104
- 238000012937 correction Methods 0.000 claims description 31
- 238000004364 calculation method Methods 0.000 claims description 25
- 230000008569 process Effects 0.000 claims description 21
- 238000004422 calculation algorithm Methods 0.000 claims description 16
- 238000004590 computer program Methods 0.000 claims description 7
- 238000003780 insertion Methods 0.000 claims description 7
- 230000037431 insertion Effects 0.000 claims description 7
- 230000004048 modification Effects 0.000 claims description 6
- 238000012986 modification Methods 0.000 claims description 6
- 238000012545 processing Methods 0.000 abstract description 13
- 238000003062 neural network model Methods 0.000 description 11
- 238000010586 diagram Methods 0.000 description 10
- 230000006870 function Effects 0.000 description 9
- 230000004913 activation Effects 0.000 description 5
- 230000008859 change Effects 0.000 description 4
- 238000000605 extraction Methods 0.000 description 4
- 230000000694 effects Effects 0.000 description 2
- 238000003384 imaging method Methods 0.000 description 2
- 238000005070 sampling Methods 0.000 description 2
- 239000007787 solid Substances 0.000 description 2
- 230000009466 transformation Effects 0.000 description 2
- 238000003491 array Methods 0.000 description 1
- 238000012512 characterization method Methods 0.000 description 1
- 238000006243 chemical reaction Methods 0.000 description 1
- 238000013500 data storage Methods 0.000 description 1
- 235000019800 disodium phosphate Nutrition 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 239000000835 fiber Substances 0.000 description 1
- 238000009432 framing Methods 0.000 description 1
- 230000014509 gene expression Effects 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 238000003672 processing method Methods 0.000 description 1
- 239000004065 semiconductor Substances 0.000 description 1
Images
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/40—Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
- H04N21/43—Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
- H04N21/44—Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs
- H04N21/4402—Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs involving reformatting operations of video signals for household redistribution, storage or real-time display
- H04N21/440281—Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs involving reformatting operations of video signals for household redistribution, storage or real-time display by altering the temporal resolution, e.g. by frame skipping
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/20—Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
- H04N21/23—Processing of content or additional data; Elementary server operations; Server middleware
- H04N21/234—Processing of video elementary streams, e.g. splicing of video streams or manipulating encoded video stream scene graphs
- H04N21/2343—Processing of video elementary streams, e.g. splicing of video streams or manipulating encoded video stream scene graphs involving reformatting operations of video signals for distribution or compliance with end-user requests or end-user device requirements
- H04N21/234381—Processing of video elementary streams, e.g. splicing of video streams or manipulating encoded video stream scene graphs involving reformatting operations of video signals for distribution or compliance with end-user requests or end-user device requirements by altering the temporal resolution, e.g. decreasing the frame rate by frame skipping
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Health & Medical Sciences (AREA)
- Computing Systems (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- Data Mining & Analysis (AREA)
- Evolutionary Computation (AREA)
- Life Sciences & Earth Sciences (AREA)
- Molecular Biology (AREA)
- Artificial Intelligence (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Health & Medical Sciences (AREA)
- Multimedia (AREA)
- Signal Processing (AREA)
- Television Systems (AREA)
Abstract
The embodiment of the application provides a video frame interpolation method, a video frame interpolation training device and electronic equipment, relates to the technical field of image processing, and can improve the frame interpolation result precision. The video frame interpolation method comprises the following steps: acquiring two adjacent video frames in a video, wherein the two video frames comprise a previous video frame and a next video frame; calculating an optical flow between two video frames; transforming the optical flow between two video frames into an initial optical flow based on a preset proportion; mapping the two video frames through the initial optical flow to obtain an initial mapping chart; correcting the optical flow between the two video frames based on the initial mapping graph to obtain a corrected optical flow; and obtaining a target interpolation frame between the two video frames according to the corrected optical flow.
Description
Technical Field
The present application relates to the field of image processing technologies, and in particular, to a video frame interpolation method, a training method, an apparatus, and an electronic device.
Background
Video interpolation (video frame interpolation) refers to the generation of video intermediate frames by using an algorithm for increasing the video frame rate or generating a slow motion special effect video. However, the conventional video frame interpolation method has low accuracy of the frame interpolation result.
Disclosure of Invention
A video frame interpolation method, a video frame interpolation training device and electronic equipment can improve frame interpolation result accuracy.
In a first aspect, a video frame insertion method is provided, including: acquiring two adjacent video frames in a video, wherein the two video frames comprise a previous video frame and a next video frame; calculating an optical flow between two video frames; transforming the optical flow between two video frames into an initial optical flow based on a preset proportion; mapping the two video frames through the initial optical flow to obtain an initial mapping chart; correcting the optical flow between the two video frames based on the initial mapping graph to obtain a corrected optical flow; and obtaining a target interpolation frame between the two video frames according to the corrected optical flow.
In a second aspect, a neural network training method for video frame interpolation is provided, including: acquiring a group of training data, wherein the group of training data comprises three continuous video frames which are a first training video frame, a second training video frame and a third training video frame in sequence; acquiring a first reference reverse optical flow, wherein the first reference reverse optical flow is a reverse optical flow from a first training video frame to a second training video frame; acquiring a second reference reverse optical flow, wherein the second reference reverse optical flow is a reverse optical flow from the third training video frame to the second training video frame; calculating a first training inverse optical flow, the first training inverse optical flow being an inverse optical flow from the first training video frame to the third training video frame; calculating a second training inverse optical flow, the second training inverse optical flow being an inverse optical flow from the third training video frame to the first training video frame; transforming the first training reversed optical flow into a first initial training optical flow based on a preset proportion; transforming the second training reversed optical flow into a second initial training optical flow based on a preset proportion; mapping the first training video frame through the first initial training optical flow to obtain a first training mapping chart; mapping the third training video frame through the second initial training optical flow to obtain a second training mapping chart; inputting a first training video frame, a third training video frame, a first initial training optical flow, a second initial training optical flow, a first training mapping chart and a second training mapping chart into an optical flow correction neural network to obtain a third training reverse optical flow and a fourth training reverse optical flow output by the optical flow correction neural network, wherein the third training reverse optical flow is a corrected reverse optical flow from the first training video frame to the second training video frame, and the fourth training reverse optical flow is a corrected reverse optical flow from the third training video frame to the second training video frame; mapping the first training video frame through a third training reverse optical flow to obtain a third training mapping chart; mapping the third training video frame through a fourth training reverse optical flow to obtain a fourth training mapping chart; inputting a first training video frame, a third training reverse optical flow, a fourth training reverse optical flow, a third training mapping chart and a fourth training mapping chart into a fusion neural network to obtain a fusion parameter chart output by the fusion neural network; performing fusion calculation on the third training mapping chart and the fourth training mapping chart based on the fusion parameter chart to obtain a target interpolation frame; adjusting network parameters of the optical-flow-modified neural network and the fused neural network based on a difference between the target interpolated frame and the second training video frame, a difference between the third training inverse optical flow and the first reference inverse optical flow, and a difference between the fourth training inverse optical flow and the second reference inverse optical flow.
In a third aspect, a video frame interpolation apparatus is provided, including: the acquisition module is used for acquiring two adjacent video frames in the video; the acquisition module is also used for calculating the optical flow between the two video frames; the acquisition module is also used for converting the optical flow between the two video frames into an initial optical flow based on a preset proportion; the acquisition module is also used for mapping the two video frames through the initial optical flow to obtain an initial mapping chart; the correction module is used for correcting the optical flow between the two video frames based on the initial mapping map to obtain a corrected optical flow; and the frame interpolation module is used for obtaining a target frame interpolation between the two video frames according to the corrected optical flow.
In a fourth aspect, an electronic device is provided, comprising: a processor and a memory for storing at least one instruction which is loaded and executed by the processor to implement the method described above.
In a fifth aspect, a computer-readable storage medium is provided, in which a computer program is stored which, when run on a computer, causes the computer to perform the above-mentioned method.
The video frame interpolation method, the training method, the device and the electronic equipment of the embodiment of the application calculate the optical flow between two adjacent video frames in a video, correct the optical flow, and obtain the interpolated frame based on the corrected optical flow, wherein the optical flow refers to the instantaneous speed of the pixel motion of a space moving object on an observation imaging plane, and the optical flow contains the information of target motion and expresses the change of an image, so that the interpolated frame between the two video frames can be obtained by using the optical flow between the two adjacent video frames in the video; in addition, the optical flow is converted according to the proportion, so that an initial optical flow corresponding to the position between the two video frames can be obtained, the video frames are mapped according to the converted initial optical flow, an initial mapping map corresponding to the position between the two video frames is obtained, the optical flow is corrected based on the initial mapping map, the optical flow can reflect the change between the two video frames more accurately, and the precision of the frame interpolation result is improved.
Drawings
Fig. 1 is a schematic flowchart illustrating a video frame interpolation method according to an embodiment of the present application;
FIG. 2 is a schematic flow chart illustrating another video frame interpolation method according to an embodiment of the present application;
FIG. 3 is a flowchart illustrating another video frame interpolation method according to an embodiment of the present application;
FIG. 4 is a schematic diagram of a model structure of an optical flow modified neural network according to an embodiment of the present application;
FIG. 5 is a schematic diagram of a model structure of a converged neural network according to an embodiment of the present application;
FIG. 6 is a block diagram illustrating an exemplary video frame interpolation apparatus according to an embodiment of the present disclosure;
FIG. 7 is a block diagram of a neural network training device according to an embodiment of the present disclosure;
fig. 8 is a block diagram of an electronic device in an embodiment of the present application.
Detailed Description
The terminology used in the description of the embodiments section of the present application is for the purpose of describing particular embodiments of the present application only and is not intended to be limiting of the present application.
It should be noted that the flow charts shown in the drawings are only exemplary and do not necessarily include all the contents and operations/steps, nor do they necessarily have to be executed in the order described. For example, some operations/steps may be decomposed, and some operations/steps may be combined or partially combined, so that the actual execution sequence may be changed according to the actual situation.
As shown in fig. 1, an embodiment of the present application provides a video frame interpolation method, including:
the video is a video of a frame to be inserted, the two video frames can be any two adjacent video frames, and the two video frames comprise a previous video frame I1And the following video frame I3。
102, calculating optical flow between two video frames;
103, converting an optical flow between two video frames into an initial optical flow based on a preset proportion, wherein the optical flow between the two video frames is obtained by calculation based on the two video frames, and the optical flow can be converted into an optical flow at a preset position between the two video frames, namely the initial optical flow according to the preset proportion;
and 105, correcting the optical flow between the two video frames based on the initial mapping map to obtain a corrected optical flow.
And step 106, obtaining a target interpolation frame between two video frames according to the corrected optical flow.
After the target frame interpolation between the two video frames is obtained in step 106, the target frame interpolation between the other two video frames may be continuously obtained according to the process from step 101 to step 106, for example, after the target frame interpolation between the first frame and the second frame in the video is obtained, the method may be circulated to continuously obtain the target frame interpolation between the two adjacent frames after a preset frame interval, and so on, so as to implement the frame interpolation of the whole video.
The video frame interpolation method of the embodiment of the application calculates the optical flow between two adjacent video frames in a video, corrects the optical flow, and obtains an interpolated frame based on the corrected optical flow, wherein the optical flow (optical flow) refers to the instantaneous speed of the pixel motion of a space moving object on an observation imaging plane, and the optical flow contains the information of target motion and expresses the change of an image, so that the interpolated frame between the two video frames can be obtained by using the optical flow between the two adjacent video frames in the video; in addition, the optical flow is converted according to the proportion, so that an initial optical flow corresponding to the position between the two video frames can be obtained, the video frames are mapped according to the converted initial optical flow, an initial mapping map corresponding to the position between the two video frames is obtained, the optical flow is corrected based on the initial mapping map, the optical flow can reflect the change between the two video frames more accurately, and the precision of the frame interpolation result is improved.
In one possible implementation, the step 102 of calculating optical flow between two video frames includes: calculating optical flow between two video frames based on a computer vision algorithm, wherein the computer vision algorithm refers to a traditional image processing method and is not a method based on neural network prediction; step 105, modifying the optical flow between two video frames based on the initial map comprises: based on the neural network, the optical flow between two video frames is modified using the initial map as an input. In step 105, the optical flow calculated in step 102 is corrected based on a neural network trained in advance. In this step, since a substantially accurate optical flow has been calculated by the computer vision algorithm, the neural network only needs to correct the optical flow, and thus the calculation amount of the neural network is small.
The traditional video frame interpolation method is to calculate an optical flow through a computer vision algorithm, and then use the optical flow obtained by calculation to carry out optical flow mapping to obtain a target frame interpolation. However, since the accuracy of the interpolation result is low when performing interpolation based on the optical flow obtained in this way, it is possible to use a method of predicting the optical flow using a neural network and obtaining a target interpolation in order to improve the accuracy, but this method is large in calculation amount.
According to the video frame interpolation method, the optical flow is calculated based on the computer vision algorithm, then the optical flow is corrected based on the neural network, the interpolated frame is obtained based on the corrected optical flow, and the optical flow obtained based on the method is corrected by means of prediction of the neural network, so that the accuracy of the interpolated frame result is high, for example, the edge artifact of the object outline can be reduced, and the user experience under the slow motion video is improved; in addition, the neural network only needs to correct the obtained optical flow, so that the calculation amount of the neural network is reduced. Namely, the calculation amount is reduced on the premise of improving the accuracy of the frame interpolation result.
In one possible embodiment, as shown in fig. 2, step 105, modifying the optical flow between two video frames based on the initial map, and obtaining the modified optical flow includes: and inputting the two video frames, the initial optical flow and the initial mapping map into an optical flow correction neural network, and correcting the initial optical flow through the optical flow correction neural network to obtain a corrected optical flow output by the optical flow correction neural network.
In one possible embodiment, the step 106 of deriving a target interpolated frame between two video frames according to the modified optical flow comprises:
and 1063, performing fusion calculation on the corrected mapping map based on the fusion parameter map to obtain the target interpolation frame.
In one possible implementation, as shown in fig. 3, the optical flow between two video frames comprises a first inverse optical flow F3-1And a second backward light flow F1-3First backward light flow F3-1For the video frame I from the previous1To the next video frame I3Of a second backward light flow F1-3For the purpose of following a video frame I3To the previous video frame I1I.e. step 101 is to obtain the adjacent previous video frame I in the video1And a subsequent video frame I3. Namely, step 102 includes:
Where the backward optical flow is also called backward optical flow, the optical flow in the embodiment of the present application can be expressed as an optical flow graph, for example, for two images of a and B, the resolution of the light flow graph is completely consistent with that of the image a and the image B, the light flow graph records the 'offset' of each pixel point on one image, the "offset" here has two directions, one is an offset x in the left-right direction, one is an offset y in the up-down direction, the value of this offset can be simply understood as the distance (number of pixels) to be moved, "apply the optical flow to the a-map", or "map a graph a by optical flow" refers to that each pixel point on the graph a performs a shift operation according to the offset value (up-down direction + left-right direction) of the corresponding position on the optical flow graph, and after the optical flow mapping is completed, a new image is obtained, which is called as a map. The optical flow calculated from the a-diagram to the B-diagram is the forward optical flow of the a-diagram and the reverse optical flow of the B-diagram for the B-diagram. Therefore, for the two frames of images a and B, it is necessary to map the image a by a forward optical flow or map the image B by a backward optical flow, and then the forward optical flow refers to the optical flow calculated from the image a to the image B, and the backward optical flow/backward optical flow refers to the optical flow calculated from the image B to the image a.
step 1041, passing the first initial optical flow FCV2-1For the previous video frame I1Mapping to obtain a first mapping chart WF1-2;
Wherein, i.e. in1Using a first initial optical flow FCV on an image2-1Making optical flow mapping backward warp, and obtaining a mapping map WarpMask or called optical flow mapping WarpFlow through mapping, namely a first mapping map WF1-2。
will be the previous videoFrame I1The following video frame I3First initial stream of light FCV2-1Second initial flow of light FCV2-3First map WF1-2And a second map WF3-2Inputting the third reverse optical flow FCVU into the optical flow correction neural network to obtain the output of the optical flow correction neural network2-1And a fourth reverse optical flow FCVU2-3Third reverse optical flow FCVU2-1For the modified preceding video frame I1Target insertion frame IN2Of a fourth reverse optical flow FCVU2-3For modified subsequent video frame I3To target interpolation frame IN2Of a third reverse optical flow, i.e. FCVU2-1And a fourth reverse optical flow FCVU2-3Belonging to the corrected optical flow in step 105.
The neural network model structure of the optical flow correction neural network can be as shown in fig. 4, and the neural network model can include a convolution Conv + activation function Relu downsampling module, a convolution Conv + activation function Relu feature extraction module, and a deconvolution convTranspose + activation function Relu upsampling module. Wherein, the input of the neural network model is the above I1、I3、FCV2-1、FCV2-3、WF1-2And WF3-2(ii) a The down-sampling module is used for reducing the input size, thereby accelerating the speed of prediction and reasoning and simultaneously extracting network characteristics; the feature extraction module is used for extracting and converting features in the network, wherein the extracted features are features after convolution layer operation in the convolution network, and the features can be the representation of the features such as edges, outlines, light and shade and the like in the frame picture in the network; the up-sampling module is used for re-amplifying the reduced features back to the original input size; the output of the neural network model is a third reverse optical flow FCVU2-1And a fourth reverse optical flow FCVU2-3I.e. modified from the previous video frame I1To target interpolation frame IN2And a modified subsequent video frame I3To target interpolation frame IN2The reverse optical flow of (2). That is, the neural network is used to apply a first initial optical flow FCV2-1Modified to a third reverse optical flow FCVU2-1The second step isInitial light flow FCV2-3Modified to a fourth reverse optical flow FCVU2-3. The related modules in the graph refer to the multiplexing of the modules, for example, in the neural network model, the same feature extraction module is multiplexed, the complexity of the network structure is reduced, and the characterization capability of the network feature extraction is enhanced. The training process of the neural network model will be described later.
In one possible embodiment, as shown in fig. 3, the step 1061 of deriving a target interpolated frame between two video frames according to the modified optical flow includes:
the neural network model structure of the converged neural network may be as shown in fig. 5, and the neural network model may include a convolution Conv + activation function Relu downsampling module and a deconvolution Conv + activation function Relu upsampling module. Wherein, the input of the neural network model is the above I1、I3、FCVU2-1、FCVU2-3、WM1-2And WM3-2(ii) a The neural network model outputs a fusion parameter map m which is used for participating IN calculation IN the subsequent process to obtain a target interpolation frame IN2. The training process of the neural network model will be described later.
In a possible embodiment, step 1063 consists in applying the third map WM to the image based on the fused parameter map m1-2And a fourth map WM3-2Performing fusion calculation to obtain target interpolation frame IN2The process comprises the following steps: map the third map WM1-2And correspondingly multiplying the pixel values in the fusion parameter graph m to obtain a first fusion graph WM1-2Xm, wherein the resolution of the fusion parameter map m is the same as the resolution of an arbitrary video frame, and a plurality of pixel values of the fusion parameter map m and the third map WM1-2The fusion parameter map m has a range of each pixel value of 0 to 1, and the first fusion map has a plurality of pixel values corresponding to the third mapping map WM1-2Is in one-to-one correspondence with a plurality of pixel values, the third map WM1-2A plurality of product values obtained by one-to-one multiplication of the plurality of pixel values and the plurality of pixel values of the fusion parameter map m are respectively the first fusion map WM1-2A plurality of pixel values of x m; subtracting the fusion parameter map m from the 1 to obtain a difference fusion parameter map (1-m), wherein a plurality of pixel values of the difference fusion parameter map (1-m) correspond to a plurality of pixel values of the fusion parameter map m one by one, and a plurality of differences obtained after subtracting the pixel values of the fusion parameter map m from the 1 are respectively a plurality of pixel values of the difference fusion parameter map (1-m); map the fourth map WM3-2Multiplying the difference fusion parameter graph (1-m) to obtain a second fusion graph WM3-2X (1-m), fourth map WM3-2A plurality of product values obtained by multiplying the plurality of pixel values of the difference fusion parameter map (1-m) in a one-to-one correspondence manner are a plurality of pixel values of the second fusion map respectively; the first fused graph WM1-2X m and second fused graph WM3-2Adding x (1-m) to obtain target interpolation frame IN2The multiple pixel values of the first fused image and the multiple pixel values of the second fused image are added IN a one-to-one correspondence manner to obtain multiple values which are respectively used as target interpolation frames IN2A plurality of pixel values of (a) for formulating a target interpolation frame IN2=WM1-2×m+WM3-2X (1-m), see the third map WM1-2And a fourth map WM3-2For performing fusion calculation based on the fusion parameter map m to obtain the target interpolation frame IN2. Target interpolation frame IN2Even if the formula is actually toThird map WM1-2Multiplying the fusion parameter map m point by point to obtain an intermediate result, and multiplying the fourth mapping map WM3-2And (1) the result of point-by-point subtraction of the fusion parameter graph m) is multiplied point-by-point to obtain another intermediate result, and then the two intermediate results are added point-by-point. For example, the following Table 1 illustrates the target interpolation frame IN2A third map WM1-2The fourth map WM3-2And a comparison table of the fusion parameter map m.
TABLE 1
Let us assume a third map WM1-2The fourth map WM3-2And the fusion parameter map m are both 2 × 2 resolution images, and the numerical values in table 1 are pixel values. In three examples, the third map WM1-2All pixel values of (2), a fourth map WM3-2The pixel values of (2) are all 4. The difference is that in example 1, the pixel values of the fusion parameter map m are all 0, according to the formula WM1-2×m+WM3-2X (1-m) target interpolation frame IN obtained by calculating each pixel value2Is 4, wherein the target interpolation frame IN2Each pixel value of (2 × 0+4 × (1-0) ═ 4. In example 2, the pixel values of the fusion parameter map m are all 1 according to the formula WM1-2×m+WM3-2X (1-m) target interpolation frame IN obtained by calculating each pixel value2Is 2, wherein the target interpolation frame IN2Each pixel value of 2 × 1+4 × (1-1) ═ 2. In example 3, the pixel values of the fusion parameter map m are all 0.5, according to the formula WM1-2×m+WM3-2X (1-m) target interpolation frame IN obtained by calculating each pixel value2Is 3, wherein the target interpolation frame IN2Each pixel value of (2 × 0.5+4 × (1-0.5) ═ 3.
In one possible embodimentStep 1031, the first backward optical flow F3-1Transformation into a first initial optical flow FCV based on a preset ratio2-1The method comprises the following steps: passing the first backward light flow F3-1Multiplying by a preset proportional value t to obtain a first initial optical flow FCV2-1I.e. based on the formula FCV2-1=t×F3-1F is to be3-1Transforming FCV2-1The range of the preset proportion value is 0.4-0.6; step 1032, apply the second backward light flow F1-3Transformation into a second initial optical flow FCV based on a preset ratio2-3The method comprises the following steps: second backward light flow F1-3Multiplying by a preset proportional value t to obtain a second initial optical flow FCV2-3I.e. based on the formula FCV2-3=t×F1-3F is to be1-3Conversion to FCV2-3. Namely, the optical flow is transformed according to a preset proportion, so that the optical flow of an intermediate frame at the corresponding position of two video frames can be obtained, and the determination of a target interpolation frame IN based on the optical flow IN the subsequent calculation process is facilitated2. The preset proportion value t may be 0.5, and if the preset proportion value t is 0.5, the optical flow obtained after transformation is an optical flow in a half position between two video frames.
The embodiment of the present application further provides a neural network training method for video frame interpolation, where the neural network training method may be used to train the optical flow modified neural network and the fusion neural network, before performing neural network training, about 100000 groups of data may be extracted from 1000 video segments covering various scenes and motion forms in advance as training data, for example, 100 groups of data are taken from each video segment, 100000 groups of training data may be extracted from 1000 videos in total, each group of training data includes three consecutive video frames, and all data are normalized to a uniform resolution size, for example, to a uniform resolution of 768 × 768 by a clipping or scaling method. The neural network training method comprises the following steps:
step 201, obtaining a set of training data, where the set of training data includes three consecutive video frames, and the three consecutive video frames are sequentially first training video frames i1Second training video frame i2And a third training video frame i3In this step, a set of training data may be randomly selected from the training data;
step 202, obtaining a first reference backward optical flow fg2-1First reference backward light flow fg2-1For the purpose of deriving from a first training video frame i1To the second training video frame i2The backward light flow of (2);
step 203, obtaining a second reference backward optical flow fg2-3Second reference backward light flow fg2-3For the purpose of training video frame i from the third3To the second training video frame i2The backward light flow of (2);
in steps 202 and 203, the first reference backward optical flow fg2-1And a second reference backward optical flow fg2-3Can be obtained by the most advanced optical flow acquisition method of the third party, fg2-1And fg2-3The optical flow is used as a reference optical flow, so that the comparison difference between the subsequent result and the result output by the neural network is facilitated, and further the network parameters are adjusted.
Step 204, calculating a first training inverse optical flow f3-1First training inverse light flow f3-1For the purpose of deriving from a first training video frame i1To a third training video frame i3The backward light flow of (2);
step 205, calculating a second training inverse optical flow f1-3Second training inverse light flow f1-3For the purpose of training video frame i from the third3To the first training video frame i1The backward light flow of (2);
step 206, inverse optical flow f of the first training3-1Transformation into a first initial training light flow fcv based on a preset ratio2-1First initial training optical flow fcv2-1As a frame i from the first training video1To the second training video frame i2The backward light flow of (2);
for example, step 206, inverse optical flow f of the first training3-1Transformation into a first initial training light flow fcv based on a preset ratio2-1The method comprises the following steps: based on the formula fcv2-1=t×f3-1Will f is mixed3-1Transformation fcv2-1,t=0.5;
Step 207, the second training inverse optical flow f1-3Transformation into a second initial training light stream fcv based on a preset ratio2-3Second beginningInitial training light flow fcv2-3As a slave third training video frame i3To the second training video frame i2Of the backward light flow fcv2-3;
For example, step 207, the inverse optical flow f from the second training1-3Transforming a second initial training optical flow to fcv based on a preset scale2-3The method comprises the following steps: based on the formula fcv2-3=t×f1-3Will f is1-3Transformation into fcv2-3,t=0.5。
Step 208, by a first initial training optical flow fcv2-1For the first training video frame i1Mapping to obtain a first training map wf1-2;
Step 209, pass the second initial training optical flow fcv2-3For the third training video frame i3Mapping to obtain a second training mapping wf3-2;
Step 210, the first training video frame i1A third training video frame i3First initial training light flow fcv2-1Second initial training light flow fcv2-3A first training map wf1-2And a second training map wf3-2Inputting the third training inverse optical flow fcvu to the optical flow correction neural network to obtain the output of the optical flow correction neural network2-1And a fourth training inverse optical flow fcvu2-3Third training inverse light flow fcvu2-1For modified slave first training video frames i1To a second training video frame i2Of the fourth training inverse optical flow fcvu2-3For modified secondary training video frames i3To the second training video frame i2The backward light flow of (2);
step 211, inverse optical flow fcvu through third training2-1For the first training video frame i1Mapping to obtain a third training mapping map wm1-2;
Step 212, inverse optical flow fcvu by fourth training2-3For the third training video frame i3Mapping to obtain a fourth training mapping wm3-2;
Step 213, the first training video frame i1And the third trainingVideo frame i3Third training inverse light flow fcvu2-1Fourth training inverse light flow fcvu2-3First training map wm1-2And a second training map wm3-2Inputting the fusion neural network to obtain a fusion parameter graph m output by the fusion neural network;
step 214, based on the fusion parameter map m, the third training map wm1-2And a fourth training map wm3-2Performing fusion calculation to obtain a target interpolation frame in2;
For example, the resolution of the fusion parameter map m is the same as that of any video frame, each pixel value range of the fusion parameter map m is 0-1, and the target interpolation frame in2=wm1-2×m+wm3-2×(1-m)。
Step 215, interpolating in based on the target2And a second training video frame i2Difference between, third training inverse optical flow fcvu2-1Flow fg in the opposite direction of the first reference light2-1Difference between, fourth training inverse optical flow fcvu2-3Flow fg in the opposite direction of the second reference light2-3The difference between the optical flow correction neural network and the fusion neural network is adjusted.
Wherein, in the process of neural network training, the second training video frame i2Is known, and the target interpolated frame in2Is based on neural network predictions and, therefore, may be based on in2And i2The network parameters are adjusted by the difference between the parameters to make the prediction of the neural network more accurate, and similar reasons can be based on fcvu2-1And fg2-1Difference between and fcvu2-3And fg2-3The difference between to adjust the network parameters. From step 201 to step 515, which are described above, are a training process, and the neural network may perform multiple training rounds based on training data. In step 215, specifically, for example, in is calculated2And i2L1 loss between L1 loss, fcvu2-1And fg2-1L1 loss, fcvu2-3And fg2-3L1 loss in between, and back-propagating the iterations to the optical flow modified neural network and the converged neural network convergence, i.e., trained over multiple rounds of the networkIn the process, the network parameters of the optical flow correction neural network and the fusion neural network are adjusted according to the L1 loss, so that the network parameters are continuously optimized until the L1 loss does not decrease any more, which indicates that the network training is finished, and the prediction effect of the neural network is best at the moment. After the network training is completed, the video frame interpolation method can be used for realizing the video frame interpolation based on the trained optical flow correction neural network and the trained fusion neural network.
In one possible embodiment, step 204, the first training inverse optical flow f is calculated3-1The method comprises the following steps: computing a first trained inverse light flow f based on a computer vision algorithm3-1(ii) a Step 205, calculate the third training inverse optical flow f1-3The method comprises the following steps: computing a third training inverse light flow f based on a computer vision algorithm1-3。
In one possible embodiment, step 214, based on the fused parameter map m, the third training map wm is mapped1-2And a fourth training map wm3-2Performing fusion calculation to obtain a target interpolation frame in2The process comprises the following steps: the third training map wm1-2And correspondingly multiplying the pixel values in the fusion parameter graph m to obtain a first fusion graph wm1-2Xm, wherein the resolution of the fusion parameter map m is the same as the resolution of an arbitrary video frame, and a plurality of pixel values of the fusion parameter map m and the third training map wm1-2The pixel values of the first fusion map m are in one-to-one correspondence, each pixel value range of the fusion parameter map m is 0-1, and the pixel values of the first fusion map m and the third training map wm1-2Is in one-to-one correspondence with a plurality of pixel values, a third training map wm1-2A plurality of product values obtained by multiplying the plurality of pixel values of the fusion parameter map m in a one-to-one correspondence manner are a plurality of pixel values of the first fusion map respectively; subtracting the fusion parameter map m from the 1 to obtain a difference fusion parameter map (1-m), wherein a plurality of pixel values of the difference fusion parameter map correspond to a plurality of pixel values of the fusion parameter map m one by one, and a plurality of differences obtained by subtracting the plurality of pixel values of the fusion parameter map m from the 1 are respectively a plurality of pixel values of the difference fusion parameter map; the fourth training map wm3-2Multiplying the difference fusion parameter map (1-m) to obtain a second fusion map wm3-2X (1-m), fourth training map wm3-2A plurality of product values obtained by multiplying the plurality of pixel values of the difference fusion parameter map in a one-to-one correspondence manner are a plurality of pixel values of the second fusion map respectively; the first fusion map wm1-2Xm and second fusion map wm3-2Adding the x (1-m) to obtain a target interpolation frame in2A plurality of values obtained by one-to-one corresponding addition of a plurality of pixel values of the first fusion map and a plurality of pixel values of the second fusion map are respectively the target interpolation frame in2A plurality of pixel values of (2) for formulating the target interpolation frame in2=wm1-2×m+wm3-2×(1-m)。
In one possible embodiment, step 206, the first training inverse optical flow f is3-1Transformation into a first initial training light flow fcv based on a preset ratio2-1The method comprises the following steps: reversing the first training light flow f3-1Multiplying the preset proportional value t to obtain a first initial training optical flow fcv2-1I.e. based on the formula fcv2-1=t×f3-1Will f is3-1Transformation fcv2-1The range of the preset proportion value is 0.4-0.6;
step 207, the second training inverse optical flow f1-3Transformation into a second initial training light stream fcv based on a preset ratio2-3The method comprises the following steps: will be from the second training inverse optical flow f1-3Multiplying by a preset proportional value t to obtain a second initial training optical flow fcv2-3I.e. based on the formula fcv2-3=t×f1-3Will f is1-3Transformation into fcv2-3Wherein the preset ratio t may be 0.5.
As shown in fig. 6, an embodiment of the present application further provides a video frame interpolation apparatus 3, including: an obtaining module 31, configured to obtain two adjacent video frames in the video, where the two video frames include a previous video frame I1And the following video frame I3(ii) a The obtaining module 31 is further configured to calculate an optical flow between two video frames; the obtaining module 31 is further configured to transform the optical flow between two video frames into an initial optical flow based on a preset ratio; the obtaining module 31 is further configured to map the two video frames through the initial optical flow to obtain an initial mapping map; a correction module 32 for correcting between two video frames based on the initial mapCorrecting the optical flow to obtain a corrected optical flow; and the frame interpolation module 33 is configured to obtain a target frame interpolation between two video frames according to the modified optical flow. The video frame interpolation apparatus may apply the video frame interpolation method in any of the above embodiments, and the specific process and principle are not described herein again.
In one possible implementation, calculating optical flow between two video frames includes: calculating an optical flow between two video frames based on a computer vision algorithm; modifying the optical flow between two video frames based on the initial map comprises: based on the neural network, the optical flow between two video frames is modified using the initial map as an input.
In one possible embodiment, the process of modifying the optical flow between two video frames based on the initial map and obtaining the modified optical flow includes: and inputting the two video frames, the initial optical flow and the initial mapping map into an optical flow correction neural network, and correcting the initial optical flow through the optical flow correction neural network to obtain a corrected optical flow output by the optical flow correction neural network.
In one possible implementation, transforming the optical flow between two video frames to an initial optical flow based on a preset ratio comprises: passing the first backward light flow F3-1Transformation into a first initial optical flow FCV based on a preset ratio2-1First backward light flow F3-1Pertaining to a flow between two video frames, a first backward flow F3-1For the purpose of starting from a previous video frame I1To the next video frame I3The backward light flow of (2); second backward light flow F1-3Transformation into a second initial optical flow FCV based on a preset ratio2-3Second backward light flow F1-3Belonging to a stream of light between two video frames, a second inverse stream of light F1-3For the following video frame I3To the previous video frame I1The backward light flow of (2); the step of mapping the two video frames through the initial optical flow to obtain an initial mapping comprises the following steps: by a first initial flow FCV2-1For the previous video frame I1Mapping to obtain a first mapping chart WF1-2First map WF1-2Belongs to the initial mapping; by a second initial flow FCV2-3To the latter oneVideo frame I3Mapping to obtain a second mapping chart WF3-2Second map WF3-2Belongs to the initial map; inputting two video frames, an initial optical flow and an initial mapping map into an optical flow correction neural network, correcting the initial optical flow through the optical flow correction neural network, and obtaining a corrected optical flow output by the optical flow correction neural network, wherein the process comprises the following steps: the previous video frame I1The next video frame I3First initial stream of light FCV2-1Second initial flow of light FCV2-3First map WF1-2And a second map WF3-2Inputting the third reverse optical flow FCVU into the optical flow correction neural network to obtain the output of the optical flow correction neural network2-1And a fourth reverse optical flow FCVU2-3Wherein the third reverse optical flow FCVU2-1And a fourth reverse optical flow FCVU2-3Belonging to the corrected light stream, the third reverse light stream FCVU2-1For the modified preceding video frame I1To target interpolation frame IN2Of a fourth reverse optical flow FCVU2-3For modified subsequent video frame I3To target interpolation frame IN2The reverse optical flow of (2).
In one possible embodiment, deriving the target interpolated frame between two video frames from the modified optical flow comprises: mapping the two video frames by the corrected optical flow to obtain a corrected mapping chart; inputting the two video frames, the corrected optical flow and the corrected mapping map into a fusion neural network to obtain a fusion parameter map output by the fusion neural network; and performing fusion calculation on the third mapping chart and the fourth mapping chart based on the fusion parameter chart to obtain the target interpolation frame.
In one possible embodiment, deriving the target interpolated frame between two video frames from the modified optical flow comprises: by a third reverse optical flow FCVU2-1For the previous video frame I1Mapping to obtain a third mapping chart WM1-2(ii) a By a fourth reverse optical flow FCVU2-3For the next video frame I3Mapping to obtain a fourth mapping chart WM3-2(ii) a The previous video frame I1The next video frame I3Third reverse optical flow FCVU2-1Fourth reverse optical flow FCVU2-3A third map WM1-2And a fourth map WM3-2Inputting the fusion parameters into a fusion neural network to obtain a fusion parameter graph m output by the fusion neural network; based on the fused parameter map m, the third mapping map WM1-2And a fourth map WM3-2Performing fusion calculation to obtain target interpolation frame IN2。
In a possible implementation manner, the process of performing fusion calculation on the third map and the fourth map based on the fusion parameter map to obtain the target interpolated frame includes: multiplying the third mapping map and the fusion parameter map to obtain a first fusion map, wherein a plurality of pixel values of the fusion parameter map correspond to a plurality of pixel values of the third mapping map one to one, each pixel value range of the fusion parameter map is 0-1, the plurality of pixel values of the first fusion map correspond to the plurality of pixel values of the third mapping map one to one, and a plurality of product values obtained by multiplying the plurality of pixel values of the third mapping map and the plurality of pixel values of the fusion parameter map one to one are respectively the plurality of pixel values of the first fusion map; subtracting the 1 from the fusion parameter map to obtain a difference fusion parameter map, wherein a plurality of pixel values of the difference fusion parameter map correspond to a plurality of pixel values of the fusion parameter map one by one, and a plurality of differences obtained by subtracting the 1 from the plurality of pixel values of the fusion parameter map are respectively a plurality of pixel values of the difference fusion parameter map; multiplying the fourth mapping chart and the difference value fusion parameter chart to obtain a second fusion chart, wherein a plurality of product values obtained by multiplying a plurality of pixel values of the fourth mapping chart and a plurality of pixel values of the difference value fusion parameter chart in a one-to-one correspondence mode are a plurality of pixel values of the second fusion chart respectively; and adding the first fusion image and the second fusion image to obtain a target interpolation frame, wherein a plurality of values obtained by one-to-one correspondence addition of a plurality of pixel values of the first fusion image and a plurality of pixel values of the second fusion image are respectively a plurality of pixel values of the target interpolation frame.
In one possible embodiment, transforming the first inverse optical flow into the first initial optical flow based on the preset ratio comprises: multiplying the first reverse optical flow by a preset proportional value to obtain a first initial optical flow, wherein the range of the preset proportional value is 0.4-0.6; transforming the second inverse optical flow into a second initial optical flow based on the preset ratio comprises: and multiplying the second reverse optical flow by a preset proportional value to obtain a second initial optical flow.
In one possible embodiment, the preset ratio value is 0.5.
As shown in fig. 7, an embodiment of the present application further provides a neural network training device 4, including: an obtaining module 41, configured to: acquiring a group of training data, wherein the group of training data comprises three continuous video frames which are sequentially a first training video frame i1Second training video frame i2And a third training video frame i3(ii) a Obtaining a first reference backward flow fg2-1First reference backward light flow fg2-1For the purpose of deriving from a first training video frame i1To the second training video frame i2The reference backward optical flow of (a); obtaining a second reference backward flow fg2-3Second reference backward light flow fg2-3For the purpose of training video frame i from the third3To the second training video frame i2The reference backward optical flow of (a); calculating a first training inverse light flow f3-1First training inverse light flow f3-1For the purpose of deriving from a first training video frame i1To the third training video frame i3The backward light flow of (2); computing a second training inverse light flow f1-3Second training inverse light flow f1-3For the purpose of training video frame i from the third3To the first training video frame i1The backward light flow of (2); reversing the first training light flow f3-1Transformation into a first initial training light flow fcv based on a preset ratio2-1(ii) a Reversing the second training light flow f1-3Transformation into a second initial training light stream fcv based on a preset ratio2-3(ii) a By a first initial training light flow fcv2-1For the first training video frame i1Mapping to obtain a first training map wf1-2(ii) a By a second initial training light flow fcv2-3For the third training video frame i3Mapping to obtain a second training mapping wf3-2(ii) a A correction module 42 for: the first training video frame i1A third training video frame i3First initial training light flow fcv2-1Second initial training light flow fcv2-3A first training map wf1-2And a second training map wf3-2Inputting the third training reverse optical flow fcvu into the optical flow correction neural network to obtain the output of the optical flow correction neural network2-1And a fourth training inverse optical flow fcvu2-3Third training inverse light flow fcvu2-1For modified slave first training video frames i1To the second training video frame i2Of the fourth training inverse optical flow fcvu2-3For modified secondary training video frames i3To the second training video frame i2The backward light flow of (2); an interpolation module 43, configured to: inverse optical flow fcvu through third training2-1For the first training video frame i1Mapping to obtain a third training mapping map wm1-2(ii) a Inverse optical flow fcvu through fourth training2-3For the third training video frame i3Mapping to obtain a fourth training mapping wm3-2(ii) a The first training video frame i1A third training video frame i3Third training inverse light flow fcvu2-1Fourth training inverse light flow fcvu2-3The third training map wm1-2And a fourth training map wm3-2Inputting the fusion neural network to obtain a fusion parameter graph m output by the fusion neural network; the frame interpolation module 43 is further configured to interpolate a third training map wm based on the fused parameter map m1-2And a fourth training map wm3-2Performing fusion calculation to obtain a target interpolation frame in2(ii) a An adjustment module 44 for inserting frames in based on the targets2And a second training video frame i2Difference between, third training inverse optical flow fcvu2-1Flow fg in the opposite direction of the first reference light2-1Difference between, fourth training inverse optical flow fcvu2-3Flow fg in the opposite direction of the second reference light2-3The difference between the two adjusts the optical flow to correct the network parameters of the neural network and the fusion neural network. The neural network training device may apply the neural network training method for video interpolation in any of the above embodiments, and the specific process and principle are the same as those in the above embodiments, and are not described herein again.
In a possible embodiment, a first training inverse optical flow f is calculated3-1The method comprises the following steps: computing a first trained inverse light flow f based on a computer vision algorithm3-1(ii) a Computing a backward optical flow f from a second training1-3The method comprises the following steps: computing a second trained inverse light flow f based on a computer vision algorithm1-3。
In a possible implementation manner, the process of performing fusion calculation on the third training map and the fourth training map based on the fusion parameter map to obtain the target frame insertion includes: multiplying the third training mapping map and the fusion parameter map to obtain a first fusion map, wherein a plurality of pixel values of the fusion parameter map correspond to a plurality of pixel values of the third training mapping map one by one, each pixel value range of the fusion parameter map is 0-1, the plurality of pixel values of the first fusion map correspond to the plurality of pixel values of the third training mapping map one by one, and a plurality of product values obtained by multiplying the plurality of pixel values of the third training mapping map and the plurality of pixel values of the fusion parameter map one by one are respectively the plurality of pixel values of the first fusion map; subtracting the 1 from the fusion parameter map to obtain a difference fusion parameter map, wherein a plurality of pixel values of the difference fusion parameter map correspond to a plurality of pixel values of the fusion parameter map one by one, and a plurality of differences obtained by subtracting the 1 from the plurality of pixel values of the fusion parameter map are respectively a plurality of pixel values of the difference fusion parameter map; multiplying the fourth training mapping chart and the difference value fusion parameter chart to obtain a second fusion chart, wherein a plurality of product values obtained by multiplying a plurality of pixel values of the fourth training mapping chart and a plurality of pixel values of the difference value fusion parameter chart in a one-to-one correspondence mode are a plurality of pixel values of the second fusion chart respectively; and adding the first fusion image and the second fusion image to obtain a target interpolation frame, wherein a plurality of values obtained by adding the plurality of pixel values of the first fusion image and the plurality of pixel values of the second fusion image in a one-to-one correspondence manner are respectively a plurality of pixel values of the target interpolation frame.
In one possible implementation, transforming the first training inverse optical flow to the first initial training optical flow based on the preset ratio comprises: multiplying the first training reverse optical flow by a preset proportional value to obtain a first initial training optical flow, wherein the range of the preset proportional value is 0.4-0.6; transforming the second initial training optical flow to a second initial training optical flow based on the preset scale comprises: and multiplying the second initial training optical flow by a preset proportional value to obtain a second initial training optical flow.
In one possible embodiment, the preset ratio value is 0.5.
It should be understood that the above division of the video frame interpolation apparatus or the neural network training apparatus is only a division of logical functions, and the actual implementation may be wholly or partially integrated into one physical entity, or may be physically separated. And these modules can be realized in the form of software called by processing element; or may be implemented entirely in hardware; and part of the modules can be realized in the form of calling by the processing element in software, and part of the modules can be realized in the form of hardware. For example, any one of the obtaining module, the modifying module and the frame interpolation module may be a processing element that is set up separately, or may be integrated in the video frame interpolation apparatus, for example, be integrated in a certain chip of the video frame interpolation apparatus, or may be stored in a memory of the video frame interpolation apparatus in the form of a program, and a certain processing element of the video frame interpolation apparatus calls and executes the functions of the above modules. Other modules are implemented similarly. In addition, all or part of the modules can be integrated together or can be independently realized. The processing element described herein may be an integrated circuit having signal processing capabilities. In implementation, each step of the above method or each module above may be implemented by an integrated logic circuit of hardware in a processor element or an instruction in the form of software. In addition, the video frame interpolation device and the neural network training device can be the same device or different devices.
For example, a video framing device or a neural network training device may be one or more integrated circuits configured to implement the above methods, such as: one or more Application Specific Integrated Circuits (ASICs), or one or more microprocessors (DSPs), or one or more Field Programmable Gate Arrays (FPGAs), among others. As another example, when one of the above modules is implemented in the form of a Processing element scheduler, the Processing element may be a general purpose processor, such as a Central Processing Unit (CPU) or other processor capable of invoking programs. As another example, these modules may be integrated together, implemented in the form of a system-on-a-chip (SOC).
As shown in fig. 8, an embodiment of the present application further provides an electronic device, including: a processor 51 and a memory 52, the memory 52 being configured to store at least one instruction which is loaded and executed by the processor 51 to implement the method in any of the embodiments described above, including the video frame interpolation method or the neural network training method for video frame interpolation. The specific process and principle of the video frame interpolation method or the neural network training method for video frame interpolation are the same as those of the above embodiments, and are not described herein again.
The number of the processors 51 may be one or more, and the processors 51 and the memory 52 may be connected by a bus 53 or other means. The memory 52 is a non-transitory computer readable storage medium, and can be used for storing non-transitory software programs, non-transitory computer executable programs, and modules, such as program instructions/modules corresponding to the data processing device in the embodiments of the present application. The processor executes various functional applications and data processing by executing non-transitory software programs, instructions and modules stored in the memory, i.e., implements the methods in any of the above-described method embodiments. The memory may include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required for at least one function; and necessary data, etc. Further, the memory may include high speed random access memory, and may also include non-transitory memory, such as at least one disk storage device, flash memory device, or other non-transitory solid state storage device. The electronic device may be, for example, a server, a computer, a mobile phone, or other electronic products.
Embodiments of the present application further provide a computer-readable storage medium, in which a computer program is stored, and when the computer program runs on a computer, the computer is caused to execute the method in any of the above embodiments.
In the above embodiments, the implementation may be wholly or partially realized by software, hardware, firmware, or any combination thereof. When implemented in software, may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. When the computer program instructions are loaded and executed on a computer, the procedures or functions described in accordance with the present application are generated, in whole or in part. The computer may be a general purpose computer, a special purpose computer, a network of computers, or other programmable device. The computer instructions may be stored in a computer readable storage medium or transmitted from one computer readable storage medium to another, for example, the computer instructions may be transmitted from one website, computer, server, or data center to another website, computer, server, or data center by wire (e.g., coaxial cable, fiber optic, digital subscriber line) or wirelessly (e.g., infrared, wireless, microwave, etc.). The computer-readable storage medium can be any available medium that can be accessed by a computer or a data storage device, such as a server, a data center, etc., that incorporates one or more of the available media. The usable medium may be a magnetic medium (e.g., floppy Disk, hard Disk, magnetic tape), an optical medium (e.g., DVD), or a semiconductor medium (e.g., Solid State Disk), among others.
In the embodiments of the present application, "at least one" means one or more, "a plurality" means two or more. "and/or" describes the association relationship of the associated objects, and means that there may be three relationships, for example, a and/or B, and may mean that a exists alone, a and B exist simultaneously, and B exists alone. Wherein A and B can be singular or plural. The character "/" generally indicates that the former and latter associated objects are in an "or" relationship. "at least one of the following" and similar expressions refer to any combination of these items, including any combination of singular or plural items. For example, at least one of a, b, and c may represent: a, b, c, a-b, a-c, b-c, or a-b-c, wherein a, b, c may be single or multiple.
The above description is only a preferred embodiment of the present application and is not intended to limit the present application, and various modifications and changes may be made by those skilled in the art. Any modification, equivalent replacement, improvement and the like made within the spirit and principle of the present application shall be included in the protection scope of the present application.
Claims (17)
1. A method for video frame insertion, comprising:
acquiring two adjacent video frames in a video, wherein the two video frames comprise a previous video frame and a next video frame;
calculating an optical flow between the two video frames;
transforming the optical flow between the two video frames into an initial optical flow based on a preset proportion;
mapping the two video frames through the initial optical flow to obtain an initial mapping chart;
correcting the optical flow between the two video frames based on the initial mapping map to obtain a corrected optical flow;
and obtaining a target interpolation frame between the two video frames according to the corrected optical flow.
2. The method of claim 1,
the calculating optical flow between the two video frames comprises: computing an optical flow between the two video frames based on a computer vision algorithm;
the modifying optical flow between the two video frames based on the initial map comprises: correcting the optical flow between the two video frames based on a neural network using the initial map as an input.
3. The method of claim 1, wherein the modifying the optical flow between the two video frames based on the initial map comprises:
and inputting the two video frames, the initial optical flow and the initial mapping map into an optical flow correction neural network, and correcting the initial optical flow through the optical flow correction neural network to obtain the corrected optical flow output by the optical flow correction neural network.
4. The method of claim 3,
the transforming the optical flow between the two video frames to an initial optical flow based on a preset scale comprises:
transforming a first inverse optical flow, belonging to an optical flow between the two video frames, into a first initial optical flow on the basis of a preset ratio, the first inverse optical flow being an inverse optical flow from the previous video frame to the subsequent video frame;
transforming a second inverse optical flow belonging to an optical flow between the two video frames into a second initial optical flow based on a preset ratio, the second inverse optical flow being an inverse optical flow from the subsequent video frame to the previous video frame;
the mapping the two video frames through the initial optical flow to obtain an initial mapping comprises:
mapping the previous video frame by the first initial optical flow to obtain a first mapping map, wherein the first mapping map belongs to the initial mapping map;
mapping the next video frame through the second initial optical flow to obtain a second mapping map, wherein the second mapping map belongs to the initial mapping map;
the process of inputting the two video frames, the initial optical flow and the initial mapping map into an optical flow modified neural network, and modifying the initial optical flow through the optical flow modified neural network to obtain the modified optical flow output by the optical flow modified neural network includes:
inputting the previous video frame, the next video frame, the first initial optical flow, the second initial optical flow, the first mapping map and the second mapping map into an optical flow modification neural network, and obtaining a third inverse optical flow and a fourth inverse optical flow output by the optical flow modification neural network, wherein the third inverse optical flow and the fourth inverse optical flow belong to the modified optical flow, the third inverse optical flow is a modified inverse optical flow from the previous video frame to the target frame insertion, and the fourth inverse optical flow is a modified inverse optical flow from the next video frame to the target frame insertion.
5. The method of claim 1,
the obtaining of the target interpolated frame between the two video frames according to the modified optical flow comprises:
mapping the two video frames through the corrected optical flow to obtain a corrected mapping chart;
inputting the two video frames, the corrected optical flow and the corrected mapping map into a fusion neural network to obtain a fusion parameter map output by the fusion neural network;
and performing fusion calculation on the corrected mapping map based on the fusion parameter map to obtain the target interpolation frame.
6. The method of claim 4,
the obtaining of the target frame interpolation between the two video frames according to the modified optical flow comprises:
mapping the previous video frame through the third reverse optical flow to obtain a third mapping map;
mapping the next video frame through the fourth backward optical flow to obtain a fourth mapping chart;
inputting the previous video frame, the next video frame, the third reverse optical flow, the fourth reverse optical flow, the third mapping map and the fourth mapping map into a fusion neural network to obtain a fusion parameter map output by the fusion neural network;
and performing fusion calculation on the third mapping chart and the fourth mapping chart based on the fusion parameter chart to obtain the target interpolation frame.
7. The method of claim 6,
the process of performing fusion calculation on the third map and the fourth map based on the fusion parameter map to obtain the target frame insertion includes:
multiplying the third mapping map and the fusion parameter map to obtain a first fusion map, wherein a plurality of pixel values of the fusion parameter map are in one-to-one correspondence with a plurality of pixel values of the third mapping map, each pixel value range of the fusion parameter map is 0-1, the plurality of pixel values of the first fusion map are in one-to-one correspondence with the plurality of pixel values of the third mapping map, and a plurality of product values obtained by multiplying the plurality of pixel values of the third mapping map and the plurality of pixel values of the fusion parameter map in one-to-one correspondence are respectively the plurality of pixel values of the first fusion map;
subtracting the fusion parameter map by 1 to obtain a difference fusion parameter map, wherein a plurality of pixel values of the difference fusion parameter map correspond to a plurality of pixel values of the fusion parameter map one by one, and a plurality of differences obtained by subtracting the plurality of pixel values of the fusion parameter map by 1 are a plurality of pixel values of the difference fusion parameter map respectively;
multiplying the fourth mapping map and the difference value fusion parameter map to obtain a second fusion map, wherein a plurality of product values obtained by one-to-one corresponding multiplication of a plurality of pixel values of the fourth mapping map and a plurality of pixel values of the difference value fusion parameter map are a plurality of pixel values of the second fusion map respectively;
adding the first fusion graph and the second fusion graph to obtain the target interpolation frame, wherein a plurality of pixel values obtained by adding the plurality of pixel values of the first fusion graph and the plurality of pixel values of the second fusion graph in a one-to-one correspondence manner are a plurality of pixel values of the target interpolation frame respectively.
8. The method according to claim 4 or 6 or 7,
the transforming the first inverse optical flow into a first initial optical flow based on a preset proportion comprises:
multiplying the first reverse optical flow by a preset proportional value to obtain the first initial optical flow, wherein the range of the preset proportional value is 0.4-0.6;
the transforming the second inverse optical flow into a second initial optical flow based on a preset proportion comprises:
and multiplying the second reverse optical flow by the preset proportional value to obtain the second initial optical flow.
9. The method of claim 8,
the preset proportional value is 0.5.
10. A neural network training method for video frame interpolation, comprising:
acquiring a group of training data, wherein the group of training data comprises three continuous video frames which are a first training video frame, a second training video frame and a third training video frame in sequence;
obtaining a first reference inverse optical flow, the first reference inverse optical flow being an inverse optical flow from the first training video frame to the second training video frame;
obtaining a second reference inverse optical flow, the second reference inverse optical flow being an inverse optical flow from the third training video frame to the second training video frame;
calculating a first training inverse optical flow, the first training inverse optical flow being an inverse optical flow from the first training video frame to the third training video frame;
calculating a second training inverse optical flow, the second training inverse optical flow being an inverse optical flow from the third training video frame to the first training video frame;
transforming the first training reversed optical flow into a first initial training optical flow based on a preset proportion;
transforming the second training reversed optical flow into a second initial training optical flow based on the preset proportion;
mapping the first training video frame through the first initial training optical flow to obtain a first training mapping chart;
mapping the third training video frame through the second initial training optical flow to obtain a second training mapping chart;
inputting the first training video frame, the third training video frame, the first initial training optical flow, the second initial training optical flow, the first training map and the second training map into an optical flow modification neural network to obtain a third training inverse optical flow and a fourth training inverse optical flow output by the optical flow modification neural network, wherein the third training inverse optical flow is a modified inverse optical flow from the first training video frame to the second training video frame, and the fourth training inverse optical flow is a modified inverse optical flow from the third training video frame to the second training video frame;
mapping the first training video frame through the third training reversed optical flow to obtain a third training mapping chart;
mapping the third training video frame through the fourth training reversed optical flow to obtain a fourth training mapping chart;
inputting the first training video frame, the third training backward light stream, the fourth training backward light stream, the third training map and the fourth training map into a fusion neural network to obtain a fusion parameter map output by the fusion neural network;
based on the fusion parameter map, performing fusion calculation on the third training mapping map and the fourth training mapping map to obtain the target interpolation frame;
adjusting network parameters of the optical-flow-modified neural network and the fused neural network based on a difference between the target interpolated frame and the second training video frame, a difference between the third training inverse optical flow and the first reference inverse optical flow, and a difference between the fourth training inverse optical flow and the second reference inverse optical flow.
11. The method of claim 10,
said calculating a first trained inverse optical flow comprises: computing the first trained inverse optical flow based on a computer vision algorithm;
said calculating a second trained inverse optical flow comprises: calculating the second trained inverse optical flow based on a computer vision algorithm.
12. The method of claim 10,
the process of performing fusion calculation on the third training map and the fourth training map based on the fusion parameter map to obtain the target frame insertion includes:
multiplying the third training map and the fusion parameter map to obtain a first fusion map, wherein a plurality of pixel values of the fusion parameter map correspond to a plurality of pixel values of the third training map one by one, each pixel value range of the fusion parameter map is 0-1, the plurality of pixel values of the first fusion map correspond to the plurality of pixel values of the third training map one by one, and a plurality of product values obtained by multiplying the plurality of pixel values of the third training map and the plurality of pixel values of the fusion parameter map one by one are respectively the plurality of pixel values of the first fusion map;
subtracting the fusion parameter map by 1 to obtain a difference fusion parameter map, wherein a plurality of pixel values of the difference fusion parameter map correspond to a plurality of pixel values of the fusion parameter map one by one, and a plurality of differences obtained by subtracting the fusion parameter map by 1 are respectively a plurality of pixel values of the difference fusion parameter map;
multiplying the fourth training map and the difference value fusion parameter map to obtain a second fusion map, wherein a plurality of product values obtained by one-to-one corresponding multiplication of a plurality of pixel values of the fourth training map and a plurality of pixel values of the difference value fusion parameter map are a plurality of pixel values of the second fusion map respectively;
and adding the first fusion graph and the second fusion graph to obtain the target interpolation frame, wherein a plurality of values obtained by adding a plurality of pixel values of the first fusion graph and a plurality of pixel values of the second fusion graph in a one-to-one correspondence manner are respectively a plurality of pixel values of the target interpolation frame.
13. The method of claim 10,
the transforming the first training inverse optical flow to a first initial training optical flow based on a preset scale comprises:
multiplying the first training reverse optical flow by a preset proportional value to obtain the first initial training optical flow, wherein the range of the preset proportional value is 0.4-0.6;
the transforming the second initial training optical flow into a second initial training optical flow based on a preset ratio comprises:
and multiplying the second initial training optical flow by the preset proportional value to obtain the second initial training optical flow.
14. The method of claim 10,
the preset proportional value is 0.5.
15. A video frame interpolation apparatus, comprising:
the acquisition module is used for acquiring two adjacent video frames in the video;
the acquisition module is further used for calculating optical flow between the two video frames;
the acquisition module is further used for transforming the optical flow between the two video frames into an initial optical flow based on a preset proportion;
the acquisition module is further configured to map the two video frames through the initial optical flow to obtain an initial mapping map;
the correction module is used for correcting the optical flow between the two video frames based on the initial mapping map to obtain a corrected optical flow;
and the frame interpolation module is used for obtaining a target frame interpolation between the two video frames according to the corrected optical flow.
16. An electronic device, comprising:
a processor and a memory for storing at least one instruction which is loaded and executed by the processor to implement the method of any one of claims 1 to 14.
17. A computer-readable storage medium, in which a computer program is stored which, when run on a computer, causes the computer to carry out the method of any one of claims 1 to 14.
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210171767.5A CN114640885B (en) | 2022-02-24 | 2022-02-24 | Video frame inserting method, training device and electronic equipment |
PCT/CN2023/075807 WO2023160426A1 (en) | 2022-02-24 | 2023-02-14 | Video frame interpolation method and apparatus, training method and apparatus, and electronic device |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210171767.5A CN114640885B (en) | 2022-02-24 | 2022-02-24 | Video frame inserting method, training device and electronic equipment |
Publications (2)
Publication Number | Publication Date |
---|---|
CN114640885A true CN114640885A (en) | 2022-06-17 |
CN114640885B CN114640885B (en) | 2023-12-22 |
Family
ID=81948635
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202210171767.5A Active CN114640885B (en) | 2022-02-24 | 2022-02-24 | Video frame inserting method, training device and electronic equipment |
Country Status (2)
Country | Link |
---|---|
CN (1) | CN114640885B (en) |
WO (1) | WO2023160426A1 (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2023160426A1 (en) * | 2022-02-24 | 2023-08-31 | 影石创新科技股份有限公司 | Video frame interpolation method and apparatus, training method and apparatus, and electronic device |
CN117115210A (en) * | 2023-10-23 | 2023-11-24 | 黑龙江省农业科学院农业遥感与信息研究所 | Intelligent agricultural monitoring and adjusting method based on Internet of things |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104978728A (en) * | 2014-04-08 | 2015-10-14 | 南京理工大学 | Image matching system of optical flow method |
WO2016187776A1 (en) * | 2015-05-25 | 2016-12-01 | 北京大学深圳研究生院 | Video frame interpolation method and system based on optical flow method |
CN112104830A (en) * | 2020-08-13 | 2020-12-18 | 北京迈格威科技有限公司 | Video frame insertion method, model training method and corresponding device |
CN113365110A (en) * | 2021-07-14 | 2021-09-07 | 北京百度网讯科技有限公司 | Model training method, video frame interpolation method, device, equipment and storage medium |
CN114007135A (en) * | 2021-10-29 | 2022-02-01 | 广州华多网络科技有限公司 | Video frame insertion method and device, equipment, medium and product thereof |
Family Cites Families (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US10776688B2 (en) * | 2017-11-06 | 2020-09-15 | Nvidia Corporation | Multi-frame video interpolation using optical flow |
CN109949221B (en) * | 2019-01-30 | 2022-05-17 | 深圳大学 | Image processing method and electronic equipment |
CN110191299B (en) * | 2019-04-15 | 2020-08-04 | 浙江大学 | Multi-frame interpolation method based on convolutional neural network |
CN113727141B (en) * | 2020-05-20 | 2023-05-12 | 富士通株式会社 | Interpolation device and method for video frames |
CN113949926B (en) * | 2020-07-17 | 2024-07-30 | 武汉Tcl集团工业研究院有限公司 | Video frame inserting method, storage medium and terminal equipment |
CN112995715B (en) * | 2021-04-20 | 2021-09-03 | 腾讯科技(深圳)有限公司 | Video frame insertion processing method and device, electronic equipment and storage medium |
CN114066730B (en) * | 2021-11-04 | 2022-10-28 | 西北工业大学 | Video frame interpolation method based on unsupervised dual learning |
CN114640885B (en) * | 2022-02-24 | 2023-12-22 | 影石创新科技股份有限公司 | Video frame inserting method, training device and electronic equipment |
-
2022
- 2022-02-24 CN CN202210171767.5A patent/CN114640885B/en active Active
-
2023
- 2023-02-14 WO PCT/CN2023/075807 patent/WO2023160426A1/en unknown
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104978728A (en) * | 2014-04-08 | 2015-10-14 | 南京理工大学 | Image matching system of optical flow method |
WO2016187776A1 (en) * | 2015-05-25 | 2016-12-01 | 北京大学深圳研究生院 | Video frame interpolation method and system based on optical flow method |
US20180176574A1 (en) * | 2015-05-25 | 2018-06-21 | Peking University Shenzhen Graduate School | Method and system for video frame interpolation based on optical flow method |
CN112104830A (en) * | 2020-08-13 | 2020-12-18 | 北京迈格威科技有限公司 | Video frame insertion method, model training method and corresponding device |
CN113365110A (en) * | 2021-07-14 | 2021-09-07 | 北京百度网讯科技有限公司 | Model training method, video frame interpolation method, device, equipment and storage medium |
CN114007135A (en) * | 2021-10-29 | 2022-02-01 | 广州华多网络科技有限公司 | Video frame insertion method and device, equipment, medium and product thereof |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2023160426A1 (en) * | 2022-02-24 | 2023-08-31 | 影石创新科技股份有限公司 | Video frame interpolation method and apparatus, training method and apparatus, and electronic device |
CN117115210A (en) * | 2023-10-23 | 2023-11-24 | 黑龙江省农业科学院农业遥感与信息研究所 | Intelligent agricultural monitoring and adjusting method based on Internet of things |
CN117115210B (en) * | 2023-10-23 | 2024-01-26 | 黑龙江省农业科学院农业遥感与信息研究所 | Intelligent agricultural monitoring and adjusting method based on Internet of things |
Also Published As
Publication number | Publication date |
---|---|
WO2023160426A1 (en) | 2023-08-31 |
CN114640885B (en) | 2023-12-22 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN108304755B (en) | Training method and device of neural network model for image processing | |
Zeng et al. | Learning image-adaptive 3d lookup tables for high performance photo enhancement in real-time | |
WO2021088473A1 (en) | Image super-resolution reconstruction method, image super-resolution reconstruction apparatus, and computer-readable storage medium | |
WO2023160426A1 (en) | Video frame interpolation method and apparatus, training method and apparatus, and electronic device | |
CN111372087B (en) | Panoramic video frame insertion method and device and corresponding storage medium | |
CN106780336B (en) | Image reduction method and device | |
CN108271022A (en) | A kind of method and device of estimation | |
CN113935934A (en) | Image processing method, image processing device, electronic equipment and computer readable storage medium | |
JP2018106316A (en) | Image correction processing method and image correction processing apparatus | |
US20050088531A1 (en) | Automatic stabilization control apparatus, automatic stabilization control method, and computer readable recording medium having automatic stabilization control program recorded thereon | |
WO2020215263A1 (en) | Image processing method and device | |
CN110830848B (en) | Image interpolation method, image interpolation device, computer equipment and storage medium | |
CN115564655A (en) | Video super-resolution reconstruction method, system and medium based on deep learning | |
US20230196721A1 (en) | Low-light video processing method, device and storage medium | |
CN111093045B (en) | Method and device for scaling video sequence resolution | |
JP2015197818A (en) | Image processing apparatus and method of the same | |
CN103618904A (en) | Motion estimation method and device based on pixels | |
CN115809959A (en) | Image processing method and device | |
CN113469880A (en) | Image splicing method and device, storage medium and electronic equipment | |
CN112802079A (en) | Disparity map acquisition method, device, terminal and storage medium | |
CN114257759B (en) | System for image completion | |
CN113658321B (en) | Three-dimensional reconstruction method, system and related equipment | |
JP2018084997A (en) | Image processing device, and image processing method | |
RU2576490C1 (en) | Background hybrid retouch method for 2d to 3d conversion | |
CN113556581A (en) | Method and device for generating video interpolation frame and electronic equipment |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |