CN114640885A - Video frame insertion method, training method, device and electronic equipment - Google Patents

Video frame insertion method, training method, device and electronic equipment Download PDF

Info

Publication number
CN114640885A
CN114640885A CN202210171767.5A CN202210171767A CN114640885A CN 114640885 A CN114640885 A CN 114640885A CN 202210171767 A CN202210171767 A CN 202210171767A CN 114640885 A CN114640885 A CN 114640885A
Authority
CN
China
Prior art keywords
optical flow
training
map
fusion
initial
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202210171767.5A
Other languages
Chinese (zh)
Other versions
CN114640885B (en
Inventor
吕朋伟
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Insta360 Innovation Technology Co Ltd
Original Assignee
Insta360 Innovation Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Insta360 Innovation Technology Co Ltd filed Critical Insta360 Innovation Technology Co Ltd
Priority to CN202210171767.5A priority Critical patent/CN114640885B/en
Publication of CN114640885A publication Critical patent/CN114640885A/en
Priority to PCT/CN2023/075807 priority patent/WO2023160426A1/en
Application granted granted Critical
Publication of CN114640885B publication Critical patent/CN114640885B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/44Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs
    • H04N21/4402Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs involving reformatting operations of video signals for household redistribution, storage or real-time display
    • H04N21/440281Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs involving reformatting operations of video signals for household redistribution, storage or real-time display by altering the temporal resolution, e.g. by frame skipping
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/23Processing of content or additional data; Elementary server operations; Server middleware
    • H04N21/234Processing of video elementary streams, e.g. splicing of video streams or manipulating encoded video stream scene graphs
    • H04N21/2343Processing of video elementary streams, e.g. splicing of video streams or manipulating encoded video stream scene graphs involving reformatting operations of video signals for distribution or compliance with end-user requests or end-user device requirements
    • H04N21/234381Processing of video elementary streams, e.g. splicing of video streams or manipulating encoded video stream scene graphs involving reformatting operations of video signals for distribution or compliance with end-user requests or end-user device requirements by altering the temporal resolution, e.g. decreasing the frame rate by frame skipping

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • Computing Systems (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Molecular Biology (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Television Systems (AREA)

Abstract

The embodiment of the application provides a video frame interpolation method, a video frame interpolation training device and electronic equipment, relates to the technical field of image processing, and can improve the frame interpolation result precision. The video frame interpolation method comprises the following steps: acquiring two adjacent video frames in a video, wherein the two video frames comprise a previous video frame and a next video frame; calculating an optical flow between two video frames; transforming the optical flow between two video frames into an initial optical flow based on a preset proportion; mapping the two video frames through the initial optical flow to obtain an initial mapping chart; correcting the optical flow between the two video frames based on the initial mapping graph to obtain a corrected optical flow; and obtaining a target interpolation frame between the two video frames according to the corrected optical flow.

Description

Video frame insertion method, training method and device and electronic equipment
Technical Field
The present application relates to the field of image processing technologies, and in particular, to a video frame interpolation method, a training method, an apparatus, and an electronic device.
Background
Video interpolation (video frame interpolation) refers to the generation of video intermediate frames by using an algorithm for increasing the video frame rate or generating a slow motion special effect video. However, the conventional video frame interpolation method has low accuracy of the frame interpolation result.
Disclosure of Invention
A video frame interpolation method, a video frame interpolation training device and electronic equipment can improve frame interpolation result accuracy.
In a first aspect, a video frame insertion method is provided, including: acquiring two adjacent video frames in a video, wherein the two video frames comprise a previous video frame and a next video frame; calculating an optical flow between two video frames; transforming the optical flow between two video frames into an initial optical flow based on a preset proportion; mapping the two video frames through the initial optical flow to obtain an initial mapping chart; correcting the optical flow between the two video frames based on the initial mapping graph to obtain a corrected optical flow; and obtaining a target interpolation frame between the two video frames according to the corrected optical flow.
In a second aspect, a neural network training method for video frame interpolation is provided, including: acquiring a group of training data, wherein the group of training data comprises three continuous video frames which are a first training video frame, a second training video frame and a third training video frame in sequence; acquiring a first reference reverse optical flow, wherein the first reference reverse optical flow is a reverse optical flow from a first training video frame to a second training video frame; acquiring a second reference reverse optical flow, wherein the second reference reverse optical flow is a reverse optical flow from the third training video frame to the second training video frame; calculating a first training inverse optical flow, the first training inverse optical flow being an inverse optical flow from the first training video frame to the third training video frame; calculating a second training inverse optical flow, the second training inverse optical flow being an inverse optical flow from the third training video frame to the first training video frame; transforming the first training reversed optical flow into a first initial training optical flow based on a preset proportion; transforming the second training reversed optical flow into a second initial training optical flow based on a preset proportion; mapping the first training video frame through the first initial training optical flow to obtain a first training mapping chart; mapping the third training video frame through the second initial training optical flow to obtain a second training mapping chart; inputting a first training video frame, a third training video frame, a first initial training optical flow, a second initial training optical flow, a first training mapping chart and a second training mapping chart into an optical flow correction neural network to obtain a third training reverse optical flow and a fourth training reverse optical flow output by the optical flow correction neural network, wherein the third training reverse optical flow is a corrected reverse optical flow from the first training video frame to the second training video frame, and the fourth training reverse optical flow is a corrected reverse optical flow from the third training video frame to the second training video frame; mapping the first training video frame through a third training reverse optical flow to obtain a third training mapping chart; mapping the third training video frame through a fourth training reverse optical flow to obtain a fourth training mapping chart; inputting a first training video frame, a third training reverse optical flow, a fourth training reverse optical flow, a third training mapping chart and a fourth training mapping chart into a fusion neural network to obtain a fusion parameter chart output by the fusion neural network; performing fusion calculation on the third training mapping chart and the fourth training mapping chart based on the fusion parameter chart to obtain a target interpolation frame; adjusting network parameters of the optical-flow-modified neural network and the fused neural network based on a difference between the target interpolated frame and the second training video frame, a difference between the third training inverse optical flow and the first reference inverse optical flow, and a difference between the fourth training inverse optical flow and the second reference inverse optical flow.
In a third aspect, a video frame interpolation apparatus is provided, including: the acquisition module is used for acquiring two adjacent video frames in the video; the acquisition module is also used for calculating the optical flow between the two video frames; the acquisition module is also used for converting the optical flow between the two video frames into an initial optical flow based on a preset proportion; the acquisition module is also used for mapping the two video frames through the initial optical flow to obtain an initial mapping chart; the correction module is used for correcting the optical flow between the two video frames based on the initial mapping map to obtain a corrected optical flow; and the frame interpolation module is used for obtaining a target frame interpolation between the two video frames according to the corrected optical flow.
In a fourth aspect, an electronic device is provided, comprising: a processor and a memory for storing at least one instruction which is loaded and executed by the processor to implement the method described above.
In a fifth aspect, a computer-readable storage medium is provided, in which a computer program is stored which, when run on a computer, causes the computer to perform the above-mentioned method.
The video frame interpolation method, the training method, the device and the electronic equipment of the embodiment of the application calculate the optical flow between two adjacent video frames in a video, correct the optical flow, and obtain the interpolated frame based on the corrected optical flow, wherein the optical flow refers to the instantaneous speed of the pixel motion of a space moving object on an observation imaging plane, and the optical flow contains the information of target motion and expresses the change of an image, so that the interpolated frame between the two video frames can be obtained by using the optical flow between the two adjacent video frames in the video; in addition, the optical flow is converted according to the proportion, so that an initial optical flow corresponding to the position between the two video frames can be obtained, the video frames are mapped according to the converted initial optical flow, an initial mapping map corresponding to the position between the two video frames is obtained, the optical flow is corrected based on the initial mapping map, the optical flow can reflect the change between the two video frames more accurately, and the precision of the frame interpolation result is improved.
Drawings
Fig. 1 is a schematic flowchart illustrating a video frame interpolation method according to an embodiment of the present application;
FIG. 2 is a schematic flow chart illustrating another video frame interpolation method according to an embodiment of the present application;
FIG. 3 is a flowchart illustrating another video frame interpolation method according to an embodiment of the present application;
FIG. 4 is a schematic diagram of a model structure of an optical flow modified neural network according to an embodiment of the present application;
FIG. 5 is a schematic diagram of a model structure of a converged neural network according to an embodiment of the present application;
FIG. 6 is a block diagram illustrating an exemplary video frame interpolation apparatus according to an embodiment of the present disclosure;
FIG. 7 is a block diagram of a neural network training device according to an embodiment of the present disclosure;
fig. 8 is a block diagram of an electronic device in an embodiment of the present application.
Detailed Description
The terminology used in the description of the embodiments section of the present application is for the purpose of describing particular embodiments of the present application only and is not intended to be limiting of the present application.
It should be noted that the flow charts shown in the drawings are only exemplary and do not necessarily include all the contents and operations/steps, nor do they necessarily have to be executed in the order described. For example, some operations/steps may be decomposed, and some operations/steps may be combined or partially combined, so that the actual execution sequence may be changed according to the actual situation.
As shown in fig. 1, an embodiment of the present application provides a video frame interpolation method, including:
step 101, two adjacent video frames in a video are obtained;
the video is a video of a frame to be inserted, the two video frames can be any two adjacent video frames, and the two video frames comprise a previous video frame I1And the following video frame I3
102, calculating optical flow between two video frames;
103, converting an optical flow between two video frames into an initial optical flow based on a preset proportion, wherein the optical flow between the two video frames is obtained by calculation based on the two video frames, and the optical flow can be converted into an optical flow at a preset position between the two video frames, namely the initial optical flow according to the preset proportion;
step 104, mapping the two video frames through the initial optical flow to obtain an initial mapping chart;
and 105, correcting the optical flow between the two video frames based on the initial mapping map to obtain a corrected optical flow.
And step 106, obtaining a target interpolation frame between two video frames according to the corrected optical flow.
After the target frame interpolation between the two video frames is obtained in step 106, the target frame interpolation between the other two video frames may be continuously obtained according to the process from step 101 to step 106, for example, after the target frame interpolation between the first frame and the second frame in the video is obtained, the method may be circulated to continuously obtain the target frame interpolation between the two adjacent frames after a preset frame interval, and so on, so as to implement the frame interpolation of the whole video.
The video frame interpolation method of the embodiment of the application calculates the optical flow between two adjacent video frames in a video, corrects the optical flow, and obtains an interpolated frame based on the corrected optical flow, wherein the optical flow (optical flow) refers to the instantaneous speed of the pixel motion of a space moving object on an observation imaging plane, and the optical flow contains the information of target motion and expresses the change of an image, so that the interpolated frame between the two video frames can be obtained by using the optical flow between the two adjacent video frames in the video; in addition, the optical flow is converted according to the proportion, so that an initial optical flow corresponding to the position between the two video frames can be obtained, the video frames are mapped according to the converted initial optical flow, an initial mapping map corresponding to the position between the two video frames is obtained, the optical flow is corrected based on the initial mapping map, the optical flow can reflect the change between the two video frames more accurately, and the precision of the frame interpolation result is improved.
In one possible implementation, the step 102 of calculating optical flow between two video frames includes: calculating optical flow between two video frames based on a computer vision algorithm, wherein the computer vision algorithm refers to a traditional image processing method and is not a method based on neural network prediction; step 105, modifying the optical flow between two video frames based on the initial map comprises: based on the neural network, the optical flow between two video frames is modified using the initial map as an input. In step 105, the optical flow calculated in step 102 is corrected based on a neural network trained in advance. In this step, since a substantially accurate optical flow has been calculated by the computer vision algorithm, the neural network only needs to correct the optical flow, and thus the calculation amount of the neural network is small.
The traditional video frame interpolation method is to calculate an optical flow through a computer vision algorithm, and then use the optical flow obtained by calculation to carry out optical flow mapping to obtain a target frame interpolation. However, since the accuracy of the interpolation result is low when performing interpolation based on the optical flow obtained in this way, it is possible to use a method of predicting the optical flow using a neural network and obtaining a target interpolation in order to improve the accuracy, but this method is large in calculation amount.
According to the video frame interpolation method, the optical flow is calculated based on the computer vision algorithm, then the optical flow is corrected based on the neural network, the interpolated frame is obtained based on the corrected optical flow, and the optical flow obtained based on the method is corrected by means of prediction of the neural network, so that the accuracy of the interpolated frame result is high, for example, the edge artifact of the object outline can be reduced, and the user experience under the slow motion video is improved; in addition, the neural network only needs to correct the obtained optical flow, so that the calculation amount of the neural network is reduced. Namely, the calculation amount is reduced on the premise of improving the accuracy of the frame interpolation result.
In one possible embodiment, as shown in fig. 2, step 105, modifying the optical flow between two video frames based on the initial map, and obtaining the modified optical flow includes: and inputting the two video frames, the initial optical flow and the initial mapping map into an optical flow correction neural network, and correcting the initial optical flow through the optical flow correction neural network to obtain a corrected optical flow output by the optical flow correction neural network.
In one possible embodiment, the step 106 of deriving a target interpolated frame between two video frames according to the modified optical flow comprises:
step 1061, mapping the two video frames through the corrected optical flow to obtain a corrected mapping chart;
step 1062, inputting the two video frames, the corrected optical flow and the corrected mapping map into a fusion neural network to obtain a fusion parameter map output by the fusion neural network;
and 1063, performing fusion calculation on the corrected mapping map based on the fusion parameter map to obtain the target interpolation frame.
In one possible implementation, as shown in fig. 3, the optical flow between two video frames comprises a first inverse optical flow F3-1And a second backward light flow F1-3First backward light flow F3-1For the video frame I from the previous1To the next video frame I3Of a second backward light flow F1-3For the purpose of following a video frame I3To the previous video frame I1I.e. step 101 is to obtain the adjacent previous video frame I in the video1And a subsequent video frame I3. Namely, step 102 includes:
step 1021, computing a first backward optical flow F based on a computer vision algorithm3-1I.e. from the previous video frame I1To the next video frame I3The backward light flow of (2);
step 1022 of calculating a second backward optical flow F based on computer vision algorithms1-3I.e. from the following video frame I3To the previous video frame I1The reverse optical flow of (2).
Where the backward optical flow is also called backward optical flow, the optical flow in the embodiment of the present application can be expressed as an optical flow graph, for example, for two images of a and B, the resolution of the light flow graph is completely consistent with that of the image a and the image B, the light flow graph records the 'offset' of each pixel point on one image, the "offset" here has two directions, one is an offset x in the left-right direction, one is an offset y in the up-down direction, the value of this offset can be simply understood as the distance (number of pixels) to be moved, "apply the optical flow to the a-map", or "map a graph a by optical flow" refers to that each pixel point on the graph a performs a shift operation according to the offset value (up-down direction + left-right direction) of the corresponding position on the optical flow graph, and after the optical flow mapping is completed, a new image is obtained, which is called as a map. The optical flow calculated from the a-diagram to the B-diagram is the forward optical flow of the a-diagram and the reverse optical flow of the B-diagram for the B-diagram. Therefore, for the two frames of images a and B, it is necessary to map the image a by a forward optical flow or map the image B by a backward optical flow, and then the forward optical flow refers to the optical flow calculated from the image a to the image B, and the backward optical flow/backward optical flow refers to the optical flow calculated from the image B to the image a.
Step 103, transforming the optical flow between two video frames into an initial optical flow based on a preset ratio comprises:
step 1031 of converting the first backward optical flow F3-1Transformation into a first initial optical flow FCV based on a preset ratio2-1First initial flow of light FCV2-1As a frame I of the previous video1To target interpolation frame IN2Due to target interpolation of frame IN2Is located at I1And I3The position between two video frames, so that the optical flow between two video frames can be approximated by a transformation based on a preset scale, for example, setting the preset scale to 0.5, by making F3-1X 0.5, the optical flow of the intermediate frame at half of two video frames can be approximately obtained;
step 1032, apply the second backward light flow F1-3Transformation into a second initial optical flow FCV based on a preset ratio2-3Second initial flow of light FCV2-3As a slave to a subsequent video frame I3To target interpolation frame IN2The backward light flow of (2);
step 104, obtaining an initial mapping map by mapping the two video frames through the initial optical flow comprises:
step 1041, passing the first initial optical flow FCV2-1For the previous video frame I1Mapping to obtain a first mapping chart WF1-2
Wherein, i.e. in1Using a first initial optical flow FCV on an image2-1Making optical flow mapping backward warp, and obtaining a mapping map WarpMask or called optical flow mapping WarpFlow through mapping, namely a first mapping map WF1-2。
Step 1042, passing the second initial flow FCV2-3For the next video frame I3Mapping to obtain a second mapping chart WF3-2That is, the initial map in the step 1052 includes the first map WF1-2And a second map WF3-2
Step 105, based on the two video frames, the initial optical flow and the initial mapping map, correcting the initial optical flow through an optical flow correction neural network, and obtaining a corrected optical flow, wherein the process comprises the following steps:
will be the previous videoFrame I1The following video frame I3First initial stream of light FCV2-1Second initial flow of light FCV2-3First map WF1-2And a second map WF3-2Inputting the third reverse optical flow FCVU into the optical flow correction neural network to obtain the output of the optical flow correction neural network2-1And a fourth reverse optical flow FCVU2-3Third reverse optical flow FCVU2-1For the modified preceding video frame I1Target insertion frame IN2Of a fourth reverse optical flow FCVU2-3For modified subsequent video frame I3To target interpolation frame IN2Of a third reverse optical flow, i.e. FCVU2-1And a fourth reverse optical flow FCVU2-3Belonging to the corrected optical flow in step 105.
The neural network model structure of the optical flow correction neural network can be as shown in fig. 4, and the neural network model can include a convolution Conv + activation function Relu downsampling module, a convolution Conv + activation function Relu feature extraction module, and a deconvolution convTranspose + activation function Relu upsampling module. Wherein, the input of the neural network model is the above I1、I3、FCV2-1、FCV2-3、WF1-2And WF3-2(ii) a The down-sampling module is used for reducing the input size, thereby accelerating the speed of prediction and reasoning and simultaneously extracting network characteristics; the feature extraction module is used for extracting and converting features in the network, wherein the extracted features are features after convolution layer operation in the convolution network, and the features can be the representation of the features such as edges, outlines, light and shade and the like in the frame picture in the network; the up-sampling module is used for re-amplifying the reduced features back to the original input size; the output of the neural network model is a third reverse optical flow FCVU2-1And a fourth reverse optical flow FCVU2-3I.e. modified from the previous video frame I1To target interpolation frame IN2And a modified subsequent video frame I3To target interpolation frame IN2The reverse optical flow of (2). That is, the neural network is used to apply a first initial optical flow FCV2-1Modified to a third reverse optical flow FCVU2-1The second step isInitial light flow FCV2-3Modified to a fourth reverse optical flow FCVU2-3. The related modules in the graph refer to the multiplexing of the modules, for example, in the neural network model, the same feature extraction module is multiplexed, the complexity of the network structure is reduced, and the characterization capability of the network feature extraction is enhanced. The training process of the neural network model will be described later.
In one possible embodiment, as shown in fig. 3, the step 1061 of deriving a target interpolated frame between two video frames according to the modified optical flow includes:
step 10611, passing the third reverse optical flow FCVU2-1For the previous video frame I1Mapping to obtain a third mapping chart WM1-2
Step 10612, passing the fourth backward optical flow FCVU2-3For the next video frame I3Mapping to obtain a fourth mapping chart WM3-2
Step 1062, adding the previous video frame I1The next video frame I3A third reverse optical flow FCVU2-1Fourth reverse optical flow FCVU2-3A third map WM1-2And a fourth map WM3-2Inputting the fusion neural network to obtain a fusion parameter graph m output by the fusion neural network;
the neural network model structure of the converged neural network may be as shown in fig. 5, and the neural network model may include a convolution Conv + activation function Relu downsampling module and a deconvolution Conv + activation function Relu upsampling module. Wherein, the input of the neural network model is the above I1、I3、FCVU2-1、FCVU2-3、WM1-2And WM3-2(ii) a The neural network model outputs a fusion parameter map m which is used for participating IN calculation IN the subsequent process to obtain a target interpolation frame IN2. The training process of the neural network model will be described later.
Step 1063, based on the fusion parameter map m, map WM1-2And a fourth map WM3-2Performing fusion calculation to obtain target interpolation frame IN2
In a possible embodiment, step 1063 consists in applying the third map WM to the image based on the fused parameter map m1-2And a fourth map WM3-2Performing fusion calculation to obtain target interpolation frame IN2The process comprises the following steps: map the third map WM1-2And correspondingly multiplying the pixel values in the fusion parameter graph m to obtain a first fusion graph WM1-2Xm, wherein the resolution of the fusion parameter map m is the same as the resolution of an arbitrary video frame, and a plurality of pixel values of the fusion parameter map m and the third map WM1-2The fusion parameter map m has a range of each pixel value of 0 to 1, and the first fusion map has a plurality of pixel values corresponding to the third mapping map WM1-2Is in one-to-one correspondence with a plurality of pixel values, the third map WM1-2A plurality of product values obtained by one-to-one multiplication of the plurality of pixel values and the plurality of pixel values of the fusion parameter map m are respectively the first fusion map WM1-2A plurality of pixel values of x m; subtracting the fusion parameter map m from the 1 to obtain a difference fusion parameter map (1-m), wherein a plurality of pixel values of the difference fusion parameter map (1-m) correspond to a plurality of pixel values of the fusion parameter map m one by one, and a plurality of differences obtained after subtracting the pixel values of the fusion parameter map m from the 1 are respectively a plurality of pixel values of the difference fusion parameter map (1-m); map the fourth map WM3-2Multiplying the difference fusion parameter graph (1-m) to obtain a second fusion graph WM3-2X (1-m), fourth map WM3-2A plurality of product values obtained by multiplying the plurality of pixel values of the difference fusion parameter map (1-m) in a one-to-one correspondence manner are a plurality of pixel values of the second fusion map respectively; the first fused graph WM1-2X m and second fused graph WM3-2Adding x (1-m) to obtain target interpolation frame IN2The multiple pixel values of the first fused image and the multiple pixel values of the second fused image are added IN a one-to-one correspondence manner to obtain multiple values which are respectively used as target interpolation frames IN2A plurality of pixel values of (a) for formulating a target interpolation frame IN2=WM1-2×m+WM3-2X (1-m), see the third map WM1-2And a fourth map WM3-2For performing fusion calculation based on the fusion parameter map m to obtain the target interpolation frame IN2. Target interpolation frame IN2Even if the formula is actually toThird map WM1-2Multiplying the fusion parameter map m point by point to obtain an intermediate result, and multiplying the fourth mapping map WM3-2And (1) the result of point-by-point subtraction of the fusion parameter graph m) is multiplied point-by-point to obtain another intermediate result, and then the two intermediate results are added point-by-point. For example, the following Table 1 illustrates the target interpolation frame IN2A third map WM1-2The fourth map WM3-2And a comparison table of the fusion parameter map m.
TABLE 1
Figure BDA0003518445500000061
Figure BDA0003518445500000071
Let us assume a third map WM1-2The fourth map WM3-2And the fusion parameter map m are both 2 × 2 resolution images, and the numerical values in table 1 are pixel values. In three examples, the third map WM1-2All pixel values of (2), a fourth map WM3-2The pixel values of (2) are all 4. The difference is that in example 1, the pixel values of the fusion parameter map m are all 0, according to the formula WM1-2×m+WM3-2X (1-m) target interpolation frame IN obtained by calculating each pixel value2Is 4, wherein the target interpolation frame IN2Each pixel value of (2 × 0+4 × (1-0) ═ 4. In example 2, the pixel values of the fusion parameter map m are all 1 according to the formula WM1-2×m+WM3-2X (1-m) target interpolation frame IN obtained by calculating each pixel value2Is 2, wherein the target interpolation frame IN2Each pixel value of 2 × 1+4 × (1-1) ═ 2. In example 3, the pixel values of the fusion parameter map m are all 0.5, according to the formula WM1-2×m+WM3-2X (1-m) target interpolation frame IN obtained by calculating each pixel value2Is 3, wherein the target interpolation frame IN2Each pixel value of (2 × 0.5+4 × (1-0.5) ═ 3.
In one possible embodimentStep 1031, the first backward optical flow F3-1Transformation into a first initial optical flow FCV based on a preset ratio2-1The method comprises the following steps: passing the first backward light flow F3-1Multiplying by a preset proportional value t to obtain a first initial optical flow FCV2-1I.e. based on the formula FCV2-1=t×F3-1F is to be3-1Transforming FCV2-1The range of the preset proportion value is 0.4-0.6; step 1032, apply the second backward light flow F1-3Transformation into a second initial optical flow FCV based on a preset ratio2-3The method comprises the following steps: second backward light flow F1-3Multiplying by a preset proportional value t to obtain a second initial optical flow FCV2-3I.e. based on the formula FCV2-3=t×F1-3F is to be1-3Conversion to FCV2-3. Namely, the optical flow is transformed according to a preset proportion, so that the optical flow of an intermediate frame at the corresponding position of two video frames can be obtained, and the determination of a target interpolation frame IN based on the optical flow IN the subsequent calculation process is facilitated2. The preset proportion value t may be 0.5, and if the preset proportion value t is 0.5, the optical flow obtained after transformation is an optical flow in a half position between two video frames.
The embodiment of the present application further provides a neural network training method for video frame interpolation, where the neural network training method may be used to train the optical flow modified neural network and the fusion neural network, before performing neural network training, about 100000 groups of data may be extracted from 1000 video segments covering various scenes and motion forms in advance as training data, for example, 100 groups of data are taken from each video segment, 100000 groups of training data may be extracted from 1000 videos in total, each group of training data includes three consecutive video frames, and all data are normalized to a uniform resolution size, for example, to a uniform resolution of 768 × 768 by a clipping or scaling method. The neural network training method comprises the following steps:
step 201, obtaining a set of training data, where the set of training data includes three consecutive video frames, and the three consecutive video frames are sequentially first training video frames i1Second training video frame i2And a third training video frame i3In this step, a set of training data may be randomly selected from the training data;
step 202, obtaining a first reference backward optical flow fg2-1First reference backward light flow fg2-1For the purpose of deriving from a first training video frame i1To the second training video frame i2The backward light flow of (2);
step 203, obtaining a second reference backward optical flow fg2-3Second reference backward light flow fg2-3For the purpose of training video frame i from the third3To the second training video frame i2The backward light flow of (2);
in steps 202 and 203, the first reference backward optical flow fg2-1And a second reference backward optical flow fg2-3Can be obtained by the most advanced optical flow acquisition method of the third party, fg2-1And fg2-3The optical flow is used as a reference optical flow, so that the comparison difference between the subsequent result and the result output by the neural network is facilitated, and further the network parameters are adjusted.
Step 204, calculating a first training inverse optical flow f3-1First training inverse light flow f3-1For the purpose of deriving from a first training video frame i1To a third training video frame i3The backward light flow of (2);
step 205, calculating a second training inverse optical flow f1-3Second training inverse light flow f1-3For the purpose of training video frame i from the third3To the first training video frame i1The backward light flow of (2);
step 206, inverse optical flow f of the first training3-1Transformation into a first initial training light flow fcv based on a preset ratio2-1First initial training optical flow fcv2-1As a frame i from the first training video1To the second training video frame i2The backward light flow of (2);
for example, step 206, inverse optical flow f of the first training3-1Transformation into a first initial training light flow fcv based on a preset ratio2-1The method comprises the following steps: based on the formula fcv2-1=t×f3-1Will f is mixed3-1Transformation fcv2-1,t=0.5;
Step 207, the second training inverse optical flow f1-3Transformation into a second initial training light stream fcv based on a preset ratio2-3Second beginningInitial training light flow fcv2-3As a slave third training video frame i3To the second training video frame i2Of the backward light flow fcv2-3
For example, step 207, the inverse optical flow f from the second training1-3Transforming a second initial training optical flow to fcv based on a preset scale2-3The method comprises the following steps: based on the formula fcv2-3=t×f1-3Will f is1-3Transformation into fcv2-3,t=0.5。
Step 208, by a first initial training optical flow fcv2-1For the first training video frame i1Mapping to obtain a first training map wf1-2
Step 209, pass the second initial training optical flow fcv2-3For the third training video frame i3Mapping to obtain a second training mapping wf3-2
Step 210, the first training video frame i1A third training video frame i3First initial training light flow fcv2-1Second initial training light flow fcv2-3A first training map wf1-2And a second training map wf3-2Inputting the third training inverse optical flow fcvu to the optical flow correction neural network to obtain the output of the optical flow correction neural network2-1And a fourth training inverse optical flow fcvu2-3Third training inverse light flow fcvu2-1For modified slave first training video frames i1To a second training video frame i2Of the fourth training inverse optical flow fcvu2-3For modified secondary training video frames i3To the second training video frame i2The backward light flow of (2);
step 211, inverse optical flow fcvu through third training2-1For the first training video frame i1Mapping to obtain a third training mapping map wm1-2
Step 212, inverse optical flow fcvu by fourth training2-3For the third training video frame i3Mapping to obtain a fourth training mapping wm3-2
Step 213, the first training video frame i1And the third trainingVideo frame i3Third training inverse light flow fcvu2-1Fourth training inverse light flow fcvu2-3First training map wm1-2And a second training map wm3-2Inputting the fusion neural network to obtain a fusion parameter graph m output by the fusion neural network;
step 214, based on the fusion parameter map m, the third training map wm1-2And a fourth training map wm3-2Performing fusion calculation to obtain a target interpolation frame in2
For example, the resolution of the fusion parameter map m is the same as that of any video frame, each pixel value range of the fusion parameter map m is 0-1, and the target interpolation frame in2=wm1-2×m+wm3-2×(1-m)。
Step 215, interpolating in based on the target2And a second training video frame i2Difference between, third training inverse optical flow fcvu2-1Flow fg in the opposite direction of the first reference light2-1Difference between, fourth training inverse optical flow fcvu2-3Flow fg in the opposite direction of the second reference light2-3The difference between the optical flow correction neural network and the fusion neural network is adjusted.
Wherein, in the process of neural network training, the second training video frame i2Is known, and the target interpolated frame in2Is based on neural network predictions and, therefore, may be based on in2And i2The network parameters are adjusted by the difference between the parameters to make the prediction of the neural network more accurate, and similar reasons can be based on fcvu2-1And fg2-1Difference between and fcvu2-3And fg2-3The difference between to adjust the network parameters. From step 201 to step 515, which are described above, are a training process, and the neural network may perform multiple training rounds based on training data. In step 215, specifically, for example, in is calculated2And i2L1 loss between L1 loss, fcvu2-1And fg2-1L1 loss, fcvu2-3And fg2-3L1 loss in between, and back-propagating the iterations to the optical flow modified neural network and the converged neural network convergence, i.e., trained over multiple rounds of the networkIn the process, the network parameters of the optical flow correction neural network and the fusion neural network are adjusted according to the L1 loss, so that the network parameters are continuously optimized until the L1 loss does not decrease any more, which indicates that the network training is finished, and the prediction effect of the neural network is best at the moment. After the network training is completed, the video frame interpolation method can be used for realizing the video frame interpolation based on the trained optical flow correction neural network and the trained fusion neural network.
In one possible embodiment, step 204, the first training inverse optical flow f is calculated3-1The method comprises the following steps: computing a first trained inverse light flow f based on a computer vision algorithm3-1(ii) a Step 205, calculate the third training inverse optical flow f1-3The method comprises the following steps: computing a third training inverse light flow f based on a computer vision algorithm1-3
In one possible embodiment, step 214, based on the fused parameter map m, the third training map wm is mapped1-2And a fourth training map wm3-2Performing fusion calculation to obtain a target interpolation frame in2The process comprises the following steps: the third training map wm1-2And correspondingly multiplying the pixel values in the fusion parameter graph m to obtain a first fusion graph wm1-2Xm, wherein the resolution of the fusion parameter map m is the same as the resolution of an arbitrary video frame, and a plurality of pixel values of the fusion parameter map m and the third training map wm1-2The pixel values of the first fusion map m are in one-to-one correspondence, each pixel value range of the fusion parameter map m is 0-1, and the pixel values of the first fusion map m and the third training map wm1-2Is in one-to-one correspondence with a plurality of pixel values, a third training map wm1-2A plurality of product values obtained by multiplying the plurality of pixel values of the fusion parameter map m in a one-to-one correspondence manner are a plurality of pixel values of the first fusion map respectively; subtracting the fusion parameter map m from the 1 to obtain a difference fusion parameter map (1-m), wherein a plurality of pixel values of the difference fusion parameter map correspond to a plurality of pixel values of the fusion parameter map m one by one, and a plurality of differences obtained by subtracting the plurality of pixel values of the fusion parameter map m from the 1 are respectively a plurality of pixel values of the difference fusion parameter map; the fourth training map wm3-2Multiplying the difference fusion parameter map (1-m) to obtain a second fusion map wm3-2X (1-m), fourth training map wm3-2A plurality of product values obtained by multiplying the plurality of pixel values of the difference fusion parameter map in a one-to-one correspondence manner are a plurality of pixel values of the second fusion map respectively; the first fusion map wm1-2Xm and second fusion map wm3-2Adding the x (1-m) to obtain a target interpolation frame in2A plurality of values obtained by one-to-one corresponding addition of a plurality of pixel values of the first fusion map and a plurality of pixel values of the second fusion map are respectively the target interpolation frame in2A plurality of pixel values of (2) for formulating the target interpolation frame in2=wm1-2×m+wm3-2×(1-m)。
In one possible embodiment, step 206, the first training inverse optical flow f is3-1Transformation into a first initial training light flow fcv based on a preset ratio2-1The method comprises the following steps: reversing the first training light flow f3-1Multiplying the preset proportional value t to obtain a first initial training optical flow fcv2-1I.e. based on the formula fcv2-1=t×f3-1Will f is3-1Transformation fcv2-1The range of the preset proportion value is 0.4-0.6;
step 207, the second training inverse optical flow f1-3Transformation into a second initial training light stream fcv based on a preset ratio2-3The method comprises the following steps: will be from the second training inverse optical flow f1-3Multiplying by a preset proportional value t to obtain a second initial training optical flow fcv2-3I.e. based on the formula fcv2-3=t×f1-3Will f is1-3Transformation into fcv2-3Wherein the preset ratio t may be 0.5.
As shown in fig. 6, an embodiment of the present application further provides a video frame interpolation apparatus 3, including: an obtaining module 31, configured to obtain two adjacent video frames in the video, where the two video frames include a previous video frame I1And the following video frame I3(ii) a The obtaining module 31 is further configured to calculate an optical flow between two video frames; the obtaining module 31 is further configured to transform the optical flow between two video frames into an initial optical flow based on a preset ratio; the obtaining module 31 is further configured to map the two video frames through the initial optical flow to obtain an initial mapping map; a correction module 32 for correcting between two video frames based on the initial mapCorrecting the optical flow to obtain a corrected optical flow; and the frame interpolation module 33 is configured to obtain a target frame interpolation between two video frames according to the modified optical flow. The video frame interpolation apparatus may apply the video frame interpolation method in any of the above embodiments, and the specific process and principle are not described herein again.
In one possible implementation, calculating optical flow between two video frames includes: calculating an optical flow between two video frames based on a computer vision algorithm; modifying the optical flow between two video frames based on the initial map comprises: based on the neural network, the optical flow between two video frames is modified using the initial map as an input.
In one possible embodiment, the process of modifying the optical flow between two video frames based on the initial map and obtaining the modified optical flow includes: and inputting the two video frames, the initial optical flow and the initial mapping map into an optical flow correction neural network, and correcting the initial optical flow through the optical flow correction neural network to obtain a corrected optical flow output by the optical flow correction neural network.
In one possible implementation, transforming the optical flow between two video frames to an initial optical flow based on a preset ratio comprises: passing the first backward light flow F3-1Transformation into a first initial optical flow FCV based on a preset ratio2-1First backward light flow F3-1Pertaining to a flow between two video frames, a first backward flow F3-1For the purpose of starting from a previous video frame I1To the next video frame I3The backward light flow of (2); second backward light flow F1-3Transformation into a second initial optical flow FCV based on a preset ratio2-3Second backward light flow F1-3Belonging to a stream of light between two video frames, a second inverse stream of light F1-3For the following video frame I3To the previous video frame I1The backward light flow of (2); the step of mapping the two video frames through the initial optical flow to obtain an initial mapping comprises the following steps: by a first initial flow FCV2-1For the previous video frame I1Mapping to obtain a first mapping chart WF1-2First map WF1-2Belongs to the initial mapping; by a second initial flow FCV2-3To the latter oneVideo frame I3Mapping to obtain a second mapping chart WF3-2Second map WF3-2Belongs to the initial map; inputting two video frames, an initial optical flow and an initial mapping map into an optical flow correction neural network, correcting the initial optical flow through the optical flow correction neural network, and obtaining a corrected optical flow output by the optical flow correction neural network, wherein the process comprises the following steps: the previous video frame I1The next video frame I3First initial stream of light FCV2-1Second initial flow of light FCV2-3First map WF1-2And a second map WF3-2Inputting the third reverse optical flow FCVU into the optical flow correction neural network to obtain the output of the optical flow correction neural network2-1And a fourth reverse optical flow FCVU2-3Wherein the third reverse optical flow FCVU2-1And a fourth reverse optical flow FCVU2-3Belonging to the corrected light stream, the third reverse light stream FCVU2-1For the modified preceding video frame I1To target interpolation frame IN2Of a fourth reverse optical flow FCVU2-3For modified subsequent video frame I3To target interpolation frame IN2The reverse optical flow of (2).
In one possible embodiment, deriving the target interpolated frame between two video frames from the modified optical flow comprises: mapping the two video frames by the corrected optical flow to obtain a corrected mapping chart; inputting the two video frames, the corrected optical flow and the corrected mapping map into a fusion neural network to obtain a fusion parameter map output by the fusion neural network; and performing fusion calculation on the third mapping chart and the fourth mapping chart based on the fusion parameter chart to obtain the target interpolation frame.
In one possible embodiment, deriving the target interpolated frame between two video frames from the modified optical flow comprises: by a third reverse optical flow FCVU2-1For the previous video frame I1Mapping to obtain a third mapping chart WM1-2(ii) a By a fourth reverse optical flow FCVU2-3For the next video frame I3Mapping to obtain a fourth mapping chart WM3-2(ii) a The previous video frame I1The next video frame I3Third reverse optical flow FCVU2-1Fourth reverse optical flow FCVU2-3A third map WM1-2And a fourth map WM3-2Inputting the fusion parameters into a fusion neural network to obtain a fusion parameter graph m output by the fusion neural network; based on the fused parameter map m, the third mapping map WM1-2And a fourth map WM3-2Performing fusion calculation to obtain target interpolation frame IN2
In a possible implementation manner, the process of performing fusion calculation on the third map and the fourth map based on the fusion parameter map to obtain the target interpolated frame includes: multiplying the third mapping map and the fusion parameter map to obtain a first fusion map, wherein a plurality of pixel values of the fusion parameter map correspond to a plurality of pixel values of the third mapping map one to one, each pixel value range of the fusion parameter map is 0-1, the plurality of pixel values of the first fusion map correspond to the plurality of pixel values of the third mapping map one to one, and a plurality of product values obtained by multiplying the plurality of pixel values of the third mapping map and the plurality of pixel values of the fusion parameter map one to one are respectively the plurality of pixel values of the first fusion map; subtracting the 1 from the fusion parameter map to obtain a difference fusion parameter map, wherein a plurality of pixel values of the difference fusion parameter map correspond to a plurality of pixel values of the fusion parameter map one by one, and a plurality of differences obtained by subtracting the 1 from the plurality of pixel values of the fusion parameter map are respectively a plurality of pixel values of the difference fusion parameter map; multiplying the fourth mapping chart and the difference value fusion parameter chart to obtain a second fusion chart, wherein a plurality of product values obtained by multiplying a plurality of pixel values of the fourth mapping chart and a plurality of pixel values of the difference value fusion parameter chart in a one-to-one correspondence mode are a plurality of pixel values of the second fusion chart respectively; and adding the first fusion image and the second fusion image to obtain a target interpolation frame, wherein a plurality of values obtained by one-to-one correspondence addition of a plurality of pixel values of the first fusion image and a plurality of pixel values of the second fusion image are respectively a plurality of pixel values of the target interpolation frame.
In one possible embodiment, transforming the first inverse optical flow into the first initial optical flow based on the preset ratio comprises: multiplying the first reverse optical flow by a preset proportional value to obtain a first initial optical flow, wherein the range of the preset proportional value is 0.4-0.6; transforming the second inverse optical flow into a second initial optical flow based on the preset ratio comprises: and multiplying the second reverse optical flow by a preset proportional value to obtain a second initial optical flow.
In one possible embodiment, the preset ratio value is 0.5.
As shown in fig. 7, an embodiment of the present application further provides a neural network training device 4, including: an obtaining module 41, configured to: acquiring a group of training data, wherein the group of training data comprises three continuous video frames which are sequentially a first training video frame i1Second training video frame i2And a third training video frame i3(ii) a Obtaining a first reference backward flow fg2-1First reference backward light flow fg2-1For the purpose of deriving from a first training video frame i1To the second training video frame i2The reference backward optical flow of (a); obtaining a second reference backward flow fg2-3Second reference backward light flow fg2-3For the purpose of training video frame i from the third3To the second training video frame i2The reference backward optical flow of (a); calculating a first training inverse light flow f3-1First training inverse light flow f3-1For the purpose of deriving from a first training video frame i1To the third training video frame i3The backward light flow of (2); computing a second training inverse light flow f1-3Second training inverse light flow f1-3For the purpose of training video frame i from the third3To the first training video frame i1The backward light flow of (2); reversing the first training light flow f3-1Transformation into a first initial training light flow fcv based on a preset ratio2-1(ii) a Reversing the second training light flow f1-3Transformation into a second initial training light stream fcv based on a preset ratio2-3(ii) a By a first initial training light flow fcv2-1For the first training video frame i1Mapping to obtain a first training map wf1-2(ii) a By a second initial training light flow fcv2-3For the third training video frame i3Mapping to obtain a second training mapping wf3-2(ii) a A correction module 42 for: the first training video frame i1A third training video frame i3First initial training light flow fcv2-1Second initial training light flow fcv2-3A first training map wf1-2And a second training map wf3-2Inputting the third training reverse optical flow fcvu into the optical flow correction neural network to obtain the output of the optical flow correction neural network2-1And a fourth training inverse optical flow fcvu2-3Third training inverse light flow fcvu2-1For modified slave first training video frames i1To the second training video frame i2Of the fourth training inverse optical flow fcvu2-3For modified secondary training video frames i3To the second training video frame i2The backward light flow of (2); an interpolation module 43, configured to: inverse optical flow fcvu through third training2-1For the first training video frame i1Mapping to obtain a third training mapping map wm1-2(ii) a Inverse optical flow fcvu through fourth training2-3For the third training video frame i3Mapping to obtain a fourth training mapping wm3-2(ii) a The first training video frame i1A third training video frame i3Third training inverse light flow fcvu2-1Fourth training inverse light flow fcvu2-3The third training map wm1-2And a fourth training map wm3-2Inputting the fusion neural network to obtain a fusion parameter graph m output by the fusion neural network; the frame interpolation module 43 is further configured to interpolate a third training map wm based on the fused parameter map m1-2And a fourth training map wm3-2Performing fusion calculation to obtain a target interpolation frame in2(ii) a An adjustment module 44 for inserting frames in based on the targets2And a second training video frame i2Difference between, third training inverse optical flow fcvu2-1Flow fg in the opposite direction of the first reference light2-1Difference between, fourth training inverse optical flow fcvu2-3Flow fg in the opposite direction of the second reference light2-3The difference between the two adjusts the optical flow to correct the network parameters of the neural network and the fusion neural network. The neural network training device may apply the neural network training method for video interpolation in any of the above embodiments, and the specific process and principle are the same as those in the above embodiments, and are not described herein again.
In a possible embodiment, a first training inverse optical flow f is calculated3-1The method comprises the following steps: computing a first trained inverse light flow f based on a computer vision algorithm3-1(ii) a Computing a backward optical flow f from a second training1-3The method comprises the following steps: computing a second trained inverse light flow f based on a computer vision algorithm1-3
In a possible implementation manner, the process of performing fusion calculation on the third training map and the fourth training map based on the fusion parameter map to obtain the target frame insertion includes: multiplying the third training mapping map and the fusion parameter map to obtain a first fusion map, wherein a plurality of pixel values of the fusion parameter map correspond to a plurality of pixel values of the third training mapping map one by one, each pixel value range of the fusion parameter map is 0-1, the plurality of pixel values of the first fusion map correspond to the plurality of pixel values of the third training mapping map one by one, and a plurality of product values obtained by multiplying the plurality of pixel values of the third training mapping map and the plurality of pixel values of the fusion parameter map one by one are respectively the plurality of pixel values of the first fusion map; subtracting the 1 from the fusion parameter map to obtain a difference fusion parameter map, wherein a plurality of pixel values of the difference fusion parameter map correspond to a plurality of pixel values of the fusion parameter map one by one, and a plurality of differences obtained by subtracting the 1 from the plurality of pixel values of the fusion parameter map are respectively a plurality of pixel values of the difference fusion parameter map; multiplying the fourth training mapping chart and the difference value fusion parameter chart to obtain a second fusion chart, wherein a plurality of product values obtained by multiplying a plurality of pixel values of the fourth training mapping chart and a plurality of pixel values of the difference value fusion parameter chart in a one-to-one correspondence mode are a plurality of pixel values of the second fusion chart respectively; and adding the first fusion image and the second fusion image to obtain a target interpolation frame, wherein a plurality of values obtained by adding the plurality of pixel values of the first fusion image and the plurality of pixel values of the second fusion image in a one-to-one correspondence manner are respectively a plurality of pixel values of the target interpolation frame.
In one possible implementation, transforming the first training inverse optical flow to the first initial training optical flow based on the preset ratio comprises: multiplying the first training reverse optical flow by a preset proportional value to obtain a first initial training optical flow, wherein the range of the preset proportional value is 0.4-0.6; transforming the second initial training optical flow to a second initial training optical flow based on the preset scale comprises: and multiplying the second initial training optical flow by a preset proportional value to obtain a second initial training optical flow.
In one possible embodiment, the preset ratio value is 0.5.
It should be understood that the above division of the video frame interpolation apparatus or the neural network training apparatus is only a division of logical functions, and the actual implementation may be wholly or partially integrated into one physical entity, or may be physically separated. And these modules can be realized in the form of software called by processing element; or may be implemented entirely in hardware; and part of the modules can be realized in the form of calling by the processing element in software, and part of the modules can be realized in the form of hardware. For example, any one of the obtaining module, the modifying module and the frame interpolation module may be a processing element that is set up separately, or may be integrated in the video frame interpolation apparatus, for example, be integrated in a certain chip of the video frame interpolation apparatus, or may be stored in a memory of the video frame interpolation apparatus in the form of a program, and a certain processing element of the video frame interpolation apparatus calls and executes the functions of the above modules. Other modules are implemented similarly. In addition, all or part of the modules can be integrated together or can be independently realized. The processing element described herein may be an integrated circuit having signal processing capabilities. In implementation, each step of the above method or each module above may be implemented by an integrated logic circuit of hardware in a processor element or an instruction in the form of software. In addition, the video frame interpolation device and the neural network training device can be the same device or different devices.
For example, a video framing device or a neural network training device may be one or more integrated circuits configured to implement the above methods, such as: one or more Application Specific Integrated Circuits (ASICs), or one or more microprocessors (DSPs), or one or more Field Programmable Gate Arrays (FPGAs), among others. As another example, when one of the above modules is implemented in the form of a Processing element scheduler, the Processing element may be a general purpose processor, such as a Central Processing Unit (CPU) or other processor capable of invoking programs. As another example, these modules may be integrated together, implemented in the form of a system-on-a-chip (SOC).
As shown in fig. 8, an embodiment of the present application further provides an electronic device, including: a processor 51 and a memory 52, the memory 52 being configured to store at least one instruction which is loaded and executed by the processor 51 to implement the method in any of the embodiments described above, including the video frame interpolation method or the neural network training method for video frame interpolation. The specific process and principle of the video frame interpolation method or the neural network training method for video frame interpolation are the same as those of the above embodiments, and are not described herein again.
The number of the processors 51 may be one or more, and the processors 51 and the memory 52 may be connected by a bus 53 or other means. The memory 52 is a non-transitory computer readable storage medium, and can be used for storing non-transitory software programs, non-transitory computer executable programs, and modules, such as program instructions/modules corresponding to the data processing device in the embodiments of the present application. The processor executes various functional applications and data processing by executing non-transitory software programs, instructions and modules stored in the memory, i.e., implements the methods in any of the above-described method embodiments. The memory may include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required for at least one function; and necessary data, etc. Further, the memory may include high speed random access memory, and may also include non-transitory memory, such as at least one disk storage device, flash memory device, or other non-transitory solid state storage device. The electronic device may be, for example, a server, a computer, a mobile phone, or other electronic products.
Embodiments of the present application further provide a computer-readable storage medium, in which a computer program is stored, and when the computer program runs on a computer, the computer is caused to execute the method in any of the above embodiments.
In the above embodiments, the implementation may be wholly or partially realized by software, hardware, firmware, or any combination thereof. When implemented in software, may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. When the computer program instructions are loaded and executed on a computer, the procedures or functions described in accordance with the present application are generated, in whole or in part. The computer may be a general purpose computer, a special purpose computer, a network of computers, or other programmable device. The computer instructions may be stored in a computer readable storage medium or transmitted from one computer readable storage medium to another, for example, the computer instructions may be transmitted from one website, computer, server, or data center to another website, computer, server, or data center by wire (e.g., coaxial cable, fiber optic, digital subscriber line) or wirelessly (e.g., infrared, wireless, microwave, etc.). The computer-readable storage medium can be any available medium that can be accessed by a computer or a data storage device, such as a server, a data center, etc., that incorporates one or more of the available media. The usable medium may be a magnetic medium (e.g., floppy Disk, hard Disk, magnetic tape), an optical medium (e.g., DVD), or a semiconductor medium (e.g., Solid State Disk), among others.
In the embodiments of the present application, "at least one" means one or more, "a plurality" means two or more. "and/or" describes the association relationship of the associated objects, and means that there may be three relationships, for example, a and/or B, and may mean that a exists alone, a and B exist simultaneously, and B exists alone. Wherein A and B can be singular or plural. The character "/" generally indicates that the former and latter associated objects are in an "or" relationship. "at least one of the following" and similar expressions refer to any combination of these items, including any combination of singular or plural items. For example, at least one of a, b, and c may represent: a, b, c, a-b, a-c, b-c, or a-b-c, wherein a, b, c may be single or multiple.
The above description is only a preferred embodiment of the present application and is not intended to limit the present application, and various modifications and changes may be made by those skilled in the art. Any modification, equivalent replacement, improvement and the like made within the spirit and principle of the present application shall be included in the protection scope of the present application.

Claims (17)

1. A method for video frame insertion, comprising:
acquiring two adjacent video frames in a video, wherein the two video frames comprise a previous video frame and a next video frame;
calculating an optical flow between the two video frames;
transforming the optical flow between the two video frames into an initial optical flow based on a preset proportion;
mapping the two video frames through the initial optical flow to obtain an initial mapping chart;
correcting the optical flow between the two video frames based on the initial mapping map to obtain a corrected optical flow;
and obtaining a target interpolation frame between the two video frames according to the corrected optical flow.
2. The method of claim 1,
the calculating optical flow between the two video frames comprises: computing an optical flow between the two video frames based on a computer vision algorithm;
the modifying optical flow between the two video frames based on the initial map comprises: correcting the optical flow between the two video frames based on a neural network using the initial map as an input.
3. The method of claim 1, wherein the modifying the optical flow between the two video frames based on the initial map comprises:
and inputting the two video frames, the initial optical flow and the initial mapping map into an optical flow correction neural network, and correcting the initial optical flow through the optical flow correction neural network to obtain the corrected optical flow output by the optical flow correction neural network.
4. The method of claim 3,
the transforming the optical flow between the two video frames to an initial optical flow based on a preset scale comprises:
transforming a first inverse optical flow, belonging to an optical flow between the two video frames, into a first initial optical flow on the basis of a preset ratio, the first inverse optical flow being an inverse optical flow from the previous video frame to the subsequent video frame;
transforming a second inverse optical flow belonging to an optical flow between the two video frames into a second initial optical flow based on a preset ratio, the second inverse optical flow being an inverse optical flow from the subsequent video frame to the previous video frame;
the mapping the two video frames through the initial optical flow to obtain an initial mapping comprises:
mapping the previous video frame by the first initial optical flow to obtain a first mapping map, wherein the first mapping map belongs to the initial mapping map;
mapping the next video frame through the second initial optical flow to obtain a second mapping map, wherein the second mapping map belongs to the initial mapping map;
the process of inputting the two video frames, the initial optical flow and the initial mapping map into an optical flow modified neural network, and modifying the initial optical flow through the optical flow modified neural network to obtain the modified optical flow output by the optical flow modified neural network includes:
inputting the previous video frame, the next video frame, the first initial optical flow, the second initial optical flow, the first mapping map and the second mapping map into an optical flow modification neural network, and obtaining a third inverse optical flow and a fourth inverse optical flow output by the optical flow modification neural network, wherein the third inverse optical flow and the fourth inverse optical flow belong to the modified optical flow, the third inverse optical flow is a modified inverse optical flow from the previous video frame to the target frame insertion, and the fourth inverse optical flow is a modified inverse optical flow from the next video frame to the target frame insertion.
5. The method of claim 1,
the obtaining of the target interpolated frame between the two video frames according to the modified optical flow comprises:
mapping the two video frames through the corrected optical flow to obtain a corrected mapping chart;
inputting the two video frames, the corrected optical flow and the corrected mapping map into a fusion neural network to obtain a fusion parameter map output by the fusion neural network;
and performing fusion calculation on the corrected mapping map based on the fusion parameter map to obtain the target interpolation frame.
6. The method of claim 4,
the obtaining of the target frame interpolation between the two video frames according to the modified optical flow comprises:
mapping the previous video frame through the third reverse optical flow to obtain a third mapping map;
mapping the next video frame through the fourth backward optical flow to obtain a fourth mapping chart;
inputting the previous video frame, the next video frame, the third reverse optical flow, the fourth reverse optical flow, the third mapping map and the fourth mapping map into a fusion neural network to obtain a fusion parameter map output by the fusion neural network;
and performing fusion calculation on the third mapping chart and the fourth mapping chart based on the fusion parameter chart to obtain the target interpolation frame.
7. The method of claim 6,
the process of performing fusion calculation on the third map and the fourth map based on the fusion parameter map to obtain the target frame insertion includes:
multiplying the third mapping map and the fusion parameter map to obtain a first fusion map, wherein a plurality of pixel values of the fusion parameter map are in one-to-one correspondence with a plurality of pixel values of the third mapping map, each pixel value range of the fusion parameter map is 0-1, the plurality of pixel values of the first fusion map are in one-to-one correspondence with the plurality of pixel values of the third mapping map, and a plurality of product values obtained by multiplying the plurality of pixel values of the third mapping map and the plurality of pixel values of the fusion parameter map in one-to-one correspondence are respectively the plurality of pixel values of the first fusion map;
subtracting the fusion parameter map by 1 to obtain a difference fusion parameter map, wherein a plurality of pixel values of the difference fusion parameter map correspond to a plurality of pixel values of the fusion parameter map one by one, and a plurality of differences obtained by subtracting the plurality of pixel values of the fusion parameter map by 1 are a plurality of pixel values of the difference fusion parameter map respectively;
multiplying the fourth mapping map and the difference value fusion parameter map to obtain a second fusion map, wherein a plurality of product values obtained by one-to-one corresponding multiplication of a plurality of pixel values of the fourth mapping map and a plurality of pixel values of the difference value fusion parameter map are a plurality of pixel values of the second fusion map respectively;
adding the first fusion graph and the second fusion graph to obtain the target interpolation frame, wherein a plurality of pixel values obtained by adding the plurality of pixel values of the first fusion graph and the plurality of pixel values of the second fusion graph in a one-to-one correspondence manner are a plurality of pixel values of the target interpolation frame respectively.
8. The method according to claim 4 or 6 or 7,
the transforming the first inverse optical flow into a first initial optical flow based on a preset proportion comprises:
multiplying the first reverse optical flow by a preset proportional value to obtain the first initial optical flow, wherein the range of the preset proportional value is 0.4-0.6;
the transforming the second inverse optical flow into a second initial optical flow based on a preset proportion comprises:
and multiplying the second reverse optical flow by the preset proportional value to obtain the second initial optical flow.
9. The method of claim 8,
the preset proportional value is 0.5.
10. A neural network training method for video frame interpolation, comprising:
acquiring a group of training data, wherein the group of training data comprises three continuous video frames which are a first training video frame, a second training video frame and a third training video frame in sequence;
obtaining a first reference inverse optical flow, the first reference inverse optical flow being an inverse optical flow from the first training video frame to the second training video frame;
obtaining a second reference inverse optical flow, the second reference inverse optical flow being an inverse optical flow from the third training video frame to the second training video frame;
calculating a first training inverse optical flow, the first training inverse optical flow being an inverse optical flow from the first training video frame to the third training video frame;
calculating a second training inverse optical flow, the second training inverse optical flow being an inverse optical flow from the third training video frame to the first training video frame;
transforming the first training reversed optical flow into a first initial training optical flow based on a preset proportion;
transforming the second training reversed optical flow into a second initial training optical flow based on the preset proportion;
mapping the first training video frame through the first initial training optical flow to obtain a first training mapping chart;
mapping the third training video frame through the second initial training optical flow to obtain a second training mapping chart;
inputting the first training video frame, the third training video frame, the first initial training optical flow, the second initial training optical flow, the first training map and the second training map into an optical flow modification neural network to obtain a third training inverse optical flow and a fourth training inverse optical flow output by the optical flow modification neural network, wherein the third training inverse optical flow is a modified inverse optical flow from the first training video frame to the second training video frame, and the fourth training inverse optical flow is a modified inverse optical flow from the third training video frame to the second training video frame;
mapping the first training video frame through the third training reversed optical flow to obtain a third training mapping chart;
mapping the third training video frame through the fourth training reversed optical flow to obtain a fourth training mapping chart;
inputting the first training video frame, the third training backward light stream, the fourth training backward light stream, the third training map and the fourth training map into a fusion neural network to obtain a fusion parameter map output by the fusion neural network;
based on the fusion parameter map, performing fusion calculation on the third training mapping map and the fourth training mapping map to obtain the target interpolation frame;
adjusting network parameters of the optical-flow-modified neural network and the fused neural network based on a difference between the target interpolated frame and the second training video frame, a difference between the third training inverse optical flow and the first reference inverse optical flow, and a difference between the fourth training inverse optical flow and the second reference inverse optical flow.
11. The method of claim 10,
said calculating a first trained inverse optical flow comprises: computing the first trained inverse optical flow based on a computer vision algorithm;
said calculating a second trained inverse optical flow comprises: calculating the second trained inverse optical flow based on a computer vision algorithm.
12. The method of claim 10,
the process of performing fusion calculation on the third training map and the fourth training map based on the fusion parameter map to obtain the target frame insertion includes:
multiplying the third training map and the fusion parameter map to obtain a first fusion map, wherein a plurality of pixel values of the fusion parameter map correspond to a plurality of pixel values of the third training map one by one, each pixel value range of the fusion parameter map is 0-1, the plurality of pixel values of the first fusion map correspond to the plurality of pixel values of the third training map one by one, and a plurality of product values obtained by multiplying the plurality of pixel values of the third training map and the plurality of pixel values of the fusion parameter map one by one are respectively the plurality of pixel values of the first fusion map;
subtracting the fusion parameter map by 1 to obtain a difference fusion parameter map, wherein a plurality of pixel values of the difference fusion parameter map correspond to a plurality of pixel values of the fusion parameter map one by one, and a plurality of differences obtained by subtracting the fusion parameter map by 1 are respectively a plurality of pixel values of the difference fusion parameter map;
multiplying the fourth training map and the difference value fusion parameter map to obtain a second fusion map, wherein a plurality of product values obtained by one-to-one corresponding multiplication of a plurality of pixel values of the fourth training map and a plurality of pixel values of the difference value fusion parameter map are a plurality of pixel values of the second fusion map respectively;
and adding the first fusion graph and the second fusion graph to obtain the target interpolation frame, wherein a plurality of values obtained by adding a plurality of pixel values of the first fusion graph and a plurality of pixel values of the second fusion graph in a one-to-one correspondence manner are respectively a plurality of pixel values of the target interpolation frame.
13. The method of claim 10,
the transforming the first training inverse optical flow to a first initial training optical flow based on a preset scale comprises:
multiplying the first training reverse optical flow by a preset proportional value to obtain the first initial training optical flow, wherein the range of the preset proportional value is 0.4-0.6;
the transforming the second initial training optical flow into a second initial training optical flow based on a preset ratio comprises:
and multiplying the second initial training optical flow by the preset proportional value to obtain the second initial training optical flow.
14. The method of claim 10,
the preset proportional value is 0.5.
15. A video frame interpolation apparatus, comprising:
the acquisition module is used for acquiring two adjacent video frames in the video;
the acquisition module is further used for calculating optical flow between the two video frames;
the acquisition module is further used for transforming the optical flow between the two video frames into an initial optical flow based on a preset proportion;
the acquisition module is further configured to map the two video frames through the initial optical flow to obtain an initial mapping map;
the correction module is used for correcting the optical flow between the two video frames based on the initial mapping map to obtain a corrected optical flow;
and the frame interpolation module is used for obtaining a target frame interpolation between the two video frames according to the corrected optical flow.
16. An electronic device, comprising:
a processor and a memory for storing at least one instruction which is loaded and executed by the processor to implement the method of any one of claims 1 to 14.
17. A computer-readable storage medium, in which a computer program is stored which, when run on a computer, causes the computer to carry out the method of any one of claims 1 to 14.
CN202210171767.5A 2022-02-24 2022-02-24 Video frame inserting method, training device and electronic equipment Active CN114640885B (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN202210171767.5A CN114640885B (en) 2022-02-24 2022-02-24 Video frame inserting method, training device and electronic equipment
PCT/CN2023/075807 WO2023160426A1 (en) 2022-02-24 2023-02-14 Video frame interpolation method and apparatus, training method and apparatus, and electronic device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210171767.5A CN114640885B (en) 2022-02-24 2022-02-24 Video frame inserting method, training device and electronic equipment

Publications (2)

Publication Number Publication Date
CN114640885A true CN114640885A (en) 2022-06-17
CN114640885B CN114640885B (en) 2023-12-22

Family

ID=81948635

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210171767.5A Active CN114640885B (en) 2022-02-24 2022-02-24 Video frame inserting method, training device and electronic equipment

Country Status (2)

Country Link
CN (1) CN114640885B (en)
WO (1) WO2023160426A1 (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2023160426A1 (en) * 2022-02-24 2023-08-31 影石创新科技股份有限公司 Video frame interpolation method and apparatus, training method and apparatus, and electronic device
CN117115210A (en) * 2023-10-23 2023-11-24 黑龙江省农业科学院农业遥感与信息研究所 Intelligent agricultural monitoring and adjusting method based on Internet of things

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104978728A (en) * 2014-04-08 2015-10-14 南京理工大学 Image matching system of optical flow method
WO2016187776A1 (en) * 2015-05-25 2016-12-01 北京大学深圳研究生院 Video frame interpolation method and system based on optical flow method
CN112104830A (en) * 2020-08-13 2020-12-18 北京迈格威科技有限公司 Video frame insertion method, model training method and corresponding device
CN113365110A (en) * 2021-07-14 2021-09-07 北京百度网讯科技有限公司 Model training method, video frame interpolation method, device, equipment and storage medium
CN114007135A (en) * 2021-10-29 2022-02-01 广州华多网络科技有限公司 Video frame insertion method and device, equipment, medium and product thereof

Family Cites Families (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10776688B2 (en) * 2017-11-06 2020-09-15 Nvidia Corporation Multi-frame video interpolation using optical flow
CN109949221B (en) * 2019-01-30 2022-05-17 深圳大学 Image processing method and electronic equipment
CN110191299B (en) * 2019-04-15 2020-08-04 浙江大学 Multi-frame interpolation method based on convolutional neural network
CN113727141B (en) * 2020-05-20 2023-05-12 富士通株式会社 Interpolation device and method for video frames
CN113949926B (en) * 2020-07-17 2024-07-30 武汉Tcl集团工业研究院有限公司 Video frame inserting method, storage medium and terminal equipment
CN112995715B (en) * 2021-04-20 2021-09-03 腾讯科技(深圳)有限公司 Video frame insertion processing method and device, electronic equipment and storage medium
CN114066730B (en) * 2021-11-04 2022-10-28 西北工业大学 Video frame interpolation method based on unsupervised dual learning
CN114640885B (en) * 2022-02-24 2023-12-22 影石创新科技股份有限公司 Video frame inserting method, training device and electronic equipment

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104978728A (en) * 2014-04-08 2015-10-14 南京理工大学 Image matching system of optical flow method
WO2016187776A1 (en) * 2015-05-25 2016-12-01 北京大学深圳研究生院 Video frame interpolation method and system based on optical flow method
US20180176574A1 (en) * 2015-05-25 2018-06-21 Peking University Shenzhen Graduate School Method and system for video frame interpolation based on optical flow method
CN112104830A (en) * 2020-08-13 2020-12-18 北京迈格威科技有限公司 Video frame insertion method, model training method and corresponding device
CN113365110A (en) * 2021-07-14 2021-09-07 北京百度网讯科技有限公司 Model training method, video frame interpolation method, device, equipment and storage medium
CN114007135A (en) * 2021-10-29 2022-02-01 广州华多网络科技有限公司 Video frame insertion method and device, equipment, medium and product thereof

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2023160426A1 (en) * 2022-02-24 2023-08-31 影石创新科技股份有限公司 Video frame interpolation method and apparatus, training method and apparatus, and electronic device
CN117115210A (en) * 2023-10-23 2023-11-24 黑龙江省农业科学院农业遥感与信息研究所 Intelligent agricultural monitoring and adjusting method based on Internet of things
CN117115210B (en) * 2023-10-23 2024-01-26 黑龙江省农业科学院农业遥感与信息研究所 Intelligent agricultural monitoring and adjusting method based on Internet of things

Also Published As

Publication number Publication date
WO2023160426A1 (en) 2023-08-31
CN114640885B (en) 2023-12-22

Similar Documents

Publication Publication Date Title
CN108304755B (en) Training method and device of neural network model for image processing
Zeng et al. Learning image-adaptive 3d lookup tables for high performance photo enhancement in real-time
WO2021088473A1 (en) Image super-resolution reconstruction method, image super-resolution reconstruction apparatus, and computer-readable storage medium
WO2023160426A1 (en) Video frame interpolation method and apparatus, training method and apparatus, and electronic device
CN111372087B (en) Panoramic video frame insertion method and device and corresponding storage medium
CN106780336B (en) Image reduction method and device
CN108271022A (en) A kind of method and device of estimation
CN113935934A (en) Image processing method, image processing device, electronic equipment and computer readable storage medium
JP2018106316A (en) Image correction processing method and image correction processing apparatus
US20050088531A1 (en) Automatic stabilization control apparatus, automatic stabilization control method, and computer readable recording medium having automatic stabilization control program recorded thereon
WO2020215263A1 (en) Image processing method and device
CN110830848B (en) Image interpolation method, image interpolation device, computer equipment and storage medium
CN115564655A (en) Video super-resolution reconstruction method, system and medium based on deep learning
US20230196721A1 (en) Low-light video processing method, device and storage medium
CN111093045B (en) Method and device for scaling video sequence resolution
JP2015197818A (en) Image processing apparatus and method of the same
CN103618904A (en) Motion estimation method and device based on pixels
CN115809959A (en) Image processing method and device
CN113469880A (en) Image splicing method and device, storage medium and electronic equipment
CN112802079A (en) Disparity map acquisition method, device, terminal and storage medium
CN114257759B (en) System for image completion
CN113658321B (en) Three-dimensional reconstruction method, system and related equipment
JP2018084997A (en) Image processing device, and image processing method
RU2576490C1 (en) Background hybrid retouch method for 2d to 3d conversion
CN113556581A (en) Method and device for generating video interpolation frame and electronic equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant