CN114640885A

CN114640885A - Video frame insertion method, training method, device and electronic equipment

Info

Publication number: CN114640885A
Application number: CN202210171767.5A
Authority: CN
Inventors: 吕朋伟
Original assignee: Insta360 Innovation Technology Co Ltd
Current assignee: Insta360 Innovation Technology Co Ltd
Priority date: 2022-02-24
Filing date: 2022-02-24
Publication date: 2022-06-17
Anticipated expiration: 2042-02-24
Also published as: WO2023160426A1; CN114640885B

Abstract

The embodiment of the application provides a video frame interpolation method, a video frame interpolation training device and electronic equipment, relates to the technical field of image processing, and can improve the frame interpolation result precision. The video frame interpolation method comprises the following steps: acquiring two adjacent video frames in a video, wherein the two video frames comprise a previous video frame and a next video frame; calculating an optical flow between two video frames; transforming the optical flow between two video frames into an initial optical flow based on a preset proportion; mapping the two video frames through the initial optical flow to obtain an initial mapping chart; correcting the optical flow between the two video frames based on the initial mapping graph to obtain a corrected optical flow; and obtaining a target interpolation frame between the two video frames according to the corrected optical flow.

Description

Video frame insertion method, training method and device and electronic equipment

Technical Field

The present application relates to the field of image processing technologies, and in particular, to a video frame interpolation method, a training method, an apparatus, and an electronic device.

Background

Video interpolation (video frame interpolation) refers to the generation of video intermediate frames by using an algorithm for increasing the video frame rate or generating a slow motion special effect video. However, the conventional video frame interpolation method has low accuracy of the frame interpolation result.

Disclosure of Invention

A video frame interpolation method, a video frame interpolation training device and electronic equipment can improve frame interpolation result accuracy.

In a first aspect, a video frame insertion method is provided, including: acquiring two adjacent video frames in a video, wherein the two video frames comprise a previous video frame and a next video frame; calculating an optical flow between two video frames; transforming the optical flow between two video frames into an initial optical flow based on a preset proportion; mapping the two video frames through the initial optical flow to obtain an initial mapping chart; correcting the optical flow between the two video frames based on the initial mapping graph to obtain a corrected optical flow; and obtaining a target interpolation frame between the two video frames according to the corrected optical flow.

In a second aspect, a neural network training method for video frame interpolation is provided, including: acquiring a group of training data, wherein the group of training data comprises three continuous video frames which are a first training video frame, a second training video frame and a third training video frame in sequence; acquiring a first reference reverse optical flow, wherein the first reference reverse optical flow is a reverse optical flow from a first training video frame to a second training video frame; acquiring a second reference reverse optical flow, wherein the second reference reverse optical flow is a reverse optical flow from the third training video frame to the second training video frame; calculating a first training inverse optical flow, the first training inverse optical flow being an inverse optical flow from the first training video frame to the third training video frame; calculating a second training inverse optical flow, the second training inverse optical flow being an inverse optical flow from the third training video frame to the first training video frame; transforming the first training reversed optical flow into a first initial training optical flow based on a preset proportion; transforming the second training reversed optical flow into a second initial training optical flow based on a preset proportion; mapping the first training video frame through the first initial training optical flow to obtain a first training mapping chart; mapping the third training video frame through the second initial training optical flow to obtain a second training mapping chart; inputting a first training video frame, a third training video frame, a first initial training optical flow, a second initial training optical flow, a first training mapping chart and a second training mapping chart into an optical flow correction neural network to obtain a third training reverse optical flow and a fourth training reverse optical flow output by the optical flow correction neural network, wherein the third training reverse optical flow is a corrected reverse optical flow from the first training video frame to the second training video frame, and the fourth training reverse optical flow is a corrected reverse optical flow from the third training video frame to the second training video frame; mapping the first training video frame through a third training reverse optical flow to obtain a third training mapping chart; mapping the third training video frame through a fourth training reverse optical flow to obtain a fourth training mapping chart; inputting a first training video frame, a third training reverse optical flow, a fourth training reverse optical flow, a third training mapping chart and a fourth training mapping chart into a fusion neural network to obtain a fusion parameter chart output by the fusion neural network; performing fusion calculation on the third training mapping chart and the fourth training mapping chart based on the fusion parameter chart to obtain a target interpolation frame; adjusting network parameters of the optical-flow-modified neural network and the fused neural network based on a difference between the target interpolated frame and the second training video frame, a difference between the third training inverse optical flow and the first reference inverse optical flow, and a difference between the fourth training inverse optical flow and the second reference inverse optical flow.

In a third aspect, a video frame interpolation apparatus is provided, including: the acquisition module is used for acquiring two adjacent video frames in the video; the acquisition module is also used for calculating the optical flow between the two video frames; the acquisition module is also used for converting the optical flow between the two video frames into an initial optical flow based on a preset proportion; the acquisition module is also used for mapping the two video frames through the initial optical flow to obtain an initial mapping chart; the correction module is used for correcting the optical flow between the two video frames based on the initial mapping map to obtain a corrected optical flow; and the frame interpolation module is used for obtaining a target frame interpolation between the two video frames according to the corrected optical flow.

In a fourth aspect, an electronic device is provided, comprising: a processor and a memory for storing at least one instruction which is loaded and executed by the processor to implement the method described above.

In a fifth aspect, a computer-readable storage medium is provided, in which a computer program is stored which, when run on a computer, causes the computer to perform the above-mentioned method.

The video frame interpolation method, the training method, the device and the electronic equipment of the embodiment of the application calculate the optical flow between two adjacent video frames in a video, correct the optical flow, and obtain the interpolated frame based on the corrected optical flow, wherein the optical flow refers to the instantaneous speed of the pixel motion of a space moving object on an observation imaging plane, and the optical flow contains the information of target motion and expresses the change of an image, so that the interpolated frame between the two video frames can be obtained by using the optical flow between the two adjacent video frames in the video; in addition, the optical flow is converted according to the proportion, so that an initial optical flow corresponding to the position between the two video frames can be obtained, the video frames are mapped according to the converted initial optical flow, an initial mapping map corresponding to the position between the two video frames is obtained, the optical flow is corrected based on the initial mapping map, the optical flow can reflect the change between the two video frames more accurately, and the precision of the frame interpolation result is improved.

Drawings

Fig. 1 is a schematic flowchart illustrating a video frame interpolation method according to an embodiment of the present application;

FIG. 2 is a schematic flow chart illustrating another video frame interpolation method according to an embodiment of the present application;

FIG. 3 is a flowchart illustrating another video frame interpolation method according to an embodiment of the present application;

FIG. 4 is a schematic diagram of a model structure of an optical flow modified neural network according to an embodiment of the present application;

FIG. 5 is a schematic diagram of a model structure of a converged neural network according to an embodiment of the present application;

FIG. 6 is a block diagram illustrating an exemplary video frame interpolation apparatus according to an embodiment of the present disclosure;

FIG. 7 is a block diagram of a neural network training device according to an embodiment of the present disclosure;

fig. 8 is a block diagram of an electronic device in an embodiment of the present application.

Detailed Description

The terminology used in the description of the embodiments section of the present application is for the purpose of describing particular embodiments of the present application only and is not intended to be limiting of the present application.

It should be noted that the flow charts shown in the drawings are only exemplary and do not necessarily include all the contents and operations/steps, nor do they necessarily have to be executed in the order described. For example, some operations/steps may be decomposed, and some operations/steps may be combined or partially combined, so that the actual execution sequence may be changed according to the actual situation.

As shown in fig. 1, an embodiment of the present application provides a video frame interpolation method, including:

step 101, two adjacent video frames in a video are obtained;

the video is a video of a frame to be inserted, the two video frames can be any two adjacent video frames, and the two video frames comprise a previous video frame I₁And the following video frame I₃。

102, calculating optical flow between two video frames;

103, converting an optical flow between two video frames into an initial optical flow based on a preset proportion, wherein the optical flow between the two video frames is obtained by calculation based on the two video frames, and the optical flow can be converted into an optical flow at a preset position between the two video frames, namely the initial optical flow according to the preset proportion;

step 104, mapping the two video frames through the initial optical flow to obtain an initial mapping chart;

and 105, correcting the optical flow between the two video frames based on the initial mapping map to obtain a corrected optical flow.

And step 106, obtaining a target interpolation frame between two video frames according to the corrected optical flow.

After the target frame interpolation between the two video frames is obtained in step 106, the target frame interpolation between the other two video frames may be continuously obtained according to the process from step 101 to step 106, for example, after the target frame interpolation between the first frame and the second frame in the video is obtained, the method may be circulated to continuously obtain the target frame interpolation between the two adjacent frames after a preset frame interval, and so on, so as to implement the frame interpolation of the whole video.

The video frame interpolation method of the embodiment of the application calculates the optical flow between two adjacent video frames in a video, corrects the optical flow, and obtains an interpolated frame based on the corrected optical flow, wherein the optical flow (optical flow) refers to the instantaneous speed of the pixel motion of a space moving object on an observation imaging plane, and the optical flow contains the information of target motion and expresses the change of an image, so that the interpolated frame between the two video frames can be obtained by using the optical flow between the two adjacent video frames in the video; in addition, the optical flow is converted according to the proportion, so that an initial optical flow corresponding to the position between the two video frames can be obtained, the video frames are mapped according to the converted initial optical flow, an initial mapping map corresponding to the position between the two video frames is obtained, the optical flow is corrected based on the initial mapping map, the optical flow can reflect the change between the two video frames more accurately, and the precision of the frame interpolation result is improved.

In one possible implementation, the step 102 of calculating optical flow between two video frames includes: calculating optical flow between two video frames based on a computer vision algorithm, wherein the computer vision algorithm refers to a traditional image processing method and is not a method based on neural network prediction; step 105, modifying the optical flow between two video frames based on the initial map comprises: based on the neural network, the optical flow between two video frames is modified using the initial map as an input. In step 105, the optical flow calculated in step 102 is corrected based on a neural network trained in advance. In this step, since a substantially accurate optical flow has been calculated by the computer vision algorithm, the neural network only needs to correct the optical flow, and thus the calculation amount of the neural network is small.

The traditional video frame interpolation method is to calculate an optical flow through a computer vision algorithm, and then use the optical flow obtained by calculation to carry out optical flow mapping to obtain a target frame interpolation. However, since the accuracy of the interpolation result is low when performing interpolation based on the optical flow obtained in this way, it is possible to use a method of predicting the optical flow using a neural network and obtaining a target interpolation in order to improve the accuracy, but this method is large in calculation amount.

According to the video frame interpolation method, the optical flow is calculated based on the computer vision algorithm, then the optical flow is corrected based on the neural network, the interpolated frame is obtained based on the corrected optical flow, and the optical flow obtained based on the method is corrected by means of prediction of the neural network, so that the accuracy of the interpolated frame result is high, for example, the edge artifact of the object outline can be reduced, and the user experience under the slow motion video is improved; in addition, the neural network only needs to correct the obtained optical flow, so that the calculation amount of the neural network is reduced. Namely, the calculation amount is reduced on the premise of improving the accuracy of the frame interpolation result.

In one possible embodiment, as shown in fig. 2, step 105, modifying the optical flow between two video frames based on the initial map, and obtaining the modified optical flow includes: and inputting the two video frames, the initial optical flow and the initial mapping map into an optical flow correction neural network, and correcting the initial optical flow through the optical flow correction neural network to obtain a corrected optical flow output by the optical flow correction neural network.

In one possible embodiment, the step 106 of deriving a target interpolated frame between two video frames according to the modified optical flow comprises:

step 1061, mapping the two video frames through the corrected optical flow to obtain a corrected mapping chart;

step 1062, inputting the two video frames, the corrected optical flow and the corrected mapping map into a fusion neural network to obtain a fusion parameter map output by the fusion neural network;

and 1063, performing fusion calculation on the corrected mapping map based on the fusion parameter map to obtain the target interpolation frame.

In one possible implementation, as shown in fig. 3, the optical flow between two video frames comprises a first inverse optical flow F_3-1And a second backward light flow F_1-3First backward light flow F_3-1For the video frame I from the previous₁To the next video frame I₃Of a second backward light flow F_1-3For the purpose of following a video frame I₃To the previous video frame I₁I.e. step 101 is to obtain the adjacent previous video frame I in the video₁And a subsequent video frame I₃. Namely, step 102 includes:

step 1021, computing a first backward optical flow F based on a computer vision algorithm_3-1I.e. from the previous video frame I₁To the next video frame I₃The backward light flow of (2);

step 1022 of calculating a second backward optical flow F based on computer vision algorithms_1-3I.e. from the following video frame I₃To the previous video frame I₁The reverse optical flow of (2).

Where the backward optical flow is also called backward optical flow, the optical flow in the embodiment of the present application can be expressed as an optical flow graph, for example, for two images of a and B, the resolution of the light flow graph is completely consistent with that of the image a and the image B, the light flow graph records the 'offset' of each pixel point on one image, the "offset" here has two directions, one is an offset x in the left-right direction, one is an offset y in the up-down direction, the value of this offset can be simply understood as the distance (number of pixels) to be moved, "apply the optical flow to the a-map", or "map a graph a by optical flow" refers to that each pixel point on the graph a performs a shift operation according to the offset value (up-down direction + left-right direction) of the corresponding position on the optical flow graph, and after the optical flow mapping is completed, a new image is obtained, which is called as a map. The optical flow calculated from the a-diagram to the B-diagram is the forward optical flow of the a-diagram and the reverse optical flow of the B-diagram for the B-diagram. Therefore, for the two frames of images a and B, it is necessary to map the image a by a forward optical flow or map the image B by a backward optical flow, and then the forward optical flow refers to the optical flow calculated from the image a to the image B, and the backward optical flow/backward optical flow refers to the optical flow calculated from the image B to the image a.

Step 103, transforming the optical flow between two video frames into an initial optical flow based on a preset ratio comprises:

step 1031 of converting the first backward optical flow F_3-1Transformation into a first initial optical flow FCV based on a preset ratio_2-1First initial flow of light FCV_2-1As a frame I of the previous video₁To target interpolation frame IN₂Due to target interpolation of frame IN₂Is located at I₁And I₃The position between two video frames, so that the optical flow between two video frames can be approximated by a transformation based on a preset scale, for example, setting the preset scale to 0.5, by making F_3-1X 0.5, the optical flow of the intermediate frame at half of two video frames can be approximately obtained;

step 1032, apply the second backward light flow F_1-3Transformation into a second initial optical flow FCV based on a preset ratio_2-3Second initial flow of light FCV_2-3As a slave to a subsequent video frame I₃To target interpolation frame IN₂The backward light flow of (2);

step 104, obtaining an initial mapping map by mapping the two video frames through the initial optical flow comprises:

step 1041, passing the first initial optical flow FCV_2-1For the previous video frame I₁Mapping to obtain a first mapping chart WF_1-2；

Wherein, i.e. in₁Using a first initial optical flow FCV on an image_2-1Making optical flow mapping backward warp, and obtaining a mapping map WarpMask or called optical flow mapping WarpFlow through mapping, namely a first mapping map WF_1-2。

Step 1042, passing the second initial flow FCV_2-3For the next video frame I₃Mapping to obtain a second mapping chart WF_3-2That is, the initial map in the step 1052 includes the first map WF_1-2And a second map WF_3-2；

Step 105, based on the two video frames, the initial optical flow and the initial mapping map, correcting the initial optical flow through an optical flow correction neural network, and obtaining a corrected optical flow, wherein the process comprises the following steps:

will be the previous videoFrame I₁The following video frame I₃First initial stream of light FCV_2-1Second initial flow of light FCV_2-3First map WF_1-2And a second map WF_3-2Inputting the third reverse optical flow FCVU into the optical flow correction neural network to obtain the output of the optical flow correction neural network_2-1And a fourth reverse optical flow FCVU_2-3Third reverse optical flow FCVU_2-1For the modified preceding video frame I₁Target insertion frame IN₂Of a fourth reverse optical flow FCVU_2-3For modified subsequent video frame I₃To target interpolation frame IN₂Of a third reverse optical flow, i.e. FCVU_2-1And a fourth reverse optical flow FCVU_2-3Belonging to the corrected optical flow in step 105.

The neural network model structure of the optical flow correction neural network can be as shown in fig. 4, and the neural network model can include a convolution Conv + activation function Relu downsampling module, a convolution Conv + activation function Relu feature extraction module, and a deconvolution convTranspose + activation function Relu upsampling module. Wherein, the input of the neural network model is the above I₁、I₃、FCV_2-1、FCV_2-3、WF_1-2And WF_3-2(ii) a The down-sampling module is used for reducing the input size, thereby accelerating the speed of prediction and reasoning and simultaneously extracting network characteristics; the feature extraction module is used for extracting and converting features in the network, wherein the extracted features are features after convolution layer operation in the convolution network, and the features can be the representation of the features such as edges, outlines, light and shade and the like in the frame picture in the network; the up-sampling module is used for re-amplifying the reduced features back to the original input size; the output of the neural network model is a third reverse optical flow FCVU_2-1And a fourth reverse optical flow FCVU_2-3I.e. modified from the previous video frame I₁To target interpolation frame IN₂And a modified subsequent video frame I₃To target interpolation frame IN₂The reverse optical flow of (2). That is, the neural network is used to apply a first initial optical flow FCV_2-1Modified to a third reverse optical flow FCVU_2-1The second step isInitial light flow FCV_2-3Modified to a fourth reverse optical flow FCVU_2-3. The related modules in the graph refer to the multiplexing of the modules, for example, in the neural network model, the same feature extraction module is multiplexed, the complexity of the network structure is reduced, and the characterization capability of the network feature extraction is enhanced. The training process of the neural network model will be described later.

In one possible embodiment, as shown in fig. 3, the step 1061 of deriving a target interpolated frame between two video frames according to the modified optical flow includes:

step 10611, passing the third reverse optical flow FCVU_2-1For the previous video frame I₁Mapping to obtain a third mapping chart WM_1-2；

Step 10612, passing the fourth backward optical flow FCVU_2-3For the next video frame I₃Mapping to obtain a fourth mapping chart WM_3-2；

Step 1062, adding the previous video frame I₁The next video frame I₃A third reverse optical flow FCVU_2-1Fourth reverse optical flow FCVU_2-3A third map WM_1-2And a fourth map WM_3-2Inputting the fusion neural network to obtain a fusion parameter graph m output by the fusion neural network;

the neural network model structure of the converged neural network may be as shown in fig. 5, and the neural network model may include a convolution Conv + activation function Relu downsampling module and a deconvolution Conv + activation function Relu upsampling module. Wherein, the input of the neural network model is the above I₁、I₃、FCVU_2-1、FCVU_2-3、WM_1-2And WM_3-2(ii) a The neural network model outputs a fusion parameter map m which is used for participating IN calculation IN the subsequent process to obtain a target interpolation frame IN₂. The training process of the neural network model will be described later.

Step 1063, based on the fusion parameter map m, map WM_1-2And a fourth map WM_3-2Performing fusion calculation to obtain target interpolation frame IN₂。

In a possible embodiment, step 1063 consists in applying the third map WM to the image based on the fused parameter map m_1-2And a fourth map WM_3-2Performing fusion calculation to obtain target interpolation frame IN₂The process comprises the following steps: map the third map WM_1-2And correspondingly multiplying the pixel values in the fusion parameter graph m to obtain a first fusion graph WM_1-2Xm, wherein the resolution of the fusion parameter map m is the same as the resolution of an arbitrary video frame, and a plurality of pixel values of the fusion parameter map m and the third map WM_1-2The fusion parameter map m has a range of each pixel value of 0 to 1, and the first fusion map has a plurality of pixel values corresponding to the third mapping map WM_1-2Is in one-to-one correspondence with a plurality of pixel values, the third map WM_1-2A plurality of product values obtained by one-to-one multiplication of the plurality of pixel values and the plurality of pixel values of the fusion parameter map m are respectively the first fusion map WM_1-2A plurality of pixel values of x m; subtracting the fusion parameter map m from the 1 to obtain a difference fusion parameter map (1-m), wherein a plurality of pixel values of the difference fusion parameter map (1-m) correspond to a plurality of pixel values of the fusion parameter map m one by one, and a plurality of differences obtained after subtracting the pixel values of the fusion parameter map m from the 1 are respectively a plurality of pixel values of the difference fusion parameter map (1-m); map the fourth map WM_3-2Multiplying the difference fusion parameter graph (1-m) to obtain a second fusion graph WM_3-2X (1-m), fourth map WM_3-2A plurality of product values obtained by multiplying the plurality of pixel values of the difference fusion parameter map (1-m) in a one-to-one correspondence manner are a plurality of pixel values of the second fusion map respectively; the first fused graph WM_1-2X m and second fused graph WM_3-2Adding x (1-m) to obtain target interpolation frame IN₂The multiple pixel values of the first fused image and the multiple pixel values of the second fused image are added IN a one-to-one correspondence manner to obtain multiple values which are respectively used as target interpolation frames IN₂A plurality of pixel values of (a) for formulating a target interpolation frame IN₂＝WM_1-2×m+WM_3-2X (1-m), see the third map WM_1-2And a fourth map WM_3-2For performing fusion calculation based on the fusion parameter map m to obtain the target interpolation frame IN₂. Target interpolation frame IN₂Even if the formula is actually toThird map WM_1-2Multiplying the fusion parameter map m point by point to obtain an intermediate result, and multiplying the fourth mapping map WM_3-2And (1) the result of point-by-point subtraction of the fusion parameter graph m) is multiplied point-by-point to obtain another intermediate result, and then the two intermediate results are added point-by-point. For example, the following Table 1 illustrates the target interpolation frame IN₂A third map WM_1-2The fourth map WM_3-2And a comparison table of the fusion parameter map m.

TABLE 1

Let us assume a third map WM_1-2The fourth map WM_3-2And the fusion parameter map m are both 2 × 2 resolution images, and the numerical values in table 1 are pixel values. In three examples, the third map WM_1-2All pixel values of (2), a fourth map WM_3-2The pixel values of (2) are all 4. The difference is that in example 1, the pixel values of the fusion parameter map m are all 0, according to the formula WM_1-2×m+WM_3-2X (1-m) target interpolation frame IN obtained by calculating each pixel value₂Is 4, wherein the target interpolation frame IN₂Each pixel value of (2 × 0+4 × (1-0) ═ 4. In example 2, the pixel values of the fusion parameter map m are all 1 according to the formula WM_1-2×m+WM_3-2X (1-m) target interpolation frame IN obtained by calculating each pixel value₂Is 2, wherein the target interpolation frame IN₂Each pixel value of 2 × 1+4 × (1-1) ═ 2. In example 3, the pixel values of the fusion parameter map m are all 0.5, according to the formula WM_1-2×m+WM_3-2X (1-m) target interpolation frame IN obtained by calculating each pixel value₂Is 3, wherein the target interpolation frame IN₂Each pixel value of (2 × 0.5+4 × (1-0.5) ═ 3.

In one possible embodimentStep 1031, the first backward optical flow F_3-1Transformation into a first initial optical flow FCV based on a preset ratio_2-1The method comprises the following steps: passing the first backward light flow F_3-1Multiplying by a preset proportional value t to obtain a first initial optical flow FCV_2-1I.e. based on the formula FCV_2-1＝t×F_3-1F is to be_3-1Transforming FCV_2-1The range of the preset proportion value is 0.4-0.6; step 1032, apply the second backward light flow F_1-3Transformation into a second initial optical flow FCV based on a preset ratio_2-3The method comprises the following steps: second backward light flow F_1-3Multiplying by a preset proportional value t to obtain a second initial optical flow FCV_2-3I.e. based on the formula FCV_2-3＝t×F_1-3F is to be_1-3Conversion to FCV_2-3. Namely, the optical flow is transformed according to a preset proportion, so that the optical flow of an intermediate frame at the corresponding position of two video frames can be obtained, and the determination of a target interpolation frame IN based on the optical flow IN the subsequent calculation process is facilitated₂. The preset proportion value t may be 0.5, and if the preset proportion value t is 0.5, the optical flow obtained after transformation is an optical flow in a half position between two video frames.

The embodiment of the present application further provides a neural network training method for video frame interpolation, where the neural network training method may be used to train the optical flow modified neural network and the fusion neural network, before performing neural network training, about 100000 groups of data may be extracted from 1000 video segments covering various scenes and motion forms in advance as training data, for example, 100 groups of data are taken from each video segment, 100000 groups of training data may be extracted from 1000 videos in total, each group of training data includes three consecutive video frames, and all data are normalized to a uniform resolution size, for example, to a uniform resolution of 768 × 768 by a clipping or scaling method. The neural network training method comprises the following steps:

step 201, obtaining a set of training data, where the set of training data includes three consecutive video frames, and the three consecutive video frames are sequentially first training video frames i₁Second training video frame i₂And a third training video frame i₃In this step, a set of training data may be randomly selected from the training data;

step 202, obtaining a first reference backward optical flow fg_2-1First reference backward light flow fg_2-1For the purpose of deriving from a first training video frame i₁To the second training video frame i₂The backward light flow of (2);

step 203, obtaining a second reference backward optical flow fg_2-3Second reference backward light flow fg_2-3For the purpose of training video frame i from the third₃To the second training video frame i₂The backward light flow of (2);

in steps 202 and 203, the first reference backward optical flow fg_2-1And a second reference backward optical flow fg_2-3Can be obtained by the most advanced optical flow acquisition method of the third party, fg_2-1And fg_2-3The optical flow is used as a reference optical flow, so that the comparison difference between the subsequent result and the result output by the neural network is facilitated, and further the network parameters are adjusted.

Step 204, calculating a first training inverse optical flow f_3-1First training inverse light flow f_3-1For the purpose of deriving from a first training video frame i₁To a third training video frame i₃The backward light flow of (2);

step 205, calculating a second training inverse optical flow f_1-3Second training inverse light flow f_1-3For the purpose of training video frame i from the third₃To the first training video frame i₁The backward light flow of (2);

step 206, inverse optical flow f of the first training_3-1Transformation into a first initial training light flow fcv based on a preset ratio_2-1First initial training optical flow fcv_2-1As a frame i from the first training video₁To the second training video frame i₂The backward light flow of (2);

for example, step 206, inverse optical flow f of the first training_3-1Transformation into a first initial training light flow fcv based on a preset ratio_2-1The method comprises the following steps: based on the formula fcv_2-1＝t×f_3-1Will f is mixed_3-1Transformation fcv_2-1，t＝0.5；

Step 207, the second training inverse optical flow f_1-3Transformation into a second initial training light stream fcv based on a preset ratio_2-3Second beginningInitial training light flow fcv_2-3As a slave third training video frame i₃To the second training video frame i₂Of the backward light flow fcv_2-3；

For example, step 207, the inverse optical flow f from the second training_1-3Transforming a second initial training optical flow to fcv based on a preset scale_2-3The method comprises the following steps: based on the formula fcv_2-3＝t×f_1-3Will f is_1-3Transformation into fcv_2-3，t＝0.5。

Step 208, by a first initial training optical flow fcv_2-1For the first training video frame i₁Mapping to obtain a first training map wf_1-2；

Step 209, pass the second initial training optical flow fcv_2-3For the third training video frame i₃Mapping to obtain a second training mapping wf_3-2；

Step 210, the first training video frame i₁A third training video frame i₃First initial training light flow fcv_2-1Second initial training light flow fcv_2-3A first training map wf_1-2And a second training map wf_3-2Inputting the third training inverse optical flow fcvu to the optical flow correction neural network to obtain the output of the optical flow correction neural network_2-1And a fourth training inverse optical flow fcvu_2-3Third training inverse light flow fcvu_2-1For modified slave first training video frames i₁To a second training video frame i₂Of the fourth training inverse optical flow fcvu_2-3For modified secondary training video frames i₃To the second training video frame i₂The backward light flow of (2);

step 211, inverse optical flow fcvu through third training_2-1For the first training video frame i₁Mapping to obtain a third training mapping map wm_1-2；

Step 212, inverse optical flow fcvu by fourth training_2-3For the third training video frame i₃Mapping to obtain a fourth training mapping wm_3-2；

Step 213, the first training video frame i₁And the third trainingVideo frame i₃Third training inverse light flow fcvu_2-1Fourth training inverse light flow fcvu_2-3First training map wm_1-2And a second training map wm_3-2Inputting the fusion neural network to obtain a fusion parameter graph m output by the fusion neural network;

step 214, based on the fusion parameter map m, the third training map wm_1-2And a fourth training map wm_3-2Performing fusion calculation to obtain a target interpolation frame in₂；

For example, the resolution of the fusion parameter map m is the same as that of any video frame, each pixel value range of the fusion parameter map m is 0-1, and the target interpolation frame in₂＝wm_1-2×m+wm_3-2×(1-m)。

Step 215, interpolating in based on the target₂And a second training video frame i₂Difference between, third training inverse optical flow fcvu_2-1Flow fg in the opposite direction of the first reference light_2-1Difference between, fourth training inverse optical flow fcvu_2-3Flow fg in the opposite direction of the second reference light_2-3The difference between the optical flow correction neural network and the fusion neural network is adjusted.

Wherein, in the process of neural network training, the second training video frame i₂Is known, and the target interpolated frame in₂Is based on neural network predictions and, therefore, may be based on in₂And i₂The network parameters are adjusted by the difference between the parameters to make the prediction of the neural network more accurate, and similar reasons can be based on fcvu_2-1And fg_2-1Difference between and fcvu_2-3And fg_2-3The difference between to adjust the network parameters. From step 201 to step 515, which are described above, are a training process, and the neural network may perform multiple training rounds based on training data. In step 215, specifically, for example, in is calculated₂And i₂L1 loss between L1 loss, fcvu_2-1And fg_2-1L1 loss, fcvu_2-3And fg_2-3L1 loss in between, and back-propagating the iterations to the optical flow modified neural network and the converged neural network convergence, i.e., trained over multiple rounds of the networkIn the process, the network parameters of the optical flow correction neural network and the fusion neural network are adjusted according to the L1 loss, so that the network parameters are continuously optimized until the L1 loss does not decrease any more, which indicates that the network training is finished, and the prediction effect of the neural network is best at the moment. After the network training is completed, the video frame interpolation method can be used for realizing the video frame interpolation based on the trained optical flow correction neural network and the trained fusion neural network.

In one possible embodiment, step 204, the first training inverse optical flow f is calculated_3-1The method comprises the following steps: computing a first trained inverse light flow f based on a computer vision algorithm_3-1(ii) a Step 205, calculate the third training inverse optical flow f_1-3The method comprises the following steps: computing a third training inverse light flow f based on a computer vision algorithm_1-3。

In one possible embodiment, step 214, based on the fused parameter map m, the third training map wm is mapped_1-2And a fourth training map wm_3-2Performing fusion calculation to obtain a target interpolation frame in₂The process comprises the following steps: the third training map wm_1-2And correspondingly multiplying the pixel values in the fusion parameter graph m to obtain a first fusion graph wm_1-2Xm, wherein the resolution of the fusion parameter map m is the same as the resolution of an arbitrary video frame, and a plurality of pixel values of the fusion parameter map m and the third training map wm_1-2The pixel values of the first fusion map m are in one-to-one correspondence, each pixel value range of the fusion parameter map m is 0-1, and the pixel values of the first fusion map m and the third training map wm_1-2Is in one-to-one correspondence with a plurality of pixel values, a third training map wm_1-2A plurality of product values obtained by multiplying the plurality of pixel values of the fusion parameter map m in a one-to-one correspondence manner are a plurality of pixel values of the first fusion map respectively; subtracting the fusion parameter map m from the 1 to obtain a difference fusion parameter map (1-m), wherein a plurality of pixel values of the difference fusion parameter map correspond to a plurality of pixel values of the fusion parameter map m one by one, and a plurality of differences obtained by subtracting the plurality of pixel values of the fusion parameter map m from the 1 are respectively a plurality of pixel values of the difference fusion parameter map; the fourth training map wm_3-2Multiplying the difference fusion parameter map (1-m) to obtain a second fusion map wm_3-2X (1-m), fourth training map wm_3-2A plurality of product values obtained by multiplying the plurality of pixel values of the difference fusion parameter map in a one-to-one correspondence manner are a plurality of pixel values of the second fusion map respectively; the first fusion map wm_1-2Xm and second fusion map wm_3-2Adding the x (1-m) to obtain a target interpolation frame in₂A plurality of values obtained by one-to-one corresponding addition of a plurality of pixel values of the first fusion map and a plurality of pixel values of the second fusion map are respectively the target interpolation frame in₂A plurality of pixel values of (2) for formulating the target interpolation frame in₂＝wm_1-2×m+wm_3-2×(1-m)。

In one possible embodiment, step 206, the first training inverse optical flow f is_3-1Transformation into a first initial training light flow fcv based on a preset ratio_2-1The method comprises the following steps: reversing the first training light flow f_3-1Multiplying the preset proportional value t to obtain a first initial training optical flow fcv_2-1I.e. based on the formula fcv_2-1＝t×f_3-1Will f is_3-1Transformation fcv_2-1The range of the preset proportion value is 0.4-0.6;

step 207, the second training inverse optical flow f_1-3Transformation into a second initial training light stream fcv based on a preset ratio_2-3The method comprises the following steps: will be from the second training inverse optical flow f_1-3Multiplying by a preset proportional value t to obtain a second initial training optical flow fcv_2-3I.e. based on the formula fcv_2-3＝t×f_1-3Will f is_1-3Transformation into fcv_2-3Wherein the preset ratio t may be 0.5.

As shown in fig. 6, an embodiment of the present application further provides a video frame interpolation apparatus 3, including: an obtaining module 31, configured to obtain two adjacent video frames in the video, where the two video frames include a previous video frame I₁And the following video frame I₃(ii) a The obtaining module 31 is further configured to calculate an optical flow between two video frames; the obtaining module 31 is further configured to transform the optical flow between two video frames into an initial optical flow based on a preset ratio; the obtaining module 31 is further configured to map the two video frames through the initial optical flow to obtain an initial mapping map; a correction module 32 for correcting between two video frames based on the initial mapCorrecting the optical flow to obtain a corrected optical flow; and the frame interpolation module 33 is configured to obtain a target frame interpolation between two video frames according to the modified optical flow. The video frame interpolation apparatus may apply the video frame interpolation method in any of the above embodiments, and the specific process and principle are not described herein again.

In one possible implementation, calculating optical flow between two video frames includes: calculating an optical flow between two video frames based on a computer vision algorithm; modifying the optical flow between two video frames based on the initial map comprises: based on the neural network, the optical flow between two video frames is modified using the initial map as an input.

In one possible embodiment, the process of modifying the optical flow between two video frames based on the initial map and obtaining the modified optical flow includes: and inputting the two video frames, the initial optical flow and the initial mapping map into an optical flow correction neural network, and correcting the initial optical flow through the optical flow correction neural network to obtain a corrected optical flow output by the optical flow correction neural network.

In one possible implementation, transforming the optical flow between two video frames to an initial optical flow based on a preset ratio comprises: passing the first backward light flow F_3-1Transformation into a first initial optical flow FCV based on a preset ratio_2-1First backward light flow F_3-1Pertaining to a flow between two video frames, a first backward flow F_3-1For the purpose of starting from a previous video frame I₁To the next video frame I₃The backward light flow of (2); second backward light flow F_1-3Transformation into a second initial optical flow FCV based on a preset ratio_2-3Second backward light flow F_1-3Belonging to a stream of light between two video frames, a second inverse stream of light F_1-3For the following video frame I₃To the previous video frame I₁The backward light flow of (2); the step of mapping the two video frames through the initial optical flow to obtain an initial mapping comprises the following steps: by a first initial flow FCV_2-1For the previous video frame I₁Mapping to obtain a first mapping chart WF_1-2First map WF_1-2Belongs to the initial mapping; by a second initial flow FCV_2-3To the latter oneVideo frame I₃Mapping to obtain a second mapping chart WF_3-2Second map WF_3-2Belongs to the initial map; inputting two video frames, an initial optical flow and an initial mapping map into an optical flow correction neural network, correcting the initial optical flow through the optical flow correction neural network, and obtaining a corrected optical flow output by the optical flow correction neural network, wherein the process comprises the following steps: the previous video frame I₁The next video frame I₃First initial stream of light FCV_2-1Second initial flow of light FCV_2-3First map WF_1-2And a second map WF_3-2Inputting the third reverse optical flow FCVU into the optical flow correction neural network to obtain the output of the optical flow correction neural network_2-1And a fourth reverse optical flow FCVU_2-3Wherein the third reverse optical flow FCVU_2-1And a fourth reverse optical flow FCVU_2-3Belonging to the corrected light stream, the third reverse light stream FCVU_2-1For the modified preceding video frame I₁To target interpolation frame IN₂Of a fourth reverse optical flow FCVU_2-3For modified subsequent video frame I₃To target interpolation frame IN₂The reverse optical flow of (2).

In one possible embodiment, deriving the target interpolated frame between two video frames from the modified optical flow comprises: mapping the two video frames by the corrected optical flow to obtain a corrected mapping chart; inputting the two video frames, the corrected optical flow and the corrected mapping map into a fusion neural network to obtain a fusion parameter map output by the fusion neural network; and performing fusion calculation on the third mapping chart and the fourth mapping chart based on the fusion parameter chart to obtain the target interpolation frame.

In one possible embodiment, deriving the target interpolated frame between two video frames from the modified optical flow comprises: by a third reverse optical flow FCVU_2-1For the previous video frame I₁Mapping to obtain a third mapping chart WM_1-2(ii) a By a fourth reverse optical flow FCVU_2-3For the next video frame I₃Mapping to obtain a fourth mapping chart WM_3-2(ii) a The previous video frame I₁The next video frame I₃Third reverse optical flow FCVU_2-1Fourth reverse optical flow FCVU_2-3A third map WM_1-2And a fourth map WM_3-2Inputting the fusion parameters into a fusion neural network to obtain a fusion parameter graph m output by the fusion neural network; based on the fused parameter map m, the third mapping map WM_1-2And a fourth map WM_3-2Performing fusion calculation to obtain target interpolation frame IN₂。

In a possible implementation manner, the process of performing fusion calculation on the third map and the fourth map based on the fusion parameter map to obtain the target interpolated frame includes: multiplying the third mapping map and the fusion parameter map to obtain a first fusion map, wherein a plurality of pixel values of the fusion parameter map correspond to a plurality of pixel values of the third mapping map one to one, each pixel value range of the fusion parameter map is 0-1, the plurality of pixel values of the first fusion map correspond to the plurality of pixel values of the third mapping map one to one, and a plurality of product values obtained by multiplying the plurality of pixel values of the third mapping map and the plurality of pixel values of the fusion parameter map one to one are respectively the plurality of pixel values of the first fusion map; subtracting the 1 from the fusion parameter map to obtain a difference fusion parameter map, wherein a plurality of pixel values of the difference fusion parameter map correspond to a plurality of pixel values of the fusion parameter map one by one, and a plurality of differences obtained by subtracting the 1 from the plurality of pixel values of the fusion parameter map are respectively a plurality of pixel values of the difference fusion parameter map; multiplying the fourth mapping chart and the difference value fusion parameter chart to obtain a second fusion chart, wherein a plurality of product values obtained by multiplying a plurality of pixel values of the fourth mapping chart and a plurality of pixel values of the difference value fusion parameter chart in a one-to-one correspondence mode are a plurality of pixel values of the second fusion chart respectively; and adding the first fusion image and the second fusion image to obtain a target interpolation frame, wherein a plurality of values obtained by one-to-one correspondence addition of a plurality of pixel values of the first fusion image and a plurality of pixel values of the second fusion image are respectively a plurality of pixel values of the target interpolation frame.

In one possible embodiment, transforming the first inverse optical flow into the first initial optical flow based on the preset ratio comprises: multiplying the first reverse optical flow by a preset proportional value to obtain a first initial optical flow, wherein the range of the preset proportional value is 0.4-0.6; transforming the second inverse optical flow into a second initial optical flow based on the preset ratio comprises: and multiplying the second reverse optical flow by a preset proportional value to obtain a second initial optical flow.

In one possible embodiment, the preset ratio value is 0.5.

As shown in fig. 7, an embodiment of the present application further provides a neural network training device 4, including: an obtaining module 41, configured to: acquiring a group of training data, wherein the group of training data comprises three continuous video frames which are sequentially a first training video frame i₁Second training video frame i₂And a third training video frame i₃(ii) a Obtaining a first reference backward flow fg_2-1First reference backward light flow fg_2-1For the purpose of deriving from a first training video frame i₁To the second training video frame i₂The reference backward optical flow of (a); obtaining a second reference backward flow fg_2-3Second reference backward light flow fg_2-3For the purpose of training video frame i from the third₃To the second training video frame i₂The reference backward optical flow of (a); calculating a first training inverse light flow f_3-1First training inverse light flow f_3-1For the purpose of deriving from a first training video frame i₁To the third training video frame i₃The backward light flow of (2); computing a second training inverse light flow f_1-3Second training inverse light flow f_1-3For the purpose of training video frame i from the third₃To the first training video frame i₁The backward light flow of (2); reversing the first training light flow f_3-1Transformation into a first initial training light flow fcv based on a preset ratio_2-1(ii) a Reversing the second training light flow f_1-3Transformation into a second initial training light stream fcv based on a preset ratio_2-3(ii) a By a first initial training light flow fcv_2-1For the first training video frame i₁Mapping to obtain a first training map wf_1-2(ii) a By a second initial training light flow fcv_2-3For the third training video frame i₃Mapping to obtain a second training mapping wf_3-2(ii) a A correction module 42 for: the first training video frame i₁A third training video frame i₃First initial training light flow fcv_2-1Second initial training light flow fcv_2-3A first training map wf_1-2And a second training map wf_3-2Inputting the third training reverse optical flow fcvu into the optical flow correction neural network to obtain the output of the optical flow correction neural network_2-1And a fourth training inverse optical flow fcvu_2-3Third training inverse light flow fcvu_2-1For modified slave first training video frames i₁To the second training video frame i₂Of the fourth training inverse optical flow fcvu_2-3For modified secondary training video frames i₃To the second training video frame i₂The backward light flow of (2); an interpolation module 43, configured to: inverse optical flow fcvu through third training_2-1For the first training video frame i₁Mapping to obtain a third training mapping map wm_1-2(ii) a Inverse optical flow fcvu through fourth training_2-3For the third training video frame i₃Mapping to obtain a fourth training mapping wm_3-2(ii) a The first training video frame i₁A third training video frame i₃Third training inverse light flow fcvu_2-1Fourth training inverse light flow fcvu_2-3The third training map wm_1-2And a fourth training map wm_3-2Inputting the fusion neural network to obtain a fusion parameter graph m output by the fusion neural network; the frame interpolation module 43 is further configured to interpolate a third training map wm based on the fused parameter map m_1-2And a fourth training map wm_3-2Performing fusion calculation to obtain a target interpolation frame in₂(ii) a An adjustment module 44 for inserting frames in based on the targets₂And a second training video frame i₂Difference between, third training inverse optical flow fcvu_2-1Flow fg in the opposite direction of the first reference light_2-1Difference between, fourth training inverse optical flow fcvu_2-3Flow fg in the opposite direction of the second reference light_2-3The difference between the two adjusts the optical flow to correct the network parameters of the neural network and the fusion neural network. The neural network training device may apply the neural network training method for video interpolation in any of the above embodiments, and the specific process and principle are the same as those in the above embodiments, and are not described herein again.

In a possible embodiment, a first training inverse optical flow f is calculated_3-1The method comprises the following steps: computing a first trained inverse light flow f based on a computer vision algorithm_3-1(ii) a Computing a backward optical flow f from a second training_1-3The method comprises the following steps: computing a second trained inverse light flow f based on a computer vision algorithm_1-3。

In a possible implementation manner, the process of performing fusion calculation on the third training map and the fourth training map based on the fusion parameter map to obtain the target frame insertion includes: multiplying the third training mapping map and the fusion parameter map to obtain a first fusion map, wherein a plurality of pixel values of the fusion parameter map correspond to a plurality of pixel values of the third training mapping map one by one, each pixel value range of the fusion parameter map is 0-1, the plurality of pixel values of the first fusion map correspond to the plurality of pixel values of the third training mapping map one by one, and a plurality of product values obtained by multiplying the plurality of pixel values of the third training mapping map and the plurality of pixel values of the fusion parameter map one by one are respectively the plurality of pixel values of the first fusion map; subtracting the 1 from the fusion parameter map to obtain a difference fusion parameter map, wherein a plurality of pixel values of the difference fusion parameter map correspond to a plurality of pixel values of the fusion parameter map one by one, and a plurality of differences obtained by subtracting the 1 from the plurality of pixel values of the fusion parameter map are respectively a plurality of pixel values of the difference fusion parameter map; multiplying the fourth training mapping chart and the difference value fusion parameter chart to obtain a second fusion chart, wherein a plurality of product values obtained by multiplying a plurality of pixel values of the fourth training mapping chart and a plurality of pixel values of the difference value fusion parameter chart in a one-to-one correspondence mode are a plurality of pixel values of the second fusion chart respectively; and adding the first fusion image and the second fusion image to obtain a target interpolation frame, wherein a plurality of values obtained by adding the plurality of pixel values of the first fusion image and the plurality of pixel values of the second fusion image in a one-to-one correspondence manner are respectively a plurality of pixel values of the target interpolation frame.

In one possible implementation, transforming the first training inverse optical flow to the first initial training optical flow based on the preset ratio comprises: multiplying the first training reverse optical flow by a preset proportional value to obtain a first initial training optical flow, wherein the range of the preset proportional value is 0.4-0.6; transforming the second initial training optical flow to a second initial training optical flow based on the preset scale comprises: and multiplying the second initial training optical flow by a preset proportional value to obtain a second initial training optical flow.

In one possible embodiment, the preset ratio value is 0.5.

It should be understood that the above division of the video frame interpolation apparatus or the neural network training apparatus is only a division of logical functions, and the actual implementation may be wholly or partially integrated into one physical entity, or may be physically separated. And these modules can be realized in the form of software called by processing element; or may be implemented entirely in hardware; and part of the modules can be realized in the form of calling by the processing element in software, and part of the modules can be realized in the form of hardware. For example, any one of the obtaining module, the modifying module and the frame interpolation module may be a processing element that is set up separately, or may be integrated in the video frame interpolation apparatus, for example, be integrated in a certain chip of the video frame interpolation apparatus, or may be stored in a memory of the video frame interpolation apparatus in the form of a program, and a certain processing element of the video frame interpolation apparatus calls and executes the functions of the above modules. Other modules are implemented similarly. In addition, all or part of the modules can be integrated together or can be independently realized. The processing element described herein may be an integrated circuit having signal processing capabilities. In implementation, each step of the above method or each module above may be implemented by an integrated logic circuit of hardware in a processor element or an instruction in the form of software. In addition, the video frame interpolation device and the neural network training device can be the same device or different devices.

For example, a video framing device or a neural network training device may be one or more integrated circuits configured to implement the above methods, such as: one or more Application Specific Integrated Circuits (ASICs), or one or more microprocessors (DSPs), or one or more Field Programmable Gate Arrays (FPGAs), among others. As another example, when one of the above modules is implemented in the form of a Processing element scheduler, the Processing element may be a general purpose processor, such as a Central Processing Unit (CPU) or other processor capable of invoking programs. As another example, these modules may be integrated together, implemented in the form of a system-on-a-chip (SOC).

As shown in fig. 8, an embodiment of the present application further provides an electronic device, including: a processor 51 and a memory 52, the memory 52 being configured to store at least one instruction which is loaded and executed by the processor 51 to implement the method in any of the embodiments described above, including the video frame interpolation method or the neural network training method for video frame interpolation. The specific process and principle of the video frame interpolation method or the neural network training method for video frame interpolation are the same as those of the above embodiments, and are not described herein again.

The number of the processors 51 may be one or more, and the processors 51 and the memory 52 may be connected by a bus 53 or other means. The memory 52 is a non-transitory computer readable storage medium, and can be used for storing non-transitory software programs, non-transitory computer executable programs, and modules, such as program instructions/modules corresponding to the data processing device in the embodiments of the present application. The processor executes various functional applications and data processing by executing non-transitory software programs, instructions and modules stored in the memory, i.e., implements the methods in any of the above-described method embodiments. The memory may include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required for at least one function; and necessary data, etc. Further, the memory may include high speed random access memory, and may also include non-transitory memory, such as at least one disk storage device, flash memory device, or other non-transitory solid state storage device. The electronic device may be, for example, a server, a computer, a mobile phone, or other electronic products.

Embodiments of the present application further provide a computer-readable storage medium, in which a computer program is stored, and when the computer program runs on a computer, the computer is caused to execute the method in any of the above embodiments.

In the above embodiments, the implementation may be wholly or partially realized by software, hardware, firmware, or any combination thereof. When implemented in software, may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. When the computer program instructions are loaded and executed on a computer, the procedures or functions described in accordance with the present application are generated, in whole or in part. The computer may be a general purpose computer, a special purpose computer, a network of computers, or other programmable device. The computer instructions may be stored in a computer readable storage medium or transmitted from one computer readable storage medium to another, for example, the computer instructions may be transmitted from one website, computer, server, or data center to another website, computer, server, or data center by wire (e.g., coaxial cable, fiber optic, digital subscriber line) or wirelessly (e.g., infrared, wireless, microwave, etc.). The computer-readable storage medium can be any available medium that can be accessed by a computer or a data storage device, such as a server, a data center, etc., that incorporates one or more of the available media. The usable medium may be a magnetic medium (e.g., floppy Disk, hard Disk, magnetic tape), an optical medium (e.g., DVD), or a semiconductor medium (e.g., Solid State Disk), among others.

In the embodiments of the present application, "at least one" means one or more, "a plurality" means two or more. "and/or" describes the association relationship of the associated objects, and means that there may be three relationships, for example, a and/or B, and may mean that a exists alone, a and B exist simultaneously, and B exists alone. Wherein A and B can be singular or plural. The character "/" generally indicates that the former and latter associated objects are in an "or" relationship. "at least one of the following" and similar expressions refer to any combination of these items, including any combination of singular or plural items. For example, at least one of a, b, and c may represent: a, b, c, a-b, a-c, b-c, or a-b-c, wherein a, b, c may be single or multiple.

The above description is only a preferred embodiment of the present application and is not intended to limit the present application, and various modifications and changes may be made by those skilled in the art. Any modification, equivalent replacement, improvement and the like made within the spirit and principle of the present application shall be included in the protection scope of the present application.

Claims

1. A method for video frame insertion, comprising:

acquiring two adjacent video frames in a video, wherein the two video frames comprise a previous video frame and a next video frame;

calculating an optical flow between the two video frames;

transforming the optical flow between the two video frames into an initial optical flow based on a preset proportion;

mapping the two video frames through the initial optical flow to obtain an initial mapping chart;

correcting the optical flow between the two video frames based on the initial mapping map to obtain a corrected optical flow;

and obtaining a target interpolation frame between the two video frames according to the corrected optical flow.

2. The method of claim 1,

the calculating optical flow between the two video frames comprises: computing an optical flow between the two video frames based on a computer vision algorithm;

the modifying optical flow between the two video frames based on the initial map comprises: correcting the optical flow between the two video frames based on a neural network using the initial map as an input.

3. The method of claim 1, wherein the modifying the optical flow between the two video frames based on the initial map comprises:

and inputting the two video frames, the initial optical flow and the initial mapping map into an optical flow correction neural network, and correcting the initial optical flow through the optical flow correction neural network to obtain the corrected optical flow output by the optical flow correction neural network.

4. The method of claim 3,

the transforming the optical flow between the two video frames to an initial optical flow based on a preset scale comprises:

transforming a first inverse optical flow, belonging to an optical flow between the two video frames, into a first initial optical flow on the basis of a preset ratio, the first inverse optical flow being an inverse optical flow from the previous video frame to the subsequent video frame;

transforming a second inverse optical flow belonging to an optical flow between the two video frames into a second initial optical flow based on a preset ratio, the second inverse optical flow being an inverse optical flow from the subsequent video frame to the previous video frame;

the mapping the two video frames through the initial optical flow to obtain an initial mapping comprises:

mapping the previous video frame by the first initial optical flow to obtain a first mapping map, wherein the first mapping map belongs to the initial mapping map;

mapping the next video frame through the second initial optical flow to obtain a second mapping map, wherein the second mapping map belongs to the initial mapping map;

the process of inputting the two video frames, the initial optical flow and the initial mapping map into an optical flow modified neural network, and modifying the initial optical flow through the optical flow modified neural network to obtain the modified optical flow output by the optical flow modified neural network includes:

inputting the previous video frame, the next video frame, the first initial optical flow, the second initial optical flow, the first mapping map and the second mapping map into an optical flow modification neural network, and obtaining a third inverse optical flow and a fourth inverse optical flow output by the optical flow modification neural network, wherein the third inverse optical flow and the fourth inverse optical flow belong to the modified optical flow, the third inverse optical flow is a modified inverse optical flow from the previous video frame to the target frame insertion, and the fourth inverse optical flow is a modified inverse optical flow from the next video frame to the target frame insertion.

5. The method of claim 1,

the obtaining of the target interpolated frame between the two video frames according to the modified optical flow comprises:

mapping the two video frames through the corrected optical flow to obtain a corrected mapping chart;

inputting the two video frames, the corrected optical flow and the corrected mapping map into a fusion neural network to obtain a fusion parameter map output by the fusion neural network;

and performing fusion calculation on the corrected mapping map based on the fusion parameter map to obtain the target interpolation frame.

6. The method of claim 4,

the obtaining of the target frame interpolation between the two video frames according to the modified optical flow comprises:

mapping the previous video frame through the third reverse optical flow to obtain a third mapping map;

mapping the next video frame through the fourth backward optical flow to obtain a fourth mapping chart;

inputting the previous video frame, the next video frame, the third reverse optical flow, the fourth reverse optical flow, the third mapping map and the fourth mapping map into a fusion neural network to obtain a fusion parameter map output by the fusion neural network;

and performing fusion calculation on the third mapping chart and the fourth mapping chart based on the fusion parameter chart to obtain the target interpolation frame.

7. The method of claim 6,

the process of performing fusion calculation on the third map and the fourth map based on the fusion parameter map to obtain the target frame insertion includes:

multiplying the third mapping map and the fusion parameter map to obtain a first fusion map, wherein a plurality of pixel values of the fusion parameter map are in one-to-one correspondence with a plurality of pixel values of the third mapping map, each pixel value range of the fusion parameter map is 0-1, the plurality of pixel values of the first fusion map are in one-to-one correspondence with the plurality of pixel values of the third mapping map, and a plurality of product values obtained by multiplying the plurality of pixel values of the third mapping map and the plurality of pixel values of the fusion parameter map in one-to-one correspondence are respectively the plurality of pixel values of the first fusion map;

subtracting the fusion parameter map by 1 to obtain a difference fusion parameter map, wherein a plurality of pixel values of the difference fusion parameter map correspond to a plurality of pixel values of the fusion parameter map one by one, and a plurality of differences obtained by subtracting the plurality of pixel values of the fusion parameter map by 1 are a plurality of pixel values of the difference fusion parameter map respectively;

multiplying the fourth mapping map and the difference value fusion parameter map to obtain a second fusion map, wherein a plurality of product values obtained by one-to-one corresponding multiplication of a plurality of pixel values of the fourth mapping map and a plurality of pixel values of the difference value fusion parameter map are a plurality of pixel values of the second fusion map respectively;

adding the first fusion graph and the second fusion graph to obtain the target interpolation frame, wherein a plurality of pixel values obtained by adding the plurality of pixel values of the first fusion graph and the plurality of pixel values of the second fusion graph in a one-to-one correspondence manner are a plurality of pixel values of the target interpolation frame respectively.

8. The method according to claim 4 or 6 or 7,

the transforming the first inverse optical flow into a first initial optical flow based on a preset proportion comprises:

multiplying the first reverse optical flow by a preset proportional value to obtain the first initial optical flow, wherein the range of the preset proportional value is 0.4-0.6;

the transforming the second inverse optical flow into a second initial optical flow based on a preset proportion comprises:

and multiplying the second reverse optical flow by the preset proportional value to obtain the second initial optical flow.

9. The method of claim 8,

the preset proportional value is 0.5.

10. A neural network training method for video frame interpolation, comprising:

acquiring a group of training data, wherein the group of training data comprises three continuous video frames which are a first training video frame, a second training video frame and a third training video frame in sequence;

obtaining a first reference inverse optical flow, the first reference inverse optical flow being an inverse optical flow from the first training video frame to the second training video frame;

obtaining a second reference inverse optical flow, the second reference inverse optical flow being an inverse optical flow from the third training video frame to the second training video frame;

calculating a first training inverse optical flow, the first training inverse optical flow being an inverse optical flow from the first training video frame to the third training video frame;

calculating a second training inverse optical flow, the second training inverse optical flow being an inverse optical flow from the third training video frame to the first training video frame;

transforming the first training reversed optical flow into a first initial training optical flow based on a preset proportion;

transforming the second training reversed optical flow into a second initial training optical flow based on the preset proportion;

mapping the first training video frame through the first initial training optical flow to obtain a first training mapping chart;

mapping the third training video frame through the second initial training optical flow to obtain a second training mapping chart;

inputting the first training video frame, the third training video frame, the first initial training optical flow, the second initial training optical flow, the first training map and the second training map into an optical flow modification neural network to obtain a third training inverse optical flow and a fourth training inverse optical flow output by the optical flow modification neural network, wherein the third training inverse optical flow is a modified inverse optical flow from the first training video frame to the second training video frame, and the fourth training inverse optical flow is a modified inverse optical flow from the third training video frame to the second training video frame;

mapping the first training video frame through the third training reversed optical flow to obtain a third training mapping chart;

mapping the third training video frame through the fourth training reversed optical flow to obtain a fourth training mapping chart;

inputting the first training video frame, the third training backward light stream, the fourth training backward light stream, the third training map and the fourth training map into a fusion neural network to obtain a fusion parameter map output by the fusion neural network;

based on the fusion parameter map, performing fusion calculation on the third training mapping map and the fourth training mapping map to obtain the target interpolation frame;

adjusting network parameters of the optical-flow-modified neural network and the fused neural network based on a difference between the target interpolated frame and the second training video frame, a difference between the third training inverse optical flow and the first reference inverse optical flow, and a difference between the fourth training inverse optical flow and the second reference inverse optical flow.

11. The method of claim 10,

said calculating a first trained inverse optical flow comprises: computing the first trained inverse optical flow based on a computer vision algorithm;

said calculating a second trained inverse optical flow comprises: calculating the second trained inverse optical flow based on a computer vision algorithm.

12. The method of claim 10,

the process of performing fusion calculation on the third training map and the fourth training map based on the fusion parameter map to obtain the target frame insertion includes:

multiplying the third training map and the fusion parameter map to obtain a first fusion map, wherein a plurality of pixel values of the fusion parameter map correspond to a plurality of pixel values of the third training map one by one, each pixel value range of the fusion parameter map is 0-1, the plurality of pixel values of the first fusion map correspond to the plurality of pixel values of the third training map one by one, and a plurality of product values obtained by multiplying the plurality of pixel values of the third training map and the plurality of pixel values of the fusion parameter map one by one are respectively the plurality of pixel values of the first fusion map;

subtracting the fusion parameter map by 1 to obtain a difference fusion parameter map, wherein a plurality of pixel values of the difference fusion parameter map correspond to a plurality of pixel values of the fusion parameter map one by one, and a plurality of differences obtained by subtracting the fusion parameter map by 1 are respectively a plurality of pixel values of the difference fusion parameter map;

multiplying the fourth training map and the difference value fusion parameter map to obtain a second fusion map, wherein a plurality of product values obtained by one-to-one corresponding multiplication of a plurality of pixel values of the fourth training map and a plurality of pixel values of the difference value fusion parameter map are a plurality of pixel values of the second fusion map respectively;

and adding the first fusion graph and the second fusion graph to obtain the target interpolation frame, wherein a plurality of values obtained by adding a plurality of pixel values of the first fusion graph and a plurality of pixel values of the second fusion graph in a one-to-one correspondence manner are respectively a plurality of pixel values of the target interpolation frame.

13. The method of claim 10,

the transforming the first training inverse optical flow to a first initial training optical flow based on a preset scale comprises:

multiplying the first training reverse optical flow by a preset proportional value to obtain the first initial training optical flow, wherein the range of the preset proportional value is 0.4-0.6;

the transforming the second initial training optical flow into a second initial training optical flow based on a preset ratio comprises:

and multiplying the second initial training optical flow by the preset proportional value to obtain the second initial training optical flow.

14. The method of claim 10,

the preset proportional value is 0.5.

15. A video frame interpolation apparatus, comprising:

the acquisition module is used for acquiring two adjacent video frames in the video;

the acquisition module is further used for calculating optical flow between the two video frames;

the acquisition module is further used for transforming the optical flow between the two video frames into an initial optical flow based on a preset proportion;

the acquisition module is further configured to map the two video frames through the initial optical flow to obtain an initial mapping map;

the correction module is used for correcting the optical flow between the two video frames based on the initial mapping map to obtain a corrected optical flow;

and the frame interpolation module is used for obtaining a target frame interpolation between the two video frames according to the corrected optical flow.

16. An electronic device, comprising:

a processor and a memory for storing at least one instruction which is loaded and executed by the processor to implement the method of any one of claims 1 to 14.

17. A computer-readable storage medium, in which a computer program is stored which, when run on a computer, causes the computer to carry out the method of any one of claims 1 to 14.