CN112040311B - Video image frame supplementing method, device and equipment and storage medium - Google Patents

Video image frame supplementing method, device and equipment and storage medium Download PDF

Info

Publication number
CN112040311B
CN112040311B CN202010720883.9A CN202010720883A CN112040311B CN 112040311 B CN112040311 B CN 112040311B CN 202010720883 A CN202010720883 A CN 202010720883A CN 112040311 B CN112040311 B CN 112040311B
Authority
CN
China
Prior art keywords
optical flow
coarse
frame
grained
images
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010720883.9A
Other languages
Chinese (zh)
Other versions
CN112040311A (en
Inventor
李甲
许豪
马中行
赵沁平
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beihang University
Original Assignee
Beihang University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beihang University filed Critical Beihang University
Priority to CN202010720883.9A priority Critical patent/CN112040311B/en
Publication of CN112040311A publication Critical patent/CN112040311A/en
Application granted granted Critical
Publication of CN112040311B publication Critical patent/CN112040311B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/44Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs
    • H04N21/44008Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs involving operations for analysing video streams, e.g. detecting features or characteristics in the video stream
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/134Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or criterion affecting or controlling the adaptive coding
    • H04N19/146Data rate or code amount at the encoder output
    • H04N19/149Data rate or code amount at the encoder output by estimating the code amount by means of a model, e.g. mathematical model or statistical model
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/169Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding
    • H04N19/17Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object
    • H04N19/172Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object the region being a picture, frame or field
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/44Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs
    • H04N21/4402Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs involving reformatting operations of video signals for household redistribution, storage or real-time display
    • H04N21/440281Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs involving reformatting operations of video signals for household redistribution, storage or real-time display by altering the temporal resolution, e.g. by frame skipping
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/80Generation or processing of content or additional data by content creator independently of the distribution process; Content per se
    • H04N21/83Generation or processing of protective or descriptive data associated with content; Content structuring
    • H04N21/845Structuring of content, e.g. decomposing content into time segments
    • H04N21/8455Structuring of content, e.g. decomposing content into time segments involving pointers to the content, e.g. pointers to the I-frames of the video stream

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Physics & Mathematics (AREA)
  • Algebra (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Analysis (AREA)
  • Mathematical Optimization (AREA)
  • Pure & Applied Mathematics (AREA)
  • Image Analysis (AREA)

Abstract

The embodiment of the invention provides a method, a device, equipment and a storage medium for video image frame supplement, and the specific implementation scheme is as follows: the method comprises the following steps: extracting front and rear adjacent frames of images in a target video, and respectively inputting the front and rear adjacent frames of images into a coarse-grained optical flow generation model trained to be convergent so as to output coarse-grained optical flow data corresponding to the front and rear adjacent frames of images; inputting the pre-configured frame supplementing time data and the coarse-grained optical flow data into an intermediate frame optical flow generation model trained to be converged to output intermediate frame optical flow data; and generating a target intermediate frame image according to the front and back adjacent two frame images and the intermediate frame optical flow data. The time information and the motion information are fused through the intermediate frame optical flow generation model trained to be convergent, so that the relevance between the generated intermediate frame optical flow data and the coarse-grained optical flow data is stronger, the relevance between the target intermediate frame image and the front frame and the rear frame is improved, and the continuity of the whole video image is improved.

Description

Video image frame supplementing method, device and equipment and storage medium
Technical Field
The embodiment of the invention relates to the technical field of video image processing, in particular to a method, a device and equipment for supplementing a frame of a video image and a storage medium.
Background
In recent years, video application software is in a variety, and the scale of video data in a network is increased explosively, which brings new opportunities and challenges to the development related to video optimization. The video frame rate is an important attribute of the video, and has extremely high significance in the video content optimization research. Firstly, in the direction of a high frame rate video, the related industries of modern movies mostly adopt a persistence of vision technology to bring audience viewing experience, the main principle is to continuously play static pictures, and when the playing speed reaches the deception speed of human eyes, people feel movement. Therefore, the frame rate of the film determines the viewing experience of the audience, and the frame rate is increased to increase the image information amount and smooth the lens which moves violently so as to enhance the audio-visual impact has important significance for the development of the film and television industry. Secondly, in the field of video slow-playing, with the development of the internet, an industry taking video content as a main part is gradually started, and the video slow-playing is an important product characteristic and has great application value.
Therefore, how to increase the frame rate becomes the key point of the video optimization related problem. However, the frame rate is generally increased by frame-filling the video image, and the correlation between the intermediate frame and the previous and subsequent frames obtained in the prior art is poor, which results in poor continuity of the whole video image.
Disclosure of Invention
The invention provides a video image frame supplementing method, a video image frame supplementing device, video image frame supplementing equipment and a storage medium, which are used for solving the problem that the relevance between an intermediate frame obtained by a video image frame supplementing mode and a previous frame and a next frame is poor, so that the consistency of the whole video image is poor.
The first aspect of the embodiments of the present invention provides a method for frame interpolation of a video image, where the method is applied to an electronic device, and the method includes:
extracting front and rear adjacent frames of images in a target video, and respectively inputting the front and rear adjacent frames of images into a coarse-grained optical flow generation model trained to be convergent so as to output coarse-grained optical flow data corresponding to the front and rear adjacent frames of images;
inputting the pre-configured frame supplementing time data and the coarse-grained optical flow data into an intermediate frame optical flow generation model trained to be converged to output intermediate frame optical flow data;
and generating a target intermediate frame image according to the front and back adjacent two frame images and the intermediate frame optical flow data.
Further, the method for extracting two adjacent frames of images in front and back of the target video and inputting the two adjacent frames of images into the coarse-grained optical flow generation model trained to converge respectively to output the coarse-grained optical flow data corresponding to the two adjacent frames of images comprises:
extracting front and back adjacent two frames of images in a target video, and respectively inputting the images into a coarse-grained optical flow generation model trained to be convergent;
and extracting corresponding image characteristic parameters from the front and rear adjacent two frames of images through the coarse-grained optical flow generation model, and outputting coarse-grained optical flow data corresponding to the front and rear adjacent two frames of images according to the image characteristic parameters.
Further, the method as described above, wherein the coarse-grained optical flow generation model comprises a codec network and a reversed convolution structure;
the extracting, by the coarse-grained optical flow generation model, corresponding image feature parameters from the front and rear two frames of images, and outputting coarse-grained optical flow data corresponding to the front and rear two adjacent frames of images according to the image feature parameters, includes:
extracting the image characteristic parameters from the front and back adjacent two frames of images through a coding network and coding to obtain corresponding coding results;
inputting the image characteristic parameters into the turning convolution structure to obtain the alignment characteristic graphs of the front and rear adjacent frames of images;
and inputting the alignment feature map and the coding result into a decoding network to output coarse-grained optical flow data corresponding to the front and rear adjacent two frames of images.
Further, the method as described above, the coarse-grained optical flow data comprising coarse-grained bi-directional optical flow data; the intermediate frame optical flow generation model comprises a fusion function and an object motion track fitting function;
the inputting the pre-configured frame-complementing time data and the coarse-grained optical flow data into an intermediate frame optical flow generation model trained to converge to output intermediate frame optical flow data comprises:
fusing pre-configured frame supplementing time data and the coarse-grained bidirectional optical flow data through the fusion function to output frame supplementing time bidirectional optical flow data corresponding to the frame supplementing time data;
and inputting the frame supplementing time bidirectional optical flow data into the object motion track fitting function to output intermediate frame optical flow data.
Further, the method as described above, before the input into the coarse-grained optical flow generation model trained to converge to output the coarse-grained optical flow data corresponding to the two adjacent frames of images, further includes:
obtaining a first training sample, wherein the first training sample is a training sample corresponding to a coarse-grained optical flow generation model, and the first training sample comprises: a previous frame image and a subsequent frame image;
inputting the first training sample into a preset coarse-grained optical flow generation model to train the preset coarse-grained optical flow generation model;
judging whether the preset coarse-grained light stream generation model meets a convergence condition or not by adopting a reconstruction loss function;
and if the preset coarse-grained optical flow generation model meets the convergence condition, determining the coarse-grained optical flow generation model meeting the convergence condition as a coarse-grained optical flow generation model trained to be converged.
Further, the method as described above, before inputting the pre-configured frame-complementing time data and the coarse-grained optical flow data into the intermediate-frame optical flow generation model trained to converge to output intermediate-frame optical flow data, further comprising:
acquiring a second training sample, wherein the second training sample is a training sample corresponding to the intermediate frame optical flow generation model, and the second training sample comprises: a first standard inter-frame image and a first actual inter-frame image;
inputting the second training sample into a preset intermediate frame optical flow generation model so as to train the preset intermediate frame optical flow generation model;
judging whether the preset intermediate frame optical flow generation model meets a convergence condition or not by adopting a perception loss function;
and if the preset intermediate frame optical flow generation model meets the convergence condition, determining the intermediate frame optical flow generation model meeting the convergence condition as the intermediate frame optical flow generation model trained to be converged.
Further, the method as described above, the generating a target intermediate frame image from the two adjacent front and back frame images and the intermediate frame optical flow data includes:
acquiring a weight of the proportion of the intermediate frame optical flow data occupied by the front and rear adjacent frames of images according to the intermediate frame optical flow data;
and generating a target intermediate frame image through mapping operation according to the weight and the two adjacent frames of images.
A second aspect of the embodiments of the present invention provides a video image frame complementing apparatus, where the apparatus is located in an electronic device, and includes:
the coarse-grained optical flow generation module is used for extracting front and back adjacent two frames of images in the target video, and respectively inputting the front and back adjacent two frames of images into a coarse-grained optical flow generation model trained to be converged so as to output coarse-grained optical flow data corresponding to the front and back adjacent two frames of images;
the intermediate frame optical flow generation module is used for inputting pre-configured frame supplementing time data and the coarse-grained optical flow data into an intermediate frame optical flow generation model trained to be converged so as to output intermediate frame optical flow data;
and the intermediate frame image generation module is used for generating a target intermediate frame image according to the front and back adjacent two frame images and the intermediate frame optical flow data.
Further, in the apparatus described above, the coarse-grained optical flow generation module is specifically configured to:
extracting front and back adjacent two frames of images in a target video, and respectively inputting the images into a coarse-grained optical flow generation model trained to be convergent; and extracting corresponding image characteristic parameters from the front and rear adjacent two frames of images through the coarse-grained optical flow generation model, and outputting coarse-grained optical flow data corresponding to the front and rear adjacent two frames of images according to the image characteristic parameters.
Further, the apparatus as described above, the coarse-grained optical flow generation model comprises a codec network and a reversed convolution structure;
the coarse-grained optical flow generation module is specifically configured to, when extracting corresponding image feature parameters from the two previous and subsequent frames of images through the coarse-grained optical flow generation model and outputting coarse-grained optical flow data corresponding to the two previous and subsequent adjacent frames of images according to the image feature parameters:
extracting the image characteristic parameters from the front and back adjacent two frames of images through a coding network and coding to obtain corresponding coding results; inputting the image characteristic parameters into the turning convolution structure to obtain the alignment characteristic graphs of the front and rear adjacent frames of images; and inputting the alignment feature map and the coding result into a decoding network to output coarse-grained optical flow data corresponding to the front and rear adjacent two frames of images.
Further, the apparatus as described above, the coarse-grained optical flow data comprising coarse-grained bi-directional optical flow data; the intermediate frame optical flow generation model comprises a fusion function and an object motion track fitting function;
the intermediate-frame optical flow generation module is specifically configured to:
fusing pre-configured frame supplementing time data and the coarse-grained bidirectional optical flow data through the fusion function to output frame supplementing time bidirectional optical flow data corresponding to the frame supplementing time data; and inputting the frame supplementing time bidirectional optical flow data into the object motion track fitting function to output intermediate frame optical flow data.
Further, the apparatus as described above, the apparatus further comprising: a first training module;
the first training module is configured to obtain a first training sample, where the first training sample is a training sample corresponding to a coarse-grained optical flow generation model, and the first training sample includes: a previous frame image and a subsequent frame image; inputting the first training sample into a preset coarse-grained optical flow generation model to train the preset coarse-grained optical flow generation model; judging whether the preset coarse-grained light stream generation model meets a convergence condition or not by adopting a reconstruction loss function; and if the preset coarse-grained optical flow generation model meets the convergence condition, determining the coarse-grained optical flow generation model meeting the convergence condition as a coarse-grained optical flow generation model trained to be converged.
Further, the apparatus as described above, the apparatus further comprising: a second training module;
the second training module is to: acquiring a second training sample, wherein the second training sample is a training sample corresponding to the intermediate frame optical flow generation model, and the second training sample comprises: a first standard inter-frame image and a first actual inter-frame image; inputting the second training sample into a preset intermediate frame optical flow generation model so as to train the preset intermediate frame optical flow generation model; judging whether the preset intermediate frame optical flow generation model meets a convergence condition or not by adopting a perception loss function; and if the preset intermediate frame optical flow generation model meets the convergence condition, determining the intermediate frame optical flow generation model meeting the convergence condition as the intermediate frame optical flow generation model trained to be converged.
Further, in the apparatus as described above, the intermediate frame image generating module is specifically configured to:
acquiring a weight of the proportion of the intermediate frame optical flow data occupied by the front and rear adjacent frames of images according to the intermediate frame optical flow data;
and generating a target intermediate frame image through mapping operation according to the weight and the two adjacent frames of images.
A third aspect of the embodiments of the present invention provides a video image frame interpolation apparatus, including: a memory, a processor;
a memory; a memory for storing the processor-executable instructions;
wherein the processor is configured to perform the video image frame complementing method of any one of the first aspect by the processor.
A fourth aspect of the embodiments of the present invention provides a computer-readable storage medium, where computer-executable instructions are stored, and when the computer-executable instructions are executed by a processor, the method for complementing video frames according to any one of the first aspect is implemented.
The embodiment of the invention provides a video image frame supplementing method, a video image frame supplementing device, video image frame supplementing equipment and a storable medium, wherein the method is applied to electronic equipment and comprises the following steps: extracting front and rear adjacent frames of images in a target video, and respectively inputting the front and rear adjacent frames of images into a coarse-grained optical flow generation model trained to be convergent so as to output coarse-grained optical flow data corresponding to the front and rear adjacent frames of images; inputting the pre-configured frame supplementing time data and the coarse-grained optical flow data into an intermediate frame optical flow generation model trained to be converged to output intermediate frame optical flow data; and generating a target intermediate frame image according to the front and back adjacent two frame images and the intermediate frame optical flow data. Corresponding coarse-grained optical flow data are output according to the front and back adjacent two frames of images in the target video through the intermediate frame optical flow generation model trained to be convergent, and intermediate frame optical flow data are output according to pre-configured frame supplementing time data and the coarse-grained optical flow data through the intermediate frame optical flow generation model trained to be convergent. And then generating a target intermediate frame image according to the front and back adjacent two frame images and the intermediate frame optical flow data. According to the method provided by the embodiment of the invention, the generated intermediate frame optical flow data and the coarse-grained optical flow data have stronger relevance by fusing the time information of the frame supplementing time data and the motion information contained in the coarse-grained optical flow data through the intermediate frame optical flow generation model trained to be converged, so that the relevance between the target intermediate frame image and the previous and subsequent frames is improved, and the continuity of the whole video image is further improved.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the invention and together with the description, serve to explain the principles of the invention.
Fig. 1 is a scene diagram of a video image frame interpolation method that can implement an embodiment of the present invention;
fig. 2 is a flowchart illustrating a video image frame interpolation method according to a first embodiment of the present invention;
fig. 3 is a flowchart illustrating a video image frame interpolation method according to a second embodiment of the present invention;
fig. 4 is a flowchart illustrating step 202 of a video image frame interpolation method according to a second embodiment of the present invention;
fig. 5 is a schematic training flow chart of a video image frame interpolation method according to a fourth embodiment of the present invention;
fig. 6 is a schematic training flow chart of a video image frame interpolation method according to a fifth embodiment of the present invention;
fig. 7 is a schematic structural diagram of a video image frame interpolation apparatus according to a sixth embodiment of the present invention;
fig. 8 is a schematic structural diagram of an electronic device according to a seventh embodiment of the present invention.
With the above figures, certain embodiments of the invention have been illustrated and described in more detail below. The drawings and the description are not intended to limit the scope of the inventive concept in any way, but rather to illustrate it by those skilled in the art with reference to specific embodiments.
Detailed Description
Reference will now be made in detail to the exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, like numbers in different drawings represent the same or similar elements unless otherwise indicated. The embodiments described in the following exemplary embodiments do not represent all embodiments consistent with the present invention. Rather, they are merely examples of apparatus and methods consistent with certain aspects of the invention, as detailed in the appended claims.
The technical solution of the present invention will be described in detail below with specific examples. The following several specific embodiments may be combined with each other, and details of the same or similar concepts or processes may not be repeated in some embodiments. Embodiments of the present invention will be described below with reference to the accompanying drawings.
First, terms related to embodiments of the present invention are explained:
frame: the single image picture is the minimum unit in the image animation, which is equivalent to each frame of lens on the motion picture film, one frame is a static picture, and continuous frames form the animation.
Frame supplementing: adding at least one frame image between two adjacent frames of images.
Optical flow: which refers to the apparent motion of the luminance pattern, the optical flow contains information of the motion of the object.
An application scenario of the video image frame interpolation method provided by the embodiment of the present invention is described below. As shown in fig. 1, 1 is a first electronic device, 2 is an adjacent subsequent frame image, 3 is an adjacent previous frame image, 4 is a second electronic device, and 5 is a third electronic device. The network architecture of the application scene corresponding to the video image frame supplementing method provided by the embodiment of the invention comprises the following steps: a first electronic device 1, a second electronic device 4 and a third electronic device 5. The second electronic device 4 stores a target video which needs to be subjected to frame complementing. The first electronic device 1 acquires the adjacent previous frame image 3 and the adjacent next frame image 2 of the target video from the second electronic device 4, and inputs the two adjacent previous and next frame images into a coarse-grained optical flow generation model trained to be convergent respectively so as to output coarse-grained optical flow data corresponding to the two adjacent previous and next frame images. And inputting the pre-configured frame supplementing time data and the coarse-grained optical flow data into an intermediate frame optical flow generation model trained to be converged to output the intermediate frame optical flow data. And generating a target intermediate frame image according to the front and back adjacent two frame images and the intermediate frame optical flow data. After the first electronic device 1 generates the target inter-frame image, the target inter-frame image may be output to the third electronic device 5. And acquiring the adjacent previous frame image 3 and the adjacent next frame image 2 of the target video in the second electronic equipment 4 by the third electronic equipment 5, and combining the target intermediate frame images to generate a video after frame supplement. Alternatively, the first electronic device 1 may combine the target inter-frame image with the target video to generate a video after frame interpolation.
According to the video image frame supplementing method provided by the embodiment of the invention, the time information of the frame supplementing time data and the motion information in the optical flow data are fused through the intermediate frame optical flow generation model trained to be converged, so that the generated intermediate frame optical flow data is stronger in relevance with the coarse-grained optical flow data, the relevance of the target intermediate frame image and the front and rear frames is improved, and the continuity of the whole video image is further improved.
The embodiments of the present invention will be described with reference to the accompanying drawings.
Fig. 2 is a flowchart illustrating a video image frame complementing method according to a first embodiment of the present invention, and as shown in fig. 2, an implementation subject of the embodiment of the present invention is a video image frame complementing device, which can be integrated in an electronic device. The video image frame interpolation method provided by the embodiment includes the following steps:
step S101, two adjacent frames of images in the target video are extracted and input into a coarse-grained optical flow generation model trained to be converged respectively, so as to output coarse-grained optical flow data corresponding to the two adjacent frames of images.
First, in the present embodiment, the two adjacent front and rear frame images refer to a front frame image and a rear frame image adjacent to the front frame image. Such as a first frame image, i.e., a previous frame image, and a second frame image, which is a subsequent frame image adjacent to the previous frame image, in the target video. The target video is a video needing video frame supplement.
The target video may be a target video obtained from a database, obtained from another electronic device, or manually input, which is not limited in this embodiment. In this embodiment, the coarse-grained optical flow generation model trained to be convergent is a trained model, and is used for generating corresponding coarse-grained optical flow data according to two adjacent frames of images. Coarse-grained optical flow data refers to logically larger optical flow data that is accommodated.
Wherein, the coarse-grained optical flow generation model can be a network structure model, such as a U-net network structure. Wherein the U-net network structure is a convolutional network structure. Meanwhile, the network structure model can adopt artificial intelligence to carry out deep learning so as to obtain the learned network structure model.
Step S102, inputting the pre-configured frame-complementing time data and the coarse-grained optical flow data into an intermediate frame optical flow generation model trained to be converged to output intermediate frame optical flow data.
In this embodiment, the preconfigured frame-complementing time data is related to a frame-complementing condition required by the video, so that the frame-complementing time data can be preconfigured according to the frame-complementing condition, for example, the preconfigured frame-complementing time data has a range of [0, 1 ]. For example, the target video has 30 frames of images, and 60 frames of images are required to be formed by frame complementing, and the preconfigured frame complementing time data can be set to be one half. If the target video needs to form an image of 90 frames by frame complementing, the preconfigured frame complementing time data may be set to one third.
In this embodiment, the intermediate frame optical flow generation model trained to converge is a trained model, and is used to generate intermediate frame optical flow data according to the pre-configured frame-complementing time data and coarse-grained optical flow data.
Wherein, the intermediate frame optical flow generation model can be a network structure model. The intermediate frame optical flow generation model can adopt a convolution network structure with spatial and temporal characteristics.
In this embodiment, the intermediate-frame optical flow data is obtained by fusing time information in the pre-configured frame-complementing time data and motion information of the object motion trajectory in the coarse-grained optical flow data, which may also be referred to as spatial information of the object motion trajectory.
In step S103, a target intermediate frame image is generated from the front and rear adjacent two frame images and the intermediate frame optical flow data.
In this embodiment, the intermediate frame optical flow data is combined based on the two adjacent frames of images, and the intermediate frame optical flow data has the motion information of the intermediate frame, so that the target intermediate frame image can be generated by combining the two adjacent frames of images. Meanwhile, the operation mode can be that the target intermediate frame image is generated by the mapping operation of the front and back adjacent two frame images and the combined intermediate frame optical flow data.
The embodiment of the invention provides a video image frame supplementing method, which is applied to electronic equipment and comprises the following steps: and extracting front and rear adjacent frames of images in the target video, and respectively inputting the front and rear adjacent frames of images into a coarse-grained optical flow generation model trained to be convergent so as to output coarse-grained optical flow data corresponding to the front and rear adjacent frames of images. And inputting the pre-configured frame supplementing time data and the coarse-grained optical flow data into an intermediate frame optical flow generation model trained to be converged to output the intermediate frame optical flow data. And generating a target intermediate frame image according to the front and back adjacent two frame images and the intermediate frame optical flow data. Corresponding coarse-grained optical flow data are output according to the front and back adjacent two frames of images in the target video through the intermediate frame optical flow generation model trained to be convergent, and intermediate frame optical flow data are output according to pre-configured frame supplementing time data and the coarse-grained optical flow data through the intermediate frame optical flow generation model trained to be convergent. And then generating a target intermediate frame image according to the front and back adjacent two frame images and the intermediate frame optical flow data. According to the method provided by the embodiment of the invention, the generated intermediate frame optical flow data and the coarse-grained optical flow data have stronger relevance by fusing the time information of the frame supplementing time data and the motion information contained in the coarse-grained optical flow data through the intermediate frame optical flow generation model trained to be converged, so that the relevance between the target intermediate frame image and the previous and subsequent frames is improved, and the continuity of the whole video image is further improved.
Fig. 3 is a schematic flow chart of a video image frame complementing method according to a second embodiment of the present invention, and as shown in fig. 3, the video image frame complementing method according to the present embodiment is further refined in each step based on the video image frame complementing method according to the first embodiment of the present invention. The video image frame interpolation method provided by the embodiment includes the following steps.
Wherein step 201-202 is a further refinement of step 101.
Step S201, two adjacent frames of images in the target video are extracted and input into a coarse-grained optical flow generation model trained to be converged respectively.
In this embodiment, the implementation manner of step 201 is similar to that of step 101 in the first embodiment of the present invention, and is not described in detail here.
Step S202, extracting corresponding image characteristic parameters from the two adjacent frames of images through the coarse-grained optical flow generation model, and outputting coarse-grained optical flow data corresponding to the two adjacent frames of images according to the image characteristic parameters.
In this embodiment, the image characteristic parameters include object motion vectors, pixel coordinates, and the like. The extraction of the corresponding image characteristic parameters from the two adjacent frames of images is used for comparing the image characteristic parameters in the two adjacent frames of images so as to output the coarse-grained optical flow data corresponding to the two adjacent frames of images. Therefore, in this embodiment, the coarse-grained optical flow data corresponding to two adjacent frames of images before and after being output according to the image feature parameters may specifically be: and outputting coarse-grained optical flow data corresponding to the front and back adjacent two frames of images according to the object motion vector and the pixel coordinates.
The obtained coarse-grained optical flow data can be more corresponding to the two adjacent frames of images by outputting the coarse-grained optical flow data corresponding to the two adjacent frames of images by using the image characteristic parameters.
It should be noted that step 203-204 is a further refinement of step 102. Also, optionally, the coarse-grained optical flow data comprises coarse-grained bi-directional optical flow data. The intermediate frame optical flow generation model comprises a fusion function and an object motion track fitting function.
Step S203, fusing the pre-configured frame-complementing time data and the coarse-grained bidirectional optical flow data through a fusion function to output frame-complementing time bidirectional optical flow data corresponding to the frame-complementing time data.
In this embodiment, the pre-configured frame-complementing time data is similar to step 102 in the first embodiment of the present invention, and is not described in detail here.
In this embodiment, the fusion function is:
Figure BDA0002599952280000091
and
Figure BDA0002599952280000092
wherein the content of the first and second substances,
Figure BDA0002599952280000093
and optical flow data representing directions from an adjacent subsequent frame image to an adjacent previous frame image in the frame-complementing-time bidirectional optical flow data.
Figure BDA0002599952280000094
And optical flow data representing directions from an adjacent preceding frame image to an adjacent succeeding frame image in the frame-complementing-time bidirectional optical flow data.
Figure BDA0002599952280000095
Representing pre-configured frame complement time data. F0→1Optical flow data representing directions from an adjacent preceding frame image to an adjacent succeeding frame image in the coarse-grained bidirectional optical flow data. F1→0Optical flow data representing directions from an adjacent subsequent frame image to an adjacent previous frame image in the coarse-grained bidirectional optical flow data.
In this embodiment, the pre-configured frame supplementing time data and the coarse-grained bidirectional optical flow data are fused by the fusion function to output the frame supplementing time bidirectional optical flow data corresponding to the frame supplementing time data, so that the frame supplementing time bidirectional optical flow data can include time information, thereby providing a basis for weight determination when a target intermediate frame image is subsequently generated.
Step S204, inputting the frame-complementing time bidirectional optical flow data into an object motion track fitting function to output intermediate frame optical flow data.
In this embodiment, the object motion trajectory fitting function may adopt an object motion trajectory fitting function in a Conv-LSTM network or another object motion trajectory fitting function, which is not limited in this embodiment. The Conv-LSTM network is a new network structure formed by adding convolution operation capable of extracting spatial features to an LSTM network capable of extracting time sequence features, and the network structure takes an object motion track fitting function as a core.
In this embodiment, the intermediate frame optical flow data is optimized by the object motion trajectory fitting function, so that the relevance between the subsequently generated target intermediate frame image and the two adjacent frames of images is higher.
It should be noted that the steps 205-206 are further detailed for the step 103.
And step S205, acquiring the weight of the proportion of the intermediate frame optical flow data occupied by the front and rear adjacent frames of images according to the intermediate frame optical flow data.
In this embodiment, the weight of the proportion of the intermediate frame optical flow data occupied by the two adjacent frames of images in the intermediate frame optical flow data can be confirmed according to the time information included in the intermediate frame optical flow data. The mapping relationship between the time information and the weight value can be preset. For example, as the time information is closer to 0, the weight representing the adjacent previous frame image is higher. In contrast, as the time information is closer to 1, the weight representing the adjacent subsequent frame image is higher.
And step S206, generating a target intermediate frame image through mapping operation according to the weight and the two adjacent frames of images.
In this embodiment, the process of generating the target intermediate frame image may be to generate corresponding target previous frames and target subsequent frames from the two adjacent previous and subsequent frame images and the intermediate frame optical flow data through mapping operation, and to obtain the target intermediate frame image by fusing the target previous frames and the target subsequent frames according to the weight. The fusion process specifically comprises the following steps: and obtaining a weighted front frame according to the weight and the spatial information of the target front frame, and obtaining a weighted rear frame according to the weight and the spatial information of the target rear frame. And fusing the weighted frame and the weighted frame to obtain a target intermediate frame image.
The embodiment of the invention provides a video image frame supplementing method, which extracts corresponding image characteristic parameters from two adjacent frames of images through a coarse-grained optical flow generation model and outputs coarse-grained optical flow data corresponding to the two adjacent frames of images according to the image characteristic parameters. And fusing the pre-configured frame supplementing time data and the coarse-grained bidirectional optical flow data through a fusion function to output frame supplementing time bidirectional optical flow data corresponding to the frame supplementing time data. Meanwhile, the frame-complementing time bidirectional optical flow data is input to an object motion trajectory fitting function to output intermediate frame optical flow data. And then, acquiring a weight value of the proportion of the front and rear adjacent frames of images in the optical flow data of the intermediate frame according to the optical flow data of the intermediate frame, and generating a target intermediate frame image through mapping operation according to the weight value and the front and rear adjacent frames of images.
The method provided by the embodiment of the invention fuses pre-configured frame supplementing time data and coarse-grained bidirectional optical flow data through a fusion function included in an intermediate frame optical flow generation model trained to be convergent, generates corresponding frame supplementing time bidirectional optical flow data by fusing time information and motion information, and then correspondingly optimizes the frame supplementing time bidirectional optical flow data through an object motion track fitting function so as to output the intermediate frame optical flow data. Because the intermediate frame optical flow data comprises the time information, the weight of the proportion of the front and rear adjacent frame images in the intermediate frame optical flow data can be obtained according to the intermediate frame optical flow data, so that the generated target intermediate frame image is combined with the content of the front and rear adjacent frame images according to the weight, the accuracy of the generated target intermediate frame image is higher, the relevance between the target intermediate frame image and the front and rear frames is improved, and the continuity of the whole video image is improved.
Fig. 4 is a flowchart illustrating step 202 of a video image frame interpolation method according to a second embodiment of the present invention. As shown in fig. 4, the video image frame interpolation method provided in this embodiment is a further refinement of step 202 on the basis of the video image frame interpolation method provided in the second embodiment of the present invention. The video image frame interpolation method provided by the embodiment includes the following steps.
Step S2021, extracting image feature parameters from two adjacent frames of images and encoding the image feature parameters to obtain corresponding encoding results. The coarse-grained optical flow generation model comprises a coding and decoding network and a reversed convolution structure.
In this embodiment, the image feature parameters are similar to those in step 202 in the second embodiment of the present invention, and are not described in detail herein.
The encoding network may be a network structure that converts image characteristic parameters and the like in an image into code information data. The flipped convolutional structure is a convolutional network structure.
Step S2022, inputting the image feature parameters into the inverse convolution structure to obtain the alignment feature maps of the two adjacent frames of images.
In this embodiment, the inverse convolution structure is mainly used to obtain spatial related information, such as motion information, of two adjacent frames of images according to the image characteristic parameters, and is expressed in the form of an alignment characteristic map. The specific process is as follows: and (4) turning the convolution structure to generate an alignment characteristic image according to the spatial information of the previous frame image and the corresponding spatial information of the next frame image. For example, the pixel coordinates of the previous frame image and the pixel coordinates of the subsequent frame image are aligned one by one, and the motion vector of the object of the previous frame image is aligned with the motion vector of the object corresponding to the subsequent frame image, so as to generate an alignment feature map.
Step S2023, inputting the alignment feature map and the encoding result into a decoding network to output coarse-grained optical flow data corresponding to two adjacent frames of images.
The decoding network is a network structure for converting code information data into optical flow data.
In this embodiment, the alignment feature map and the encoding result are input to a decoding network, and information data such as image parameters are converted into coarse-grained optical flow data corresponding to two adjacent frames of images through the decoding network. The alignment feature map generated by inverting the convolution structure can enable the obtained image feature change between the previous frame image and the next frame image to be more accurate, and therefore the accuracy of the subsequently generated coarse-grained optical flow data is higher.
Fig. 5 is a schematic diagram of a training flow of a video image frame interpolation method according to a fourth embodiment of the present invention, and as shown in fig. 5, the video image frame interpolation method according to the present embodiment is a training flow that adds a coarse-grained optical flow generation model to the video image frame interpolation methods according to the first to third embodiments of the present invention. The video image frame interpolation method provided by the embodiment includes the following steps.
Step S301, a first training sample is obtained, wherein the first training sample is a training sample corresponding to the coarse-grained optical flow generation model. The first training sample comprises: a previous frame image and a subsequent frame image.
In this embodiment, the previous frame image and the next frame image may be the 1 st frame in the target video of the previous frame image, and the corresponding next frame image is the 3 rd frame in the target video. Meanwhile, the image of the previous frame is the 1 st frame in the target video, and the image of the corresponding next frame is the 5 th frame in the target video. So as to ensure that the actual intermediate frame image obtained subsequently has a standard intermediate frame image with contrast in the target video. For example, when the previous frame image is the 1 st frame in the target video and the corresponding next frame image is the 3 rd frame in the target video, the standard intermediate frame image is the 2 nd frame in the target video. Therefore, the generated actual intermediate frame image can be compared with the standard intermediate frame image, and the similarity between the generated actual intermediate frame image and the standard intermediate frame image is determined, so that the accuracy of the generated actual intermediate frame image is improved.
Step S302, inputting the first training sample into a preset coarse-grained optical flow generation model so as to train the preset coarse-grained optical flow generation model.
In this embodiment, the first training sample is input into a preset coarse-grained optical flow generation model that needs to be trained, and the generated coarse-grained optical flow data is used to determine whether a convergence condition is satisfied through a reconstruction loss function.
Step S303, adopting a reconstruction loss function to judge whether the preset coarse-grained optical flow generation model meets a convergence condition.
In this embodiment, the reconstruction loss function is:
Lr1=|warp_op(frame 1,-F0→1)-frame 0|
Lr2=|warp_op(frame 0,-F1→0)-frame 1|
Figure BDA0002599952280000121
wherein L isr1Representing the intermediate function of the later frame, Lr2Representing the intermediate function of the previous frame, LrRepresenting the reconstruction loss function, frame 1 representing the following frame image, frame 0 representing the preceding frame image, F0→1Optical flow data representing the direction in coarse-grained optical flow data from a preceding image to a following image, F1→0Optical flow data representing the direction from the subsequent frame image to the previous frame image in the coarse-grained bidirectional optical flow data, warp _ op representing the operation of generating an output frame image using the optical flow data and the input frame image, x and y representing the coordinates of pixel points in the image, and N representing the number of image pixels.
In this embodiment, the reconstruction loss function is mainly used to monitor the generation effect of the coarse-grained optical flow data.
In step S304, if the preset coarse-grained optical flow generation model satisfies the convergence condition, the coarse-grained optical flow generation model satisfying the convergence condition is determined as the coarse-grained optical flow generation model trained to converge.
The convergence condition corresponding to the preset coarse-grained optical flow generation model is when the reconstruction loss function reaches the minimum value which can be optimized. At this time, the preset coarse-grained optical flow generation model meets the convergence condition, and the generated coarse-grained optical flow has high accuracy.
In this embodiment, by using the trained coarse-grained optical flow generation model, the coarse-grained optical flow data generated by the coarse-grained optical flow generation model can be made more accurate.
Fig. 6 is a schematic diagram of a training flow of a video image frame interpolation method according to a fifth embodiment of the present invention, and as shown in fig. 6, the video image frame interpolation method according to the present embodiment is a training flow added with an intermediate frame optical flow generation model on the basis of the video image frame interpolation method according to the fourth embodiment of the present invention. The video image frame interpolation method provided by the embodiment includes the following steps.
Step S401, a second training sample is obtained, wherein the second training sample is a training sample corresponding to the intermediate frame optical flow generation model. The second training sample comprises: a first standard inter-frame image and a first actual inter-frame image.
In this embodiment, the first standard inter-frame image is a standard inter-frame image in the target video for comparison. The first actual intermediate frame image is an actual intermediate frame image generated by an intermediate frame optical flow generation model in the training process.
Step S402, inputting a second training sample into the preset intermediate frame optical flow generation model to train the preset intermediate frame optical flow generation model.
In this embodiment, the training process is similar to that of step 302 in the fourth embodiment of the present invention, and is not described in detail herein.
And step S403, judging whether the preset intermediate frame optical flow generation model meets a convergence condition by adopting a perception loss function.
In this embodiment, the perceptual loss function is:
Figure BDA0002599952280000131
wherein L ispDenotes the perceptual loss function, phi (I) denotes the characteristic output of the first actual inter-frame image, phi (gt) denotes the characteristic output of the first standard inter-frame image, and N denotes the number of image pixels.
In this embodiment, the perceptual loss function mainly evaluates the two images at a higher semantic level, and the loss calculation process generally includes two steps of high-level feature extraction and feature difference calculation. The high-level feature extraction generally adopts high-level features in a pre-trained deep neural network, for example, the output of a conv4_3 convolutional layer in a VGG16 network pre-trained on ImageNet, wherein ImageNet is a large visualization database used for visual object recognition software research, the VGG16 network is a network structure, and the conv4_3 convolutional layer is a convolutional layer in the VGG16 network.
In step S404, if the preset intermediate frame optical flow generation model satisfies the convergence condition, the intermediate frame optical flow generation model satisfying the convergence condition is determined as the intermediate frame optical flow generation model trained to converge.
And the convergence condition of the preset intermediate frame optical flow generation model is that the perception loss function reaches the minimum value. At this time, the preset intermediate frame optical flow generation model meets the convergence condition, and the generated intermediate frame optical flow has high accuracy.
Optionally, in this embodiment, an interframe loss function may also be used to determine whether the preset intermediate frame optical flow generation model meets the convergence condition to train the intermediate frame optical flow generation model. At this time, the training samples include: a second standard inter-frame image and a second actual inter-frame image.
The interframe loss function is:
Figure BDA0002599952280000141
wherein L isfAnd expressing an interframe loss function, I expressing a second actual intermediate frame image, gt expressing a second standard intermediate frame image, x and y expressing the coordinates of pixel points in the image, and N expressing the number of image pixels.
And the convergence condition of the preset intermediate frame optical flow generation model is that the inter-frame loss function reaches the minimum value. At this time, the preset intermediate frame optical flow generation model satisfies the convergence condition and is determined as an intermediate frame optical flow generation model trained to converge.
Meanwhile, in the embodiment, the accuracy of the intermediate frame optical flow generated by the trained intermediate frame optical flow generation model can be higher through the inter-frame loss function.
Fig. 7 is a schematic structural diagram of a video image frame complementing apparatus according to a sixth embodiment of the present invention, as shown in fig. 7, in this embodiment, the apparatus is located in an electronic device, and the video image frame complementing apparatus 500 includes:
the coarse-grained optical flow generation module 501 is configured to extract two adjacent frames of images in the target video, and input the two adjacent frames of images into a coarse-grained optical flow generation model trained to be convergent to output coarse-grained optical flow data corresponding to the two adjacent frames of images.
An intermediate frame optical flow generating module 502, configured to input the pre-configured frame-complementing time data and the coarse-grained optical flow data into an intermediate frame optical flow generating model trained to converge, so as to output the intermediate frame optical flow data.
An intermediate frame image generating module 503, configured to generate a target intermediate frame image according to the two adjacent front and back frame images and the intermediate frame optical flow data.
The video image frame interpolation apparatus provided in this embodiment may implement the technical solution of the method embodiment shown in fig. 2, and the implementation principle and technical effect thereof are similar to those of the method embodiment shown in fig. 2, and are not described in detail herein.
Meanwhile, another embodiment of the video image frame complementing device provided by the present invention further refines the video image frame complementing device 500 on the basis of the video image frame complementing device provided by the previous embodiment.
Optionally, in this embodiment, the coarse-grained optical flow generating module 501 is specifically configured to:
and extracting front and back adjacent two frames of images in the target video, and respectively inputting the images into a coarse-grained optical flow generation model trained to be convergent.
Meanwhile, corresponding image characteristic parameters are extracted from the front and the back adjacent two frames of images through the coarse-grained optical flow generation model, and coarse-grained optical flow data corresponding to the front and the back adjacent two frames of images are output according to the image characteristic parameters.
Optionally, in this embodiment, the coarse-grained optical flow generation model includes a coding and decoding network and a reversed convolution structure.
The coarse-grained optical flow generation module 501 is specifically configured to, when extracting corresponding image feature parameters from two previous and next frames of images through the coarse-grained optical flow generation model and outputting coarse-grained optical flow data corresponding to two previous and next frames of images according to the image feature parameters:
and extracting image characteristic parameters from the front and back adjacent two frames of images through a coding network and coding to obtain a corresponding coding result.
Meanwhile, inputting the image characteristic parameters into a turning convolution structure to obtain the alignment characteristic graphs of the two adjacent frames of images.
And inputting the alignment feature map and the coding result into a decoding network to output coarse-grained optical flow data corresponding to two adjacent frames of images.
Optionally, in this embodiment, the coarse-grained optical flow data includes coarse-grained bidirectional optical flow data. The intermediate frame optical flow generation model comprises a fusion function and an object motion track fitting function.
The intermediate-frame optical flow generation module 502 is specifically configured to:
and fusing the pre-configured frame supplementing time data and the coarse-grained bidirectional optical flow data through a fusion function to output frame supplementing time bidirectional optical flow data corresponding to the frame supplementing time data.
Meanwhile, the frame-complementing time bidirectional optical flow data is input to an object motion trajectory fitting function to output intermediate frame optical flow data.
Optionally, in this embodiment, the video image frame interpolation apparatus 500 further includes: a first training module.
The first training module is used for acquiring a first training sample, the first training sample is a training sample corresponding to the coarse-grained optical flow generation model, and the first training sample comprises: a previous frame image and a subsequent frame image.
Meanwhile, inputting the first training sample into a preset coarse-grained optical flow generation model so as to train the preset coarse-grained optical flow generation model.
And then, judging whether the preset coarse-grained optical flow generation model meets the convergence condition or not by adopting a reconstruction loss function.
And if the preset coarse-grained optical flow generation model meets the convergence condition, determining the coarse-grained optical flow generation model meeting the convergence condition as the coarse-grained optical flow generation model trained to be converged.
Optionally, in this embodiment, the video image frame interpolation apparatus 500 further includes: a second training module.
The second training module is used for obtaining a second training sample, the second training sample is a training sample corresponding to the intermediate frame optical flow generation model, and the second training sample comprises: a first standard inter-frame image and a first actual inter-frame image.
Meanwhile, inputting a second training sample into the preset intermediate frame optical flow generation model to train the preset intermediate frame optical flow generation model.
And then, judging whether the preset intermediate frame optical flow generation model meets the convergence condition or not by adopting a perception loss function.
And if the preset intermediate frame optical flow generation model meets the convergence condition, determining the intermediate frame optical flow generation model meeting the convergence condition as the intermediate frame optical flow generation model trained to be converged.
Optionally, in this embodiment, the intermediate frame image generating module 503 is specifically configured to:
and acquiring the weight of the proportion of the front and rear adjacent frames of images in the intermediate frame optical flow data according to the intermediate frame optical flow data.
And generating a target intermediate frame image through mapping operation according to the weight and the two adjacent frames of images.
The video image frame interpolation apparatus provided in this embodiment may implement the technical solutions of the method embodiments shown in fig. 2 to 6, and the implementation principles and technical effects thereof are similar to those of the method embodiments shown in fig. 2 to 6, and are not described in detail herein.
The invention also provides an electronic device and a computer-readable storage medium according to the embodiments of the invention.
Fig. 8 is a schematic structural diagram of an electronic device according to a seventh embodiment of the invention. Electronic devices are intended for various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular phones, smart phones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be exemplary only, and are not meant to limit implementations of the inventions described and/or claimed herein.
As shown in fig. 8, the electronic apparatus includes: a processor 601, a memory 602. The various components are interconnected using different buses and may be mounted on a common motherboard or in other manners as desired. The processor may process instructions for execution within the electronic device.
The memory 602 is a non-transitory computer readable storage medium provided by the present invention. The memory stores instructions executable by the at least one processor, so that the at least one processor executes the video image frame complementing method provided by the invention. The non-transitory computer-readable storage medium of the present invention stores computer instructions for causing a computer to execute the video image frame interpolation method provided by the present invention.
The memory 602, which is a non-transitory computer-readable storage medium, may be used to store non-transitory software programs, non-transitory computer-executable programs, and modules, such as program instructions/modules corresponding to the video image frame interpolation method in the embodiment of the present invention (for example, the coarse-grained optical flow generation module 501, the intermediate frame optical flow generation module 502, and the intermediate frame image generation module 503 shown in fig. 7). The processor 601 executes various functional applications and data processing of the server by running non-transitory software programs, instructions and modules stored in the memory 602, that is, implementing the video image frame complementing method in the above method embodiment.
Other embodiments of the invention will be apparent to those skilled in the art from consideration of the specification and practice of the invention disclosed herein. This application is intended to cover any variations, uses, or adaptations of the embodiments of the invention following, in general, the principles of the embodiments of the invention and including such departures from the present disclosure as come within known or customary practice within the art to which the embodiments of the invention pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of embodiments of the invention being indicated by the following claims.
It is to be understood that the embodiments of the present invention are not limited to the precise arrangements described above and shown in the drawings, and that various modifications and changes may be made without departing from the scope thereof. The scope of embodiments of the invention is limited only by the appended claims.

Claims (8)

1. A video image frame complementing method is applied to an electronic device, and comprises the following steps:
extracting front and rear adjacent frames of images in a target video, and respectively inputting the front and rear adjacent frames of images into a coarse-grained optical flow generation model trained to be convergent so as to output coarse-grained optical flow data corresponding to the front and rear adjacent frames of images;
inputting the pre-configured frame supplementing time data and the coarse-grained optical flow data into an intermediate frame optical flow generation model trained to be converged to output intermediate frame optical flow data;
generating a target intermediate frame image according to the front and back adjacent two frame images and the intermediate frame optical flow data;
the method for extracting front and rear adjacent frames of images in a target video and respectively inputting the front and rear adjacent frames of images into a coarse-grained optical flow generation model trained to be convergent so as to output coarse-grained optical flow data corresponding to the front and rear adjacent frames of images comprises the following steps:
extracting front and back adjacent two frames of images in a target video, and respectively inputting the images into a coarse-grained optical flow generation model trained to be convergent;
extracting corresponding image characteristic parameters from the front and rear adjacent two frames of images through the coarse-grained optical flow generation model, and outputting coarse-grained optical flow data corresponding to the front and rear adjacent two frames of images according to the image characteristic parameters;
the coarse-grained optical flow generation model comprises a coding and decoding network and a turning convolution structure;
the extracting, by the coarse-grained optical flow generation model, corresponding image feature parameters from the front and rear two frames of images, and outputting coarse-grained optical flow data corresponding to the front and rear two adjacent frames of images according to the image feature parameters, includes:
extracting the image characteristic parameters from the front and back adjacent two frames of images through a coding network and coding to obtain corresponding coding results;
inputting the image characteristic parameters into the turning convolution structure to obtain the alignment characteristic graphs of the front and rear adjacent frames of images;
and inputting the alignment feature map and the coding result into a decoding network to output coarse-grained optical flow data corresponding to the front and rear adjacent two frames of images.
2. The method of claim 1, wherein the coarse-grained optical flow data comprises coarse-grained bi-directional optical flow data; the intermediate frame optical flow generation model comprises a fusion function and an object motion track fitting function;
the inputting the pre-configured frame-complementing time data and the coarse-grained optical flow data into an intermediate frame optical flow generation model trained to converge to output intermediate frame optical flow data comprises:
fusing pre-configured frame supplementing time data and the coarse-grained bidirectional optical flow data through the fusion function to output frame supplementing time bidirectional optical flow data corresponding to the frame supplementing time data;
and inputting the frame supplementing time bidirectional optical flow data into the object motion track fitting function to output intermediate frame optical flow data.
3. The method according to claim 1, wherein before the input into the coarse-grained optical flow generation model trained to converge respectively to output the coarse-grained optical flow data corresponding to the two adjacent frames of images, the method further comprises:
obtaining a first training sample, wherein the first training sample is a training sample corresponding to a coarse-grained optical flow generation model, and the first training sample comprises: a previous frame image and a subsequent frame image;
inputting the first training sample into a preset coarse-grained optical flow generation model to train the preset coarse-grained optical flow generation model;
judging whether the preset coarse-grained light stream generation model meets a convergence condition or not by adopting a reconstruction loss function;
and if the preset coarse-grained optical flow generation model meets the convergence condition, determining the coarse-grained optical flow generation model meeting the convergence condition as a coarse-grained optical flow generation model trained to be converged.
4. The method of claim 1, wherein before inputting the preconfigured complement temporal data and the coarse-grained optical flow data into an inter-frame optical flow generation model trained to converge to output inter-frame optical flow data, further comprising:
acquiring a second training sample, wherein the second training sample is a training sample corresponding to the intermediate frame optical flow generation model, and the second training sample comprises: a first standard inter-frame image and a first actual inter-frame image;
inputting the second training sample into a preset intermediate frame optical flow generation model so as to train the preset intermediate frame optical flow generation model;
judging whether the preset intermediate frame optical flow generation model meets a convergence condition or not by adopting a perception loss function;
and if the preset intermediate frame optical flow generation model meets the convergence condition, determining the intermediate frame optical flow generation model meeting the convergence condition as the intermediate frame optical flow generation model trained to be converged.
5. The method of claim 1, wherein generating a target inter-frame image from the two consecutive frame images and the inter-frame optical flow data comprises:
acquiring a weight of the proportion of the intermediate frame optical flow data occupied by the front and rear adjacent frames of images according to the intermediate frame optical flow data;
and generating a target intermediate frame image through mapping operation according to the weight and the two adjacent frames of images.
6. An apparatus for frame-filling a video image, the apparatus being located in an electronic device, comprising:
the coarse-grained optical flow generation module is used for extracting front and back adjacent two frames of images in the target video, and respectively inputting the front and back adjacent two frames of images into a coarse-grained optical flow generation model trained to be converged so as to output coarse-grained optical flow data corresponding to the front and back adjacent two frames of images;
the intermediate frame optical flow generation module is used for inputting pre-configured frame supplementing time data and the coarse-grained optical flow data into an intermediate frame optical flow generation model trained to be converged so as to output intermediate frame optical flow data;
the intermediate frame image generation module is used for generating a target intermediate frame image according to the front and back adjacent two frame images and the intermediate frame optical flow data;
the coarse-grained optical flow generation module is specifically configured to:
extracting front and back adjacent two frames of images in a target video, and respectively inputting the images into a coarse-grained optical flow generation model trained to be convergent; extracting corresponding image characteristic parameters from the front and rear adjacent two frames of images through the coarse-grained optical flow generation model, and outputting coarse-grained optical flow data corresponding to the front and rear adjacent two frames of images according to the image characteristic parameters;
the coarse-grained optical flow generation model comprises a coding and decoding network and a turning convolution structure;
the coarse-grained optical flow generation module is specifically configured to, when extracting corresponding image feature parameters from the two previous and subsequent frames of images through the coarse-grained optical flow generation model and outputting coarse-grained optical flow data corresponding to the two previous and subsequent adjacent frames of images according to the image feature parameters:
extracting the image characteristic parameters from the front and back adjacent two frames of images through a coding network and coding to obtain corresponding coding results; inputting the image characteristic parameters into the turning convolution structure to obtain the alignment characteristic graphs of the front and rear adjacent frames of images; and inputting the alignment feature map and the coding result into a decoding network to output coarse-grained optical flow data corresponding to the front and rear adjacent two frames of images.
7. A video image frame complementing apparatus, comprising: a memory, a processor;
a memory; a memory for storing the processor-executable instructions;
wherein the processor is configured to perform the video image frame interpolation method of any one of claims 1 to 5 by the processor.
8. A computer-readable storage medium having computer-executable instructions stored thereon, which when executed by a processor, are configured to implement the video image frame complementing method according to any one of claims 1 to 5.
CN202010720883.9A 2020-07-24 2020-07-24 Video image frame supplementing method, device and equipment and storage medium Active CN112040311B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010720883.9A CN112040311B (en) 2020-07-24 2020-07-24 Video image frame supplementing method, device and equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010720883.9A CN112040311B (en) 2020-07-24 2020-07-24 Video image frame supplementing method, device and equipment and storage medium

Publications (2)

Publication Number Publication Date
CN112040311A CN112040311A (en) 2020-12-04
CN112040311B true CN112040311B (en) 2021-10-26

Family

ID=73582988

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010720883.9A Active CN112040311B (en) 2020-07-24 2020-07-24 Video image frame supplementing method, device and equipment and storage medium

Country Status (1)

Country Link
CN (1) CN112040311B (en)

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113066478A (en) * 2020-12-07 2021-07-02 泰州市朗嘉馨网络科技有限公司 Dialect recognition system based on model training
CN112584234B (en) * 2020-12-09 2023-06-16 广州虎牙科技有限公司 Frame supplementing method and related device for video image
CN112804561A (en) * 2020-12-29 2021-05-14 广州华多网络科技有限公司 Video frame insertion method and device, computer equipment and storage medium
CN114066730B (en) * 2021-11-04 2022-10-28 西北工业大学 Video frame interpolation method based on unsupervised dual learning
CN116033225A (en) * 2023-03-16 2023-04-28 深圳市微浦技术有限公司 Digital signal processing method, device, equipment and storage medium based on set top box

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109068174A (en) * 2018-09-12 2018-12-21 上海交通大学 Video frame rate upconversion method and system based on cyclic convolution neural network
CN109672886A (en) * 2019-01-11 2019-04-23 京东方科技集团股份有限公司 A kind of picture frame prediction technique, device and head show equipment
CN110267098A (en) * 2019-06-28 2019-09-20 连尚(新昌)网络科技有限公司 A kind of method for processing video frequency and terminal
CN110351511A (en) * 2019-06-28 2019-10-18 上海交通大学 Video frame rate upconversion system and method based on scene depth estimation
US20200160495A1 (en) * 2017-05-01 2020-05-21 Gopro, Inc. Apparatus and methods for artifact detection and removal using frame interpolation techniques
CN111277826A (en) * 2020-01-22 2020-06-12 腾讯科技(深圳)有限公司 Video data processing method and device and storage medium
CN111405316A (en) * 2020-03-12 2020-07-10 北京奇艺世纪科技有限公司 Frame insertion method, electronic device and readable storage medium

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110324664B (en) * 2019-07-11 2021-06-04 南开大学 Video frame supplementing method based on neural network and training method of model thereof
CN110798630B (en) * 2019-10-30 2020-12-29 北京市商汤科技开发有限公司 Image processing method and device, electronic equipment and storage medium
CN110933497B (en) * 2019-12-10 2022-03-22 Oppo广东移动通信有限公司 Video image data frame insertion processing method and related equipment
CN111311490B (en) * 2020-01-20 2023-03-21 陕西师范大学 Video super-resolution reconstruction method based on multi-frame fusion optical flow
CN111327926B (en) * 2020-02-12 2022-06-28 北京百度网讯科技有限公司 Video frame insertion method and device, electronic equipment and storage medium

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20200160495A1 (en) * 2017-05-01 2020-05-21 Gopro, Inc. Apparatus and methods for artifact detection and removal using frame interpolation techniques
CN109068174A (en) * 2018-09-12 2018-12-21 上海交通大学 Video frame rate upconversion method and system based on cyclic convolution neural network
CN109672886A (en) * 2019-01-11 2019-04-23 京东方科技集团股份有限公司 A kind of picture frame prediction technique, device and head show equipment
CN110267098A (en) * 2019-06-28 2019-09-20 连尚(新昌)网络科技有限公司 A kind of method for processing video frequency and terminal
CN110351511A (en) * 2019-06-28 2019-10-18 上海交通大学 Video frame rate upconversion system and method based on scene depth estimation
CN111277826A (en) * 2020-01-22 2020-06-12 腾讯科技(深圳)有限公司 Video data processing method and device and storage medium
CN111405316A (en) * 2020-03-12 2020-07-10 北京奇艺世纪科技有限公司 Frame insertion method, electronic device and readable storage medium

Also Published As

Publication number Publication date
CN112040311A (en) 2020-12-04

Similar Documents

Publication Publication Date Title
CN112040311B (en) Video image frame supplementing method, device and equipment and storage medium
US11610122B2 (en) Generative adversarial neural network assisted reconstruction
US11625613B2 (en) Generative adversarial neural network assisted compression and broadcast
CN112995652B (en) Video quality evaluation method and device
US11641446B2 (en) Method for video frame interpolation, and electronic device
US20230281833A1 (en) Facial image processing method and apparatus, device, and storage medium
CN113379877A (en) Face video generation method and device, electronic equipment and storage medium
CN115471658A (en) Action migration method and device, terminal equipment and storage medium
US20230237713A1 (en) Method, device, and computer program product for generating virtual image
CN117218246A (en) Training method and device for image generation model, electronic equipment and storage medium
CN112785669B (en) Virtual image synthesis method, device, equipment and storage medium
CN117576248B (en) Image generation method and device based on gesture guidance
CN116757923B (en) Image generation method and device, electronic equipment and storage medium
CN111488886B (en) Panoramic image significance prediction method, system and terminal for arranging attention features
CN113822114A (en) Image processing method, related equipment and computer readable storage medium
CN113542758A (en) Generating antagonistic neural network assisted video compression and broadcast
CN116977392A (en) Image generation method, device, electronic equipment and storage medium
CN115035219A (en) Expression generation method and device and expression generation model training method and device
Jiang et al. Analyzing and Optimizing Virtual Reality Classroom Scenarios: A Deep Learning Approach.
CN115761565B (en) Video generation method, device, equipment and computer readable storage medium
CN115984094B (en) Face safety generation method and equipment based on multi-loss constraint visual angle consistency
US20240169701A1 (en) Affordance-based reposing of an object in a scene
Shah et al. Generative Adversarial Networks for Inpainting Occluded Face Images
Qin Single-Image Animation by Cumulative Training and Motion Residual Prediction
Chen et al. LPIPS-AttnWav2Lip: Generic audio-driven lip synchronization for talking head generation in the wild

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant