CN115588153B - Video frame generation method based on 3D-DoubleU-Net - Google Patents

Video frame generation method based on 3D-DoubleU-Net Download PDF

Info

Publication number
CN115588153B
CN115588153B CN202211234067.2A CN202211234067A CN115588153B CN 115588153 B CN115588153 B CN 115588153B CN 202211234067 A CN202211234067 A CN 202211234067A CN 115588153 B CN115588153 B CN 115588153B
Authority
CN
China
Prior art keywords
dimensional
net
frame
network
encoder
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202211234067.2A
Other languages
Chinese (zh)
Other versions
CN115588153A (en
Inventor
蹇木伟
张昊然
王芮
举雅琨
杨成东
武玉增
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shandong Jiude Intelligent Technology Co ltd
Shandong University of Finance and Economics
Original Assignee
Shandong Jiude Intelligent Technology Co ltd
Shandong University of Finance and Economics
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shandong Jiude Intelligent Technology Co ltd, Shandong University of Finance and Economics filed Critical Shandong Jiude Intelligent Technology Co ltd
Priority to CN202211234067.2A priority Critical patent/CN115588153B/en
Publication of CN115588153A publication Critical patent/CN115588153A/en
Application granted granted Critical
Publication of CN115588153B publication Critical patent/CN115588153B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Landscapes

  • Processing Or Creating Images (AREA)

Abstract

The invention provides a 3D-DoubleU-Net based video frame generation method, which aims at the problems that the frame generation method is difficult to accurately acquire the space-time characteristics among video frames under the conditions of complex video scene, rapid object movement and blocked object, adopts the technology of combining a three-dimensional convolutional neural network and a double U-Net architecture to extract more space-time characteristics of the video, and generates an intermediate frame which is more similar to a real frame. According to the technical scheme, the 3D-double U-Net network can be used for exploring the characteristics that the space-time dimension can be explored and the inter-frame context information can be captured by the double U-Net network at the same time, so that more accurate motion information and more abundant space-time features between frames can be captured under an extreme scene, and a finer intermediate frame result can be generated.

Description

Video frame generation method based on 3D-DoubleU-Net
Technical Field
The invention relates to the technical field of computer vision, in particular to a video frame generation method based on 3D-DoubleU-Net.
Background
With the upgrading of video display equipment and the improvement of video transmission bandwidth, the requirements of people on video visual quality are increasing. The frame rate is one of the important indicators of the quality of a video, representing the number of frame images played per second of video. The video with lower frame rate can have picture delay and jitter phenomenon when played, thereby affecting the viewing experience of users.
The video frame generation is a technology of generating and inserting one or more frames between two continuous frames by using an original video frame image as a reference by utilizing a video/image processing technology, so that the conversion of the video frame rate from low to high is realized. The video frame generation technology is one of key technologies in the field of video processing, attracts attention of researchers, and is widely applied to the fields of video enhancement, data compression, video special effect processing and the like.
With the development of deep learning technology in recent years, a large number of video frame generation methods based on deep learning have been proposed, mainly including methods based on optical flow estimation, methods based on kernel estimation, and methods combining optical flow estimation with kernel estimation.
The most widely used method is based on estimating the optical flow between input frames, but the optical flow cannot be accurately estimated by the algorithm under the challenging condition, so that a fuzzy result is generated. The kernel estimation-based method generally adaptively estimates the kernel of each pixel, and then convolves the estimated kernel with the input frame image to obtain an intermediate frame, but the method cannot be directed to any position, and thus cannot process object motion beyond the kernel size. The method combining optical flow estimation and nuclear estimation can use an optical flow method to carry out motion estimation on an input frame and sample pixel information around a reference point. But the available reference points for this type of method are still small and the disadvantages of the methods of light flow estimation and nuclear estimation are not significantly improved.
In the actually collected video scene, the problems of complex scene, rapid object movement, object shielding, severe illumination change and the like generally exist, and great challenges are brought to video frame generation research. Therefore, the research of the video frame generation method is also one of the difficulties in the current computer vision field, and the research of the robust and accurate video frame generation method has important theoretical significance and application value.
Disclosure of Invention
In order to make up for the defects of the prior art, the invention provides a video frame generation method based on 3D-DoubleU-Net.
The invention is realized by the following technical scheme: the video frame generation method based on the 3D-DoubleU-Net is characterized by comprising the following steps of:
s1, constructing a data set: both training and test data sets contain a plurality of triplets, one triplet consisting of three consecutive frames in the time domain, denoted asWherein->Is the previous frame, +.>Is a real intermediate frame,/-, and>is the latter frame;
s2, designing a 3D-DoubleU-Net network model: the model comprises two three-dimensional U-Net networks with a double-cross view spatial attention mechanism (VISTA), wherein each three-dimensional U-Net network consists of a three-dimensional Encoder (3D-Encoder), a cavity convolution spatial pyramid pooling (ASPP) and a three-dimensional Decoder (3D-Decoder);
spliced adjacent framesSequentially inputting two three-dimensional U-Net networks, and obtaining a result of +.>The method comprises the steps of carrying out a first treatment on the surface of the Subsequently, let in>And->The second three-dimensional U-Net network is input together, and the result is +.>The method comprises the steps of carrying out a first treatment on the surface of the Finally, let(s)>And->After splicing, inputting two-dimensional convolution to obtain a final result +.>
S3, training a model: the invention is achieved by minimizing the initial results、/>And final result->And true intermediate frame->The difference between the two models realizes the training of the optimal model; the loss function used is as follows:
(1);
(2);
(3);
(4);
wherein the invention usesTraining network->,/>,/>;/>Use->Norm measure +_>、/>And->Differences between; />Optimizing +.using Charbonnier function>Norms to measure +.>And->The difference between the two is that,;/>for perception loss->Conv4_3 convolutional layer in VGG-16 network after ImageNet pre-training is used as feature extractor +.>Obtain->And->A perceived loss between;
s4, testing a model: inputting the front frame and the rear frame of the test set into a trained model, and directly generating an intermediate frame result;
s5, using a model: the real video is input into a trained network model, and a high-frame-rate video can be obtained.
Preferably, the step S1 specifically includes the following steps:
s1-1, model training use includes 51312Vimeo-90K data set of triples, wherein the triples areAnd->For the adjacent frame, as input to the network, the second frame +.>Is a real frame and is used for supervising the training of the network;
s1-2, the UCF101 and the DAVIS data set are selected for testing the model.
Preferably, the step S2 specifically includes the following steps:
s2-1, designing a 3D-DoubleU-Net network model: the model contains two three-dimensional U-Net networks with a dual view spatial attention mechanism (VISTA);
s2-2, first three-dimensional U-Net slave encoderAnd decoder->Constitution, wherein encoder->Is a pretrained ResNet18-3D (R3D-18) three-dimensional convolutional neural network; removing the pooling operation and the last classification layer from R3D-18 and using a three-dimensional convolution with a spatial stride of 2;
s2-3: pairs using pixel-level multiplication operationsAnd->Fusing, and fusing the resultInputting to a second three-dimensional U-Net network, performing feature extraction and upsampling to obtain result output 2, denoted +.>
S2-4: the second three-dimensional U-Net has the same structure as the first three-dimensional U-Net, and is formed by an encoderHole convolution spatial pyramid pooling (ASPP) and decoder->Constructing;
s2-5, willInput encoder->Obtaining the extracted characteristic->Subsequently ASPP is entered, obtaining multi-scale context information,/->
S2-6 decoderAlso four decoding blocks are included, but are +.>Using only the encoder +.>Is different from the jump connection of->A skip connection from both encoders is used. Will->Input decoder->Obtaining a second generated frame result->
S2-7, in a second three-dimensional U-Net, the last layer of each coding block and decoding block uses a double-span view space attention mechanism (VISTA) on the features;
s2-8, willAnd->Inputting the spliced result into a two-dimensional convolution to obtain a final result +.>
Further, the step S2-3 specifically comprises the following steps:
s2-3-1, input frame after cascade connectionInput to encoder->Extracting features to obtain
S2-3-2, capturing multi-scale context by adopting cavity convolution space pyramid pooling (ASPP) to obtain characteristics
S2-3-3 decoder comprising four decoding blocksIs used to reconstruct the preliminary intermediate frame result output 1, i.e
Further, the decoder in step S2-3-3Using a three-dimensional transposed convolution layer (3 DTransConv) with a stride of 2, and adding the three-dimensional transposed convolution layer after the last layer of the three-dimensional transposed convolution layer; the last layer of each decoding block uses a dual cross-view spatial attention mechanism (VISTA) for the features.
Further, the ResNet18-3D (R3D-18) three-dimensional convolutional neural network in step S2-4 is an encoderIs a backbone structure of the encoder->Is trained from scratch, comprising four coded blocks.
The invention adopts the technical proposal, and compared with the prior art, the invention has the following beneficial effects: the method mainly comprises two parts, namely, an intermediate frame is generated without using an intermediate motion estimation step, so that inaccuracy of motion estimation in the conventional method is avoided in an extreme scene; secondly, a three-dimensional convolutional neural network and a double U-Net network are combined for frame generation for the first time, and a 3D-DoubleU-Net network is provided. The 3D-double U-Net network can simultaneously use the three-dimensional convolution neural network to explore the characteristics that the space-time dimension and the double U-Net network can capture the inter-frame context information, and capture more accurate motion information and more abundant space-time characteristics between frames under extreme scenes to generate finer intermediate frame results.
Additional aspects and advantages of the invention will be set forth in part in the description which follows, or may be learned by practice of the invention.
Drawings
The foregoing and/or additional aspects and advantages of the invention will become apparent and may be better understood from the following description of embodiments taken in conjunction with the accompanying drawings in which:
FIG. 1 is a data set format;
FIG. 2 is a flow chart of the method of the present invention;
fig. 3 is a schematic diagram of a network structure according to the present invention.
Detailed Description
In order that the above-recited objects, features and advantages of the present invention will be more clearly understood, a more particular description of the invention will be rendered by reference to the appended drawings and appended detailed description. It should be noted that, in the case of no conflict, the embodiments of the present application and the features in the embodiments may be combined with each other.
In the following description, numerous specific details are set forth in order to provide a thorough understanding of the present invention, however, the present invention may be practiced otherwise than as described herein, and therefore the scope of the present invention is not limited to the specific embodiments disclosed below.
The 3D-DoubleU-Net based video frame generation method according to the embodiment of the present invention will be specifically described with reference to fig. 1 to 3. Aiming at the problems that the space-time characteristics between video frames are difficult to accurately obtain by a frame generation method under the conditions that a video scene is complex, the movement of an object is rapid and the object is blocked, the invention adopts the technology of combining a three-dimensional convolutional neural network with a double U-Net architecture to extract more space-time characteristics of the video and generate an intermediate frame which is more similar to a real frame. When the method is specifically implemented, the technical scheme of the invention can adopt the computer software technology to realize the automatic operation flow.
As shown in fig. 2, the invention provides a video frame generation method based on 3D-DoubleU-Net, which specifically includes the following steps:
s1, constructing a data set:both training and test data sets contain a plurality of triplets, one triplet consisting of three consecutive frames in the time domain, denoted asWherein->Is the previous frame, +.>Is a real intermediate frame,/-, and>is the latter frame; the method specifically comprises the following steps:
s1-1 model training uses a Vimeo-90K dataset containing 51312 triples, wherein the triples areAnd->For the adjacent frame, as input to the network, the second frame +.>Is a real frame and is used for supervising the training of the network;
s1-2, the invention selects UCF101 and DAVIS data sets to test the model.
S2, designing a 3D-DoubleU-Net network model: the model comprises two three-dimensional U-Net networks with a double-cross view spatial attention mechanism (VISTA), wherein each three-dimensional U-Net network consists of a three-dimensional Encoder (3D-Encoder), a cavity convolution spatial pyramid pooling (ASPP) and a three-dimensional Decoder (3D-Decoder);
spliced adjacent framesSequentially inputting two three-dimensional U-Net networks, passing through a firstThe result obtained for the three-dimensional U-Net network is +.>The method comprises the steps of carrying out a first treatment on the surface of the Subsequently, let in>And->The second three-dimensional U-Net network is input together, and the result is +.>The method comprises the steps of carrying out a first treatment on the surface of the Finally, let(s)>And->After splicing, inputting two-dimensional convolution to obtain a final result +.>
The method specifically comprises the following steps:
s2-1, designing a 3D-DoubleU-Net network model: the model contains two three-dimensional U-Net networks with a dual view spatial attention mechanism (VISTA);
s2-2, first three-dimensional U-Net slave encoderAnd decoder->Constitution, wherein encoder->Is a pretrained ResNet18-3D (R3D-18) three-dimensional convolutional neural network; the pooling operation and last classification layer are removed from R3D-18 and spatial stride is used2, three-dimensional convolution;
s2-3: pairs using pixel-level multiplication operationsAnd->Fusing, and fusing the resultInputting to a second three-dimensional U-Net network, performing feature extraction and upsampling to obtain result output 2, denoted +.>The method comprises the steps of carrying out a first treatment on the surface of the The method specifically comprises the following steps:
s2-3-1, input frame after cascade connectionInput to encoder->Extracting features to obtain
S2-3-2, capturing multi-scale context by adopting cavity convolution space pyramid pooling (ASPP) to obtain characteristics
S2-3-3 decoder comprising four decoding blocksIs used to reconstruct the preliminary intermediate frame result output 1, i.eThe method comprises the steps of carrying out a first treatment on the surface of the Decoder->A three-dimensional transposition convolutional layer (3 DTransConv) with a stride of 2 is used, and in order to process common chessboard artifacts, the three-dimensional transposition convolutional layer is added after the last layer of the three-dimensional transposition convolutional layer; the last layer of each decoding block uses a dual cross-view spatial attention mechanism (VISTA) for the features.
S2-4: the second three-dimensional U-Net has the same structure as the first three-dimensional U-Net, and is formed by an encoderHole convolution spatial pyramid pooling (ASPP) and decoder->Constructing; wherein the ResNet18-3D (R3D-18) three-dimensional convolutional neural network is an encoder +.>Is associated with the encoder->Different, encoder->Is trained from scratch, comprising four coded blocks.
S2-5, willInput encoder->Obtaining the extracted characteristic->Subsequently ASPP is entered, obtaining multi-scale context information,/->
S2-6 decoderAlso four decoding blocks are included, but are +.>Using only the encoder +.>Is different from the jump connection of->A skip connection from both encoders is used. Will->Input decoder->Obtaining a second generated frame result->
S2-7, in a second three-dimensional U-Net, the last layer of each coding block and decoding block uses a double-span view space attention mechanism (VISTA) on the features;
s2-8, willAnd->Inputting the spliced result into a two-dimensional convolution to obtain a final result +.>
S3, training a model: the invention is achieved by minimizing the initial results、/>And final result->And true intermediate frame->The difference between the two models realizes the training of the optimal model; the loss function used in the invention is as follows:
(1);
(2);
(3);
(4);
wherein the invention usesTraining network->,/>,/>;/>Use->Norm measure +_>、/>And->Differences between; />Optimizing +.using Charbonnier function>Norms to measure +.>And->The difference between the two is that,;/>for perceived loss, the network can be helped to effectively produce a more visually realistic result; />Conv4_3 convolutional layer in VGG-16 network after ImageNet pre-training is used as feature extractor +.>Obtain->And->A perceived loss between;
s4, testing a model: inputting the front frame and the rear frame of the test set into a trained model, and directly generating an intermediate frame result; the invention evaluates the generated intermediate frame result by using an objective method peak signal-to-noise ratio, structural similarity and a subjective method.
S5, using a model: the real video is input into a trained network model, and a high-frame-rate video can be obtained.
In the description of the present invention, the term "plurality" means two or more, unless explicitly defined otherwise, the orientation or positional relationship indicated by the terms "upper", "lower", etc. are based on the orientation or positional relationship shown in the drawings, merely for convenience of description of the present invention and to simplify the description, and do not indicate or imply that the apparatus or elements referred to must have a specific orientation, be constructed and operated in a specific orientation, and therefore should not be construed as limiting the present invention; the terms "coupled," "mounted," "secured," and the like are to be construed broadly, and may be fixedly coupled, detachably coupled, or integrally connected, for example; can be directly connected or indirectly connected through an intermediate medium. The specific meaning of the above terms in the present invention can be understood by those of ordinary skill in the art according to the specific circumstances.
In the description of the present specification, the terms "one embodiment," "some embodiments," "particular embodiments," and the like, mean that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the present invention. In this specification, schematic representations of the above terms do not necessarily refer to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples.
The above is only a preferred embodiment of the present invention, and is not intended to limit the present invention, but various modifications and variations can be made to the present invention by those skilled in the art. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims (6)

1. The video frame generation method based on the 3D-DoubleU-Net is characterized by comprising the following steps of:
s1, constructing a data set: both training and test data sets contain a plurality of triplets, one triplet consisting of three consecutive frames in the time domain, denoted asWherein->Is the previous frame, +.>Is a real intermediate frame,/-, and>is the latter frame;
s2, designing a 3D-DoubleU-Net network model: the model comprises two three-dimensional U-Net networks with a double-cross view space attention mechanism VISTA, wherein each three-dimensional U-Net network consists of a three-dimensional Encoder 3D-Encoder, a cavity convolution space pyramid pooling ASPP and a three-dimensional Decoder 3D-Decode;
spliced adjacent framesSequentially inputting two three-dimensional U-Net networks, and obtaining a result of +.>The method comprises the steps of carrying out a first treatment on the surface of the Subsequently, let in>And->The second three-dimensional U-Net network is input together, and the result is +.>The method comprises the steps of carrying out a first treatment on the surface of the Finally, let(s)>And->After splicing, inputting two-dimensional convolution to obtain a final result +.>
S3, training a model: by minimizing initial results、/>And final result->And true intermediate frame->The difference between the two models realizes the training of the optimal model; the loss function used is as follows:
(1);
(2);
(3);
(4);
wherein use is made ofTraining network->,/>,/>;/>Use->Norm measure +_>、/>And (3) withDifferences between; />Optimizing +.using Charbonnier function>Norms to measure +.>And->Difference between->For perception loss->Conv4_3 convolutional layer in VGG-16 network after ImageNet pre-training is used as feature extractor +.>Obtain->And->A perceived loss between;
s4, testing a model: inputting the front frame and the rear frame of the test set into a trained model, and directly generating an intermediate frame result;
s5, using a model: the real video is input into a trained network model, and a high-frame-rate video can be obtained.
2. The method for generating a video frame based on 3D-double u-Net according to claim 1, wherein said step S1 specifically comprises the steps of:
s1-1 model training uses a Vimeo-90K dataset containing 51312 triples, wherein the triples areAnd->As input to the network, the secondFrame->Is a real frame and is used for supervising the training of the network;
s1-2, the UCF101 and the DAVIS data set are selected for testing the model.
3. The method for generating a video frame based on 3D-double u-Net according to claim 1, wherein said step S2 specifically comprises the steps of:
s2-1, designing a 3D-DoubleU-Net network model: the model comprises two three-dimensional U-Net networks with a double-span visual angle space attention mechanism VISTA;
s2-2, first three-dimensional U-Net slave encoderAnd decoder->Constitution, wherein encoder->Is a pretrained ResNet18-3D three-dimensional convolutional neural network; the pooling operation and last classification layer are removed from ResNet18-3D and a three-dimensional convolution with a spatial stride of 2 is used;
s2-3: pairs using pixel-level multiplication operationsAnd->Fusing, and fusing the resultInputting to a second three-dimensional U-Net network, performing feature extraction and upsampling to obtain result output 2, denoted +.>
S2-4: the second three-dimensional U-Net has the same structure as the first three-dimensional U-Net, and is formed by an encoderCavity convolution spatial pyramid pooling ASPP and decoder +.>Constructing;
s2-5, willInput encoder->Obtaining the extracted characteristic->Subsequently ASPP is entered, obtaining multi-scale context information,/->
S2-6 decoderAlso four decoding blocks are included, but are +.>Using only the encoder +.>Is different from the jump connection of->Using a skip connection from two encoders; will->Input decoder->Obtaining a second generated frame result
S2-7, in a second three-dimensional U-Net, the last layer of each coding block and decoding block uses a double-span visual angle space attention mechanism VISTA for the characteristics;
s2-8, willAnd->Inputting the spliced result into a two-dimensional convolution to obtain a final result +.>
4. A method for generating a video frame based on 3D-double u-Net according to claim 3, wherein said step S2-3 specifically comprises the steps of:
s2-3-1, input frame after cascade connectionInput to encoder->Extracting features to obtain
S2-3-2, capturing multi-scale contexts by adopting cavity convolution space pyramid pooling ASPP to obtain characteristics
S2-3-3 decoder comprising four decoding blocksIs used to reconstruct the preliminary intermediate frame result output 1, i.e
5. The method for generating video frames based on 3D-DoubleU-Net as recited in claim 4, wherein said decoder in step S2-3-3Using a three-dimensional transposed convolution layer (3 DTransConv) with a stride of 2, and adding the three-dimensional transposed convolution layer after the last layer of the three-dimensional transposed convolution layer; the last layer of each decoding block uses a dual cross-view spatial attention mechanism VISTA for the feature.
6. The method for generating 3D-DoubleU-Net based video frames according to claim 3, wherein said ResNet18-3D three-dimensional convolutional neural network in step S2-4 is an encoderIs a backbone structure of the encoder->Is trained from scratch, comprising four coded blocks.
CN202211234067.2A 2022-10-10 2022-10-10 Video frame generation method based on 3D-DoubleU-Net Active CN115588153B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211234067.2A CN115588153B (en) 2022-10-10 2022-10-10 Video frame generation method based on 3D-DoubleU-Net

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211234067.2A CN115588153B (en) 2022-10-10 2022-10-10 Video frame generation method based on 3D-DoubleU-Net

Publications (2)

Publication Number Publication Date
CN115588153A CN115588153A (en) 2023-01-10
CN115588153B true CN115588153B (en) 2024-02-02

Family

ID=84779524

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211234067.2A Active CN115588153B (en) 2022-10-10 2022-10-10 Video frame generation method based on 3D-DoubleU-Net

Country Status (1)

Country Link
CN (1) CN115588153B (en)

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109379550A (en) * 2018-09-12 2019-02-22 上海交通大学 Video frame rate upconversion method and system based on convolutional neural networks
CN109905624A (en) * 2019-03-01 2019-06-18 北京大学深圳研究生院 A kind of video frame interpolation method, device and equipment
CN111489372A (en) * 2020-03-11 2020-08-04 天津大学 Video foreground and background separation method based on cascade convolution neural network
CN113542651A (en) * 2021-05-28 2021-10-22 北京迈格威科技有限公司 Model training method, video frame interpolation method and corresponding device
CN113808106A (en) * 2021-09-17 2021-12-17 浙江大学 Ultra-low dose PET image reconstruction system and method based on deep learning
CN114842400A (en) * 2022-05-23 2022-08-02 山东海量信息技术研究院 Video frame generation method and system based on residual block and feature pyramid

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109379550A (en) * 2018-09-12 2019-02-22 上海交通大学 Video frame rate upconversion method and system based on convolutional neural networks
CN109905624A (en) * 2019-03-01 2019-06-18 北京大学深圳研究生院 A kind of video frame interpolation method, device and equipment
CN111489372A (en) * 2020-03-11 2020-08-04 天津大学 Video foreground and background separation method based on cascade convolution neural network
CN113542651A (en) * 2021-05-28 2021-10-22 北京迈格威科技有限公司 Model training method, video frame interpolation method and corresponding device
CN113808106A (en) * 2021-09-17 2021-12-17 浙江大学 Ultra-low dose PET image reconstruction system and method based on deep learning
CN114842400A (en) * 2022-05-23 2022-08-02 山东海量信息技术研究院 Video frame generation method and system based on residual block and feature pyramid

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
Deep Slow Motion Video Reconstruction With Hybrid Imaging System;Avinash Paliwal et.al;《IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE》;第42卷(第7期);第1557-1569页 *
DoubleU-Net: A Deep Convolutional Neural Network for Medical Image Segmentation;Debesh Jha et.al;《arXiv:2006.04868v2 [eess.IV] 》;第1-7页 *
VISTA: Boosting 3D Object Detection via Dual Cross-VIew SpaTial Attention;Shengheng Deng et.al;《arXiv:2203.09704v1 [cs.CV] 》;第1-12页 *
用于视频图像帧间运动补偿的深度卷积神经网络;龙古灿 等;《国防科技大学学报》;第38卷(第5期);第143-148页 *

Also Published As

Publication number Publication date
CN115588153A (en) 2023-01-10

Similar Documents

Publication Publication Date Title
CN109064507B (en) Multi-motion-stream deep convolution network model method for video prediction
CN109671023B (en) Face image super-resolution secondary reconstruction method
CN111260560B (en) Multi-frame video super-resolution method fused with attention mechanism
CN111028150B (en) Rapid space-time residual attention video super-resolution reconstruction method
WO2022267641A1 (en) Image defogging method and system based on cyclic generative adversarial network
Cao et al. Semi-automatic 2D-to-3D conversion using disparity propagation
CN112149459B (en) Video saliency object detection model and system based on cross attention mechanism
CN111739082B (en) Stereo vision unsupervised depth estimation method based on convolutional neural network
CN111489372B (en) Video foreground and background separation method based on cascade convolution neural network
CN113139898B (en) Light field image super-resolution reconstruction method based on frequency domain analysis and deep learning
CN114463218B (en) Video deblurring method based on event data driving
CN110751649A (en) Video quality evaluation method and device, electronic equipment and storage medium
CN110225260B (en) Three-dimensional high dynamic range imaging method based on generation countermeasure network
WO2023231535A1 (en) Monochrome image-guided joint denoising and demosaicing method for color raw image
CN114170286A (en) Monocular depth estimation method based on unsupervised depth learning
CN116248955A (en) VR cloud rendering image enhancement method based on AI frame extraction and frame supplement
CN115496663A (en) Video super-resolution reconstruction method based on D3D convolution intra-group fusion network
CN114494050A (en) Self-supervision video deblurring and image frame inserting method based on event camera
CN115588153B (en) Video frame generation method based on 3D-DoubleU-Net
CN112862675A (en) Video enhancement method and system for space-time super-resolution
CN116402908A (en) Dense light field image reconstruction method based on heterogeneous imaging
CN117011357A (en) Human body depth estimation method and system based on 3D motion flow and normal map constraint
CN116208812A (en) Video frame inserting method and system based on stereo event and intensity camera
CN116563155A (en) Method for converting priori semantic image into picture
CN114612305B (en) Event-driven video super-resolution method based on stereogram modeling

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant