CN115588153B - Video frame generation method based on 3D-DoubleU-Net - Google Patents
Video frame generation method based on 3D-DoubleU-Net Download PDFInfo
- Publication number
- CN115588153B CN115588153B CN202211234067.2A CN202211234067A CN115588153B CN 115588153 B CN115588153 B CN 115588153B CN 202211234067 A CN202211234067 A CN 202211234067A CN 115588153 B CN115588153 B CN 115588153B
- Authority
- CN
- China
- Prior art keywords
- dimensional
- net
- frame
- network
- encoder
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000000034 method Methods 0.000 title claims abstract description 47
- 238000013527 convolutional neural network Methods 0.000 claims abstract description 9
- 238000012549 training Methods 0.000 claims description 21
- 230000007246 mechanism Effects 0.000 claims description 12
- 238000011176 pooling Methods 0.000 claims description 12
- 238000012360 testing method Methods 0.000 claims description 12
- 230000006870 function Effects 0.000 claims description 6
- 230000009977 dual effect Effects 0.000 claims description 5
- 238000000605 extraction Methods 0.000 claims description 3
- 230000000007 visual effect Effects 0.000 claims description 3
- 230000008447 perception Effects 0.000 claims description 2
- 101100295091 Arabidopsis thaliana NUDT14 gene Proteins 0.000 claims 4
- 101000666896 Homo sapiens V-type immunoglobulin domain-containing suppressor of T-cell activation Proteins 0.000 claims 4
- 102100038282 V-type immunoglobulin domain-containing suppressor of T-cell activation Human genes 0.000 claims 4
- IJJVMEJXYNJXOJ-UHFFFAOYSA-N fluquinconazole Chemical compound C=1C=C(Cl)C=C(Cl)C=1N1C(=O)C2=CC(F)=CC=C2N=C1N1C=NC=N1 IJJVMEJXYNJXOJ-UHFFFAOYSA-N 0.000 claims 4
- 230000033001 locomotion Effects 0.000 abstract description 9
- 238000005516 engineering process Methods 0.000 abstract description 7
- 230000003287 optical effect Effects 0.000 description 6
- 238000012545 processing Methods 0.000 description 3
- 238000011160 research Methods 0.000 description 3
- 230000017105 transposition Effects 0.000 description 3
- 238000013135 deep learning Methods 0.000 description 2
- 230000006872 improvement Effects 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 230000008569 process Effects 0.000 description 2
- 238000013528 artificial neural network Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000005540 biological transmission Effects 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 238000006243 chemical reaction Methods 0.000 description 1
- 238000007796 conventional method Methods 0.000 description 1
- 238000013144 data compression Methods 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 238000005286 illumination Methods 0.000 description 1
Classifications
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02T—CLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
- Y02T10/00—Road transport of goods or passengers
- Y02T10/10—Internal combustion engine [ICE] based vehicles
- Y02T10/40—Engine management systems
Landscapes
- Processing Or Creating Images (AREA)
Abstract
The invention provides a 3D-DoubleU-Net based video frame generation method, which aims at the problems that the frame generation method is difficult to accurately acquire the space-time characteristics among video frames under the conditions of complex video scene, rapid object movement and blocked object, adopts the technology of combining a three-dimensional convolutional neural network and a double U-Net architecture to extract more space-time characteristics of the video, and generates an intermediate frame which is more similar to a real frame. According to the technical scheme, the 3D-double U-Net network can be used for exploring the characteristics that the space-time dimension can be explored and the inter-frame context information can be captured by the double U-Net network at the same time, so that more accurate motion information and more abundant space-time features between frames can be captured under an extreme scene, and a finer intermediate frame result can be generated.
Description
Technical Field
The invention relates to the technical field of computer vision, in particular to a video frame generation method based on 3D-DoubleU-Net.
Background
With the upgrading of video display equipment and the improvement of video transmission bandwidth, the requirements of people on video visual quality are increasing. The frame rate is one of the important indicators of the quality of a video, representing the number of frame images played per second of video. The video with lower frame rate can have picture delay and jitter phenomenon when played, thereby affecting the viewing experience of users.
The video frame generation is a technology of generating and inserting one or more frames between two continuous frames by using an original video frame image as a reference by utilizing a video/image processing technology, so that the conversion of the video frame rate from low to high is realized. The video frame generation technology is one of key technologies in the field of video processing, attracts attention of researchers, and is widely applied to the fields of video enhancement, data compression, video special effect processing and the like.
With the development of deep learning technology in recent years, a large number of video frame generation methods based on deep learning have been proposed, mainly including methods based on optical flow estimation, methods based on kernel estimation, and methods combining optical flow estimation with kernel estimation.
The most widely used method is based on estimating the optical flow between input frames, but the optical flow cannot be accurately estimated by the algorithm under the challenging condition, so that a fuzzy result is generated. The kernel estimation-based method generally adaptively estimates the kernel of each pixel, and then convolves the estimated kernel with the input frame image to obtain an intermediate frame, but the method cannot be directed to any position, and thus cannot process object motion beyond the kernel size. The method combining optical flow estimation and nuclear estimation can use an optical flow method to carry out motion estimation on an input frame and sample pixel information around a reference point. But the available reference points for this type of method are still small and the disadvantages of the methods of light flow estimation and nuclear estimation are not significantly improved.
In the actually collected video scene, the problems of complex scene, rapid object movement, object shielding, severe illumination change and the like generally exist, and great challenges are brought to video frame generation research. Therefore, the research of the video frame generation method is also one of the difficulties in the current computer vision field, and the research of the robust and accurate video frame generation method has important theoretical significance and application value.
Disclosure of Invention
In order to make up for the defects of the prior art, the invention provides a video frame generation method based on 3D-DoubleU-Net.
The invention is realized by the following technical scheme: the video frame generation method based on the 3D-DoubleU-Net is characterized by comprising the following steps of:
s1, constructing a data set: both training and test data sets contain a plurality of triplets, one triplet consisting of three consecutive frames in the time domain, denoted asWherein->Is the previous frame, +.>Is a real intermediate frame,/-, and>is the latter frame;
s2, designing a 3D-DoubleU-Net network model: the model comprises two three-dimensional U-Net networks with a double-cross view spatial attention mechanism (VISTA), wherein each three-dimensional U-Net network consists of a three-dimensional Encoder (3D-Encoder), a cavity convolution spatial pyramid pooling (ASPP) and a three-dimensional Decoder (3D-Decoder);
spliced adjacent framesSequentially inputting two three-dimensional U-Net networks, and obtaining a result of +.>The method comprises the steps of carrying out a first treatment on the surface of the Subsequently, let in>And->The second three-dimensional U-Net network is input together, and the result is +.>The method comprises the steps of carrying out a first treatment on the surface of the Finally, let(s)>And->After splicing, inputting two-dimensional convolution to obtain a final result +.>;
S3, training a model: the invention is achieved by minimizing the initial results、/>And final result->And true intermediate frame->The difference between the two models realizes the training of the optimal model; the loss function used is as follows:
(1);
(2);
(3);
(4);
wherein the invention usesTraining network->,/>,/>;/>Use->Norm measure +_>、/>And->Differences between; />Optimizing +.using Charbonnier function>Norms to measure +.>And->The difference between the two is that,;/>for perception loss->Conv4_3 convolutional layer in VGG-16 network after ImageNet pre-training is used as feature extractor +.>Obtain->And->A perceived loss between;
s4, testing a model: inputting the front frame and the rear frame of the test set into a trained model, and directly generating an intermediate frame result;
s5, using a model: the real video is input into a trained network model, and a high-frame-rate video can be obtained.
Preferably, the step S1 specifically includes the following steps:
s1-1, model training use includes 51312Vimeo-90K data set of triples, wherein the triples areAnd->For the adjacent frame, as input to the network, the second frame +.>Is a real frame and is used for supervising the training of the network;
s1-2, the UCF101 and the DAVIS data set are selected for testing the model.
Preferably, the step S2 specifically includes the following steps:
s2-1, designing a 3D-DoubleU-Net network model: the model contains two three-dimensional U-Net networks with a dual view spatial attention mechanism (VISTA);
s2-2, first three-dimensional U-Net slave encoderAnd decoder->Constitution, wherein encoder->Is a pretrained ResNet18-3D (R3D-18) three-dimensional convolutional neural network; removing the pooling operation and the last classification layer from R3D-18 and using a three-dimensional convolution with a spatial stride of 2;
s2-3: pairs using pixel-level multiplication operationsAnd->Fusing, and fusing the resultInputting to a second three-dimensional U-Net network, performing feature extraction and upsampling to obtain result output 2, denoted +.>;
S2-4: the second three-dimensional U-Net has the same structure as the first three-dimensional U-Net, and is formed by an encoderHole convolution spatial pyramid pooling (ASPP) and decoder->Constructing;
s2-5, willInput encoder->Obtaining the extracted characteristic->Subsequently ASPP is entered, obtaining multi-scale context information,/->;
S2-6 decoderAlso four decoding blocks are included, but are +.>Using only the encoder +.>Is different from the jump connection of->A skip connection from both encoders is used. Will->Input decoder->Obtaining a second generated frame result->;
S2-7, in a second three-dimensional U-Net, the last layer of each coding block and decoding block uses a double-span view space attention mechanism (VISTA) on the features;
s2-8, willAnd->Inputting the spliced result into a two-dimensional convolution to obtain a final result +.>。
Further, the step S2-3 specifically comprises the following steps:
s2-3-1, input frame after cascade connectionInput to encoder->Extracting features to obtain。
S2-3-2, capturing multi-scale context by adopting cavity convolution space pyramid pooling (ASPP) to obtain characteristics。
S2-3-3 decoder comprising four decoding blocksIs used to reconstruct the preliminary intermediate frame result output 1, i.e。
Further, the decoder in step S2-3-3Using a three-dimensional transposed convolution layer (3 DTransConv) with a stride of 2, and adding the three-dimensional transposed convolution layer after the last layer of the three-dimensional transposed convolution layer; the last layer of each decoding block uses a dual cross-view spatial attention mechanism (VISTA) for the features.
Further, the ResNet18-3D (R3D-18) three-dimensional convolutional neural network in step S2-4 is an encoderIs a backbone structure of the encoder->Is trained from scratch, comprising four coded blocks.
The invention adopts the technical proposal, and compared with the prior art, the invention has the following beneficial effects: the method mainly comprises two parts, namely, an intermediate frame is generated without using an intermediate motion estimation step, so that inaccuracy of motion estimation in the conventional method is avoided in an extreme scene; secondly, a three-dimensional convolutional neural network and a double U-Net network are combined for frame generation for the first time, and a 3D-DoubleU-Net network is provided. The 3D-double U-Net network can simultaneously use the three-dimensional convolution neural network to explore the characteristics that the space-time dimension and the double U-Net network can capture the inter-frame context information, and capture more accurate motion information and more abundant space-time characteristics between frames under extreme scenes to generate finer intermediate frame results.
Additional aspects and advantages of the invention will be set forth in part in the description which follows, or may be learned by practice of the invention.
Drawings
The foregoing and/or additional aspects and advantages of the invention will become apparent and may be better understood from the following description of embodiments taken in conjunction with the accompanying drawings in which:
FIG. 1 is a data set format;
FIG. 2 is a flow chart of the method of the present invention;
fig. 3 is a schematic diagram of a network structure according to the present invention.
Detailed Description
In order that the above-recited objects, features and advantages of the present invention will be more clearly understood, a more particular description of the invention will be rendered by reference to the appended drawings and appended detailed description. It should be noted that, in the case of no conflict, the embodiments of the present application and the features in the embodiments may be combined with each other.
In the following description, numerous specific details are set forth in order to provide a thorough understanding of the present invention, however, the present invention may be practiced otherwise than as described herein, and therefore the scope of the present invention is not limited to the specific embodiments disclosed below.
The 3D-DoubleU-Net based video frame generation method according to the embodiment of the present invention will be specifically described with reference to fig. 1 to 3. Aiming at the problems that the space-time characteristics between video frames are difficult to accurately obtain by a frame generation method under the conditions that a video scene is complex, the movement of an object is rapid and the object is blocked, the invention adopts the technology of combining a three-dimensional convolutional neural network with a double U-Net architecture to extract more space-time characteristics of the video and generate an intermediate frame which is more similar to a real frame. When the method is specifically implemented, the technical scheme of the invention can adopt the computer software technology to realize the automatic operation flow.
As shown in fig. 2, the invention provides a video frame generation method based on 3D-DoubleU-Net, which specifically includes the following steps:
s1, constructing a data set:both training and test data sets contain a plurality of triplets, one triplet consisting of three consecutive frames in the time domain, denoted asWherein->Is the previous frame, +.>Is a real intermediate frame,/-, and>is the latter frame; the method specifically comprises the following steps:
s1-1 model training uses a Vimeo-90K dataset containing 51312 triples, wherein the triples areAnd->For the adjacent frame, as input to the network, the second frame +.>Is a real frame and is used for supervising the training of the network;
s1-2, the invention selects UCF101 and DAVIS data sets to test the model.
S2, designing a 3D-DoubleU-Net network model: the model comprises two three-dimensional U-Net networks with a double-cross view spatial attention mechanism (VISTA), wherein each three-dimensional U-Net network consists of a three-dimensional Encoder (3D-Encoder), a cavity convolution spatial pyramid pooling (ASPP) and a three-dimensional Decoder (3D-Decoder);
spliced adjacent framesSequentially inputting two three-dimensional U-Net networks, passing through a firstThe result obtained for the three-dimensional U-Net network is +.>The method comprises the steps of carrying out a first treatment on the surface of the Subsequently, let in>And->The second three-dimensional U-Net network is input together, and the result is +.>The method comprises the steps of carrying out a first treatment on the surface of the Finally, let(s)>And->After splicing, inputting two-dimensional convolution to obtain a final result +.>;
The method specifically comprises the following steps:
s2-1, designing a 3D-DoubleU-Net network model: the model contains two three-dimensional U-Net networks with a dual view spatial attention mechanism (VISTA);
s2-2, first three-dimensional U-Net slave encoderAnd decoder->Constitution, wherein encoder->Is a pretrained ResNet18-3D (R3D-18) three-dimensional convolutional neural network; the pooling operation and last classification layer are removed from R3D-18 and spatial stride is used2, three-dimensional convolution;
s2-3: pairs using pixel-level multiplication operationsAnd->Fusing, and fusing the resultInputting to a second three-dimensional U-Net network, performing feature extraction and upsampling to obtain result output 2, denoted +.>The method comprises the steps of carrying out a first treatment on the surface of the The method specifically comprises the following steps:
s2-3-1, input frame after cascade connectionInput to encoder->Extracting features to obtain。
S2-3-2, capturing multi-scale context by adopting cavity convolution space pyramid pooling (ASPP) to obtain characteristics。
S2-3-3 decoder comprising four decoding blocksIs used to reconstruct the preliminary intermediate frame result output 1, i.eThe method comprises the steps of carrying out a first treatment on the surface of the Decoder->A three-dimensional transposition convolutional layer (3 DTransConv) with a stride of 2 is used, and in order to process common chessboard artifacts, the three-dimensional transposition convolutional layer is added after the last layer of the three-dimensional transposition convolutional layer; the last layer of each decoding block uses a dual cross-view spatial attention mechanism (VISTA) for the features.
S2-4: the second three-dimensional U-Net has the same structure as the first three-dimensional U-Net, and is formed by an encoderHole convolution spatial pyramid pooling (ASPP) and decoder->Constructing; wherein the ResNet18-3D (R3D-18) three-dimensional convolutional neural network is an encoder +.>Is associated with the encoder->Different, encoder->Is trained from scratch, comprising four coded blocks.
S2-5, willInput encoder->Obtaining the extracted characteristic->Subsequently ASPP is entered, obtaining multi-scale context information,/->;
S2-6 decoderAlso four decoding blocks are included, but are +.>Using only the encoder +.>Is different from the jump connection of->A skip connection from both encoders is used. Will->Input decoder->Obtaining a second generated frame result->;
S2-7, in a second three-dimensional U-Net, the last layer of each coding block and decoding block uses a double-span view space attention mechanism (VISTA) on the features;
s2-8, willAnd->Inputting the spliced result into a two-dimensional convolution to obtain a final result +.>。
S3, training a model: the invention is achieved by minimizing the initial results、/>And final result->And true intermediate frame->The difference between the two models realizes the training of the optimal model; the loss function used in the invention is as follows:
(1);
(2);
(3);
(4);
wherein the invention usesTraining network->,/>,/>;/>Use->Norm measure +_>、/>And->Differences between; />Optimizing +.using Charbonnier function>Norms to measure +.>And->The difference between the two is that,;/>for perceived loss, the network can be helped to effectively produce a more visually realistic result; />Conv4_3 convolutional layer in VGG-16 network after ImageNet pre-training is used as feature extractor +.>Obtain->And->A perceived loss between;
s4, testing a model: inputting the front frame and the rear frame of the test set into a trained model, and directly generating an intermediate frame result; the invention evaluates the generated intermediate frame result by using an objective method peak signal-to-noise ratio, structural similarity and a subjective method.
S5, using a model: the real video is input into a trained network model, and a high-frame-rate video can be obtained.
In the description of the present invention, the term "plurality" means two or more, unless explicitly defined otherwise, the orientation or positional relationship indicated by the terms "upper", "lower", etc. are based on the orientation or positional relationship shown in the drawings, merely for convenience of description of the present invention and to simplify the description, and do not indicate or imply that the apparatus or elements referred to must have a specific orientation, be constructed and operated in a specific orientation, and therefore should not be construed as limiting the present invention; the terms "coupled," "mounted," "secured," and the like are to be construed broadly, and may be fixedly coupled, detachably coupled, or integrally connected, for example; can be directly connected or indirectly connected through an intermediate medium. The specific meaning of the above terms in the present invention can be understood by those of ordinary skill in the art according to the specific circumstances.
In the description of the present specification, the terms "one embodiment," "some embodiments," "particular embodiments," and the like, mean that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the present invention. In this specification, schematic representations of the above terms do not necessarily refer to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples.
The above is only a preferred embodiment of the present invention, and is not intended to limit the present invention, but various modifications and variations can be made to the present invention by those skilled in the art. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present invention should be included in the protection scope of the present invention.
Claims (6)
1. The video frame generation method based on the 3D-DoubleU-Net is characterized by comprising the following steps of:
s1, constructing a data set: both training and test data sets contain a plurality of triplets, one triplet consisting of three consecutive frames in the time domain, denoted asWherein->Is the previous frame, +.>Is a real intermediate frame,/-, and>is the latter frame;
s2, designing a 3D-DoubleU-Net network model: the model comprises two three-dimensional U-Net networks with a double-cross view space attention mechanism VISTA, wherein each three-dimensional U-Net network consists of a three-dimensional Encoder 3D-Encoder, a cavity convolution space pyramid pooling ASPP and a three-dimensional Decoder 3D-Decode;
spliced adjacent framesSequentially inputting two three-dimensional U-Net networks, and obtaining a result of +.>The method comprises the steps of carrying out a first treatment on the surface of the Subsequently, let in>And->The second three-dimensional U-Net network is input together, and the result is +.>The method comprises the steps of carrying out a first treatment on the surface of the Finally, let(s)>And->After splicing, inputting two-dimensional convolution to obtain a final result +.>;
S3, training a model: by minimizing initial results、/>And final result->And true intermediate frame->The difference between the two models realizes the training of the optimal model; the loss function used is as follows:
(1);
(2);
(3);
(4);
wherein use is made ofTraining network->,/>,/>;/>Use->Norm measure +_>、/>And (3) withDifferences between; />Optimizing +.using Charbonnier function>Norms to measure +.>And->Difference between->;For perception loss->Conv4_3 convolutional layer in VGG-16 network after ImageNet pre-training is used as feature extractor +.>Obtain->And->A perceived loss between;
s4, testing a model: inputting the front frame and the rear frame of the test set into a trained model, and directly generating an intermediate frame result;
s5, using a model: the real video is input into a trained network model, and a high-frame-rate video can be obtained.
2. The method for generating a video frame based on 3D-double u-Net according to claim 1, wherein said step S1 specifically comprises the steps of:
s1-1 model training uses a Vimeo-90K dataset containing 51312 triples, wherein the triples areAnd->As input to the network, the secondFrame->Is a real frame and is used for supervising the training of the network;
s1-2, the UCF101 and the DAVIS data set are selected for testing the model.
3. The method for generating a video frame based on 3D-double u-Net according to claim 1, wherein said step S2 specifically comprises the steps of:
s2-1, designing a 3D-DoubleU-Net network model: the model comprises two three-dimensional U-Net networks with a double-span visual angle space attention mechanism VISTA;
s2-2, first three-dimensional U-Net slave encoderAnd decoder->Constitution, wherein encoder->Is a pretrained ResNet18-3D three-dimensional convolutional neural network; the pooling operation and last classification layer are removed from ResNet18-3D and a three-dimensional convolution with a spatial stride of 2 is used;
s2-3: pairs using pixel-level multiplication operationsAnd->Fusing, and fusing the resultInputting to a second three-dimensional U-Net network, performing feature extraction and upsampling to obtain result output 2, denoted +.>;
S2-4: the second three-dimensional U-Net has the same structure as the first three-dimensional U-Net, and is formed by an encoderCavity convolution spatial pyramid pooling ASPP and decoder +.>Constructing;
s2-5, willInput encoder->Obtaining the extracted characteristic->Subsequently ASPP is entered, obtaining multi-scale context information,/->;
S2-6 decoderAlso four decoding blocks are included, but are +.>Using only the encoder +.>Is different from the jump connection of->Using a skip connection from two encoders; will->Input decoder->Obtaining a second generated frame result;
S2-7, in a second three-dimensional U-Net, the last layer of each coding block and decoding block uses a double-span visual angle space attention mechanism VISTA for the characteristics;
s2-8, willAnd->Inputting the spliced result into a two-dimensional convolution to obtain a final result +.>。
4. A method for generating a video frame based on 3D-double u-Net according to claim 3, wherein said step S2-3 specifically comprises the steps of:
s2-3-1, input frame after cascade connectionInput to encoder->Extracting features to obtain;
S2-3-2, capturing multi-scale contexts by adopting cavity convolution space pyramid pooling ASPP to obtain characteristics;
S2-3-3 decoder comprising four decoding blocksIs used to reconstruct the preliminary intermediate frame result output 1, i.e。
5. The method for generating video frames based on 3D-DoubleU-Net as recited in claim 4, wherein said decoder in step S2-3-3Using a three-dimensional transposed convolution layer (3 DTransConv) with a stride of 2, and adding the three-dimensional transposed convolution layer after the last layer of the three-dimensional transposed convolution layer; the last layer of each decoding block uses a dual cross-view spatial attention mechanism VISTA for the feature.
6. The method for generating 3D-DoubleU-Net based video frames according to claim 3, wherein said ResNet18-3D three-dimensional convolutional neural network in step S2-4 is an encoderIs a backbone structure of the encoder->Is trained from scratch, comprising four coded blocks.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202211234067.2A CN115588153B (en) | 2022-10-10 | 2022-10-10 | Video frame generation method based on 3D-DoubleU-Net |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202211234067.2A CN115588153B (en) | 2022-10-10 | 2022-10-10 | Video frame generation method based on 3D-DoubleU-Net |
Publications (2)
Publication Number | Publication Date |
---|---|
CN115588153A CN115588153A (en) | 2023-01-10 |
CN115588153B true CN115588153B (en) | 2024-02-02 |
Family
ID=84779524
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202211234067.2A Active CN115588153B (en) | 2022-10-10 | 2022-10-10 | Video frame generation method based on 3D-DoubleU-Net |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN115588153B (en) |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109379550A (en) * | 2018-09-12 | 2019-02-22 | 上海交通大学 | Video frame rate upconversion method and system based on convolutional neural networks |
CN109905624A (en) * | 2019-03-01 | 2019-06-18 | 北京大学深圳研究生院 | A kind of video frame interpolation method, device and equipment |
CN111489372A (en) * | 2020-03-11 | 2020-08-04 | 天津大学 | Video foreground and background separation method based on cascade convolution neural network |
CN113542651A (en) * | 2021-05-28 | 2021-10-22 | 北京迈格威科技有限公司 | Model training method, video frame interpolation method and corresponding device |
CN113808106A (en) * | 2021-09-17 | 2021-12-17 | 浙江大学 | Ultra-low dose PET image reconstruction system and method based on deep learning |
CN114842400A (en) * | 2022-05-23 | 2022-08-02 | 山东海量信息技术研究院 | Video frame generation method and system based on residual block and feature pyramid |
-
2022
- 2022-10-10 CN CN202211234067.2A patent/CN115588153B/en active Active
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109379550A (en) * | 2018-09-12 | 2019-02-22 | 上海交通大学 | Video frame rate upconversion method and system based on convolutional neural networks |
CN109905624A (en) * | 2019-03-01 | 2019-06-18 | 北京大学深圳研究生院 | A kind of video frame interpolation method, device and equipment |
CN111489372A (en) * | 2020-03-11 | 2020-08-04 | 天津大学 | Video foreground and background separation method based on cascade convolution neural network |
CN113542651A (en) * | 2021-05-28 | 2021-10-22 | 北京迈格威科技有限公司 | Model training method, video frame interpolation method and corresponding device |
CN113808106A (en) * | 2021-09-17 | 2021-12-17 | 浙江大学 | Ultra-low dose PET image reconstruction system and method based on deep learning |
CN114842400A (en) * | 2022-05-23 | 2022-08-02 | 山东海量信息技术研究院 | Video frame generation method and system based on residual block and feature pyramid |
Non-Patent Citations (4)
Title |
---|
Deep Slow Motion Video Reconstruction With Hybrid Imaging System;Avinash Paliwal et.al;《IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE》;第42卷(第7期);第1557-1569页 * |
DoubleU-Net: A Deep Convolutional Neural Network for Medical Image Segmentation;Debesh Jha et.al;《arXiv:2006.04868v2 [eess.IV] 》;第1-7页 * |
VISTA: Boosting 3D Object Detection via Dual Cross-VIew SpaTial Attention;Shengheng Deng et.al;《arXiv:2203.09704v1 [cs.CV] 》;第1-12页 * |
用于视频图像帧间运动补偿的深度卷积神经网络;龙古灿 等;《国防科技大学学报》;第38卷(第5期);第143-148页 * |
Also Published As
Publication number | Publication date |
---|---|
CN115588153A (en) | 2023-01-10 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109064507B (en) | Multi-motion-stream deep convolution network model method for video prediction | |
CN109671023B (en) | Face image super-resolution secondary reconstruction method | |
CN111260560B (en) | Multi-frame video super-resolution method fused with attention mechanism | |
CN111028150B (en) | Rapid space-time residual attention video super-resolution reconstruction method | |
WO2022267641A1 (en) | Image defogging method and system based on cyclic generative adversarial network | |
Cao et al. | Semi-automatic 2D-to-3D conversion using disparity propagation | |
CN112149459B (en) | Video saliency object detection model and system based on cross attention mechanism | |
CN111739082B (en) | Stereo vision unsupervised depth estimation method based on convolutional neural network | |
CN111489372B (en) | Video foreground and background separation method based on cascade convolution neural network | |
CN113139898B (en) | Light field image super-resolution reconstruction method based on frequency domain analysis and deep learning | |
CN114463218B (en) | Video deblurring method based on event data driving | |
CN110751649A (en) | Video quality evaluation method and device, electronic equipment and storage medium | |
CN110225260B (en) | Three-dimensional high dynamic range imaging method based on generation countermeasure network | |
WO2023231535A1 (en) | Monochrome image-guided joint denoising and demosaicing method for color raw image | |
CN114170286A (en) | Monocular depth estimation method based on unsupervised depth learning | |
CN116248955A (en) | VR cloud rendering image enhancement method based on AI frame extraction and frame supplement | |
CN115496663A (en) | Video super-resolution reconstruction method based on D3D convolution intra-group fusion network | |
CN114494050A (en) | Self-supervision video deblurring and image frame inserting method based on event camera | |
CN115588153B (en) | Video frame generation method based on 3D-DoubleU-Net | |
CN112862675A (en) | Video enhancement method and system for space-time super-resolution | |
CN116402908A (en) | Dense light field image reconstruction method based on heterogeneous imaging | |
CN117011357A (en) | Human body depth estimation method and system based on 3D motion flow and normal map constraint | |
CN116208812A (en) | Video frame inserting method and system based on stereo event and intensity camera | |
CN116563155A (en) | Method for converting priori semantic image into picture | |
CN114612305B (en) | Event-driven video super-resolution method based on stereogram modeling |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |