CN112884636A - Style migration method for automatically generating stylized video - Google Patents

Style migration method for automatically generating stylized video Download PDF

Info

Publication number
CN112884636A
CN112884636A CN202110117964.4A CN202110117964A CN112884636A CN 112884636 A CN112884636 A CN 112884636A CN 202110117964 A CN202110117964 A CN 202110117964A CN 112884636 A CN112884636 A CN 112884636A
Authority
CN
China
Prior art keywords
encoder
migration
style
video
lightweight
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202110117964.4A
Other languages
Chinese (zh)
Other versions
CN112884636B (en
Inventor
霍静
孔美豪
***
高阳
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing University
Original Assignee
Nanjing University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing University filed Critical Nanjing University
Priority to CN202110117964.4A priority Critical patent/CN112884636B/en
Publication of CN112884636A publication Critical patent/CN112884636A/en
Application granted granted Critical
Publication of CN112884636B publication Critical patent/CN112884636B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T3/00Geometric image transformations in the plane of the image
    • G06T3/04Context-preserving transformations, e.g. by using an importance map
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Molecular Biology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Compression Or Coding Systems Of Tv Signals (AREA)

Abstract

The invention discloses a style migration method for automatically generating stylized videos, which comprises the steps of constructing a style migration model for automatically generating stylized videos, wherein the style migration model comprises a high-compression self-encoder model based on knowledge distillation and a feature migration module based on semantic alignment; the self-encoder is divided into an encoder and a decoder, the encoder can encode original video content frames and style images into feature maps, the feature migration module can fuse content features and style features obtained by encoding of the encoder based on semantic alignment, finally fusion migration features based on semantic alignment are obtained, and finally stylized video frames are obtained through the decoder. The method and the device can ensure the stability of the migrated video, can realize the stylization of any video in any style, have very high speed in the style migration process, and have higher practicability.

Description

Style migration method for automatically generating stylized video
Technical Field
The invention belongs to the field of computer application, and particularly relates to a style migration method for automatically generating a stylized video.
Background
With the development and popularization of the internet and the mobile internet, more and more short video platforms begin to rise, the artistic requirements of people for videos are gradually increased on the basis of the short video platforms, and the forms created by professional artists or professional editing technicians are not only inconvenient but also high in cost. Therefore, the automatic generation of videos of any artistic style from videos by computer technology is receiving attention and favor of people.
Given a content graph and a target style sheet, style migration is aimed at producing a stylized image that can have both content graph structure and style sheet texture. A great deal of research work is already carried out on a style migration method based on a single image, and a great deal of attention is now turned to the field of video style migration because the video style migration has very wide application prospects (including short video artistic conversion and the like); clearly, style migration of video is more practical and challenging than style migration of single images.
Compared with the traditional image style migration, the video style migration is more difficult in that the stylized quality, the stability and the computational efficiency are considered at the same time. Currently available video style migration methods can be roughly classified into two categories according to whether optical flow is used or not.
The first category is methods using optical flow, which propose a loss of temporal consistency to obtain stability between adjacent frames by means of supervised constraints of optical flow. The method comprises an optimized optical flow constraint-based method, and although the method can obtain stable migration video, the time for each frame style migration of the video is nearly three minutes, and the extremely slow migration speed is unacceptable. Video style migration methods based on feed-forward networks are proposed in the successors, but the video style migration tasks cannot achieve real-time effects because optical flow constraints are still used in the training stage and the testing stage. To solve this problem, some methods only use the optical flow in the training phase and avoid using the optical flow in the testing phase, but then the speed is increased but the effect of the final migration is very unstable compared to those methods that also use the optical flow in the testing phase.
The second category is methods that do not use optical flow, such as LST, which can implement feature affine and thus can result in stable stylized video. After that, there are studies proposed to use an Avatar-Net based decoration module in combination with a compound normalization method to ensure video stability. However, the existing methods that do not use optical flow all use the original VGG network to encode content and style features, and the VGG network is very bulky, which means that a very large memory space is required to store the VGG model, which will greatly limit their applications in some small terminal devices.
Disclosure of Invention
The purpose of the invention is as follows: the invention provides a style migration method for automatically generating stylized videos, which can realize real-time stable arbitrary video style migration.
The technical scheme is as follows: the invention provides a style migration method for automatically generating a stylized video, which specifically comprises the following steps:
(1) constructing a video style migration network model, wherein the model comprises a high-compression self-encoder module based on knowledge distillation and a feature migration module based on semantic alignment; the self-encoder module comprises a lightweight encoder and a lightweight decoder;
(2) encoder encoding of content video frames and trellis diagrams: knowledge distillation is carried out on the lightweight encoder based on the VGG network, the encoder learns the encoding capacity of a teacher network VGG encoder while having enough lightweight, and original video content frames and style images are encoded into feature maps;
(3) a semantic alignment based feature migration module: fusing content features and style features obtained by encoding of an encoder to obtain fusion migration features based on semantic alignment;
(4) knowledge distillation of the lightweight decoder based on VGG network: the decoder can learn the decoding capability of a VGG decoder of a teacher network while the decoder is light enough, and the decoder decodes the fused and migrated features to obtain stylized video frames, and finally synthesizes the videos.
Further, the implementation of step (2) requires that the optimization of the loss function is as follows:
Figure BDA0002921029310000021
wherein, I is an original image, an encoder in a VGG network is E, and a lightweight encoder is E
Figure BDA0002921029310000022
I' is the reconstructed picture, Ek(I) For the k-th layer output characteristic diagram in the original VGG encoder,
Figure BDA0002921029310000023
and the k-th layer output characteristic diagram in the lightweight encoder is shown, wherein lambda and gamma are both hyper-parameters.
Further, the step (3) is realized as follows:
the characteristic diagram of the content image output by the encoder is Fc∈RCx(WxH)The output obtained from the stylized image is Fs∈RCx(WxH)Wherein C is the number of the channels of the feature map, and W and H are the width and the height of the feature map respectively; the feature migration module based on semantic alignment aims at finding a feature migration which enables semantic alignment of content graphs of different video frames through conversion, and supposing that the conversion process can be parameterized into a projection matrix P e RCxCThen the optimized objective function is:
Figure BDA0002921029310000031
wherein ,
Figure BDA0002921029310000032
denotes from FcIn the operation of selecting the i-th position feature vector, AijTo represent
Figure BDA0002921029310000033
And
Figure BDA0002921029310000034
k neighbor matrix of (1);
solving for P as:
Figure BDA0002921029310000035
wherein A is the affine matrix defined above, U is the diagonal matrix, and
Figure BDA0002921029310000036
Figure BDA0002921029310000037
is a matrix with characteristic alignment function, and the projection matrix P is formed as P ═ g (F (F)c)f(Fs)T) In the linear conversion process, g (x) MX and f (x) XTT(ii) a The (x) process chooses to fit with three convolutional layers, and the g () process uses a fully-connected layer fit.
Further, the implementation of step (4) requires that the optimization of the loss function is as follows:
Figure BDA0002921029310000038
wherein I is an original image, Ek(I) For the k-th layer output characteristic diagram in the original VGG encoder,
Figure BDA0002921029310000039
for the k-th layer output characteristic diagram in the lightweight encoder, I' is a decoder using lightweight
Figure BDA00029210293100000310
Decoding the resulting reconstructed picture with λ being a hyperparametric, the above distillation process being aimed at
Figure BDA00029210293100000311
While the information of the original E can be retained,
Figure BDA00029210293100000312
can be well combined with
Figure BDA00029210293100000313
And performing image reconstruction on the obtained output characteristic information.
Has the advantages that: compared with the prior art, the invention has the beneficial effects that: 1. the stability between adjacent frames and the order consistency are considered while the higher stylized quality of the video frames is finished, so that the stability of the video after the migration can be ensured; 2. the stylization has rich diversity, and can realize the stylization of any video in any style; 3. in the process of video style migration, real-time performance needs to be achieved, that is, the speed of the style migration process needs to be guaranteed to be very high, and in order to have higher practicability, the light weight of the whole model needs to be guaranteed.
Drawings
FIG. 1 is a flow chart of the invention;
FIG. 2 is a block diagram of a high compression self-encoder of the present invention based on knowledge distillation;
FIG. 3 is a schematic diagram of a video style migration network structure constructed by the present invention;
FIG. 4 is an exemplary diagram of a video style migration effect according to the present invention.
Detailed Description
The present invention will be described in further detail with reference to the accompanying drawings.
The invention provides a style migration method for automatically generating a stylized video, which mainly comprises three stages in the video style migration process, wherein the first stage is to encode a content video frame and a style graph by an encoder, the second stage is to perform characteristic style migration and fusion on the content and style characteristics obtained by encoding, and the third stage is to perform decoder decoding on the migrated and fused characteristics to obtain a stylized video frame and finally synthesize a video. The sizes of the encoder and the decoder largely determine whether the model is light, and whether the design of the characteristic migration part directly determines whether the stylized video obtained by migration is stable, whether the stylized video can be migrated in real time and whether the stylized video has the capability of migrating in any style. As shown in fig. 1, the method specifically comprises the following steps:
step 1: and constructing a video style migration network model, as shown in FIG. 3, which comprises a high-compression self-encoder module based on knowledge distillation and a feature migration module based on semantic alignment.
The self-encoder is divided into an encoder and a decoder, the encoder can encode original video content frames and style images into feature maps, the feature migration module can fuse content features and style features obtained by encoding of the encoder based on semantic alignment, finally fusion migration features based on semantic alignment are obtained, and finally stylized video frames are obtained through the decoder.
A feature style migration module (FTM) based on semantic alignment, which can ensure the stability between adjacent frames in the video style migration process; the size of the video style migration model is only 2.67MB, and the speed of executing the video style migration can reach 166.67 fps.
Step 2: encoder encoding of content video frames and trellis diagrams: knowledge distillation is carried out on the lightweight encoder based on the VGG network, the encoder learns the encoding capacity of the VGG encoder of a teacher network while having enough lightweight, and original video content frames and style images are encoded into feature maps.
As shown in fig. 2, the network structure of the lightweight encoder and decoder specifically includes: a lightweight encoder network of four symmetric groups of upsampled and downsampled convolutional layers, max pooling layers and employing a ReLU activation function to feature encode input video frames and arbitrary style images. The VGG network is a network structure widely used in style migration, and the lightweight encoder network is a student network obtained by knowledge distillation based on the VGG teacher network, so that the VGG network can use few parameters to realize an image encoding process. As in the network architecture of fig. 2
Figure BDA0002921029310000051
Partly as shown, it is desirable to have the encoder network learn while being sufficiently lightweightThe encoding capability to the teacher network VGG encoder, where the loss function needs to be optimized, is as follows:
Figure BDA0002921029310000052
wherein, the encoder in the original VGG-based network is E, and the lightweight encoder is defined as
Figure BDA0002921029310000053
I' is a reconstructed picture obtained by reconstruction of a decoder, wherein I is an original image, an encoder in a VGG network is E, and a lightweight encoder is E
Figure BDA0002921029310000054
I' is the reconstructed picture, Ek(I) For the k-th layer output characteristic diagram in the original VGG encoder,
Figure BDA0002921029310000055
is a k-th layer output characteristic diagram in a lightweight encoder, and both lambda and gamma are hyper-parameters
And step 3: knowledge distillation is carried out on the lightweight decoder based on the VGG network, so that the decoder can learn the decoding capability of the VGG decoder in the teacher network while being light enough.
As in the network architecture of fig. 2
Figure BDA0002921029310000056
Partially shown, for the lightweight decoder network for feature decoding of the migrated features, the VGG network is used as a teacher network to perform knowledge distillation, and the decoder network needs to be capable of learning the decoding capability of the VGG decoder of the teacher network while having a sufficiently lightweight, where the loss function to be optimized is as follows:
Figure BDA0002921029310000057
wherein
Figure BDA0002921029310000058
Is implemented by a lightweight decoder
Figure BDA0002921029310000059
Decoding the resulting reconstructed picture, the above distillation process being aimed at
Figure BDA00029210293100000510
While the information of the original E can be retained,
Figure BDA00029210293100000511
can be well combined with
Figure BDA00029210293100000512
And performing image reconstruction on the obtained output characteristic information.
And 4, step 4: and the feature migration module based on semantic alignment fuses the content features and the style features obtained by the encoder coding to obtain fusion migration features based on semantic alignment.
The feature migration module based on semantic alignment is a key for realizing real-time stable video style migration, and needs to be capable of efficiently completing style feature migration and simultaneously performing feature semantic alignment. To achieve the above idea, the idea of manifold alignment is adopted. Assume that the feature map of the content image output from the encoder is Fc∈RCx(WxH)The output of the style image obtained by the lightweight coding network is Fs∈RCx(WxH)Wherein C is the number of the channels of the feature map, and W and H are the width and height of the feature map respectively. The FTM module is designed to output the feature F after semantic alignment migrationcsAnd outputs it to the decoder to obtain the migrated result map. In fact, the goal of the FTM module we have designed is to find a transform that enables semantically aligned feature migration of content maps of different video frames, assuming that the transform process can be parameterized as a projection matrix P e RCxCThen the optimized objective function is:
Figure BDA0002921029310000061
wherein ,
Figure BDA0002921029310000062
denotes from FcIn the operation of selecting the i-th position feature vector, AijTo represent
Figure BDA0002921029310000063
And
Figure BDA0002921029310000064
k of (2) is a neighbor matrix. Therefore, the objective function is to make the content features after conversion similar to the k-nearest neighbor features of the style feature space. Equivalent to that during the video style migration, there may be some moving objects and some lighting changes, which may cause jitter after the migration. But based on the above affine-preserving transformation, stable consistency can be kept between two adjacent frames, thereby generating a stable video style migration result.
Solving the above equation in effect calculates its closed form solution for P, which can be obtained by taking the derivative of P and making the derivative 0:
Figure BDA0002921029310000065
wherein A is the affine matrix defined above, U is the diagonal matrix, and
Figure BDA0002921029310000066
Figure BDA0002921029310000067
also a matrix. Since A is a diagonal matrix, it can be decomposed into TTT, the projection matrix P may thus be formalized as P ═ g (F)c)f(Fs)T) In the above linear conversion process, g (x) MX and f (x) XTT. Even if we can solve P in a closed form, the process of matrix inversion is very importantTime consuming, we have designed an FTM network module result to fit the solution process. Where the f (x) process we choose to fit with three convolutional layers, the g () process uses a fully-connected layer fit.
The content images needed to be used for training the self-encoder are preprocessed. Uniformly adjusting the image to 256 × 256 pixels; the content image is respectively input into a student self-encoder network and a teacher self-encoder network, and the part comprises an encoding part and a decoding part. The encoding section encodes the image; the decoding part reconstructs the input image according to the characteristic code obtained by the encoder. Meanwhile, through the feature perception loss and the reconstruction loss, as shown in fig. 2, the training method based on knowledge distillation ensures that the light-weight self-encoder network obtained by distillation can have the capabilities of multi-level feature extraction and feature-based image reconstruction; the content image and the style image are respectively sent into a style migration network added with a semantic alignment feature migration module as shown in fig. 3, the middle feature migration module is trained (a distilled lightweight self-encoder network is fixed), and the migration module is trained based on the designed content loss Lc and the style loss Ls.
In the testing stage, the video frames and the selected style images are directly input into a trained lightweight style migration model, the model automatically and efficiently outputs stylized results, and finally, stable stylized videos are synthesized in real time, as shown in fig. 4, the stylized videos are style migration results every 10 frames in one video, and it can be seen that the style migration with semantic alignment can be performed no matter whether the videos are foreground or background, so that stable video frame results are generated.

Claims (4)

1. A style migration method for automatically generating stylized video is characterized by comprising the following steps:
(1) constructing a video style migration network model, wherein the model comprises a high-compression self-encoder module based on knowledge distillation and a feature migration module based on semantic alignment; the self-encoder module comprises a lightweight encoder and a lightweight decoder;
(2) encoder encoding of content video frames and trellis diagrams: knowledge distillation is carried out on the lightweight encoder based on the VGG network, the encoder learns the encoding capacity of a teacher network VGG encoder while having enough lightweight, and original video content frames and style images are encoded into feature maps;
(3) a semantic alignment based feature migration module: fusing content features and style features obtained by encoding of an encoder to obtain fusion migration features based on semantic alignment;
(4) knowledge distillation of the lightweight decoder based on VGG network: the decoder can learn the decoding capability of a VGG decoder of a teacher network while the decoder is light enough, and the decoder decodes the fused and migrated features to obtain stylized video frames, and finally synthesizes the videos.
2. The method for automatically generating style migration of stylized video according to claim 1, wherein said step (2) is implemented by optimizing a loss function as follows:
Figure FDA0002921029300000011
wherein, I is an original image, an encoder in a VGG network is E, and a lightweight encoder is E
Figure FDA0002921029300000012
I' is the reconstructed picture, Ek(I) For the k-th layer output characteristic diagram in the original VGG encoder,
Figure FDA0002921029300000013
and the k-th layer output characteristic diagram in the lightweight encoder is shown, wherein lambda and gamma are both hyper-parameters.
3. The method for automatically generating style migration of stylized video according to claim 1, wherein said step (3) is implemented as follows:
the characteristic diagram of the content image output by the encoder is Fc∈RCx(WxH)The output obtained from the stylized image is Fs∈RCx (WxH)Wherein C is the number of the channels of the feature map, and W and H are the width and the height of the feature map respectively; the feature migration module based on semantic alignment aims at finding a feature migration which enables semantic alignment of content graphs of different video frames through conversion, and supposing that the conversion process can be parameterized into a projection matrix P e RCxCThen the optimized objective function is:
Figure FDA0002921029300000014
wherein ,
Figure FDA0002921029300000021
denotes from FcIn the operation of selecting the i-th position feature vector, AijTo represent
Figure FDA0002921029300000022
And
Figure FDA0002921029300000023
k neighbor matrix of (1);
solving for P as:
Figure FDA00029210293000000212
wherein A is the affine matrix defined above, U is the diagonal matrix, and
Figure FDA0002921029300000024
Figure FDA0002921029300000025
is a matrix with characteristic alignment function, and the projection matrix P is formed as P ═ g (F (F)c)f(Fs)T) In the linear conversion process, g (x) MX and f (x) XTT(ii) a The (x) process chooses to fit with three convolutional layers, and the g () process uses a fully-connected layer fit.
4. The method for automatically generating style migration of stylized video according to claim 1, characterized in that said step (4) is implemented by optimizing a loss function as follows:
Figure FDA0002921029300000026
wherein I is an original image, Ek(I) For the k-th layer output characteristic diagram in the original VGG encoder,
Figure FDA0002921029300000027
for the k-th layer output characteristic diagram in the lightweight encoder, I' is a decoder using lightweight
Figure FDA0002921029300000028
Decoding the resulting reconstructed picture with λ being a hyperparametric, the above distillation process being aimed at
Figure FDA0002921029300000029
While the information of the original E can be retained,
Figure FDA00029210293000000210
can be well combined with
Figure FDA00029210293000000211
And performing image reconstruction on the obtained output characteristic information.
CN202110117964.4A 2021-01-28 2021-01-28 Style migration method for automatically generating stylized video Active CN112884636B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110117964.4A CN112884636B (en) 2021-01-28 2021-01-28 Style migration method for automatically generating stylized video

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110117964.4A CN112884636B (en) 2021-01-28 2021-01-28 Style migration method for automatically generating stylized video

Publications (2)

Publication Number Publication Date
CN112884636A true CN112884636A (en) 2021-06-01
CN112884636B CN112884636B (en) 2023-09-26

Family

ID=76052976

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110117964.4A Active CN112884636B (en) 2021-01-28 2021-01-28 Style migration method for automatically generating stylized video

Country Status (1)

Country Link
CN (1) CN112884636B (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113989102A (en) * 2021-10-19 2022-01-28 复旦大学 Rapid style migration method with high shape-preserving property
CN114331827A (en) * 2022-03-07 2022-04-12 深圳市其域创新科技有限公司 Style migration method, device, equipment and storage medium
CN118283201A (en) * 2024-06-03 2024-07-02 上海蜜度科技股份有限公司 Video synthesis method, system, storage medium and electronic equipment

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190236814A1 (en) * 2016-10-21 2019-08-01 Google Llc Stylizing input images
CN110175951A (en) * 2019-05-16 2019-08-27 西安电子科技大学 Video Style Transfer method based on time domain consistency constraint
CN110310221A (en) * 2019-06-14 2019-10-08 大连理工大学 A kind of multiple domain image Style Transfer method based on generation confrontation network
CN110706151A (en) * 2018-09-13 2020-01-17 南京大学 Video-oriented non-uniform style migration method
US20200151938A1 (en) * 2018-11-08 2020-05-14 Adobe Inc. Generating stylized-stroke images from source images utilizing style-transfer-neural networks with non-photorealistic-rendering
US20200167658A1 (en) * 2018-11-24 2020-05-28 Jessica Du System of Portable Real Time Neurofeedback Training
CN111325681A (en) * 2020-01-20 2020-06-23 南京邮电大学 Image style migration method combining meta-learning mechanism and feature fusion
CN111932445A (en) * 2020-07-27 2020-11-13 广州市百果园信息技术有限公司 Compression method for style migration network and style migration method, device and system

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190236814A1 (en) * 2016-10-21 2019-08-01 Google Llc Stylizing input images
CN110706151A (en) * 2018-09-13 2020-01-17 南京大学 Video-oriented non-uniform style migration method
US20200151938A1 (en) * 2018-11-08 2020-05-14 Adobe Inc. Generating stylized-stroke images from source images utilizing style-transfer-neural networks with non-photorealistic-rendering
US20200167658A1 (en) * 2018-11-24 2020-05-28 Jessica Du System of Portable Real Time Neurofeedback Training
CN110175951A (en) * 2019-05-16 2019-08-27 西安电子科技大学 Video Style Transfer method based on time domain consistency constraint
CN110310221A (en) * 2019-06-14 2019-10-08 大连理工大学 A kind of multiple domain image Style Transfer method based on generation confrontation network
CN111325681A (en) * 2020-01-20 2020-06-23 南京邮电大学 Image style migration method combining meta-learning mechanism and feature fusion
CN111932445A (en) * 2020-07-27 2020-11-13 广州市百果园信息技术有限公司 Compression method for style migration network and style migration method, device and system

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
暴雨轩;芦天亮;杜彦辉;: "深度伪造视频检测技术综述", 计算机科学, no. 09, pages 289 - 298 *

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113989102A (en) * 2021-10-19 2022-01-28 复旦大学 Rapid style migration method with high shape-preserving property
CN114331827A (en) * 2022-03-07 2022-04-12 深圳市其域创新科技有限公司 Style migration method, device, equipment and storage medium
CN118283201A (en) * 2024-06-03 2024-07-02 上海蜜度科技股份有限公司 Video synthesis method, system, storage medium and electronic equipment

Also Published As

Publication number Publication date
CN112884636B (en) 2023-09-26

Similar Documents

Publication Publication Date Title
CN112884636A (en) Style migration method for automatically generating stylized video
CN113762322B (en) Video classification method, device and equipment based on multi-modal representation and storage medium
CN111862294B (en) Hand-painted 3D building automatic coloring network device and method based on ArcGAN network
CN111242844B (en) Image processing method, device, server and storage medium
CN112819833B (en) Large scene point cloud semantic segmentation method
CN113344188A (en) Lightweight neural network model based on channel attention module
WO2023151529A1 (en) Facial image processing method and related device
CN110930342A (en) Depth map super-resolution reconstruction network construction method based on color map guidance
CN111626968B (en) Pixel enhancement design method based on global information and local information
CN114332482A (en) Lightweight target detection method based on feature fusion
CN112381716A (en) Image enhancement method based on generation type countermeasure network
CN115829876A (en) Real degraded image blind restoration method based on cross attention mechanism
WO2023036157A1 (en) Self-supervised spatiotemporal representation learning by exploring video continuity
CN112837212B (en) Image arbitrary style migration method based on manifold alignment
CN117994447A (en) Auxiliary generation method and system for 3D image of vehicle model design oriented to sheet
Yu et al. Stacked generative adversarial networks for image compositing
CN112257464A (en) Machine translation decoding acceleration method based on small intelligent mobile device
CN116311455A (en) Expression recognition method based on improved Mobile-former
CN114118415B (en) Deep learning method of lightweight bottleneck attention mechanism
CN113706572B (en) End-to-end panoramic image segmentation method based on query vector
Wang et al. Boosting light field image super resolution learnt from single-image prior
CN110245677A (en) A kind of image descriptor dimension reduction method based on convolution self-encoding encoder
Lin Virtual reality and its application for producing TV programs
Zhang et al. Fusing Temporally Distributed Multi-Modal Semantic Clues for Video Question Answering
CN114513684B (en) Method for constructing video image quality enhancement model, video image quality enhancement method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant