WO2019214381A1 - 一种视频去模糊方法、装置、存储介质和电子装置 - Google Patents

一种视频去模糊方法、装置、存储介质和电子装置 Download PDF

Info

Publication number
WO2019214381A1
WO2019214381A1 PCT/CN2019/081702 CN2019081702W WO2019214381A1 WO 2019214381 A1 WO2019214381 A1 WO 2019214381A1 CN 2019081702 W CN2019081702 W CN 2019081702W WO 2019214381 A1 WO2019214381 A1 WO 2019214381A1
Authority
WO
WIPO (PCT)
Prior art keywords
image frame
network model
sample
convolution
blurred image
Prior art date
Application number
PCT/CN2019/081702
Other languages
English (en)
French (fr)
Inventor
张凯皓
罗文寒
马林
刘威
Original Assignee
腾讯科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 腾讯科技(深圳)有限公司 filed Critical 腾讯科技(深圳)有限公司
Priority to EP19799523.6A priority Critical patent/EP3792869A4/en
Publication of WO2019214381A1 publication Critical patent/WO2019214381A1/zh
Priority to US16/993,922 priority patent/US11688043B2/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T5/00Image enhancement or restoration
    • G06T5/73Deblurring; Sharpening
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/047Probabilistic or stochastic networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/088Non-supervised learning, e.g. competitive learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T5/00Image enhancement or restoration
    • G06T5/50Image enhancement or restoration using two or more images, e.g. averaging or subtraction
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T5/00Image enhancement or restoration
    • G06T5/60Image enhancement or restoration using machine learning, e.g. neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/44Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
    • G06V10/443Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components by matching or filtering
    • G06V10/449Biologically inspired filters, e.g. difference of Gaussians [DoG] or Gabor filters
    • G06V10/451Biologically inspired filters, e.g. difference of Gaussians [DoG] or Gabor filters with interaction between the filter responses, e.g. cortical complex cells
    • G06V10/454Integrating the filters into a hierarchical structure, e.g. convolutional neural networks [CNN]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10016Video; Image sequence
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20172Image enhancement details
    • G06T2207/20182Noise reduction or smoothing in the temporal domain; Spatio-temporal filtering
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20172Image enhancement details
    • G06T2207/20201Motion blur correction
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20212Image combination
    • G06T2207/20221Image fusion; Image merging

Definitions

  • the embodiments of the present invention relate to the field of computer technologies, and in particular, to a video deblurring method, a device storage medium, and an electronic device.
  • multi-frame images are mainly sent to a convolutional neural network model, a 2D convolution kernel is used to extract spatial information in a single-frame image, and a reconstruction loss function is used as supervised information to deblur the blurred video. deal with.
  • the embodiment of the present application provides a video deblurring method, device, storage medium and electronic device for improving the effect of video deblurring.
  • the embodiment of the present application provides the following technical solutions:
  • the embodiment of the present application provides a video deblurring method, including:
  • the electronic device acquires consecutive N image frames from the video segment, where N is a positive integer, and the N image frames include: a blurred image frame to be processed;
  • the electronic device performs a three-dimensional 3D convolution process on the N image frames by using a generated anti-network model to obtain spatio-temporal information corresponding to the blurred image frame, where the spatio-temporal information includes: spatial feature information of the blurred image frame, and Time characteristic information between the blurred image frame and the adjacent image frame in the N image frames;
  • the electronic device uses the spatiotemporal information corresponding to the blurred image frame, performs deblurring processing on the blurred image frame by using the generated confrontation network model, and outputs a clear image frame.
  • the embodiment of the present application further provides a video deblurring apparatus, including one or more processors, and one or more memories storing program units, where the program units are executed by a processor, and the program units include:
  • An obtaining module configured to obtain consecutive N image frames from the video segment, the N being a positive integer, the N image frames comprising: a blurred image frame to be processed;
  • a spatio-temporal information extraction module configured to perform a three-dimensional 3D convolution process on the N image frames by using a generated anti-network model, to obtain spatio-temporal information corresponding to the blurred image frame, where the spatio-temporal information includes: the blurred image frame Spatial feature information, and time characteristic information between the blurred image frame and adjacent image frames in the N image frames;
  • the deblurring processing module is configured to use the spatiotemporal information corresponding to the blurred image frame, perform deblurring processing on the blurred image frame by using the generated confrontation network model, and output a clear image frame.
  • constituent modules of the video deblurring apparatus may also perform the steps described in the foregoing aspects and various possible implementations, as described in detail in the foregoing aspects and various possible implementations.
  • an embodiment of the present application provides an electronic device, including: a processor and a memory; a memory is stored in the memory, and the processor is configured to execute the video deblurring method by using a computer program.
  • the embodiment of the present application provides a non-transitory computer readable storage medium, where the computer readable storage medium stores instructions, when executed on a computer, causing the computer to perform the foregoing aspects. method.
  • embodiments of the present application provide a computer program product comprising instructions that, when run on a computer, cause the computer to perform the methods described in the various aspects above.
  • the embodiments of the present application have the following advantages:
  • a continuous N image frames are first obtained from a video segment, a blurred image frame to be processed is included in the N image frames, and then 3D convolution processing is performed on the N image frames using the generated confrontation network model.
  • the spatiotemporal information includes: spatial feature information of the blurred image frame, and temporal feature information between the blurred image frame and the adjacent image frame in the N image frames.
  • the fuzzy image frame is deblurred by generating the anti-network model, and the clear image frame is output.
  • the 3D convolution operation can be used to generate the spatio-temporal information implicit between successive image frames, so that the spatio-temporal information corresponding to the blurred image frame is used, and the fuzzy is completed by generating the anti-network model.
  • the deblurring of the image frame so that a more realistic and clear image can be obtained, and the effect of video deblurring is improved.
  • FIG. 1 is a schematic block diagram of a video deblurring method according to an embodiment of the present disclosure
  • FIG. 2 is a schematic diagram of a process for performing deblurring on a blurred image frame by generating a network model according to an embodiment of the present application
  • FIG. 3 is a schematic diagram of a training process for generating a network model and an anti-network model according to an embodiment of the present application
  • FIG. 4 is a schematic structural diagram of a video deblurring apparatus according to an embodiment of the present disclosure
  • FIG. 4 is a schematic structural diagram of another video deblurring apparatus according to an embodiment of the present disclosure.
  • FIG. 4 is a schematic structural diagram of a structure of a spatiotemporal information extraction module according to an embodiment of the present application.
  • FIG. 4 is a schematic structural diagram of a model training module according to an embodiment of the present application.
  • FIG. 5 is a schematic structural diagram of a video deblurring method applied to a terminal according to an embodiment of the present disclosure
  • FIG. 6 is a schematic structural diagram of a video deblurring method applied to a server according to an embodiment of the present application.
  • the embodiment of the present application provides a video deblurring method and apparatus for improving the effect of video deblurring.
  • the embodiment of the present application mainly provides a video deblurring method based on deep learning.
  • the embodiment of the present application can complete the restoration of the blurred video by using the deep neural network, and is applied to the deblurring process of the video captured by the camera.
  • a video deblurring device is also provided, and the video deblurring device can pass the video processing software.
  • the mode is deployed in the terminal, and the video deblurring device can also be a server that stores video.
  • the video deblurring method provided by the embodiment of the present application uses a deep learning method to train a Generative Adversarial Nets (GAN) model, which may be implemented by a convolutional neural network model.
  • GAN Generative Adversarial Nets
  • the front and rear multi-frame images of each frame image are sent together to generate a confrontation network model, and the generated anti-network model is used to extract and integrate the multi-frame video, and the three-dimensional (3D) in the anti-network model is generated.
  • the dimensional, 3D) convolution kernel performs a 3D convolution operation to extract spatio-temporal information implicit between successive image frames, and uses a full convolution operation to perform a clear recovery of the blurred image frames in a proportional manner, thereby obtaining a more realistic and clear image.
  • the generated anti-network model adopted in the embodiment of the present application can effectively extract spatio-temporal information to process the blurred image frame, thereby automatically recovering the blurred video.
  • FIG. 1 is a schematic block diagram of a video deblurring method according to an embodiment of the present disclosure.
  • the video deblurring method may be performed by an electronic device, where the electronic device may be a terminal or a server.
  • the video deblurring method is performed by using an electronic device as an example. Referring to FIG. 1 , the following steps may be included:
  • the electronic device acquires consecutive N image frames from the video segment, where N is a positive integer, and the N image frames include: a blurred image frame to be processed.
  • the video segment may be a video recorded by the terminal through the camera, or may be a video downloaded by the terminal from the network, as long as at least one frame of the blurred image in the video segment can be implemented by the present application.
  • the video deblurring method provided by the example recovers a clear image.
  • consecutive N image frames are obtained from the video segment, and at least one of the N image frames includes a blurred image frame to be processed, which may be caused by camera shake or motion of the subject.
  • a plurality of image frames that are first acquired in the embodiment of the present application may have a blurred image frame to be processed.
  • the blurred image frame may be an intermediate image frame of the consecutive N image frames.
  • the value of N may be 3.
  • the blurred image frame may be the second image frame, or the value of N is 5, and the blurred image frame may be the third image frame.
  • the value of N is a positive integer and is not limited here.
  • the electronic device uses the generated confrontation network model to perform 3D convolution processing on the N image frames to obtain spatiotemporal information corresponding to the blurred image frame, and the spatiotemporal information includes: spatial feature information of the blurred image frame, and a blurred image in the N image frames. Time characteristic information between a frame and an adjacent image frame.
  • the trained generated anti-network model may be used for the deblurring process of the video, and after the consecutive N image frames are acquired, the consecutive N image frames are input into the generated confrontation network model.
  • the 3D convolution operation is performed using a 3D convolution kernel generated in the anti-network model to extract spatio-temporal information implicit between successive image frames.
  • the space-time information includes: spatial feature information of the blurred image frame, that is, the spatial feature information is hidden in the single-frame blurred image, and the time feature information is time information between the blurred image frame and the adjacent image frame, for example, through a 3D volume.
  • the product operation can extract a fuzzy image frame and two image frames before the blurred image frame and time feature information of the two image frames after the blurred image frame, which can be extracted by the 3D convolution kernel in the embodiment of the present application.
  • Spatio-temporal information that is, temporal feature information and spatial feature information. Therefore, the feature information hidden between successive images in a video can be effectively utilized, and the generated network model can be improved in combination with the trained generation to improve the deblurring effect of the blurred image frame. For details, see the video deblurring in the subsequent embodiment. instruction of.
  • the electronic device uses the spatiotemporal information corresponding to the blurred image frame, and performs deblurring processing on the blurred image frame by generating a confrontation network model, and outputs a clear image frame.
  • the spatiotemporal information corresponding to the blurred image frame may be used as the image feature, and the image feature is generated.
  • the prediction output is performed against the network model, and the output of the generated anti-network model is a clear image frame obtained by deblurring the blurred image frame. Since the 3D convolution operation is used to generate the anti-network model in the embodiment of the present application, the temporal feature information and the spatial feature information may be extracted, and the feature information may be used to predict a clear image frame corresponding to the blurred image frame.
  • the 3D convolution kernel is mainly used to process consecutive video frames, so that the spatiotemporal information implicit in the continuous video frame can be extracted more effectively, and the generated confrontation network model can be used to better ensure the clearness of the recovery.
  • the video is more realistic.
  • the generating a confrontation network model provided by the embodiment of the present application includes: generating a network model and an anti-network model.
  • the generated confrontation network model in the embodiment of the present application includes at least two network models, one of which is a generated network model, and the other is a discriminant network model, which generates a confrontation network by generating a network model and discriminating the mutual learning of the network model.
  • the model produces a fairly good output.
  • the video deblurring method provided by the embodiment of the present application further includes:
  • the electronic device acquires consecutive N sample image frames and true clear image frames for discriminating from the video sample library, and the N sample image frames include: sample blurred image frames for training, true clear image frames and sample blur. Corresponding to image frames;
  • A2 The electronic device extracts spatio-temporal information corresponding to the sample blurred image frame from the N sample image frames by using a 3D convolution kernel in the generated network model;
  • the electronic device uses the spatiotemporal information corresponding to the sample blurred image frame, deblurs the sample blurred image frame by generating a network model, and outputs a clear image frame of the sample;
  • the electronic device alternately trains the generated network model and the discriminant network model according to the sample clear image frame and the real clear image frame.
  • a video sample library may be set for training and discriminating the model, for example, using a continuous N sample image frames for model training, where the “sample image frame” is different from the image in step 101.
  • a frame the sample image frame is a sample image in the video sample library, and includes a sample blurred image frame in the N sample image frames.
  • a true clear image frame is also provided, and the real clear image frame is provided.
  • the real clear image frame is a true clear image frame corresponding to the sample blurred image frame.
  • the spatio-temporal information corresponding to the sample blurred image frame is extracted from the N sample image frames by using the 3D convolution kernel in the generated network model, and the spatio-temporal information may include: spatial feature information of the sample blurred image frame, and The temporal feature information between the sample blurred image frame and the adjacent image frame in the sample image frame, and the generated network model may be a convolutional neural network model.
  • the spurious information corresponding to the sample blurred image frame is used to deblur the sample blurred image frame by generating the network model, and the subsequent embodiment details the training process of generating the network model. It is indicated that the sample clear image frame can be output by generating a deblurring process of the network model.
  • the sample clear image frame is the result of generating a network model to deblur the sample blurred image frame.
  • a network model is generated, including: a first 3D convolution kernel and a second 3D convolution kernel.
  • the step A2 uses the 3D convolution kernel in the generated network model to extract spatio-temporal information corresponding to the sample blurred image frame from the N sample image frames, including:
  • A11 performing convolution processing on the N sample image frames by using the first 3D convolution kernel to obtain a low-level space-time feature corresponding to the sample blurred image frame;
  • Convolution processing is performed on the low-level space-time feature by using the second 3D convolution kernel to obtain a high-level space-time feature corresponding to the sample blurred image frame;
  • the high-level space-time features corresponding to the sample blurred image frame are fused together to obtain spatio-temporal information corresponding to the sample blurred image frame.
  • the generating network model first sets two 3D convolution layers, and each 3D convolution layer can use different 3D convolution kernels, for example, the first 3D convolution kernel and the second 3D convolution kernel have different weight parameters.
  • convolution processing is performed on the N sample image frames by using the first 3D convolution kernel to obtain a low-level space-time feature corresponding to the sample blurred image frame, wherein the low-level space-time feature refers to inconspicuous feature information, such as features such as lines. .
  • the low-level space-time feature is used as the input condition, and the convolution process is performed on the next 3D convolution layer to obtain the high-level space-time feature corresponding to the sample blurred image frame.
  • the high-level space-time feature refers to the feature information of different image frames before and after.
  • the spatio-temporal information corresponding to the sample blurred image frame is obtained, and the spatio-temporal information can be used as a feature map to generate training of the network model.
  • the first 3D convolution kernel is used to convolve the five sample image frames to obtain three low-level space-time features of different dimensions, and then the second 3D convolution kernel is used to convolve the low-level space-time features.
  • the high-level space-time features corresponding to the sample blurred image frames are obtained, and the high-level space-time features are fused together to obtain the spatio-temporal information corresponding to the sample blurred image frames.
  • the network model is generated, and further includes: M 2D convolution kernels, where M is a positive integer.
  • Step A3 The electronic device uses the spatiotemporal information corresponding to the sample blurred image frame, and performs deblurring processing on the sample blurred image frame by generating a network model, and outputting the sample clear image frame includes:
  • the generated network model in the embodiment of the present application not only has two 3D convolution kernels, but also may have multiple 2D convolution kernels, and convolves the spatiotemporal information corresponding to the sample blurred image frames sequentially by using multiple 2D convolution kernels. Processing, after convolution processing through the last 2D convolution kernel of the M 2D convolution kernels, a clear image frame of the sample is obtained.
  • the detailed implementation process of the 2D convolution kernel can be referred to the description in the subsequent embodiments.
  • the odd 2D convolution kernels in the M 2D convolution kernels include: a convolutional layer, a normalization layer, and an activation function
  • the even 2D convolution kernels in the M 2D convolution kernels include : Convolutional layer and activation function.
  • each 2D convolution kernel may be combined with an application scenario, wherein the odd 2D convolution kernel refers to the first 2D convolution kernel and the third 2D convolution in the M 2D convolution kernels.
  • Nuclei, etc., odd 2D convolution kernels may include: a convolutional layer, a normalized layer, and a function (Rectified Linear Units, ReLu).
  • the even 2D convolution kernels in the M 2D convolution kernels refer to the 2nd 2D convolution kernel, the 4th 2D convolution kernel, etc. in the M convolution kernels, and the even 2D convolution kernel includes: the convolution layer And the activation function.
  • the convolution layer And the activation function For the detailed calculation process of the normalization layer and the activation function, refer to the description in the convolutional neural network, which will not be described here.
  • the step A4 electronic device alternately trains the generated network model and the discriminant network model according to the sample clear image frame and the real clear image frame, including:
  • the network model is generated by training the reconstruction loss function
  • a discriminant network model can also be introduced.
  • the network model is first trained to generate a sample fuzzy image frame into the generated network model.
  • the generated sample clear image frame is compared with the real clear image frame to obtain a reconstruction loss function, and the weight parameter of the network model is generated by the reconstruction loss function.
  • the network model is trained, and the clear video and the generated sample clear video are sent to the discriminant network model to obtain the anti-loss function.
  • the network model is generated by adjusting the loss function, so that the discriminant network model has the true and clear image and the fuzzy image. The ability to generate clear images from frames, thereby completing the structure of alternately training two network models.
  • step A4 alternately trains the generated network model and the discriminant network model according to the sample clear image frame and the real clear image frame, in addition to the foregoing steps A41 to A44, Including the following steps:
  • A46 performing weighted fusion on the re-acquired reconstruction loss function and the re-acquired anti-loss function to obtain a combined loss function
  • step A45 to step A47 are executed based on the two network models after the initial training, and when the network model is trained, two loss functions are used together. Adjust the structure of the generated network model so that the image can be similar to a true and clear image at the pixel level, while looking more like a clear image overall.
  • the two loss functions can be combined by a weight parameter, which can be used to control the magnitude of the action of the two loss functions on the feedback adjustment.
  • the purpose of generating the network model is to generate a clear video from the blurred video, and the role of the network model is to distinguish whether the incoming video frame is a true clear image or a generated clear image.
  • a continuous N image frames are obtained from a video segment, a blurred image frame to be processed is included in the N image frames, and then N image frames are generated using the generated network model.
  • the 3D convolution process is performed to obtain spatiotemporal information corresponding to the blurred image frame, and the spatiotemporal information includes: spatial feature information of the blurred image frame, and temporal feature information between the blurred image frame and the adjacent image frame in the N image frames.
  • the fuzzy image frame is deblurred by generating the anti-network model, and the clear image frame is output.
  • the 3D convolution operation can be used to generate the spatio-temporal information implicit between successive image frames, so that the spatio-temporal information corresponding to the blurred image frame is used, and the fuzzy is completed by generating the anti-network model.
  • the deblurring of the image frame so that a more realistic and clear image can be obtained, and the effect of video deblurring is improved.
  • the video deblurring method provided by the embodiment of the present application may provide a video deblurring service.
  • the video deblurring method provided by the embodiment of the present application is applied to the mobile phone and the digital camera, which can make the captured blurred video become blurred due to device shake and motion of the captured object. Clearer.
  • the video deblurring method provided by the embodiment of the present application may be deployed in a background server. When the user uploads some of the video that is captured by the user, the video deblurring method provided by the embodiment of the present application is used to change the video of the user. Be more clear.
  • the video deblurring method provided by the embodiment of the present application adopts an end-to-end video processing method, including preprocessing a video frame, extracting low-level spatio-temporal information of a video frame, and extracting high-level spatio-temporal information of the video frame, and using two loss functions. Model training is performed, and finally the resulting model is used to reconstruct a clear video.
  • the specific flow chart of the method is shown in Figure 1.
  • FIG. 2 a schematic diagram of a process for performing deblurring on a blurred image frame by generating a network model is provided in the embodiment of the present application.
  • the specific scheme is as follows: for a video of length T seconds, an adjacent 5 frames of images are selected as input, and the first two convolutional layers use a 3D convolution operation to extract temporal and spatial information existing in adjacent video frames, when performing 3D twice. After the convolution operation, since 5 frames are fed, after two 3D convolution operations, the features can be better fused together, so the number of channels in the time series is changed from 5 to 1, and then 33 2D volumes are used.
  • the product core performs feature extraction and image reconstruction operations on the image.
  • the time information has been merged into the spatial information to obtain the spatio-temporal information. Therefore, the 2D convolution operation is used, and after each convolution, the normalized layer and the ReLU activation function are used to process the output. From Layer 3 to Layer 32, the odd-level convolution operation is followed by the BN normalization layer and the ReLU activation function, and the even-level convolution operation is followed by the BN normalization layer, and the 33rd to 34th layer convolution operations. After using the ReLU function, after the 35th layer convolution operation, the final clear video frame is obtained. The whole operation uses the full convolution operation, that is, the full convolution means that the full link layer is not used because the image does not need to be upsampled.
  • the anti-network structure can also be introduced in the embodiment of the present application.
  • the embodiment of the present application uses the reconstruction loss function and the anti-loss function, and the generation network adaptively adjusts to make the image clear, so the obtained video is more real.
  • the video deblurring operation scheme used in the present scheme is mainly based on a convolutional neural network method, and the 3D convolution can extract the advantage extraction feature of the spatiotemporal feature.
  • the spatio-temporal information contained in the adjacent video and then reconstruct the blurred video to obtain a clear video.
  • the 3D convolution operation is:
  • b is the offset function
  • g is the network weight
  • v is the feature value in the sent feature map
  • m is a total of several images.
  • FIG. 3 is a schematic diagram of a training process for generating a network model and an anti-network model according to an embodiment of the present application.
  • the discriminant network model (referred to as the discriminant network for short) and the generated network model (referred to as the generation network for short) constitute a confrontation network. Fight between the two.
  • the anti-network structure is introduced, and the network structure of Figure 2 is used as a generator (ie, a network model is generated), and a discriminator is added ( That is, the network model is determined).
  • the network is first trained to generate a fuzzy video frame into the generation network, and the generated clear video frame is obtained, and compared with the real video frame to obtain a reconstruction loss function (ie, the loss function in FIG. 3). 1) The weight parameter of the network is generated by the loss adjustment.
  • the real clear video and the generated clear video are sent to the discriminant network to obtain the anti-loss function (ie, the loss function 2 in Fig. 3), and the network structure is generated by adjusting the anti-loss function to make the discriminant network have the judgment.
  • the embodiment of the present application uses two loss functions, that is, based on pixel difference values.
  • the content loss function ie, the reconstruction loss function
  • the adversarial loss function ie, the adversarial loss function
  • the loss function based on the pixel difference is:
  • W and H represent the length and width of the video frame
  • I the pixel value of the true clear video frame at the position (x, y) position
  • G(I blurry ) x, y is the value of the generated video frame corresponding to the position.
  • the counter-loss function is:
  • G(I blurry ) is the possibility that the generated video frame considered by the network is a real video
  • D represents the discrimination network
  • L L content +a ⁇ L adversarial .
  • a represents the weight of the two.
  • the generation network can perform parameter adjustments to get a better generation network.
  • the method provided by the embodiment of the present application can improve the existing video deblurring capability, and can automatically perform a deblurring operation on the video, which can be used for subsequent processing after the video is captured by a device such as a mobile phone or a digital camera, and can also be used for a network backend server. Deblur the video uploaded by the user to get a clearer video.
  • FIG. 4 is a schematic structural diagram of a video deblurring apparatus according to an embodiment of the present disclosure.
  • a video deblurring apparatus 400 is provided in an embodiment of the present application, where the apparatus includes one or more processors, and one or more memories storing program units, where the program units are processed.
  • the program unit may include: an obtaining module 401, a spatiotemporal information extracting module 402, and a deblurring processing module 403, where
  • the obtaining module 401 is configured to obtain consecutive N image frames from the video segment, where N is a positive integer, and the N image frames include: a blurred image frame to be processed;
  • the spatiotemporal information extraction module 402 is configured to perform a three-dimensional 3D convolution process on the N image frames by using a generated anti-network model to obtain spatio-temporal information corresponding to the blurred image frame, where the spatio-temporal information includes: the blurred image frame Spatial feature information, and temporal feature information between the blurred image frame and adjacent image frames in the N image frames;
  • the deblurring processing module 403 is configured to use the spatiotemporal information corresponding to the blurred image frame, perform deblurring processing on the blurred image frame by using the generated confrontation network model, and output a clear image frame.
  • the generating an anti-network model includes: generating a network model and an anti-network model.
  • FIG. 4 is a schematic structural diagram of another video deblurring apparatus according to an embodiment of the present disclosure. As shown in FIG. 4-b, the program unit further includes: a model training module 404, where
  • the obtaining module 401 is further configured to acquire consecutive N sample image frames and a true clear image frame for discriminating from a video sample library, where the N sample image frames include: a sample blurred image frame for training And the real clear image frame corresponds to the sample blurred image frame;
  • the spatiotemporal information extraction module 402 is further configured to extract spatiotemporal information corresponding to the sample blurred image frame from the N sample image frames by using a 3D convolution kernel in the generated network model;
  • the deblurring processing module 403 is further configured to use the spatiotemporal information corresponding to the sample blurred image frame, perform deblurring processing on the sample blurred image frame by using the generated network model, and output a sample clear image frame;
  • the model training module 404 is configured to perform alternate training on the generated network model and the discriminant network model according to the sample clear image frame and the real clear image frame.
  • the generating a network model includes: a first 3D convolution kernel and a second 3D convolution kernel.
  • FIG. 4 is a schematic structural diagram of a structure of a spatiotemporal information extraction module according to an embodiment of the present application. As shown in Figure 4-c, the spatio-temporal information extraction module 402 includes:
  • the first convolution unit 4021 is configured to perform convolution processing on the N sample image frames by using the first 3D convolution kernel to obtain a low-level space-time feature corresponding to the sample blurred image frame;
  • the second convolution unit 4022 is configured to perform convolution processing on the low-level space-time feature by using the second 3D convolution kernel to obtain a high-level space-time feature corresponding to the sample blurred image frame;
  • the spatiotemporal feature fusion unit 4023 is configured to fuse the high-level spatiotemporal features corresponding to the sample blurred image frame to obtain spatiotemporal information corresponding to the sample blurred image frame.
  • the generating a network model further includes: M 2D convolution kernels, where M is a positive integer.
  • the deblurring processing module 403 is specifically configured to perform convolution processing on the spatiotemporal information corresponding to the sample blurred image frame by using the 2D convolution kernels of the M 2D convolution kernels, and go through the M After the convolution process of the last 2D convolution kernel in the 2D convolution kernel, the sample clear image frame is obtained.
  • the odd 2D convolution kernels in the M 2D convolution kernels include: a convolutional layer, a normalization layer, and an activation function, an even 2D of the M 2D convolution kernels
  • the convolution kernel includes: a convolution layer and an activation function.
  • FIG. 4 is a schematic structural diagram of a model training module according to an embodiment of the present application.
  • the model training module 404 includes:
  • the loss function obtaining unit 4041 is configured to acquire a reconstruction loss function according to the sample clear image frame and the real clear image frame;
  • Generating a network model training unit 4042 configured to train the generated network model by the reconstruction loss function
  • the discriminant network model training unit 4043 is configured to train the discriminant network model using the real clear image frame and the sample clear image frame to obtain an anti-loss function outputted by the discriminant network model;
  • the generating network model training unit 4042 is further configured to continue training the generated network model by the counter-loss function.
  • the loss function obtaining unit 4041 is further configured to reacquire the reconstruction loss by using the generated network model after continuing to train the generated network model by the anti-loss function. function;
  • the discriminant network model training unit 4043 is further configured to reacquire the anti-loss function through the discriminant network model
  • the loss function obtaining unit 4041 is further configured to perform weighted fusion on the re-acquired reconstruction loss function and the re-acquired anti-loss function to obtain a fused loss function;
  • the generating network model training unit 4042 is further configured to continue training the generated network model by the merged loss function.
  • N image frames are obtained from a video segment
  • a blurred image frame to be processed is included in the N image frames
  • N image frames are subjected to 3D using a generated confrontation network model.
  • the convolution process obtains spatiotemporal information corresponding to the blurred image frame, and the spatiotemporal information includes: spatial feature information of the blurred image frame, and temporal feature information between the blurred image frame and the adjacent image frame in the N image frames.
  • the fuzzy image frame is deblurred by generating the anti-network model, and the clear image frame is output.
  • the 3D convolution operation can be used to generate the spatio-temporal information implicit between successive image frames, so that the spatio-temporal information corresponding to the blurred image frame is used, and the fuzzy is completed by generating the anti-network model.
  • the deblurring of the image frame so that a more realistic and clear image can be obtained, and the effect of video deblurring is improved.
  • FIG. 5 is a schematic structural diagram of a video deblurring method applied to the terminal according to the embodiment of the present application.
  • the terminal may be a mobile phone. As shown in FIG. 5, for the convenience of description, only the parts related to the embodiments of the present application are shown. If the specific technical details are not disclosed, please refer to the method part of the embodiment of the present application.
  • the terminal may be any terminal device including a mobile phone, a tablet computer, a personal digital assistant (PDA), a point of sales (POS), a car computer, and the like, and the terminal is a mobile phone as an example:
  • the mobile phone includes: a radio frequency (RF) circuit 1010, a memory 1020, an input unit 1030, a display unit 1040, a sensor 1050, an audio circuit 1060, a wireless fidelity (WiFi) module 1070, and a processor 1080. And power supply 1090 and other components.
  • RF radio frequency
  • the structure of the handset shown in FIG. 5 does not constitute a limitation to the handset, and may include more or less components than those illustrated, or some components may be combined, or different components may be arranged.
  • the RF circuit 1010 can be configured to receive and transmit signals during and after the transmission or reception of information, in particular, after receiving the downlink information of the base station, and processing it to the processor 1080; in addition, transmitting the designed uplink data to the base station.
  • RF circuit 1010 includes, but is not limited to, an antenna, at least one amplifier, a transceiver, a coupler, a Low Noise Amplifier (LNA), a duplexer, and the like.
  • LNA Low Noise Amplifier
  • RF circuit 1010 can also communicate with the network and other devices via wireless communication.
  • the above wireless communication may use any communication standard or protocol, including but not limited to Global System of Mobile communication (GSM), General Packet Radio Service (GPRS), Code Division Multiple Access (Code Division). Multiple Access (CDMA), Wideband Code Division Multiple Access (WCDMA), Long Term Evolution (LTE), E-mail, Short Messaging Service (SMS), and the like.
  • GSM Global System of Mobile communication
  • GPRS General Packet
  • the memory 1020 can be configured to store software programs and modules, and the processor 1080 executes various functional applications and data processing of the mobile phone by running software programs and modules stored in the memory 1020.
  • the memory 1020 may mainly include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application required for at least one function (such as a sound playing function, an image playing function, etc.), and the like; the storage data area may be stored according to Data created by the use of the mobile phone (such as audio data, phone book, etc.).
  • memory 1020 can include high speed random access memory, and can also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other volatile solid state storage device.
  • the input unit 1030 can be configured to receive input numeric or character information and to generate key signal inputs related to user settings and function controls of the handset.
  • the input unit 1030 may include a touch panel 1031 and other input devices 1032.
  • the touch panel 1031 also referred to as a touch screen, can collect touch operations on or near the user (such as the user using a finger, a stylus, or the like on the touch panel 1031 or near the touch panel 1031. Operation), and drive the corresponding connecting device according to a preset program.
  • the touch panel 1031 may include two parts: a touch detection device and a touch controller.
  • the touch detection device detects the touch orientation of the user, and detects a signal brought by the touch operation, and transmits the signal to the touch controller; the touch controller receives the touch information from the touch detection device, converts the touch information into contact coordinates, and sends the touch information.
  • the processor 1080 is provided and can receive commands from the processor 1080 and execute them.
  • the touch panel 1031 can be implemented in various types such as resistive, capacitive, infrared, and surface acoustic waves.
  • the input unit 1030 may also include other input devices 1032.
  • other input devices 1032 may include, but are not limited to, one or more of a physical keyboard, function keys (such as volume control buttons, switch buttons, etc.), trackballs, mice, joysticks, and the like.
  • the display unit 1040 can be configured to display information input by the user or information provided to the user as well as various menus of the mobile phone.
  • the display unit 1040 may include a display panel 1041.
  • the display panel 1041 may be configured in the form of a liquid crystal display (LCD), an organic light-emitting diode (OLED), or the like.
  • the touch panel 1031 may cover the display panel 1041, and when the touch panel 1031 detects a touch operation thereon or nearby, the touch panel 1031 transmits to the processor 1080 to determine the type of the touch event, and then the processor 1080 according to the touch event. The type provides a corresponding visual output on display panel 1041.
  • touch panel 1031 and the display panel 1041 are used as two independent components to implement the input and input functions of the mobile phone in FIG. 5, in some embodiments, the touch panel 1031 and the display panel 1041 may be integrated. Realize the input and output functions of the phone.
  • the handset can also include at least one type of sensor 1050, such as a light sensor, motion sensor, and other sensors.
  • the light sensor may include an ambient light sensor and a proximity sensor, wherein the ambient light sensor may adjust the brightness of the display panel 1041 according to the brightness of the ambient light, and the proximity sensor may close the display panel 1041 and/or when the mobile phone moves to the ear. Or backlight.
  • the accelerometer sensor can detect the magnitude of acceleration in all directions (usually three axes). When it is stationary, it can detect the magnitude and direction of gravity.
  • the mobile phone can be used to identify the gesture of the mobile phone (such as horizontal and vertical screen switching, related Game, magnetometer attitude calibration), vibration recognition related functions (such as pedometer, tapping), etc.; as for the mobile phone can also be configured with gyroscopes, barometers, hygrometers, thermometers, infrared sensors and other sensors, no longer Narration.
  • the gesture of the mobile phone such as horizontal and vertical screen switching, related Game, magnetometer attitude calibration
  • vibration recognition related functions such as pedometer, tapping
  • the mobile phone can also be configured with gyroscopes, barometers, hygrometers, thermometers, infrared sensors and other sensors, no longer Narration.
  • An audio circuit 1060, a speaker 1061, and a microphone 1062 can provide an audio interface between the user and the handset.
  • the audio circuit 1060 can transmit the converted electrical data of the received audio data to the speaker 1061, and convert the speaker 1061 into a sound signal to output a sample clear image frame; on the other hand, the microphone 1062 converts the collected sound signal into an electrical signal.
  • the audio circuit 1060 After being received by the audio circuit 1060, it is converted into audio data, and then processed by the audio data output processor 1080, transmitted to the mobile phone 1010 via the RF circuit 1010, or outputted to the memory 1020 for further processing.
  • WiFi is a short-range wireless transmission technology.
  • the mobile phone can help users to send and receive emails, browse web pages and access streaming media through the WiFi module 1070. It provides users with wireless broadband Internet access.
  • FIG. 5 shows the WiFi module 1070, it can be understood that it does not belong to the essential configuration of the mobile phone, and may be omitted as needed within the scope of not changing the essence of the invention.
  • the processor 1080 is the control center of the handset, which connects various portions of the entire handset using various interfaces and lines, by executing or executing software programs and/or modules stored in the memory 1020, and invoking data stored in the memory 1020, The phone's various functions and processing data, so that the overall monitoring of the phone.
  • the processor 1080 may include one or more processing units; preferably, the processor 1080 may integrate an application processor and a modem processor, where the application processor mainly processes an operating system, a user interface, an application, and the like.
  • the modem processor primarily handles wireless communications. It will be appreciated that the above described modem processor may also not be integrated into the processor 1080.
  • the mobile phone also includes a power source 1090 (such as a battery) that supplies power to various components.
  • a power source 1090 such as a battery
  • the power source can be logically coupled to the processor 1080 through a power management system to manage functions such as charging, discharging, and power management through the power management system.
  • the mobile phone may further include a camera, a Bluetooth module, and the like, and details are not described herein again.
  • the processor 1080 included in the terminal further has a flow of controlling a video deblurring method performed by the terminal.
  • FIG. 6 is a schematic structural diagram of a video deblurring method applied to a server according to an embodiment of the present application.
  • the server 1100 may generate a large difference due to different configurations or performances, and may include one or more central processing units (central processing units). , CPU) 1122 (eg, one or more processors) and memory 1132, one or more storage media 1130 storing application 1142 or data 1144 (eg, one or one storage device in Shanghai).
  • the memory 1132 and the storage medium 1130 may be short-term storage or persistent storage.
  • the program stored on storage medium 1130 may include one or more modules (not shown), each of which may include a series of instruction operations in the server.
  • central processor 1122 can be configured to communicate with storage medium 1130, executing a series of instruction operations in storage medium 1130 on server 1100.
  • Server 1100 may also include one or more power sources 1126, one or more wired or wireless network interfaces 1150, one or more input and output interfaces 1158, and/or one or more operating systems 1141, such as Windows ServerTM, Mac OS XTM, UnixTM, LinuxTM, FreeBSDTM and more.
  • operating systems 1141 such as Windows ServerTM, Mac OS XTM, UnixTM, LinuxTM, FreeBSDTM and more.
  • the steps of the video deblurring method performed by the server in the above embodiment may be based on the server structure shown in FIG. 6.
  • the device embodiments described above are merely illustrative, wherein the units described as separate components may or may not be physically separated, and the components displayed as units may or may not be Physical units can be located in one place or distributed to multiple network elements. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the embodiment.
  • the connection relationship between the modules indicates that there is a communication connection between them, and specifically may be implemented as one or more communication buses or signal lines.
  • the technical solution of the embodiments of the present application may be embodied in the form of a software product in essence or in a form of a software product, which is stored in a readable storage medium, such as a computer.
  • a continuous N image frames are first obtained from a video segment, a blurred image frame to be processed is included in the N image frames, and then 3D convolution processing is performed on the N image frames using the generated confrontation network model.
  • the spatiotemporal information includes: spatial feature information of the blurred image frame, and temporal feature information between the blurred image frame and the adjacent image frame in the N image frames.
  • the fuzzy image frame is deblurred by generating the anti-network model, and the clear image frame is output.
  • the 3D convolution operation can be used to generate the spatio-temporal information implicit between successive image frames, so that the spatio-temporal information corresponding to the blurred image frame is used, and the fuzzy is completed by generating the anti-network model.
  • the deblurring of the image frame so that a more realistic and clear image can be obtained, and the effect of video deblurring is improved.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Biomedical Technology (AREA)
  • Molecular Biology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Biophysics (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Biodiversity & Conservation Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Multimedia (AREA)
  • Probability & Statistics with Applications (AREA)
  • Image Processing (AREA)
  • Image Analysis (AREA)

Abstract

本申请实施例公开了一种视频去模糊方法、装置、存储介质和电子装置,用于提高视频去模糊的效果。本申请实施例提供一种视频去模糊方法,包括:电子设备从视频片段中获取连续的N个图像帧,所述N为正整数,所述N个图像帧包括:待处理的模糊图像帧;电子设备使用生成对抗网络模型对所述N个图像帧进行三维3D卷积处理,得到所述模糊图像帧对应的时空信息,所述时空信息包括:所述模糊图像帧的空间特征信息,以及在所述N个图像帧中所述模糊图像帧与相邻图像帧之间的时间特征信息;电子设备使用所述模糊图像帧对应的时空信息,通过所述生成对抗网络模型对所述模糊图像帧进行去模糊处理,输出清晰图像帧。

Description

一种视频去模糊方法、装置、存储介质和电子装置
本申请要求于2018年05月09日提交中国专利局、申请号为201810438831.5、发明名称“一种视频去模糊方法和装置”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请实施例涉及计算机技术领域,尤其涉及一种视频去模糊方法、装置存储介质和电子装置。
背景技术
当用户使用手机或者数码相机进行视频拍摄的时候,常常因为设备抖动以及拍摄物体的运动而导致模糊,因此用户有将拍摄的模糊视频变得更加清晰的现实需求。
目前提出基于深度学习方法完成视频图像的去模糊处理。相关技术中主要是将多帧图像一起送入卷积神经网络模型,使用2D卷积核来提取单帧图像中的空间信息,使用重构损失函数作为监督信息,来对模糊的视频进行去模糊处理。
在相关技术的视频去模糊方案中,由于使用的是2D卷积核,只能提取到单帧图像中的空间信息,无法提取视频内的图像之间的信息,使得相关技术中卷积神经网络模型利用空间信息的能力有限。同时因为仅仅使用基于像素的重构损失函数作为监督信息,因此去模糊之后的视频显得不够真实,降低了视频去模糊的效果。
发明内容
本申请实施例提供了一种视频去模糊方法、装置、存储介质和电子装置,用于提高视频去模糊的效果。
为解决上述技术问题,本申请实施例提供以下技术方案:
一方面,本申请实施例提供一种视频去模糊方法,包括:
电子设备从视频片段中获取连续的N个图像帧,所述N为正整数,所述N 个图像帧包括:待处理的模糊图像帧;
电子设备使用生成对抗网络模型对所述N个图像帧进行三维3D卷积处理,得到所述模糊图像帧对应的时空信息,所述时空信息包括:所述模糊图像帧的空间特征信息,以及在所述N个图像帧中所述模糊图像帧与相邻图像帧之间的时间特征信息;
电子设备使用所述模糊图像帧对应的时空信息,通过所述生成对抗网络模型对所述模糊图像帧进行去模糊处理,输出清晰图像帧。
另一方面,本申请实施例还提供一种视频去模糊装置,包括一个或多个处理器,以及一个或多个存储程序单元的存储器,其中,程序单元由处理器执行,程序单元包括:
获取模块,被设置为从视频片段中获取连续的N个图像帧,所述N为正整数,所述N个图像帧包括:待处理的模糊图像帧;
时空信息提取模块,被设置为使用生成对抗网络模型对所述N个图像帧进行三维3D卷积处理,得到所述模糊图像帧对应的时空信息,所述时空信息包括:所述模糊图像帧的空间特征信息,以及在所述N个图像帧中所述模糊图像帧与相邻图像帧之间的时间特征信息;
去模糊处理模块,被设置为使用所述模糊图像帧对应的时空信息,通过所述生成对抗网络模型对所述模糊图像帧进行去模糊处理,输出清晰图像帧。
在前述方面中,视频去模糊装置的组成模块还可以执行前述一方面以及各种可能的实现方式中所描述的步骤,详见前述对一方面以及各种可能的实现方式中的说明。
另一方面,本申请实施例提供一种电子装置,该电子装置包括:处理器和存储器;存储器中存储有计算机程序,处理器被设置为通过计算机程序执行上述视频去模糊方法。
另一方面,本申请实施例提供了一种非暂态计算机可读存储介质,所述计算机可读存储介质中存储有指令,当其在计算机上运行时,使得计算机执行上述各方面所述的方法。
另一方面,本申请实施例提供了一种包含指令的计算机程序产品,当其在计算机上运行时,使得计算机执行上述各方面所述的方法。
从以上技术方案可以看出,本申请实施例具有以下优点:
在本申请实施例中,首先从视频片段中获取连续的N个图像帧,在N个图像帧中包括有待处理的模糊图像帧,然后使用生成对抗网络模型对N个图像帧进行3D卷积处理,得到模糊图像帧对应的时空信息,时空信息包括:模糊图像帧的空间特征信息,以及在N个图像帧中模糊图像帧与相邻图像帧之间的时间特征信息。最后使用模糊图像帧对应的时空信息,通过生成对抗网络模型对模糊图像帧进行去模糊处理,输出清晰图像帧。本申请实施例中由于生成对抗网络模型可以采用3D卷积操作,提取隐含在连续的图像帧之间的时空信息,因此使用模糊图像帧对应的时空信息,通过生成对抗网络模型完成了对模糊图像帧的去模糊处理,因此可以得到更加真实的清晰图像,提高了视频去模糊的效果。
附图说明
为了更清楚地说明本申请实施例中的技术方案,下面将对实施例描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本申请的一些实施例,对于本领域的技术人员来讲,还可以根据这些附图获得其他的附图。
图1为本申请实施例提供的一种视频去模糊方法的流程方框示意图;
图2为本申请实施例提供通过生成网络模型对模糊图像帧进行去模糊处理的过程示意图;
图3为本申请实施例提供的生成网络模型和对抗网络模型的训练过程示意图;
图4-a为本申请实施例提供的一种视频去模糊装置的组成结构示意图;
图4-b为本申请实施例提供的另一种视频去模糊装置的组成结构示意图;
图4-c为本申请实施例提供的一种时空信息提取模块的组成结构示意图;
图4-d为本申请实施例提供的一种模型训练模块的组成结构示意图;
图5为本申请实施例提供的视频去模糊方法应用于终端的组成结构示意图;
图6为本申请实施例提供的视频去模糊方法应用于服务器的组成结构示 意图。
具体实施方式
本申请实施例提供了一种视频去模糊方法和装置,用于提高视频去模糊的效果。
为使得本申请的发明目的、特征、优点能够更加的明显和易懂,下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行清楚、完整地描述,显然,下面所描述的实施例仅仅是本申请一部分实施例,而非全部实施例。基于本申请中的实施例,本领域的技术人员所获得的所有其他实施例,都属于本申请保护的范围。
本申请的说明书和权利要求书及上述附图中的术语“包括”和“具有”以及他们的任何变形,意图在于覆盖不排他的包含,以便包含一系列单元的过程、方法、***、产品或设备不必限于那些单元,而是可包括没有清楚地列出的或对于这些过程、方法、产品或设备固有的其它单元。
以下分别进行详细说明。
本申请实施例主要提供一种基于深度学习的视频去模糊方法。本申请实施例通过深度神经网络可以完成对模糊视频的恢复,应用于对相机拍摄的视频进行去模糊处理,本申请实施例中还提供视频去模糊装置,该视频去模糊装置可以通过视频处理软件的方式部署在终端中,该视频去模糊装置也可以是存储视频的服务器。本申请实施例提供的视频去模糊方法中采用深度学习的方式来训练出生成对抗网络(Generative Adversarial Nets,GAN)模型,该生成对抗网络模型可以是通过卷积神经网络模型来实现。具体的,将每一帧图像的前后多帧图像一起送入生成对抗网络模型中,利用该生成对抗网络模型对多帧的视频进行特征提取和整合,使用生成对抗网络模型中的三维(three-dimensional,3D)卷积核进行3D卷积操作,提取隐含在连续图像帧之间的时空信息,利用全卷积操作对模糊图像帧进行等比例的清晰恢复,从而可以得到更加真实的清晰图像。本申请实施例采用的生成对抗网络模型能够有效的提取时空信息来对模糊图像帧进行处理,从而能够自动地对模糊视频进行恢复。
图1为本申请实施例提供的一种视频去模糊方法的流程方框示意图,该视频去模糊方法可以由电子设备执行,该电子设备可以是终端或服务器。下面以电子设备执行该视频去模糊方法为例进行说明,请参阅图1所示,可以包括如下步骤:
101、电子设备从视频片段中获取连续的N个图像帧,N为正整数,N个图像帧包括:待处理的模糊图像帧。
在本申请实施例中,视频片段可以是终端通过摄像头录制的一段视频,也可以是终端从网络上下载的一段视频,只要该视频片段中至少存在一帧的模糊图像,都可以通过本申请实施例提供的视频去模糊方法恢复出清晰图像。首先从该视频片段中获取连续的N个图像帧,N个图像帧中至少包括待处理的一个模糊图像帧,该模糊图像帧可以是因为拍摄设备抖动或者拍摄对象的运动而导致模糊。本申请实施例中首先获取到的连续N个图像帧可以存在待处理的一个模糊图像帧,例如该模糊图像帧可以是这连续N个图像帧的中间图像帧,例如N的取值可以为3,则该模糊图像帧可以是第2个图像帧,或者N的取值为5,则该模糊图像帧可以是第3个图像帧。N的取值为正整数,此处不做限定。
102、电子设备使用生成对抗网络模型对N个图像帧进行3D卷积处理,得到模糊图像帧对应的时空信息,时空信息包括:模糊图像帧的空间特征信息,以及在N个图像帧中模糊图像帧与相邻图像帧之间的时间特征信息。
在本申请实施例中,可以使用训练好的生成对抗网络模型来用于视频的去模糊处理,在获取到连续的N个图像帧之后,将连续的N个图像帧输入到生成对抗网络模型中,使用生成对抗网络模型中的3D卷积核进行3D卷积操作,提取隐含在连续图像帧之间的时空信息。其中,时空信息包括:模糊图像帧的空间特征信息,即空间特征信息隐藏在单帧的模糊图像中,时间特征信息是一个模糊图像帧与相邻图像帧之间的时间信息,例如通过3D卷积操作可以提取出一个模糊图像帧与该模糊图像帧之前的两个图像帧、与该模糊图像帧之后的两个图像帧的时间特征信息,本申请实施例中通过3D卷积核可以提取到时空信息,即时间特征信息和空间特征信息。因此可以有效的利用一段视频中的连续图像之间隐藏的特征信息,再结合训练好的生成对抗网络模 型,可以提高对模糊图像帧的去模糊处理效果,详见后续实施例中对视频去模糊的说明。
103、电子设备使用模糊图像帧对应的时空信息,通过生成对抗网络模型对模糊图像帧进行去模糊处理,输出清晰图像帧。
在本申请实施例中,通过生成对抗网络模型中的3D卷积核进行3D卷积操作,提取到模糊图像帧对应的时空信息之后,可以使用模糊图像帧对应的时空信息作为图像特征,通过生成对抗网络模型进行预测输出,该生成对抗网络模型的输出结果即为对模糊图像帧进行去模糊后得到的清晰图像帧。由于本申请实施例中生成对抗网络模型采用的是3D卷积操作,因此可以提取到时间特征信息以及空间特征信息,这种特征信息可以用于预测出模糊图像帧对应的清晰图像帧。
本申请实施例中主要利用3D卷积核来处理连续的视频帧,这样可以更加有效的提取隐含在连续视频帧中的时空信息,同时使用生成对抗网络模型,可以更好的保证恢复的清晰视频更加真实。
接下来对本申请实施例中生成对抗网络模型的训练过程进行举例说明。具体的,本申请实施例提供的生成对抗网络模型,包括:生成网络模型和对抗网络模型。其中,本申请实施例中生成对抗网络模型至少包括两个网络模型,其中一个是生成网络模型,另一个是判别网络模型,通过生成网络模型和判别网络模型的互相博弈学习,从而通过生成对抗网络模型产生相当好的输出。
在本申请的一些实施例中,电子设备使用生成对抗网络模型对N个图像帧进行三维3D卷积处理之前,本申请实施例提供的视频去模糊方法还包括:
A1、电子设备从视频样本库中获取连续的N个样本图像帧以及用于判别的真实清晰图像帧,N个样本图像帧包括:用于训练的样本模糊图像帧,真实清晰图像帧与样本模糊图像帧相对应;
A2、电子设备使用生成网络模型中的3D卷积核从N个样本图像帧中提取出样本模糊图像帧对应的时空信息;
A3、电子设备使用样本模糊图像帧对应的时空信息,通过生成网络模型对样本模糊图像帧进行去模糊处理,输出样本清晰图像帧;
A4、电子设备根据样本清晰图像帧和真实清晰图像帧,对生成网络模型和判别网络模型进行交替训练。
其中,本申请实施例中可以设置视频样本库用于模型的训练与判别,例如采用一段连续的N个样本图像帧用于模型训练,这里的“样本图像帧”有别于步骤101中的图像帧,该样本图像帧是视频样本库中的样本图像,在N个样本图像帧包括一个样本模糊图像帧,为了判别生成网络模型的输出效果,还提供一个真实清晰图像帧,该真实清晰图像帧与样本模糊图像帧相对应,即真实清晰图像帧是样本模糊图像帧对应的真实的清晰图像帧。
接下来显示使用生成网络模型中的3D卷积核从N个样本图像帧中提取出样本模糊图像帧对应的时空信息,该时空信息可以包括:样本模糊图像帧的空间特征信息,以及在N个样本图像帧中样本模糊图像帧与相邻图像帧之间的时间特征信息,该生成网络模型可以是卷积神经网络模型。获取到样本模糊图像帧对应的时空信息之后,接下来使用样本模糊图像帧对应的时空信息,通过生成网络模型对样本模糊图像帧进行去模糊处理,后续实施例对生成网络模型的训练过程进行详细说明,通过生成网络模型的去模糊处理可以输出样本清晰图像帧。该样本清晰图像帧是生成网络模型对样本模糊图像帧进行去模糊后输出的结果。
在生成网络模型输出样本清晰图像帧之后,根据样本清晰图像帧和真实清晰图像帧,再使用判别网络模型来判别输出的样本清晰图像帧是模糊的或者清晰的,使用判别网络模型,引入对抗损失函数,从而对生成网络模型和判别网络模型进行交替的多次训练,从而可以更好的保证恢复的清晰视频更加真实。
进一步的,在本申请的一些实施例中,生成网络模型,包括:第一3D卷积核和第二3D卷积核。在这种实现场景下,步骤A2电子设备使用生成网络模型中的3D卷积核从N个样本图像帧中提取出样本模糊图像帧对应的时空信息,包括:
A11、使用第一3D卷积核对N个样本图像帧进行卷积处理,得到样本模糊图像帧对应的低级别时空特征;
A12、使用第二3D卷积核对低级别时空特征进行卷积处理,得到样本模 糊图像帧对应的高级别时空特征;
A13、将样本模糊图像帧对应的高级别时空特征融合在一起,得到样本模糊图像帧对应的时空信息。
其中,生成网络模型中首先设置两个3D卷积层,在每个3D卷积层可以使用不同的3D卷积核,例如第一3D卷积核和第二3D卷积核具有不同的权重参数,首先使用第一3D卷积核对N个样本图像帧进行卷积处理,得到样本模糊图像帧对应的低级别时空特征,其中,低级别时空特征指不明显的特征信息,比如线条之类的特征。然后以低级别时空特征为输入条件,在下一个3D卷积层进行卷积处理,得到样本模糊图像帧对应的高级别时空特征,高级别时空特征指的是前后不同图像帧的特征信息。最后再通过这些高级别时空特征融合在一起,得到样本模糊图像帧对应的时空信息,该时空信息可以作为特征图用于生成网络模型的训练。举例说明如下,首先使用第一3D卷积核对5个样本图像帧进行卷积处理,得到3个不同维度的低级别时空特征,然后使用第二3D卷积核对低级别时空特征进行卷积处理,得到样本模糊图像帧对应的高级别时空特征,高级别时空特征融合在一起,得到样本模糊图像帧对应的时空信息,由于送入生成网络模型的是5帧图像,然后进行2次的3D卷积,此时就会输出一帧的特征图,即经过两次的3D卷积,时间序列的通道数由5变成1。
进一步的,在本申请的一些实施例中,生成网络模型,还包括:M个2D卷积核,M为正整数。步骤A3电子设备使用样本模糊图像帧对应的时空信息,通过生成网络模型对样本模糊图像帧进行去模糊处理,输出样本清晰图像帧包括:
A31、使用M个2D卷积核中的各个2D卷积核依次对样本模糊图像帧对应的时空信息进行卷积处理,经过M个2D卷积核中的最后一个2D卷积核进行卷积处理之后,得到样本清晰图像帧。
其中,本申请实施例中生成网路模型不仅具有两个3D卷积核,还可以有多个2D卷积核,通过多个2D卷积核依次对样本模糊图像帧对应的时空信息进行卷积处理,经过M个2D卷积核中的最后一个2D卷积核进行卷积处理之后,得到样本清晰图像帧。2D卷积核的详细实现过程可以参阅后续实施例中 的说明。
在本申请的一些实施例中,M个2D卷积核中的奇数2D卷积核包括:卷积层、归一化层和激活函数,M个2D卷积核中的偶数2D卷积核包括:卷积层和激活函数。
其中,对于每个2D卷积核的实现方式,可以结合应用场景,其中,奇数2D卷积核指的是M个2D卷积核中的第1个2D卷积核、第3个2D卷积核等,奇数2D卷积核可以包括:卷积层、归一化层和激活函数(Rectified Linear Units,ReLu)。M个2D卷积核中的偶数2D卷积核指的是M个卷积核中的第2个2D卷积核、第4个2D卷积核等,偶数2D卷积核包括:卷积层和激活函数。对于归一化层和激活函数的详细计算过程,可以参阅卷积神经网络中的说明,此处不再赘述。
在本申请的一些实施例中,步骤A4电子设备根据样本清晰图像帧和真实清晰图像帧,对生成网络模型和判别网络模型进行交替训练,包括:
A41、根据样本清晰图像帧和真实清晰图像帧获取重构损失函数;
A42、通过重构损失函数训练生成网络模型;
A43、使用真实清晰图像帧和样本清晰图像帧训练判别网络模型,得到判别网络模型输出的对抗损失函数;
A44、通过对抗损失函数继续训练生成网络模型。
其中,为了得到更加真实的去模糊视频,在训练生成网络模型的时候,还可以引入判别网络模型,在训练的过程中,首先训练生成网络模型,将样本模糊图像帧送入生成网络模型中,得到生成的样本清晰图像帧,与真实清晰图像帧进行比较,得到重构损失函数,通过重构损失函数调整生成网络模型的权重参数。之后训练判别网络模型,将真实清晰视频与生成的样本清晰视频进行送入判别网络模型,得到对抗损失函数,通过对抗损失函数调整生成网络模型,使判别网络模型具有判断真实清晰图像与从模糊图像帧生成的清晰图像的能力,以此完成交替训练两个网络模型的结构。
进一步的,在本申请的一些实施例中,步骤A4根据样本清晰图像帧和真实清晰图像帧,对生成网络模型和判别网络模型进行交替训练,除了包括前述步骤A41至步骤A44之外,还可以包括如下步骤:
A45、通过对抗损失函数继续训练生成网络模型之后,通过生成网络模型重新获取重构损失函数,以及通过判别网络模型重新获取对抗损失函数;
A46、对重新获取的重构损失函数和重新获取的对抗损失函数进行加权融合,得到融合后的损失函数;
A47、通过融合后的损失函数继续训练生成网络模型。
其中,通过前述步骤A41至步骤A44训练生成网络模型和判别网络模型之后,基于初次训练后的两个网络模型,执行步骤A45至步骤A47,再训练生成网络模型的时候,使用两种损失函数一起调整生成网络模型的结构,使图像既可以在像素层面上与真实清晰图像相似,同时在整体上看起来更像是清晰的图像。两个损失函数之间可以通过一个权重参数来进行联合,该权重可以用于控制两种损失函数作用于反馈调节的作用大小。生成网络模型的作用是为了通过模糊的视频生成清晰的视频,而判别网络模型的作用是为了分辨送入的视频帧是真实的清晰图像还是生成的清晰图像。通过本申请实施例提供的对抗学习,判别网络模型的判别能力越来越强,同时,生成网络模型的生成视频也越来越真实。
通过以上实施例对本申请实施例的描述可知,首先从视频片段中获取连续的N个图像帧,在N个图像帧中包括有待处理的模糊图像帧,然后使用生成对抗网络模型对N个图像帧进行3D卷积处理,得到模糊图像帧对应的时空信息,时空信息包括:模糊图像帧的空间特征信息,以及在N个图像帧中模糊图像帧与相邻图像帧之间的时间特征信息。最后使用模糊图像帧对应的时空信息,通过生成对抗网络模型对模糊图像帧进行去模糊处理,输出清晰图像帧。本申请实施例中由于生成对抗网络模型可以采用3D卷积操作,提取隐含在连续的图像帧之间的时空信息,因此使用模糊图像帧对应的时空信息,通过生成对抗网络模型完成了对模糊图像帧的去模糊处理,因此可以得到更加真实的清晰图像,提高了视频去模糊的效果。
为便于更好的理解和实施本申请实施例的上述方案,下面举例相应的应用场景来进行具体说明。
本申请实施例提供的视频去模糊方法可以提供视频去模糊服务。当手机或者数码相机进行视频拍摄的时候,因为设备抖动以及拍摄物体的运动而导 致模糊,将本申请实施例提供的视频去模糊方法应用于手机和数码相机中,可以使拍摄的模糊视频变得更加清晰。此外,本申请实施例提供的视频去模糊方法可部署在后台服务器中,当用户上传一些自己拍摄的存在模糊的视频的时候,使用本申请实施例提供的视频去模糊方法,将用户的视频变得更加清晰。
本申请实施例提供的视频去模糊方法采用端到端的视频处理方法,包含对视频帧的预处理,提取视频帧的低级别时空信息,再提取视频帧的高级别时空信息,使用两种损失函数进行模型训练,最终使用得到的模型重构出清晰的视频。本方法的具体流程图见图1。
如图2所示,为本申请实施例提供通过生成网络模型对模糊图像帧进行去模糊处理的过程示意图。具体方案如下:对于长度为T秒的视频,选择相邻的5帧图像作为输入,前面两个卷积层使用3D卷积操作,提取相邻视频帧中存在的时空信息,当进行两次3D卷积操作之后,因为送入的是5帧,进行两次3D卷积操作之后,可以把特征更好的融合在一起,因此时间序列的通道数由5变成1,之后使用33个2D卷积核对图像进行特征提取和图像重构操作。通过前述的3D卷积操作,时间信息已经融合到空间信息里面即得到了时空信息,因此再使用2D卷积操作,每次卷积之后,使用归一化层以及ReLU激活函数对输出进行处理,从第3层到第32层,奇数层的卷积操作之后紧跟BN归一化层和ReLU激活函数,偶数层的卷积操作之后跟着BN归一化层,第33到34层卷积操作后再使用ReLU函数,在经过第35层的卷积操作之后,得到最后的清晰视频帧,整个操作使用全卷积操作,即全卷积是指没有使用全链接层,因为图像无需进行上采样和下采样操作。在训练的时候,本申请实施例中还可以引入对抗网络结构,本申请实施例使用了重构损失函数和对抗损失函数,生成网络会自适应调节,使图像变得清晰,因此得到的视频更加真实。
接下来对本申请实施例中3D卷积的计算过程进行举例说明,本方案中使用的视频去模糊操作的方案主要是基于卷积神经网络的方法,利用3D卷积可以提取时空特征的优势提取隐含在相邻视频中的时空信息,进而对模糊视频进行重构而得到清晰视频。3D卷积的操作为:
Figure PCTCN2019081702-appb-000001
其中,
Figure PCTCN2019081702-appb-000002
是第i层的第j个特征层在位置(x,y,z)位置的时空特征值,(P i,Q j,R r)是3D卷积核的尺寸,Q j代表时间维度,σ(·)代表ReLU函数。整个卷积网络操作过程如图2所示。b是偏置函数,g是网络权重,v是送入的特征图里面的特征值,m是一共送入了几次图像。
如图3所示,为本申请实施例提供的生成网络模型和对抗网络模型的训练过程示意图。判别网络模型(简称为判别网络)和生成网络模型(简称为生成网络)在一起构成对抗网络。两者之间进行对抗。为了得到更加真实的去模糊视频,在训练图2所示的生成网络模型的时候,引入了对抗网络结构,将图2的网络结构作为生成器(即生成网络模型),同时增加一个判别器(即判别网络模型)。在训练的过程中,首先训练生成网络,将模糊的视频帧送入生成网络中,得到生成的清晰视频帧,与真实的视频帧进行比较,得到重构损失函数(即图3中的损失函数1),通过该损失调整生成网络的权重参数。之后训练判别网络,将真实的清晰视频与生成的清晰视频进行送入判别网络,得到对抗损失函数(即图3中的损失函数2),通过对抗损失函数调整生成网络结构,使判别网络具有判断真实清晰图像与从模糊图像生成的清晰图像的能力。交替训练两个网络结构,在之后训练生成网络的时候,使用两种损失一起调整网络结构,使图像既可以在像素层面上与真实清晰图像相似,同时在整体上看起来更像是清晰的图像。两个损失函数之间通过一个权重参数来进行联合,该权重可以用于控制两种损失函数作用于反馈调节的作用大小。生成网络的作用是为了通过模糊的视频生成清晰的视频,而判别网络的作用是为了分辨送入的视频帧是真实的清晰图像还是生成的清晰视频帧。通过对抗学习,判别网络的判别能力越来越强,同时,生成网络的生成视频也越来越真实。
接下来对两种不同损失函数的加权融合进行举例说明,由于本申请实施例中使用两个网络,即生成网络和判别网络,所以本申请实施例使用了两个损失函数,即基于像素差值的损失(content loss)函数(即重构损失函数)以及对抗损失(adversarial loss)函数。
首先,基于像素差值的损失函数为:
Figure PCTCN2019081702-appb-000003
其中,W和H代表视频帧的长和宽,
Figure PCTCN2019081702-appb-000004
是真实清晰视频帧在位置(x,y)位置上的像素值,G(I blurry) x,y是对应位置的生成视频帧的值。
对抗损失函数为:
L adversarial=log(1-D(G(I blurry)))。
其中,G(I blurry)是判别网络认为的生成的视频帧是真实视频的可能性,D表示判别网络。
两个损失函数通过如下的公式进行结合:
L=L content+a·L adversarial
其中,a代表两者的权重,在实验过程中,本申请实施例中可以将a设置成0.0002的时候,效果比较好。通过这个公式,生成网络可以进行参数调节,得到更好的生成网络。
本申请实施例提供的方法能够提高现有的视频去模糊能力,能够自动地对视频进行去模糊操作,可以用于手机或者数码相机等设备拍摄视频之后的后续处理,也可以用于网络后台服务器对用户上传的视频进行去模糊处理,从而得到更为清晰的视频。
需要说明的是,对于前述的各方法实施例,为了简单描述,故将其都表述为一系列的动作组合,但是本领域技术人员应该知悉,本申请实施例并不受所描述的动作顺序的限制,因为依据本申请实施例,某些步骤可以采用其他顺序或者同时进行。其次,本领域技术人员也应该知悉,说明书中所描述的实施例均属于优选实施例,所涉及的动作和模块并不一定是本申请实施例所必须的。
为便于更好的实施本申请实施例的上述方案,下面还提供用于实施上述方案的相关装置。
图4-a为本申请实施例提供的一种视频去模糊装置的组成结构示意图。请参阅图4-a所示,本申请实施例提供的一种视频去模糊装置400,该装置包括一个或多个处理器,以及一个或多个存储程序单元的存储器,其中,程序 单元由处理器执行,程序单元可以包括:获取模块401、时空信息提取模块402、去模糊处理模块403,其中,
获取模块401,被设置为从视频片段中获取连续的N个图像帧,所述N为正整数,所述N个图像帧包括:待处理的模糊图像帧;
时空信息提取模块402,被设置为使用生成对抗网络模型对所述N个图像帧进行三维3D卷积处理,得到所述模糊图像帧对应的时空信息,所述时空信息包括:所述模糊图像帧的空间特征信息,以及在所述N个图像帧中所述模糊图像帧与相邻图像帧之间的时间特征信息;
去模糊处理模块403,被设置为使用所述模糊图像帧对应的时空信息,通过所述生成对抗网络模型对所述模糊图像帧进行去模糊处理,输出清晰图像帧。
在本申请的一些实施例中,所述生成对抗网络模型,包括:生成网络模型和对抗网络模型。图4-b为本申请实施例提供的另一种视频去模糊装置的组成结构示意图。如图4-b所示,所述程序单元还包括:模型训练模块404,其中,
所述获取模块401,还被设置为从视频样本库中获取连续的N个样本图像帧以及用于判别的真实清晰图像帧,所述N个样本图像帧包括:用于训练的样本模糊图像帧,所述真实清晰图像帧与所述样本模糊图像帧相对应;
所述时空信息提取模块402,还被设置为使用所述生成网络模型中的3D卷积核从所述N个样本图像帧中提取出所述样本模糊图像帧对应的时空信息;
所述去模糊处理模块403,还被设置为使用所述样本模糊图像帧对应的时空信息,通过所述生成网络模型对所述样本模糊图像帧进行去模糊处理,输出样本清晰图像帧;
所述模型训练模块404,被设置为根据所述样本清晰图像帧和所述真实清晰图像帧,对所述生成网络模型和所述判别网络模型进行交替训练。
在本申请的一些实施例中,所述生成网络模型,包括:第一3D卷积核和第二3D卷积核。图4-c为本申请实施例提供的一种时空信息提取模块的组成结构示意图。如图4-c所示,所述时空信息提取模块402,包括:
第一卷积单元4021,被设置为使用所述第一3D卷积核对所述N个样本图 像帧进行卷积处理,得到所述样本模糊图像帧对应的低级别时空特征;
第二卷积单元4022,被设置为使用所述第二3D卷积核对所述低级别时空特征进行卷积处理,得到所述样本模糊图像帧对应的高级别时空特征;
时空特征融合单元4023,被设置为将所述样本模糊图像帧对应的高级别时空特征融合在一起,得到所述样本模糊图像帧对应的时空信息。
在本申请的一些实施例中,所述生成网络模型,还包括:M个2D卷积核,所述M为正整数。所述去模糊处理模块403,具体被设置为使用所述M个2D卷积核中的各个2D卷积核依次对所述样本模糊图像帧对应的时空信息进行卷积处理,经过所述M个2D卷积核中的最后一个2D卷积核进行卷积处理之后,得到所述样本清晰图像帧。
在本申请的一些实施例中,所述M个2D卷积核中的奇数2D卷积核包括:卷积层、归一化层和激活函数,所述M个2D卷积核中的偶数2D卷积核包括:卷积层和激活函数。
图4-d为本申请实施例提供的一种模型训练模块的组成结构示意图。在本申请的一些实施例中,如图4-d所示,所述模型训练模块404,包括:
损失函数获取单元4041,被设置为根据所述样本清晰图像帧和所述真实清晰图像帧获取重构损失函数;
生成网络模型训练单元4042,被设置为通过所述重构损失函数训练所述生成网络模型;
判别网络模型训练单元4043,被设置为使用所述真实清晰图像帧和所述样本清晰图像帧训练所述判别网络模型,得到所述判别网络模型输出的对抗损失函数;
所述生成网络模型训练单元4042,还被设置为通过所述对抗损失函数继续训练所述生成网络模型。
进一步的,在本申请的一些实施例中,所述损失函数获取单元4041,还被设置为通过所述对抗损失函数继续训练所述生成网络模型之后,通过所述生成网络模型重新获取重构损失函数;
所述判别网络模型训练单元4043,还被设置为通过所述判别网络模型重新获取对抗损失函数;
所述损失函数获取单元4041,还被设置为对重新获取的重构损失函数和重新获取的对抗损失函数进行加权融合,得到融合后的损失函数;
所述生成网络模型训练单元4042,还被设置为通过所述融合后的损失函数继续训练所述生成网络模型。
通过以上对本申请实施例的描述可知,首先从视频片段中获取连续的N个图像帧,在N个图像帧中包括有待处理的模糊图像帧,然后使用生成对抗网络模型对N个图像帧进行3D卷积处理,得到模糊图像帧对应的时空信息,时空信息包括:模糊图像帧的空间特征信息,以及在N个图像帧中模糊图像帧与相邻图像帧之间的时间特征信息。最后使用模糊图像帧对应的时空信息,通过生成对抗网络模型对模糊图像帧进行去模糊处理,输出清晰图像帧。本申请实施例中由于生成对抗网络模型可以采用3D卷积操作,提取隐含在连续的图像帧之间的时空信息,因此使用模糊图像帧对应的时空信息,通过生成对抗网络模型完成了对模糊图像帧的去模糊处理,因此可以得到更加真实的清晰图像,提高了视频去模糊的效果。
本申请实施例还提供了另一种终端,图5示出的是本申请实施例提供的视频去模糊方法应用于终端的组成结构示意图,该终端可以是手机。如图5所示,为了便于说明,仅示出了与本申请实施例相关的部分,具体技术细节未揭示的,请参照本申请实施例方法部分。该终端可以为包括手机、平板电脑、个人数字助理(Personal Digital Assistant,PDA)、销售终端(Point of Sales,POS)、车载电脑等任意终端设备,以终端为手机为例:
参考图5,手机包括:射频(Radio Frequency,RF)电路1010、存储器1020、输入单元1030、显示单元1040、传感器1050、音频电路1060、无线保真(wireless fidelity,WiFi)模块1070、处理器1080、以及电源1090等部件。本领域技术人员可以理解,图5中示出的手机结构并不构成对手机的限定,可以包括比图示更多或更少的部件,或者组合某些部件,或者不同的部件布置。
下面结合图5对手机的各个构成部件进行具体的介绍:
RF电路1010可被设置为收发信息或通话过程中,信号的接收和发送,特别地,将基站的下行信息接收后,给处理器1080处理;另外,将设计上行的 数据发送给基站。通常,RF电路1010包括但不限于天线、至少一个放大器、收发信机、耦合器、低噪声放大器(Low Noise Amplifier,LNA)、双工器等。此外,RF电路1010还可以通过无线通信与网络和其他设备通信。上述无线通信可以使用任一通信标准或协议,包括但不限于全球移动通讯***(Global System of Mobile communication,GSM)、通用分组无线服务(General Packet Radio Service,GPRS)、码分多址(Code Division Multiple Access,CDMA)、宽带码分多址(Wideband Code Division Multiple Access,WCDMA)、长期演进(Long Term Evolution,LTE)、电子邮件、短消息服务(Short Messaging Service,SMS)等。
存储器1020可被设置为存储软件程序以及模块,处理器1080通过运行存储在存储器1020的软件程序以及模块,从而执行手机的各种功能应用以及数据处理。存储器1020可主要包括存储程序区和存储数据区,其中,存储程序区可存储操作***、至少一个功能所需的应用程序(比如声音播放功能、图像播放功能等)等;存储数据区可存储根据手机的使用所创建的数据(比如音频数据、电话本等)等。此外,存储器1020可以包括高速随机存取存储器,还可以包括非易失性存储器,例如至少一个磁盘存储器件、闪存器件、或其他易失性固态存储器件。
输入单元1030可被设置为接收输入的数字或字符信息,以及产生与手机的用户设置以及功能控制有关的键信号输入。具体地,输入单元1030可包括触控面板1031以及其他输入设备1032。触控面板1031,也称为触摸屏,可收集用户在其上或附近的触摸操作(比如用户使用手指、触笔等任何适合的物体或附件在触控面板1031上或在触控面板1031附近的操作),并根据预先设定的程式驱动相应的连接装置。可选的,触控面板1031可包括触摸检测装置和触摸控制器两个部分。其中,触摸检测装置检测用户的触摸方位,并检测触摸操作带来的信号,将信号传送给触摸控制器;触摸控制器从触摸检测装置上接收触摸信息,并将它转换成触点坐标,再送给处理器1080,并能接收处理器1080发来的命令并加以执行。此外,可以采用电阻式、电容式、红外线以及表面声波等多种类型实现触控面板1031。除了触控面板1031,输入单元1030还可以包括其他输入设备1032。具体地,其他输入设备1032可以 包括但不限于物理键盘、功能键(比如音量控制按键、开关按键等)、轨迹球、鼠标、操作杆等中的一种或多种。
显示单元1040可被设置为显示由用户输入的信息或提供给用户的信息以及手机的各种菜单。显示单元1040可包括显示面板1041,可选的,可以采用液晶显示器(Liquid Crystal Display,LCD)、有机发光二极管(Organic Light-Emitting Diode,OLED)等形式来配置显示面板1041。进一步的,触控面板1031可覆盖显示面板1041,当触控面板1031检测到在其上或附近的触摸操作后,传送给处理器1080以确定触摸事件的类型,随后处理器1080根据触摸事件的类型在显示面板1041上提供相应的视觉输出。虽然在图5中,触控面板1031与显示面板1041是作为两个独立的部件来实现手机的输入和输入功能,但是在某些实施例中,可以将触控面板1031与显示面板1041集成而实现手机的输入和输出功能。
手机还可包括至少一种传感器1050,比如光传感器、运动传感器以及其他传感器。具体地,光传感器可包括环境光传感器及接近传感器,其中,环境光传感器可根据环境光线的明暗来调节显示面板1041的亮度,接近传感器可在手机移动到耳边时,关闭显示面板1041和/或背光。作为运动传感器的一种,加速计传感器可检测各个方向上(一般为三轴)加速度的大小,静止时可检测出重力的大小及方向,可用于识别手机姿态的应用(比如横竖屏切换、相关游戏、磁力计姿态校准)、振动识别相关功能(比如计步器、敲击)等;至于手机还可配置的陀螺仪、气压计、湿度计、温度计、红外线传感器等其他传感器,在此不再赘述。
音频电路1060、扬声器1061,传声器1062可提供用户与手机之间的音频接口。音频电路1060可将接收到的音频数据转换后的电信号,传输到扬声器1061,由扬声器1061转换为声音信号输出样本清晰图像帧;另一方面,传声器1062将收集的声音信号转换为电信号,由音频电路1060接收后转换为音频数据,再将音频数据输出处理器1080处理后,经RF电路1010以发送给比如另一手机,或者将音频数据输出至存储器1020以便进一步处理。
WiFi属于短距离无线传输技术,手机通过WiFi模块1070可以帮助用户收发电子邮件、浏览网页和访问流式媒体等,它为用户提供了无线的宽带互 联网访问。虽然图5示出了WiFi模块1070,但是可以理解的是,其并不属于手机的必须构成,完全可以根据需要在不改变发明的本质的范围内而省略。
处理器1080是手机的控制中心,利用各种接口和线路连接整个手机的各个部分,通过运行或执行存储在存储器1020内的软件程序和/或模块,以及调用存储在存储器1020内的数据,执行手机的各种功能和处理数据,从而对手机进行整体监控。可选的,处理器1080可包括一个或多个处理单元;优选的,处理器1080可集成应用处理器和调制解调处理器,其中,应用处理器主要处理操作***、用户界面和应用程序等,调制解调处理器主要处理无线通信。可以理解的是,上述调制解调处理器也可以不集成到处理器1080中。
手机还包括给各个部件供电的电源1090(比如电池),优选的,电源可以通过电源管理***与处理器1080逻辑相连,从而通过电源管理***实现管理充电、放电、以及功耗管理等功能。
尽管未示出,手机还可以包括摄像头、蓝牙模块等,在此不再赘述。
在本申请实施例中,该终端所包括的处理器1080还具有控制执行以上由终端执行的视频去模糊方法流程。
图6是本申请实施例提供的视频去模糊方法应用于服务器的组成结构示意图,该服务器1100可因配置或性能不同而产生比较大的差异,可以包括一个或一个以***处理器(central processing units,CPU)1122(例如,一个或一个以上处理器)和存储器1132,一个或一个以上存储应用程序1142或数据1144的存储介质1130(例如一个或一个以上海量存储设备)。其中,存储器1132和存储介质1130可以是短暂存储或持久存储。存储在存储介质1130的程序可以包括一个或一个以上模块(图示没标出),每个模块可以包括对服务器中的一系列指令操作。更进一步地,中央处理器1122可以设置为与存储介质1130通信,在服务器1100上执行存储介质1130中的一系列指令操作。
服务器1100还可以包括一个或一个以上电源1126,一个或一个以上有线或无线网络接口1150,一个或一个以上输入输出接口1158,和/或,一个或一个以上操作***1141,例如Windows ServerTM,Mac OS XTM,UnixTM,LinuxTM,FreeBSDTM等等。
上述实施例中由服务器所执行的视频去模糊方法的步骤可以基于该图6所示的服务器结构。
另外需说明的是,以上所描述的装置实施例仅仅是示意性的,其中所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部模块来实现本实施例方案的目的。另外,本申请提供的装置实施例附图中,模块之间的连接关系表示它们之间具有通信连接,具体可以实现为一条或多条通信总线或信号线。本领域普通技术人员在不付出创造性劳动的情况下,即可以理解并实施。
通过以上的实施方式的描述,所属领域的技术人员可以清楚地了解到本申请实施例可借助软件加必需的通用硬件的方式来实现,当然也可以通过专用硬件包括专用集成电路、专用CPU、专用存储器、专用元器件等来实现。一般情况下,凡由计算机程序完成的功能都可以很容易地用相应的硬件来实现,而且,用来实现同一功能的具体硬件结构也可以是多种多样的,例如模拟电路、数字电路或专用电路等。但是,对本申请实施例而言更多情况下软件程序实现是更佳的实施方式。基于这样的理解,本申请实施例的技术方案本质上或者说对相关技术做出贡献的部分可以以软件产品的形式体现出来,该计算机软件产品存储在可读取的存储介质中,如计算机的软盘、U盘、移动硬盘、只读存储器(Read-Only Memory,ROM)、随机存取存储器(Random Access Memory,RAM)、磁碟或者光盘等,包括若干指令用以使得一台计算机设备(可以是个人计算机,服务器,或者网络设备等)执行本申请各个实施例所述的方法。
综上所述,以上实施例仅用以说明本申请实施例的技术方案,而非对其限制;尽管参照上述实施例对本申请进行了详细的说明,本领域的普通技术人员应当理解:其依然可以对上述各实施例所记载的技术方案进行修改,或者对其中部分技术特征进行等同替换;而这些修改或者替换,并不使相应技术方案的本质脱离本申请各实施例技术方案的精神和范围。
工业实用性
在本申请实施例中,首先从视频片段中获取连续的N个图像帧,在N个图像帧中包括有待处理的模糊图像帧,然后使用生成对抗网络模型对N个图像帧进行3D卷积处理,得到模糊图像帧对应的时空信息,时空信息包括:模糊图像帧的空间特征信息,以及在N个图像帧中模糊图像帧与相邻图像帧之间的时间特征信息。最后使用模糊图像帧对应的时空信息,通过生成对抗网络模型对模糊图像帧进行去模糊处理,输出清晰图像帧。本申请实施例中由于生成对抗网络模型可以采用3D卷积操作,提取隐含在连续的图像帧之间的时空信息,因此使用模糊图像帧对应的时空信息,通过生成对抗网络模型完成了对模糊图像帧的去模糊处理,因此可以得到更加真实的清晰图像,提高了视频去模糊的效果。

Claims (16)

  1. 一种视频去模糊方法,包括:
    电子设备从视频片段中获取连续的N个图像帧,所述N为正整数,所述N个图像帧包括:待处理的模糊图像帧;
    所述电子设备使用生成对抗网络模型对所述N个图像帧进行三维3D卷积处理,得到所述模糊图像帧对应的时空信息,所述时空信息包括:所述模糊图像帧的空间特征信息,以及在所述N个图像帧中所述模糊图像帧与相邻图像帧之间的时间特征信息;
    所述电子设备使用所述模糊图像帧对应的时空信息,通过所述生成对抗网络模型对所述模糊图像帧进行去模糊处理,输出清晰图像帧。
  2. 根据权利要求1所述的方法,其中,所述生成对抗网络模型,包括:生成网络模型和对抗网络模型;
    所述电子设备使用生成对抗网络模型对所述N个图像帧进行三维3D卷积处理之前,所述方法还包括:
    所述电子设备从视频样本库中获取连续的N个样本图像帧以及用于判别的真实清晰图像帧,所述N个样本图像帧包括:用于训练的样本模糊图像帧,所述真实清晰图像帧与所述样本模糊图像帧相对应;
    所述电子设备使用所述生成网络模型中的3D卷积核从所述N个样本图像帧中提取出所述样本模糊图像帧对应的时空信息;
    所述电子设备使用所述样本模糊图像帧对应的时空信息,通过所述生成网络模型对所述样本模糊图像帧进行去模糊处理,输出样本清晰图像帧;
    所述电子设备根据所述样本清晰图像帧和所述真实清晰图像帧,对所述生成网络模型和所述判别网络模型进行交替训练。
  3. 根据权利要求2所述的方法,其中,所述生成网络模型,包括:第一3D卷积核和第二3D卷积核;
    所述电子设备使用所述生成网络模型中的3D卷积核从所述N个样本图像帧中提取出所述样本模糊图像帧对应的时空信息,包括:
    使用所述第一3D卷积核对所述N个样本图像帧进行卷积处理,得到所述样本模糊图像帧对应的低级别时空特征;
    使用所述第二3D卷积核对所述低级别时空特征进行卷积处理,得到所述 样本模糊图像帧对应的高级别时空特征;
    将所述样本模糊图像帧对应的高级别时空特征融合在一起,得到所述样本模糊图像帧对应的时空信息。
  4. 根据权利要求2所述的方法,其中,所述生成网络模型,还包括:M个2D卷积核,所述M为正整数;
    所述电子设备使用所述样本模糊图像帧对应的时空信息,通过所述生成网络模型对所述样本模糊图像帧进行去模糊处理,输出样本清晰图像帧包括:
    使用所述M个2D卷积核中的各个2D卷积核依次对所述样本模糊图像帧对应的时空信息进行卷积处理,经过所述M个2D卷积核中的最后一个2D卷积核进行卷积处理之后,得到所述样本清晰图像帧。
  5. 根据权利要求4所述的方法,其中,所述M个2D卷积核中的奇数2D卷积核包括:卷积层、归一化层和激活函数,所述M个2D卷积核中的偶数2D卷积核包括:卷积层和激活函数。
  6. 根据权利要求2至5中任一项所述的方法,其中,所述电子设备根据所述样本清晰图像帧和所述真实清晰图像帧,对所述生成网络模型和所述判别网络模型进行交替训练,包括:
    根据所述样本清晰图像帧和所述真实清晰图像帧获取重构损失函数;
    通过所述重构损失函数训练所述生成网络模型;
    使用所述真实清晰图像帧和所述样本清晰图像帧训练所述判别网络模型,得到所述判别网络模型输出的对抗损失函数;
    通过所述对抗损失函数继续训练所述生成网络模型。
  7. 根据权利要求6所述的方法,其中,所述电子设备根据所述样本清晰图像帧和所述真实清晰图像帧,对所述生成网络模型和所述判别网络模型进行交替训练,还包括:
    所述通过所述对抗损失函数继续训练所述生成网络模型之后,通过所述生成网络模型重新获取重构损失函数,以及通过所述判别网络模型重新获取对抗损失函数;
    对重新获取的重构损失函数和重新获取的对抗损失函数进行加权融合,得到融合后的损失函数;
    通过所述融合后的损失函数继续训练所述生成网络模型。
  8. 一种视频去模糊装置,包括一个或多个处理器,以及一个或多个存储程序单元的存储器,其中,所述程序单元由所述处理器执行,所述程序单元包括:
    获取模块,被设置为从视频片段中获取连续的N个图像帧,所述N为正整数,所述N个图像帧包括:待处理的模糊图像帧;
    时空信息提取模块,被设置为使用生成对抗网络模型对所述N个图像帧进行三维3D卷积处理,得到所述模糊图像帧对应的时空信息,所述时空信息包括:所述模糊图像帧的空间特征信息,以及在所述N个图像帧中所述模糊图像帧与相邻图像帧之间的时间特征信息;
    去模糊处理模块,被设置为使用所述模糊图像帧对应的时空信息,通过所述生成对抗网络模型对所述模糊图像帧进行去模糊处理,输出清晰图像帧。
  9. 根据权利要求8所述的装置,其中,所述生成对抗网络模型,包括:生成网络模型和对抗网络模型;
    所述视频去模糊装置还包括:模型训练模块,其中,
    所述获取模块,还被设置为从视频样本库中获取连续的N个样本图像帧以及用于判别的真实清晰图像帧,所述N个样本图像帧包括:用于训练的样本模糊图像帧,所述真实清晰图像帧与所述样本模糊图像帧相对应;
    所述时空信息提取模块,还被设置为使用所述生成网络模型中的3D卷积核从所述N个样本图像帧中提取出所述样本模糊图像帧对应的时空信息;
    所述去模糊处理模块,还被设置为使用所述样本模糊图像帧对应的时空信息,通过所述生成网络模型对所述样本模糊图像帧进行去模糊处理,输出样本清晰图像帧;
    所述模型训练模块,被设置为根据所述样本清晰图像帧和所述真实清晰图像帧,对所述生成网络模型和所述判别网络模型进行交替训练。
  10. 根据权利要求9所述的装置,其中,所述生成网络模型,包括:第一3D卷积核和第二3D卷积核;
    所述时空信息提取模块,包括:
    第一卷积单元,被设置为使用所述第一3D卷积核对所述N个样本图像帧 进行卷积处理,得到所述样本模糊图像帧对应的低级别时空特征;
    第二卷积单元,被设置为使用所述第二3D卷积核对所述低级别时空特征进行卷积处理,得到所述样本模糊图像帧对应的高级别时空特征;
    时空特征融合单元,被设置为将所述样本模糊图像帧对应的高级别时空特征融合在一起,得到所述样本模糊图像帧对应的时空信息。
  11. 根据权利要求9所述的装置,其中,所述生成网络模型,还包括:M个2D卷积核,所述M为正整数;
    所述去模糊处理模块,具体被设置为使用所述M个2D卷积核中的各个2D卷积核依次对所述样本模糊图像帧对应的时空信息进行卷积处理,经过所述M个2D卷积核中的最后一个2D卷积核进行卷积处理之后,得到所述样本清晰图像帧。
  12. 根据权利要求11所述的装置,其中,所述M个2D卷积核中的奇数2D卷积核包括:卷积层、归一化层和激活函数,所述M个2D卷积核中的偶数2D卷积核包括:卷积层和激活函数。
  13. 根据权利要求9至12中任一项所述的装置,其中,所述模型训练模块,包括:
    损失函数获取单元,被设置为根据所述样本清晰图像帧和所述真实清晰图像帧获取重构损失函数;
    生成网络模型训练单元,被设置为通过所述重构损失函数训练所述生成网络模型;
    判别网络模型训练单元,被设置为使用所述真实清晰图像帧和所述样本清晰图像帧训练所述判别网络模型,得到所述判别网络模型输出的对抗损失函数;
    所述生成网络模型训练单元,还被设置为通过所述对抗损失函数继续训练所述生成网络模型。
  14. 根据权利要求13所述的装置,其中,
    所述损失函数获取单元,还被设置为通过所述对抗损失函数继续训练所述生成网络模型之后,通过所述生成网络模型重新获取重构损失函数;
    所述判别网络模型训练单元,还被设置为通过所述判别网络模型重新获 取对抗损失函数;
    所述损失函数获取单元,还被设置为对重新获取的重构损失函数和重新获取的对抗损失函数进行加权融合,得到融合后的损失函数;
    所述生成网络模型训练单元,还用通过所述融合后的损失函数继续训练所述生成网络模型。
  15. 一种非暂态计算机可读存储介质,包括指令,当其在计算机上运行时,使得计算机执行如权利要求1至7任意一项所述的方法。
  16. 一种电子装置,包括存储器和处理器,所述存储器中存储有计算机程序,所述处理器被设置为通过所述计算机程序执行所述权利要求1至7任一项中所述的方法。
PCT/CN2019/081702 2018-05-09 2019-04-08 一种视频去模糊方法、装置、存储介质和电子装置 WO2019214381A1 (zh)

Priority Applications (2)

Application Number Priority Date Filing Date Title
EP19799523.6A EP3792869A4 (en) 2018-05-09 2019-04-08 METHOD AND APPARATUS FOR VIDEO BLUR REMOVAL, INFORMATION MEDIA AND ELECTRONIC APPARATUS
US16/993,922 US11688043B2 (en) 2018-05-09 2020-08-14 Video deblurring method and apparatus, storage medium, and electronic apparatus

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201810438831.5 2018-05-09
CN201810438831.5A CN110473147A (zh) 2018-05-09 2018-05-09 一种视频去模糊方法和装置

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US16/993,922 Continuation US11688043B2 (en) 2018-05-09 2020-08-14 Video deblurring method and apparatus, storage medium, and electronic apparatus

Publications (1)

Publication Number Publication Date
WO2019214381A1 true WO2019214381A1 (zh) 2019-11-14

Family

ID=68467114

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2019/081702 WO2019214381A1 (zh) 2018-05-09 2019-04-08 一种视频去模糊方法、装置、存储介质和电子装置

Country Status (4)

Country Link
US (1) US11688043B2 (zh)
EP (1) EP3792869A4 (zh)
CN (1) CN110473147A (zh)
WO (1) WO2019214381A1 (zh)

Cited By (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111368941A (zh) * 2020-04-10 2020-07-03 浙江大华技术股份有限公司 一种图像处理方法、装置以及计算机存储介质
CN111738913A (zh) * 2020-06-30 2020-10-02 北京百度网讯科技有限公司 视频填充方法、装置、设备及存储介质
CN111914785A (zh) * 2020-08-10 2020-11-10 北京小米松果电子有限公司 一种提高人脸图像清晰度的方法、装置及存储介质
CN111951168A (zh) * 2020-08-25 2020-11-17 Oppo(重庆)智能科技有限公司 图像处理方法、图像处理装置、存储介质与电子设备
CN112489198A (zh) * 2020-11-30 2021-03-12 江苏科技大学 一种基于对抗学习的三维重建***及其方法
CN112565628A (zh) * 2020-12-01 2021-03-26 合肥工业大学 一种卡通视频重制方法及***
CN112837344A (zh) * 2019-12-18 2021-05-25 沈阳理工大学 一种基于条件对抗生成孪生网络的目标跟踪方法
CN112837240A (zh) * 2021-02-02 2021-05-25 北京百度网讯科技有限公司 模型训练方法、分数提升方法、装置、设备、介质和产品
CN113658062A (zh) * 2021-07-28 2021-11-16 上海影谱科技有限公司 一种视频去模糊方法、装置及计算设备
CN114820382A (zh) * 2022-05-17 2022-07-29 上海传英信息技术有限公司 图像处理方法、智能终端及存储介质
CN114820389A (zh) * 2022-06-23 2022-07-29 北京科技大学 一种基于无监督解耦表征的人脸图像去模糊方法

Families Citing this family (26)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2020234449A1 (en) * 2019-05-23 2020-11-26 Deepmind Technologies Limited Generative adversarial networks with temporal and spatial discriminators for efficient video generation
CN111028166B (zh) * 2019-11-30 2022-07-22 温州大学 一种基于迭代神经网络的视频去模糊方法
CN111047532B (zh) * 2019-12-06 2020-12-29 广东启迪图卫科技股份有限公司 一种基于3d卷积神经网络的低照度视频增强方法
CN111178401B (zh) * 2019-12-16 2023-09-12 上海航天控制技术研究所 一种基于多层对抗网络的空间目标分类方法
CN111460939A (zh) * 2020-03-20 2020-07-28 深圳市优必选科技股份有限公司 一种去模糊的人脸识别方法、***和一种巡检机器人
CN111626944B (zh) * 2020-04-21 2023-07-25 温州大学 一种基于时空金字塔网络和对抗自然先验的视频去模糊方法
CN111612703A (zh) * 2020-04-22 2020-09-01 杭州电子科技大学 一种基于生成对抗网络的图像盲去模糊方法
CN111968052B (zh) * 2020-08-11 2024-04-30 北京小米松果电子有限公司 图像处理方法、图像处理装置及存储介质
CN112351196B (zh) * 2020-09-22 2022-03-11 北京迈格威科技有限公司 图像清晰度的确定方法、图像对焦方法及装置
CN112115322B (zh) * 2020-09-25 2024-05-07 平安科技(深圳)有限公司 用户分群方法、装置、电子设备及存储介质
CN112465730A (zh) * 2020-12-18 2021-03-09 辽宁石油化工大学 一种运动视频去模糊的方法
CN112734678B (zh) * 2021-01-22 2022-11-08 西华大学 基于深度残差收缩网络和生成对抗网络的去图像运动模糊方法
CN113129240B (zh) * 2021-05-19 2023-07-25 广西师范大学 一种工业包装字符的去运动模糊方法
CN113570689B (zh) * 2021-07-28 2024-03-01 杭州网易云音乐科技有限公司 人像卡通化方法、装置、介质和计算设备
CN113610713B (zh) * 2021-08-13 2023-11-28 北京达佳互联信息技术有限公司 视频超分辨模型的训练方法、视频超分辨方法及装置
CN113643215B (zh) * 2021-10-12 2022-01-28 北京万里红科技有限公司 生成图像去模糊模型的方法及虹膜图像去模糊方法
CN113822824B (zh) * 2021-11-22 2022-02-25 腾讯科技(深圳)有限公司 视频去模糊方法、装置、设备及存储介质
CN114202474A (zh) * 2021-11-24 2022-03-18 清华大学 面向移动端的循环多路输出高效视频降噪方法及装置
CN116362976A (zh) * 2021-12-22 2023-06-30 北京字跳网络技术有限公司 一种模糊视频修复方法及装置
CN114596219B (zh) * 2022-01-27 2024-04-26 太原理工大学 一种基于条件生成对抗网络的图像去运动模糊方法
CN114140363B (zh) * 2022-02-08 2022-05-24 腾讯科技(深圳)有限公司 视频去模糊方法及装置、视频去模糊模型训练方法及装置
US20240051568A1 (en) * 2022-08-09 2024-02-15 Motional Ad Llc Discriminator network for detecting out of operational design domain scenarios
CN115439375B (zh) * 2022-11-02 2023-03-24 国仪量子(合肥)技术有限公司 图像去模糊模型的训练方法和装置以及应用方法和装置
CN115861099B (zh) * 2022-11-24 2024-02-13 南京信息工程大学 一种引入物理成像先验知识约束的卫星云图图像复原方法
CN116233626B (zh) * 2023-05-05 2023-09-15 荣耀终端有限公司 图像处理方法、装置及电子设备
CN117196985A (zh) * 2023-09-12 2023-12-08 军事科学院军事医学研究院军事兽医研究所 一种基于深度强化学习的视觉去雨雾方法

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106934769A (zh) * 2017-01-23 2017-07-07 武汉理工大学 基于近景遥感的去运动模糊方法
CN107491771A (zh) * 2017-09-21 2017-12-19 百度在线网络技术(北京)有限公司 人脸检测方法和装置
CN107730458A (zh) * 2017-09-05 2018-02-23 北京飞搜科技有限公司 一种基于生成式对抗网络的模糊人脸重建方法及***
US20180114096A1 (en) * 2015-04-30 2018-04-26 The Regents Of The University Of California Machine learning to process monte carlo rendered images

Family Cites Families (20)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2011049565A1 (en) * 2009-10-21 2011-04-28 Hewlett-Packard Development Company, L.P. Real-time video deblurring
US8345984B2 (en) * 2010-01-28 2013-01-01 Nec Laboratories America, Inc. 3D convolutional neural networks for automatic human action recognition
US8768069B2 (en) * 2011-02-24 2014-07-01 Sony Corporation Image enhancement apparatus and method
US9692939B2 (en) * 2013-05-29 2017-06-27 Yeda Research And Development Co. Ltd. Device, system, and method of blind deblurring and blind super-resolution utilizing internal patch recurrence
US9135683B2 (en) * 2013-09-05 2015-09-15 Arecont Vision, Llc. System and method for temporal video image enhancement
KR101671391B1 (ko) * 2015-07-07 2016-11-02 한국과학기술연구원 레이어 블러 모델에 기반한 비디오 디블러링 방법, 이를 수행하기 위한 기록 매체 및 장치
US10607321B2 (en) * 2016-06-22 2020-03-31 Intel Corporation Adaptive sharpness enhancement control
US10387765B2 (en) * 2016-06-23 2019-08-20 Siemens Healthcare Gmbh Image correction using a deep generative machine-learning model
WO2018053340A1 (en) * 2016-09-15 2018-03-22 Twitter, Inc. Super resolution using a generative adversarial network
US10701394B1 (en) * 2016-11-10 2020-06-30 Twitter, Inc. Real-time video super-resolution with spatio-temporal networks and motion compensation
JP7002729B2 (ja) * 2017-07-31 2022-01-20 株式会社アイシン 画像データ生成装置、画像認識装置、画像データ生成プログラム、及び画像認識プログラム
CN107590774A (zh) * 2017-09-18 2018-01-16 北京邮电大学 一种基于生成对抗网络的车牌清晰化方法及装置
CN109727201A (zh) * 2017-10-30 2019-05-07 富士通株式会社 信息处理设备、图像处理方法以及存储介质
US10733714B2 (en) * 2017-11-09 2020-08-04 Samsung Electronics Co., Ltd Method and apparatus for video super resolution using convolutional neural network with two-stage motion compensation
CN107968962B (zh) * 2017-12-12 2019-08-09 华中科技大学 一种基于深度学习的两帧不相邻图像的视频生成方法
JP2019152927A (ja) * 2018-02-28 2019-09-12 株式会社エクォス・リサーチ 画像データ生成装置、画像認識装置、画像データ生成プログラム、及び、画像認識プログラム
CN108416752B (zh) * 2018-03-12 2021-09-07 中山大学 一种基于生成式对抗网络进行图像去运动模糊的方法
CN110728626A (zh) * 2018-07-16 2020-01-24 宁波舜宇光电信息有限公司 图像去模糊方法和装置及其训练
CN111861894B (zh) * 2019-04-25 2023-06-20 上海理工大学 基于生成式对抗网络的图像去运动模糊方法
CN111223062B (zh) * 2020-01-08 2023-04-07 西安电子科技大学 基于生成对抗网络的图像去模糊方法

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180114096A1 (en) * 2015-04-30 2018-04-26 The Regents Of The University Of California Machine learning to process monte carlo rendered images
CN106934769A (zh) * 2017-01-23 2017-07-07 武汉理工大学 基于近景遥感的去运动模糊方法
CN107730458A (zh) * 2017-09-05 2018-02-23 北京飞搜科技有限公司 一种基于生成式对抗网络的模糊人脸重建方法及***
CN107491771A (zh) * 2017-09-21 2017-12-19 百度在线网络技术(北京)有限公司 人脸检测方法和装置

Cited By (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112837344A (zh) * 2019-12-18 2021-05-25 沈阳理工大学 一种基于条件对抗生成孪生网络的目标跟踪方法
CN112837344B (zh) * 2019-12-18 2024-03-29 沈阳理工大学 一种基于条件对抗生成孪生网络的目标跟踪方法
CN111368941A (zh) * 2020-04-10 2020-07-03 浙江大华技术股份有限公司 一种图像处理方法、装置以及计算机存储介质
CN111368941B (zh) * 2020-04-10 2023-09-01 浙江大华技术股份有限公司 一种图像处理方法、装置以及计算机存储介质
CN111738913A (zh) * 2020-06-30 2020-10-02 北京百度网讯科技有限公司 视频填充方法、装置、设备及存储介质
CN111914785A (zh) * 2020-08-10 2020-11-10 北京小米松果电子有限公司 一种提高人脸图像清晰度的方法、装置及存储介质
CN111914785B (zh) * 2020-08-10 2023-12-05 北京小米松果电子有限公司 一种提高人脸图像清晰度的方法、装置及存储介质
CN111951168A (zh) * 2020-08-25 2020-11-17 Oppo(重庆)智能科技有限公司 图像处理方法、图像处理装置、存储介质与电子设备
CN112489198A (zh) * 2020-11-30 2021-03-12 江苏科技大学 一种基于对抗学习的三维重建***及其方法
CN112565628A (zh) * 2020-12-01 2021-03-26 合肥工业大学 一种卡通视频重制方法及***
CN112837240B (zh) * 2021-02-02 2023-08-04 北京百度网讯科技有限公司 模型训练方法、分数提升方法、装置、设备、介质和产品
CN112837240A (zh) * 2021-02-02 2021-05-25 北京百度网讯科技有限公司 模型训练方法、分数提升方法、装置、设备、介质和产品
CN113658062A (zh) * 2021-07-28 2021-11-16 上海影谱科技有限公司 一种视频去模糊方法、装置及计算设备
CN114820382A (zh) * 2022-05-17 2022-07-29 上海传英信息技术有限公司 图像处理方法、智能终端及存储介质
CN114820389B (zh) * 2022-06-23 2022-09-23 北京科技大学 一种基于无监督解耦表征的人脸图像去模糊方法
CN114820389A (zh) * 2022-06-23 2022-07-29 北京科技大学 一种基于无监督解耦表征的人脸图像去模糊方法

Also Published As

Publication number Publication date
US20200372618A1 (en) 2020-11-26
EP3792869A1 (en) 2021-03-17
EP3792869A4 (en) 2021-06-23
CN110473147A (zh) 2019-11-19
US11688043B2 (en) 2023-06-27

Similar Documents

Publication Publication Date Title
WO2019214381A1 (zh) 一种视频去模糊方法、装置、存储介质和电子装置
WO2020216054A1 (zh) 视线追踪模型训练的方法、视线追踪的方法及装置
US11356619B2 (en) Video synthesis method, model training method, device, and storage medium
WO2020187153A1 (zh) 目标检测方法、模型训练方法、装置、设备及存储介质
CN110059744B (zh) 训练神经网络的方法、图像处理的方法、设备及存储介质
WO2020029906A1 (zh) 一种多人语音的分离方法和装置
RU2731370C1 (ru) Способ распознавания живого организма и терминальное устройство
WO2020192465A1 (zh) 一种三维对象重建方法和装置
US10701315B2 (en) Video communication device and video communication method
CN108989672B (zh) 一种拍摄方法及移动终端
CN108038825B (zh) 一种图像处理方法及移动终端
WO2019052329A1 (zh) 人脸识别方法及相关产品
EP3594848B1 (en) Queue information acquisition method, device and computer readable storage medium
WO2019011206A1 (zh) 活体检测方法及相关产品
WO2019233216A1 (zh) 一种手势动作的识别方法、装置以及设备
WO2019015575A1 (zh) 解锁控制方法及相关产品
CN107766403B (zh) 一种相册处理方法、移动终端以及计算机可读存储介质
WO2019011098A1 (zh) 解锁控制方法及相关产品
WO2019015418A1 (zh) 解锁控制方法及相关产品
CN110263216B (zh) 一种视频分类的方法、视频分类模型训练的方法及装置
WO2019001254A1 (zh) 虹膜活体检测方法及相关产品
CN112184548A (zh) 图像超分辨率方法、装置、设备及存储介质
CN109618218B (zh) 一种视频处理方法及移动终端
CN110766610A (zh) 一种超分辨率图像的重建方法及电子设备
CN109120858A (zh) 一种图像拍摄方法、装置、设备及存储介质

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19799523

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

ENP Entry into the national phase

Ref document number: 2019799523

Country of ref document: EP

Effective date: 20201209