WO2021208122A1 - 基于深度学习的视频盲去噪方法及装置 - Google Patents

基于深度学习的视频盲去噪方法及装置 Download PDF

Info

Publication number
WO2021208122A1
WO2021208122A1 PCT/CN2020/086094 CN2020086094W WO2021208122A1 WO 2021208122 A1 WO2021208122 A1 WO 2021208122A1 CN 2020086094 W CN2020086094 W CN 2020086094W WO 2021208122 A1 WO2021208122 A1 WO 2021208122A1
Authority
WO
WIPO (PCT)
Prior art keywords
frame
optical flow
image
denoising
video sequence
Prior art date
Application number
PCT/CN2020/086094
Other languages
English (en)
French (fr)
Inventor
谢翔
邹少锋
李国林
麦宋平
王志华
Original Assignee
清华大学深圳国际研究生院
清华大学
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 清华大学深圳国际研究生院, 清华大学 filed Critical 清华大学深圳国际研究生院
Publication of WO2021208122A1 publication Critical patent/WO2021208122A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T5/00Image enhancement or restoration
    • G06T5/70Denoising; Smoothing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T5/00Image enhancement or restoration
    • G06T5/50Image enhancement or restoration using two or more images, e.g. averaging or subtraction
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T5/00Image enhancement or restoration
    • G06T5/60Image enhancement or restoration using machine learning, e.g. neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/20Analysis of motion
    • G06T7/215Motion-based segmentation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/30Determination of transform parameters for the alignment of images, i.e. image registration
    • G06T7/38Registration of image sequences
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10016Video; Image sequence
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]

Definitions

  • the invention relates to the technical field of video denoising, in particular to a method and device for blind denoising of video based on deep learning.
  • Denoising is a fundamental problem in image and video processing. Although denoising algorithms and camera sensors have improved over the years, there is still a lot of noise in videos shot under dark light conditions and in videos that use short exposure times to capture high-speed moving objects. At the same time, surveillance cameras and mobile phones are widely used. Most of the other equipment uses low-quality camera sensors, even if the images and videos collected under good lighting conditions still have a lot of noise. Therefore, denoising is an essential part of video image processing.
  • General image denoising algorithms often model the noise of the image as additive noise (the relationship with the signal is addition, no matter whether there is a signal, the noise exists), and the noise is assumed to be Gaussian white noise, and then pass it on the clean image Add Gaussian white noise to generate a large number of numbers, and train the denoising model in a data-driven manner.
  • Gaussian white noise is used to model the noise because the observation signal in the CCD/CMOS imaging system can usually be modeled as a Poisson-Gaussian joint distribution, and the Poisson-Gaussian joint distribution can be transformed by a variance stabilization transformation (VST) It is additive white Gaussian noise.
  • VST variance stabilization transformation
  • the output of the imaging sensor has undergone operations such as quantization, demosaicing, gamma correction, and compression.
  • operations such as quantization, demosaicing, gamma correction, and compression.
  • the video may also be processed by compression, filtering, etc. Therefore, in many cases, the noise signal in the image or video cannot be simply modeled with additive white Gaussian noise.
  • VBM3D is an extension of video denoising based on the image denoising algorithm BM3D.
  • VBM3D is based on the video sequence and uses its temporal and spatial correlation to obtain similar blocks in adjacent frames and the current frame.
  • the VBM3D algorithm can obtain a good compromise in denoising performance and computational complexity.
  • the denoising effect of VBM3D often affects the accuracy of block matching due to the perspective change and object movement in the video sequence, resulting in a poor denoising effect.
  • the VBM3D algorithm is aimed at additive white Gaussian noise. It is necessary to estimate the noise level of the noisy image first, but the noise level of the noisy image in the actual scene is often not directly obtained, and the noise distribution does not meet the Gaussian distribution, so the algorithm has certain limitations in its application.
  • Ehret et al. proposed an unsupervised video denoising algorithm, using the DnCNN network, first pre-training on the data containing Gaussian white noise, and then training the video with unknown noise distribution frame by frame, and the image noise model distribution in the unknown video Next, the blind denoising of the video is realized.
  • the traditional optical flow algorithm is used to estimate the optical flow of the video before and after the frame, and then the adjacent frame is mapped to the current frame for registration according to the optical flow, so as to obtain a pair of noisy images with the same content, and then use the idea of noise2noise Perform frame-by-frame training to achieve blind denoising of videos with arbitrary noise distribution.
  • the pair of images can be trained to achieve the denoising effect, but the denoising effect of Gaussian white noise is slightly inferior to the direct use of pre-trained ones. DnCNN network.
  • the denoising effect after multiple iterations of a single image has a certain degree of instability, and the denoising effect between video sequences has certain volatility, which reduces the visual effect of the video.
  • the embodiment of the present invention provides a method and device for blind denoising of video based on deep learning, which solves the problem that in the prior art, only two adjacent frames of images are used but the time domain information of the video sequence is not fully utilized, so that the denoising effect is affected.
  • the technical problem of the limitation is not limited.
  • the embodiment of the present invention provides a blind video denoising method based on deep learning, the method including:
  • the images corresponding to each other frame in the video sequence are respectively converted to noisy reference frames for registration, and multiple frames of noisy registration images are obtained;
  • the denoising network is constructed based on the convolutional neural network, the multi-frame noisy registration image is used as the input of the convolutional neural network, the noisy reference frame is used as the reference image of the convolutional neural network, and the nose2noise method is used for frame-by-frame iterative training and de-noising. noisy, obtain the denoised image corresponding to the noisy reference frame.
  • the embodiment of the present invention also provides a blind video denoising device based on deep learning, which includes:
  • the optical flow estimation module is used to take a video sequence containing a preset number of frames from the video sequence to be denoised, and use the middle frame of the video sequence as a noisy reference frame, for the noisy reference frame and every other frame in the video sequence Perform optical flow estimation on the corresponding image to obtain multiple optical flow fields between two frames of images;
  • the image transformation module is used to convert the images corresponding to each other frame in the video sequence to the noisy reference frame for registration according to the optical flow field between multiple two frames of images, to obtain a multi-frame noisy registration image ;
  • the multi-frame image fusion denoising module is used to build a denoising network based on a convolutional neural network.
  • the noisy registration image of multiple frames is used as the input of the convolutional neural network, and the noisy reference frame is used as the reference image of the convolutional neural network.
  • the nose2noise method is used for frame-by-frame iterative training and denoising, and the denoised image corresponding to the noisy reference frame is obtained.
  • the embodiment of the present invention also provides a computer device including a memory, a processor, and a computer program stored in the memory and capable of running on the processor, and the processor implements the above-mentioned method when the computer program is executed.
  • the embodiment of the present invention also provides a computer-readable storage medium, and the computer-readable storage medium stores a computer program for executing the above-mentioned method.
  • the intermediate frame of the video sequence of the preset number of frames in the video to be denoised is obtained as the noisy reference frame, and the noisy reference frame is used for registration with other frame images, and then through the noise2noise training idea, Zero-sample learning can be performed using only one video, and blind denoising of the video sequence can be realized without obtaining a large amount of noise data and clean data, nor obtaining an accurate noise distribution model.
  • the multi-frame fusion method can make full use of the time domain information of the video sequence, solve the problem of lack of time domain information, and help to obtain better denoising image quality.
  • FIG. 1 is a flowchart of a method for blind video denoising based on deep learning according to an embodiment of the present invention
  • FIG. 2 is a specific flow chart of a method for blind video denoising based on deep learning provided by an embodiment of the present invention
  • Figure 3 is a schematic diagram of a network structure similar to the DnCNN structure
  • Figure 4 is a noise map of a certain frame of image in a station2 video sequence in a Derf data set provided by an embodiment of the present invention
  • Figure 5 is a denoising image processed by the VBM3D method
  • Figure 6 is a denoising image processed using the unsupervised video denoising method proposed by Ehret et al.
  • Figure 7 is a denoising map processed by the method of the present invention.
  • FIG. 8 is a structural block diagram of a blind video denoising device based on deep learning provided by an embodiment of the present invention.
  • a method for blind video denoising based on deep learning includes:
  • Step 101 Take a video sequence that contains a preset number of frames from the video sequence to be denoised, use the middle frame of the video sequence as a noisy reference frame, and perform processing on the noisy reference frame and the image corresponding to each other frame in the video sequence.
  • Optical flow estimation to obtain multiple optical flow fields between two frames of images.
  • Step 102 According to the optical flow field between a plurality of two-frame images, the images corresponding to each other frame in the video sequence are respectively converted to a noisy reference frame for registration, and a multi-frame noisy registration image is obtained;
  • Step 103 Construct a denoising network based on the convolutional neural network, take the multi-frame noisy registration image as the input of the convolutional neural network, take the noisy reference frame as the reference image of the convolutional neural network, and use the nose2noise method to iterate frame by frame Training and denoising, to obtain the denoised image corresponding to the noisy reference frame.
  • step 101 when denoising, sequentially take N frames of the video sequence in the video to be denoised, and take the middle frame of the video sequence as the noisy reference frame, you can use N -1 optical flow estimation network performs optical flow estimation on the noisy reference frame and the image corresponding to each other frame in the video sequence.
  • These optical flow estimation networks have the same network structure and parameters, and each optical flow estimation network takes An image corresponding to a corresponding frame in the sequence and a noisy reference frame are used as input, and the dense optical flow field between the two frames of images is obtained as a motion estimation.
  • the noisy reference frame is I t , t indicates that the noisy reference frame is the t-th frame in the video sequence, the other frames in the N frame can be expressed as I t+i relative to the noisy reference frame, and i greater than 0 indicates other
  • the frame is the last i frame relative to the noisy reference frame, i is less than 0 means that other frames are the previous i frame relative to the noisy reference frame, so the value range of i is [-(N-1)/2,(N-1 )/2].
  • the clean image corresponding to the noisy reference frame I t is U t
  • v t,t+i represents the optical flow field from frame t to frame t+i
  • passes I t+i and v t,t+i through the space Transformation of Spatial Transformer Networks (STN) is obtained.
  • STN Spatial Transformer Networks
  • the optical flow estimation network can adopt optical flow networks such as Flownet2, SpyNet, PWCNet, etc.
  • the embodiment of the present invention does not specifically limit the optical flow estimation network, and only needs to realize the optical flow estimation of the preceding and following frames, and Flownet2 is preferred in this embodiment.
  • the optical flow estimation network is pre-trained to obtain the pre-training model.
  • the Sintel data set can be used for training.
  • the specific training method varies with different networks.
  • the data set used during training is not limited to this, and the published pre-training weights can also be used directly. In this embodiment, the published pre-training weights are preferably used.
  • the optical flow estimation network can be fine-tuned through backpropagation, or the network weights can be frozen, gradient update is not performed, and only the pre-trained network is used for optical flow estimation.
  • the weight of the optical flow estimation network is preferably frozen, and the network weight is not updated by back propagation.
  • Stream estimation Accurate optical flow estimation is often difficult due to problems such as light changes, motion blur, and occlusion in the video sequence.
  • the registered image after image transformation often has borders that cannot be aligned with the noisy reference frame and because of the difference between the two images. The lack of information caused by the difference between the images and the perspective change. Therefore, the optical flow estimation through a multi-frame video sequence can make up for the information loss caused by only using the two frames before and after the optical flow.
  • the greater the time interval between the two selected frames the less accurate the optical flow estimation will be, the less effective time domain information it can bring, and the complexity and calculation of the system will also increase, so the size of N needs to be weighed.
  • the traditional optical flow estimation algorithm can also be used instead of the optical flow estimation network for optical flow estimation.
  • the TV-L1 algorithm can be used to complete the optical flow. It is estimated that an excellent denoising effect can also be achieved.
  • the input frame can be transformed to a reference frame for registration through a spatial transformation network (Spatial Transformer Networks, STN).
  • STN Spatial Transformer Networks
  • each spatial transformation network converts the corresponding image in the video sequence to the view of the reference frame, and N-1 frames of images need to use N-1 spatial transformation networks.
  • bilinear interpolation When transforming the input frame to the reference frame according to the optical flow field, bilinear interpolation is required.
  • a differentiable image sampling method can be used.
  • the gradient of the loss function can be removed from the image.
  • the noise network is propagated back to the optical flow estimation step, so that the optical flow estimation network can be fine-tuned according to different videos, so that the entire video denoising network can be trained end-to-end.
  • the image transformation part if the entire network does not update the parameters of the optical flow estimation network or use the traditional optical flow estimation algorithm, it is not necessary to use the spatial transformation network, but use the traditional image processing algorithm to complete the spatial transformation of the image through opencv.
  • step 103 use a convolutional neural network to perform frame-by-frame iterative training and denoising.
  • the registered images of N-1 frames are stacked together to form a multi-channel image, and then sent to the denoising network for training.
  • the noise2noise algorithm there is no need to use any clean images as training data, but to The middle frame of the N frame image is used as the noisy reference frame.
  • an online learning strategy is used for frame-by-frame iterative training, and the network output during the iterative training process is fused and averaged to obtain the final denoised image corresponding to the noisy reference frame.
  • this example preferably uses a network similar to the DnCNN structure as the denoising network.
  • This example includes 17 convolutional layers.
  • the first convolutional layer uses a 3x3 convolution kernel and uses ReLU as the activation function to output 64 feature maps.
  • the next 15 convolutional layers also use 64 3x3 convolutions.
  • the output layer of the network uses only a 3x3 convolution kernel for convolution.
  • this example does not use residual learning, that is, the output of the network is the estimated denoising map instead of the estimated noise, because the input of the network is an image formed by stacking N-1 frames of images, and the output of the network Is the denoising estimation map of the reference frame.
  • the initialization of the convolutional layer parameters uses Kaming initialization, which can effectively avoid gradient dispersion or gradient explosion in the back propagation process, and accelerate the network convergence.
  • the present invention stacks N-1 frames of images after STN transformation. If the size of the original image is (H, W, C), where H is the height of the image, W is the width of the image, and C is the number of channels of the image, after stacking, (H, W, (N-1) ⁇ C)
  • H, W, (N-1) ⁇ C The size of the image is recorded as Use it as the input to the denoising network.
  • the image obtained after STN transformation in the t+i frame video sequence Corresponding clean image
  • the clean image U t corresponding to the noisy reference frame I t is approximately matched on the corresponding pixels; the noise in each frame of the N frame video sequence is independent and identically distributed.
  • the image after STN transformation And the noisy reference frame I t form a pair of approximately the same clean image but contain independent and identically distributed noise, so it can be
  • I t is used as the noisy reference frame of the convolutional neural network, and the idea of nose2noise is used for training without using clean images.
  • the present invention is further expanded on this basis, and the stacked images of N-1 frames in the sequence after STN transformation As the input of the convolutional neural network, and using I t as the noisy reference frame of the convolutional neural network, the same denoising purpose can still be achieved, and a better denoising effect can be obtained.
  • the loss function of the convolutional neural network when training based on the noise2noise algorithm, the loss function used depends on the distribution of noise. If the noise distribution is known, the loss function can be selected in a targeted manner. For example, for Gaussian noise or Poisson noise, the L 2 loss function can be used, and for random impulse noise, the L 1 loss function can be used. In practical applications, the noise distribution model is often not available, or the noise in the video is mixed with multiple distributions. In this case, the optimal loss function can be determined through experimental methods. For the L 2 loss function, it can be expressed as:
  • L 2 () represents the L 2 loss function
  • I t represents the noisy reference frame
  • t represents that the noisy reference frame is the t-th frame in the video sequence
  • x represents the position of the pixel in the video sequence
  • I t (x) represents the pixel value of the noisy reference frame at position x
  • v t,t+i is the optical flow field from the noisy reference frame to the t+i frame of image
  • M t,t+i is the occlusion mask corresponding to the optical flow field
  • is the set threshold
  • div represents the dispersion Spend.
  • the occlusion masks corresponding to the N-1 optical flow fields v t,t+i are summed and averaged to obtain the final occlusion mask M t .
  • the mask is used to shield the occluded part from participating in the calculation of the loss function.
  • lens zoom push and pull, camera position movement back and forth, and object movement bring about changes in the field of view of the picture, often causing the optical flow estimation network to fail to obtain the optical flow field with effective edges, and the resulting occlusion mask is taken at the edge of the image. It is always 0, so the loss at the edge of the image cannot be obtained, which affects the denoising of the edge of the image. Therefore, the present invention fills the edge of the occlusion mask with 1 in a certain width, thereby avoiding serious distortion at the edge of the denoising image.
  • the L 2 loss function can be expressed as:
  • Epochs For the training method of convolutional neural network, the idea of online learning is adopted, and the training is carried out frame by frame, that is, iterate on the same image multiple times.
  • Set the number of iterations for one frame of image to Epochs. If Epochs is set too large, it may cause the network The over-fitting, that is, as the number of iterations increases, the denoising effect will gradually become worse. Too small will cause the network to underfit and fail to achieve the optimal denoising effect.
  • video scenes and noise distribution may be different, so the optimal choice of Epochs will also be different. In this example, Epochs is between 25 and 100, and the specific selection can be obtained through experimental observation.
  • the PSNR Peak Signal to Noise Ratio
  • the present invention sums and averages the output images in the iterative process, thereby obtaining the final denoising effect, which can balance the under-fitting at the beginning of the iteration and the under-fitting at the later stage, and also eliminate the training
  • the effect of floating denoising effects during the process can obtain better denoising effects and better visual effects than directly taking denoising images after a certain number of iterations.
  • frame-by-frame training can effectively solve the noise changes caused by changes in the environment and weather during the video acquisition process, and realize lifelong learning.
  • the entire video sequence can be framed frame by frame before denoising is performed.
  • Optical flow estimation and image transformation save the registration map and occlusion mask obtained after image transformation to the computer hard disk, and the subsequent denoising algorithm can directly call the registered image and occlusion mask calculated by local processing to avoid de-noising. In the noise process, it is possible to repeatedly perform optical flow estimation and image transformation on the same pair of images, which can save computing resources and time.
  • the multi-frame fusion denoising part in addition to using online learning to sequentially denoise each frame of the video sequence, you can also use the offline learning method to perform multiple rounds of iterative training on the entire video sequence frame by frame to determine the entire video sequence.
  • One training pass is one iteration.
  • the weights of the convolutional neural network are updated by multiple iterations frame by frame to obtain the convolutional neural network corresponding to the noisy reference frame.
  • the convolutional neural network corresponding to the noisy reference frame is used to analyze the entire video sequence (Multi-frame noisy registration image and noisy reference frame) are tested to obtain a denoised video sequence.
  • the occlusion mask part may not be used when designing the loss function. Since multiple frames of images are used for fusion, the information has a certain degree of redundancy, and the denoising network also has a certain degree of robustness, so it can also obtain a fairly or even better denoising effect.
  • the table corresponds to the unsupervised video denoising algorithm proposed by Ehret et al.
  • Proposed-TVL1 represents the video denoising algorithm constructed by the traditional TV-L1 optical flow estimation algorithm in the method of the present invention
  • Proposed-Flownet2 represents the method of the present invention.
  • the bold in the table indicates the algorithm that obtains the highest PNSR in the current video.
  • the present invention achieves a large PSNR improvement in all 7 videos.
  • the present invention can significantly improve the definition of image details after video denoising, enhance the recognizability of the image to human eyes, improve the subjective quality of the image, and at the same time improve the objective index.
  • an embodiment of the present invention also provides a blind video denoising device based on deep learning, as described in the following embodiments. Since the problem-solving principle of the blind video denoising device based on deep learning is similar to that of the blind video denoising method based on deep learning, the implementation of the blind video denoising device based on deep learning can be found in the video blind denoising method based on deep learning. Implementation, the repetition will not be repeated.
  • the term "unit” or "module” can be a combination of software and/or hardware that implements a predetermined function.
  • Fig. 8 is a structural block diagram of an apparatus for blind video denoising based on deep learning according to an embodiment of the present invention, as shown in Fig. 8, including:
  • the optical flow estimation module 02 is used to take a video sequence containing a preset number of frames from the video sequence to be denoised, and use the middle frame of the video sequence as a noisy reference frame, and for each other in the noisy reference frame and the video sequence Perform optical flow estimation on images corresponding to frames, and obtain multiple optical flow fields between two frames of images;
  • the image transformation module 04 is used to convert the images corresponding to each other frame in the video sequence to the noisy reference frame for registration according to the optical flow field between multiple two frames of images, to obtain a multi-frame noisy registration image;
  • Multi-frame image fusion denoising module 06 used to build a denoising network based on a convolutional neural network, using a multi-frame noisy registration image as the input of the convolutional neural network, and using a noisy reference frame as the reference image of the convolutional neural network , Use the nose2noise method to perform frame-by-frame iterative training and denoising, and obtain the denoised image corresponding to the noisy reference frame.
  • the optical flow estimation module 02 is specifically used for:
  • the image transformation module 04 is specifically used for:
  • the image corresponding to each other frame in the video sequence is converted to a noisy reference frame for registration through a spatial change network, and a multi-frame noisy registration image is obtained.
  • the optical flow estimation module 02 is specifically used for:
  • the image transformation module 04 is specifically used for:
  • the images corresponding to each other frame in the video sequence are converted to noisy reference frames for registration.
  • the number of the optical flow estimation network is a preset number minus one, and the preset number minus one optical flow estimation network has the same network structure and parameters;
  • the optical flow estimation module 02 is specifically used for:
  • noisy reference frame and the image corresponding to the other frame in the video sequence are used as the input of an optical flow estimation network, and an optical flow field between two frames of images is obtained through optical flow estimation;
  • the preset number minus 1 optical flow estimation network For the preset number minus 1 optical flow estimation network, the preset number minus 1 optical flow field between two frames of images is obtained.
  • the optical flow estimation module 02 is also used for:
  • the optical flow estimation network is pre-trained to obtain a pre-training model.
  • the number of the spatial change network is a preset number minus one
  • the image transformation module 04 is specifically used for:
  • Each spatial change network converts the image corresponding to the other frame in the video sequence to a noisy reference frame for registration, and obtains a registered video sequence
  • the multi-frame image fusion denoising module 06 is specifically used for:
  • the multi-channel image is used as the input of the convolutional neural network, and the noisy reference frame is used as the reference image of the convolutional neural network.
  • the nose2noise method is used for frame-by-frame iterative training and denoising, and the entire iterative training process of each frame of image is denoising the network.
  • the output denoised image of the image is summed and averaged to obtain the final denoised image with noisy reference frame.
  • the loss function in the convolutional neural network may adopt formula (1).
  • the multi-frame image fusion denoising module 06 is also used for:
  • the loss function in the convolutional neural network is determined according to the final occlusion mask.
  • the binarized occlusion mask is defined according to formula (2).
  • the loss function may adopt formula (3).
  • An embodiment of the present invention also provides a computer device, including a memory, a processor, and a computer program stored in the memory and capable of running on the processor, and the processor implements the above-mentioned method when the processor executes the computer program.
  • the embodiment of the present invention also provides a computer-readable storage medium, and the computer-readable storage medium stores a computer program for executing the above-mentioned method.
  • the method and device for blind video denoising based on deep learning proposed in the present invention have the following advantages:
  • optical flow estimation and image change methods optical flow estimation and image registration will be carried out on the front and rear frames of the video sequence, and then through the training idea of noise2noise, only one video can be used for zero-sample learning to realize the blind removal of the video sequence Noisy, without obtaining a large amount of noise data and clean data, and without obtaining an accurate noise distribution model.
  • the multi-frame fusion method can make full use of the time-domain information of the video sequence to solve the problem of missing time-domain information caused by changes in the field of view of the picture caused by lens zoom push and pull, camera position forward and backward movement, and object movement, which is helpful to obtain Better denoising image quality.
  • the video is trained frame by frame to denoise, which effectively solves the problem of the failure of the trained model caused by the change of the noise distribution in the video acquisition process.
  • the results of the denoising network are summed and averaged, which effectively balances the over-fitting and under-fitting problems in the online learning process, stabilizes the fluctuation of the network output, obtains a better denoising effect, and improves the video frame rate.
  • the continuity and consistency of the denoising effect are provided.
  • the embodiments of the present invention can be provided as a method, a system, or a computer program product. Therefore, the present invention may adopt the form of a complete hardware embodiment, a complete software embodiment, or an embodiment combining software and hardware. Moreover, the present invention may adopt the form of a computer program product implemented on one or more computer-usable storage media (including but not limited to disk storage, CD-ROM, optical storage, etc.) containing computer-usable program codes.
  • computer-usable storage media including but not limited to disk storage, CD-ROM, optical storage, etc.
  • These computer program instructions can also be stored in a computer-readable memory that can guide a computer or other programmable data processing equipment to work in a specific manner, so that the instructions stored in the computer-readable memory produce an article of manufacture including the instruction device.
  • the device implements the functions specified in one process or multiple processes in the flowchart and/or one block or multiple blocks in the block diagram.
  • These computer program instructions can also be loaded on a computer or other programmable data processing equipment, so that a series of operation steps are executed on the computer or other programmable equipment to produce computer-implemented processing, so as to execute on the computer or other programmable equipment.
  • the instructions provide steps for implementing the functions specified in one process or multiple processes in the flowchart and/or one block or multiple blocks in the block diagram.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • General Health & Medical Sciences (AREA)
  • Computing Systems (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Biomedical Technology (AREA)
  • Molecular Biology (AREA)
  • Biophysics (AREA)
  • General Engineering & Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Multimedia (AREA)
  • Image Processing (AREA)

Abstract

一种基于深度学习的视频盲去噪方法及装置,该方法包括:从待去噪视频中取包含预设数量帧的视频序列,将该视频序列的中间帧作为带噪参考帧,对带噪参考帧和视频序列中的其他每一帧图像进行光流估计,获得多个两帧图像之间的光流场;根据光流场将视频序列中的其他每一帧图像分别转换到带噪参考帧进行配准,获得多帧带噪配准图像;基于卷积神经网络构建去噪网络,以多帧带噪配准图像作为网络输入,以带噪参考帧作为网络的参考图像,利用nose2noise方法进行逐帧迭代训练和去噪,获得带噪参考帧对应的去噪图像。该方案即可仅利用单个视频而无需获得大量的噪声数据、干净的数据、准确的噪声分布模型,就能实现对视频的盲去噪。

Description

基于深度学习的视频盲去噪方法及装置 技术领域
本发明涉及视频去噪技术领域,特别涉及一种基于深度学习的视频盲去噪方法及装置。
背景技术
去噪是图像和视频处理中的基础问题。尽管去噪算法和摄像传感器尽年来都有一定的提升,但是对于暗光条件下的拍摄视频以及对于利用短曝光时间捕获高速运动物体的视频中仍然存在大量噪声,同时广泛使用的监控摄像头、手机等设备大部分使用质量较低的摄像传感器,即使在光照良好的条件下采集的图像与视频仍然存在大量噪声。所以,去噪是视频图像处理中必不可少的一部分。
一般的图像去噪算法常常将图像的噪声建模为加性噪声(与信号的关系是相加,不管有没有信号,噪声都存在),并且将噪声假设为高斯白噪声,然后通过在干净图像上添加高斯白噪声生成大量的数量,以数据驱动的方式训练去噪模型。使用高斯白噪声来对噪声建模是因为对于CCD/CMOS成像***中的观测信号通常可以建模为泊松-高斯联合分布,而泊松-高斯联合分布又可以通过方差稳定变换(VST)转换为加性高斯白噪声。但是,在许多应用中,所需处理的数据并不是直接来自于成像传感器的原始数据,成像传感器的输出经过了量化、去马赛克、伽马校正、压缩等操作,此外对于手机等设备生成的图像和视频可能还会经过压缩、滤镜等处理。因此很多情况下,图像或视频中的噪声信号并不能简单的用加性高斯白噪声进行建模。
此外,现有的深度学习去噪算法常以数据驱动的方式构造去噪模型。当噪声模型已知时,可以获得优异的去噪性能,但当应用到噪声模型未知的数据时,这些模型的去噪性能将会受到很大限制。现在也有将多种不同噪声分布的数据进行混合训练,但是其去噪性能往往不及于在特定噪声分布下训练获得的模型。此外训练去噪模型所需的真实场景下的噪声数据以及对应的干净数据是通常也难以获取。
VBM3D是基于图像去噪算法BM3D的在视频去噪上的扩展。VBM3D基于视频序列利用其时域和空域上的相关性,在相邻帧以及当前帧中获取相似块。VBM3D算法在去噪性能以及计算复杂度上可以获得较好的折中。VBM3D的去噪效果往往会因视频序列中的视角变换、物体运动而影响到块匹配的准确性,从而导致较差的去噪效果,同时 VBM3D算法针对的是加性高斯白噪声,在去噪前需要先估计带噪图像的噪声水平,而实际场景中带噪图像的噪声水平往往无法直接获得,并且噪声分布也不满足高斯分布,因此该算法在应用具有一定的局限性。
Ehret等人提出了无监督的视频去噪算法,利用DnCNN网络,首先在含高斯白噪声的数据进行预训练,然后对未知噪声分布的视频进行逐帧训练,在未知视频中图像噪声模型分布情况下,实现对视频的盲去噪。具体是利用传统光流算法对视频的前后帧进行光流估计,再根据光流将相邻帧映射到当前帧进行配准,从而获得一对具有相同内容的带噪图像,再利用noise2noise的思想进行逐帧训练,实现了对含任意噪声分布的视频的盲去噪。通过获取相邻两帧的图像,经过光流进行运动补偿后,对这一对图像进行训练,可以达到去噪效果,但是其对高斯白噪声的去噪效果稍逊于直接使用预训练好的DnCNN网络。此外仅使用相邻两帧图像而未充分利用视频序列的时域信息,使得去噪效果受到一定的限制。同时在线学习过程中对单张图像多次迭代后去噪效果存在一定的不稳定性,视频序列之间的去噪效果存在一定的波动性,降低了视频的视觉效果。
发明内容
本发明实施例提供了一种基于深度学习的视频盲去噪方法及装置,解决了现有技术中仅使用相邻两帧图像而未充分利用视频序列的时域信息,使得去噪效果受到一定的限制的技术问题。
本发明实施例提供了一种基于深度学习的视频盲去噪方法,该方法包括:
从待去噪视频序列中取包含预设数量帧的视频序列,将视频序列的中间帧作为带噪参考帧,对带噪参考帧和视频序列中的其他每一帧对应的图像进行光流估计,获得多个两帧图像之间的光流场;
根据多个两帧图像之间的光流场,将视频序列中的其他每一帧对应的图像分别转换到带噪参考帧进行配准,获得多帧带噪配准图像;
基于卷积神经网络构建去噪网络,以多帧带噪配准图像作为卷积神经网络的输入,以带噪参考帧作为卷积神经网络的参考图像,利用nose2noise方法进行逐帧迭代训练和去噪,获得带噪参考帧对应的去噪图像。
本发明实施例还提供了一种基于深度学习的视频盲去噪装置,该装置包括:
光流估计模块,用于从待去噪视频序列中取包含预设数量帧的视频序列,将视频序列的中间帧作为带噪参考帧,对带噪参考帧和视频序列中的其他每一帧对应的图像进行光流估计,获得多个两帧图像之间的光流场;
图像变换模块,用于根据多个两帧图像之间的光流场,将视频序列中的其他每一帧对应的图像分别转换到带噪参考帧进行配准,获得多帧带噪配准图像;
多帧图像融合去噪模块,用于基于卷积神经网络构建去噪网络,以多帧带噪配准图像作为卷积神经网络的输入,以带噪参考帧作为卷积神经网络的参考图像,利用nose2noise方法进行逐帧迭代训练和去噪,获得带噪参考帧对应的去噪图像。
本发明实施例还提供了一种计算机设备,包括存储器、处理器及存储在存储器上并可在处理器上运行的计算机程序,所述处理器执行所述计算机程序时实现上述所述方法。
本发明实施例还提供了一种计算机可读存储介质,所述计算机可读存储介质存储有执行上述所述方法的计算机程序。
在本发明实施例中,获取待去噪视频中预设数量帧的视频序列的中间帧作为带噪参考帧,利用该带噪参考帧和其他帧图像进行配准,再通过noise2noise的训练思想,仅利用一个视频就可以进行零样本学习,实现对视频序列的盲去噪,而无需获得大量的噪声数据和干净的数据,也不需要获取准确的噪声分布模型。利用多帧融合的方法,可以充分利用到视频序列的时域信息,解决时域信息缺失问题,有助于获得更优的去噪图像质量。
附图说明
为了更清楚地说明本发明实施例或现有技术中的技术方案,下面将对实施例或现有技术描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本发明的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。
图1是本发明实施例提供的一种基于深度学习的视频盲去噪方法流程图;
图2是本发明实施例提供的一种基于深度学习的视频盲去噪方法具体流程图;
图3是一种类似DnCNN结构的网络结构示意图;
图4是本发明实施例提供的一种Derf数据集中的station2视频序列中某帧图像的噪声图;
图5是一种使用VBM3D方法处理的去噪图;
图6是一种使用Ehret等人提出的无监督视频去噪方法处理的去噪图;
图7是一种使用本发明方法处理的去噪图;
图8是本发明实施例提供的一种基于深度学习的视频盲去噪装置结构框图。
具体实施方式
下面将结合本发明实施例中的附图,对本发明实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例仅是本发明一部分实施例,而不是全部的实施例。基于本发明中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都属于本发明保护的范围。
在本发明实施例中,提供了一种基于深度学习的视频盲去噪方法,如图1所示,该方法包括:
步骤101:从待去噪视频序列中取包含预设数量帧的视频序列,将视频序列的中间帧作为带噪参考帧,对带噪参考帧和视频序列中的其他每一帧对应的图像进行光流估计,获得多个两帧图像之间的光流场。
步骤102:根据多个两帧图像之间的光流场,将视频序列中的其他每一帧对应的图像分别转换到带噪参考帧进行配准,获得多帧带噪配准图像;
步骤103:基于卷积神经网络构建去噪网络,以多帧带噪配准图像作为卷积神经网络的输入,以带噪参考帧作为卷积神经网络的参考图像,利用nose2noise方法进行逐帧迭代训练和去噪,获得带噪参考帧对应的去噪图像。
在本发明实施例中,如图2所示,对于步骤101:去噪时,依次取待去噪视频中N帧视频序列,并取该视频序列的中间帧作为带噪参考帧,可以利用N-1个光流估计网络对带噪参考帧和该视频序列中的其他每一帧对应的图像进行光流估计,这些光流估计网络拥有相同的网络结构和参数,每个光流估计网络取序列中对应的一帧对应的图像和带噪参考帧作为输入,获得这两帧图像之间的稠密光流场作为运动估计。
记带噪参考帧为I t,t表示该带噪参考帧在视频序列中为第t帧,N帧中的其他帧相对于带噪参考帧可表示为I t+i,i大于0表示其他帧相对于带噪参考帧是后i帧,i小于0表示其他帧相对于带噪参考帧是前i帧,故i的取值范围为[-(N-1)/2,(N-1)/2]。记带噪参考帧I t对应的干净图像为U t,记v t,t+i表示从t帧到t+i帧的光流场,将I t+i和v t,t+i经过 空间变化网络(Spatial Transformer Networks,STN)变换后得到
Figure PCTCN2020086094-appb-000001
Figure PCTCN2020086094-appb-000002
所对应的干净图像为
Figure PCTCN2020086094-appb-000003
其中,w表示对图像进行空间变换。
光流估计网络可以采用Flownet2、SpyNet、PWCNet等光流网络。本发明实施例对光流估计网络不做具体限定,实现前后帧的光流估计即可,本实施例优选Flownet2。在进行视频去噪前,首先对光流估计网络进行预训练获得预训练模型,可以使用Sintel数据集进行训练,具体训练方式因不同网络而异。训练时用的数据集不局限于此,也可直接使用已公开的预训练权重,本实施例优选使用已公开的预训练权重。在进行视频去噪时,光流估计网络可以通过反向传播进行精调,也可以冻结网络权重,不进行梯度更新,仅使用预训练网络进行光流估计。本实施例优选冻结光流估计网络的权重,不进行反向传播更新网络权重。
关于N的选取,本实例优选N=11,即取包含11帧图像的序列,取第6帧作为带噪参考帧,剩下的每帧和参考帧作为光流估计网络的输入获得对应的光流估计。准确的光流估计往往因为视频序列中光线的变化、运动模糊、遮挡等问题而变得困难,经过图像变换后的配准图像往往会出现边界与带噪参考帧无法对齐以及因两帧图像之间图像差异、视角变换而造成的信息缺失。因此通过多帧视频序列进行光流估计,可以弥补仅使用前后两帧进行光流而造成的信息损失。但是选取的两帧图像时间间隔越大,光流估计越不准确,所能带来的有效时域信息越少,同时也会增加***的复杂性和计算量,所以N的大小需要一定权衡。
对于光流估计,如果去噪时不更新光流估计网络的参数,那么也可以使用传统的光流估计算法,而不使用光流估计网络进行光溜估计,例如可以使用TV-L1算法完成光流估计,同样可以达到优异的去噪效果。
在本发明实施例中,如图2所示,对于步骤102:可以通过空间变换网络(Spatial Transformer Networks,STN)将输入帧变换到参考帧进行配准。具体而言,每个空间变换网络把视频序列中对应的图像转换到参考帧的视图上,N-1帧图像则需使用N-1个空间变换网络。
在根据光流场对输入帧变换到参考帧时需要进行双线性插值,使用空间变换网络可以通过一种可微的图像采样方式,在进行去噪训练时,损失函数的梯度能够从图像去噪网络反向传播至光流估计步骤,使得光流估计网络能够根据不同的视频进行微调,从而使整个视频去噪网络进行端到端的训练。
对于图像变换部分,如果整个网络不更新光流估计网络的参数或者使用传统的光流 估计算法,也可以不使用空间变换网络,而是通过opencv使用传统图像处理算法完成对图像的空间变换。
在本发明实施例中,如图2所示,对于步骤103:使用一个卷积神经网络进行逐帧迭代训练和去噪。具体而言,将N-1帧配准后的图像堆叠在一起组成一个多通道的图像,再送入去噪网络进行训练,基于noise2noise算法的训练思想,无需任何干净图像作为训练数据,而是以N帧图像的中间帧作为带噪参考帧。训练时采用在线学***均,获得最终该带噪参考帧所对应的去噪图像。
对于卷积神经网络,可以采用当前的主流去噪网络,如DnCNN、U-Net等网络,具体使用的卷积神经去噪网络不局限于此。参考图3,本实例优选此种类似DnCNN结构的网络作为去噪网络。本实例包括了17个卷积层,第一个卷积层使用3x3的卷积核,并用ReLU作为激活函数,输出64个特征图,接下来的15层卷积层同样使用64个3x3的卷积核,并使用批量归一化和ReLU做激活,网络的输出层仅使用一个3x3的卷积核做卷积。与DnCNN不同的是,本实例并没有使用残差学习,即网络的输出是估计的去噪图而不是估计的噪声,因为网络的输入是N-1帧图像堆叠形成的图像,而网络的输出是参考帧的去噪估计图。
对于卷积神经网络的初始化设计,卷积层参数的初始化使用Kaming初始化,可以有效避免反向传播过程中的梯度弥散或者梯度***,加速网络收敛。
对于卷积神经网络的输入和输出,本发明将N-1帧经过STN变换后的图像堆叠在一起。若原图像的大小是(H,W,C),其中H是图像的高度,W是图像的宽度,C是图像的通道数,经过堆叠后获得(H,W,(N-1)×C)大小的图像记为
Figure PCTCN2020086094-appb-000004
将其作为去噪网络的输入。在此,做出以下假设:t+i帧视频序列中经过STN变换后所得图像
Figure PCTCN2020086094-appb-000005
对应的干净图像
Figure PCTCN2020086094-appb-000006
和带噪参考帧I t对应的干净图像U t在对应的像素点上近似匹配;N帧视频序列中每帧图像中的噪声是独立同分布的。
因此经过STN变换后的图像
Figure PCTCN2020086094-appb-000007
和带噪参考帧I t构成一对近似具有相同的干净图像但包含独立同分布的噪声,因此可以将
Figure PCTCN2020086094-appb-000008
作为卷积神经网络输入,将I t作为卷积神经网络的带噪参考帧,利用nose2noise的思想进行训练而无需使用干净图像。本发明再此基础上做进一步拓展,将序列中N-1帧经过STN变换后的堆叠在一起的图像
Figure PCTCN2020086094-appb-000009
作为卷积神 经网络的输入,再以I t作为卷积神经网络的带噪参考帧,仍然可以达到相同的去噪目的,并且可以获得更好的去噪效果。
对于卷积神经网络的损失函数,在基于noise2noise算法进行训练时,使用的损失函数取决于噪声的分布。如果已知噪声分布,可以有针对性的选择损失函数,例如对于高斯噪声或者泊松噪声,可以使用L 2损失函数,对于随机脉冲噪声,可以使用L 1损失函数。而在实际应用中,往往在无法获得噪声分布模型,或者视频中的噪声是多种分布混合在一起,这种情况下,可以通过实验的方法来确定最优的损失函数。对于L 2损失函数,可以表示为:
Figure PCTCN2020086094-appb-000010
其中,L 2( )表示L 2损失函数;I t表示带噪参考帧,t表示该带噪参考帧在视频序列中为第t帧;
Figure PCTCN2020086094-appb-000011
表示经过空间转换后的多通道图像
Figure PCTCN2020086094-appb-000012
经过去噪网络后输出的去噪图像;x表示视频序列的像素点的位置;I t(x)表示带噪参考帧在x位置处的像素值;
Figure PCTCN2020086094-appb-000013
表示去噪图像在x位置处的像素值。
此外,经过STN变换后的图像常常存在一定的光流场的遮挡区域,即在估计从I t到I t+i的光流时,在I t中出现的区域在I t+i可能并未出现,但是计算出的光流场v t,t+i在该区域仍然会有赋值。可以通过判断光流散度绝对值大于一定阈值的区域记为遮挡区域。由此可定义一个二值化的遮挡掩膜:
Figure PCTCN2020086094-appb-000014
其中,v t,t+i为带噪参考帧到t+i帧图像的光流场;M t,t+i为该光流场对应的遮挡掩膜;τ为设定阈值;div表示散度。
由此将N-1个光流场v t,t+i对应的遮挡掩膜求和取平均获得最终的遮挡掩膜M t。在计算损失时,用该掩膜来屏蔽遮挡部分参与损失函数的计算。此外,镜头变焦推拉、机位前后移动,物体运动等带来画面视野的变化,往往导致光流估计网络无法获得拥有有效边缘的光流场,由此获得的遮挡掩膜的图像边缘处取值总为0,故无法获得图像边缘处的损失,从而影响对图像边缘的去噪。因此,本发明将遮挡掩膜的边缘一定宽度内填充为1,由此可避免去噪图像边缘处的严重失真。由此L 2损失函数,可以表示为:
Figure PCTCN2020086094-appb-000015
对于卷积神经网络的训练方式,采用在线学***均,从而获得最终的去噪效果,既可以均衡迭代开始时的欠拟合以及后期的欠拟合,同时也消除训练过程中去噪效果浮动带来的影响,相比直接取一定次数迭代后的去噪图像,能够获得更优的去噪效果和更好的视觉效果。同时逐帧训练,可以有效解决视频采集过程中因环境、天气等变换而造成的噪声变化,实现了终生学习。
在本发明实施例中,对于光流估计和图像变换,如果去噪时不更新光流估计网络的参数或者使用传统的光流估计算法,可以在去噪进行之前,对整个视频序列进行逐帧光流估计和图像变换,将图像变换后获得的配准图和遮挡掩膜保存到计算机硬盘中,后续的去噪算法可以直接调用本地处理计算好的配准图像和遮挡掩膜,避免了去噪过程中可能对同一对图像重复进行光流估计和图像变换的情况,可以节省计算资源和时间。
对于多帧融合去噪部分,除了使用在线学习来对视频序列中的每帧图像进行依次去噪,也可以使用离线学习的方法,对整个视频序列进行逐帧多轮迭代训练,以整个视频序列训练一遍为一次迭代,通过逐帧多轮迭代进行卷积神经网络的权重更新,获得带噪参考帧对应的卷积神经网络,最后再利用带噪参考帧对应的卷积神经网络对整个视频序列(多帧带噪配准图像和带噪参考帧)进行测试,获得去噪视频序列。
对于多帧融合去噪部分,设计损失函数时可以不使用遮挡掩膜部分。由于使用了多帧图像进行融合,信息具有一定的冗余性,去噪网络也具有一定的鲁棒性,因此也可以获得相当甚至更优的去噪效果。
下面举例说明本发明方法的优点。
参照表1,使用不同算法对从Derf数据集中选取的7个视频序列进行去噪的PSNR量化指标对比。关于噪声序列的生成,首先把视频序列通过平均R、G、B三通道分量获得灰度图,再下采样2倍,确保视频序列中没有噪声。然后添加σ=25的高斯白噪声,再以为10的质量因子进行JPEG压缩,获得对应的噪声视频序列。表中对应了Ehret等 人提出了无监督的视频去噪的算法,Proposed-TVL1表示本发明方法中使用传统的TV-L1光流估计算法构建的视频去噪算法,Proposed-Flownet2表示本发明方法中使用基于深度学习的Flownet2网络构建的视频去噪。表中加粗表示当前视频中获得最高PNSR的算法。
表1
Figure PCTCN2020086094-appb-000016
可以看出本发明在7个视频中都获得较大的PSNR提升。
参考图4至图7分别表示Derf数据集中的station2视频序列中某帧图像的噪声图,以及分别使用VBM3D方法、Ehret等人提出了无监督的视频去噪的算法以及本发明处理的去噪图。所添加的噪声与表1中的噪声相同。从视觉效果上可以看出,即使本发明在未知噪声分布及噪声水平,并没有进行任何去噪的预训练时,就可以获得很好的去噪效果,图7中可以较为清晰的看到铁轨和高架电车线。而VBM3D在设定噪声水平为25的情况下,去噪的结果出现了多处的伪影(即图5),Ehret等人提出了无监督的视频去噪的算法(即图6)去噪的结果则过于模糊,失去了图像很多的细节信息。
可以看出,本发明可以显著提高视频去噪后图像细节清晰度,增强图像对于人眼的可识别性,改善图像主观质量,同时提高客观指标。
基于同一发明构思,本发明实施例中还提供了一种基于深度学习的视频盲去噪装置,如下面的实施例所述。由于基于深度学习的视频盲去噪装置解决问题的原理与基于深度学习的视频盲去噪方法相似,因此基于深度学习的视频盲去噪装置的实施可以参见基于深度学习的视频盲去噪方法的实施,重复之处不再赘述。以下所使用的,术语“单元”或者“模块”可以实现预定功能的软件和/或硬件的组合。尽管以下实施例所描述的装置较佳地以软件来实现,但是硬件,或者软件和硬件的组合的实现也是可能并被构想的。
图8是本发明实施例的基于深度学习的视频盲去噪装置的结构框图,如图8所示,包括:
光流估计模块02,用于从待去噪视频序列中取包含预设数量帧的视频序列,将视频序列的中间帧作为带噪参考帧,对带噪参考帧和视频序列中的其他每一帧对应的图像进行光流估计,获得多个两帧图像之间的光流场;
图像变换模块04,用于根据多个两帧图像之间的光流场,将视频序列中的其他每一帧对应的图像分别转换到带噪参考帧进行配准,获得多帧带噪配准图像;
多帧图像融合去噪模块06,用于基于卷积神经网络构建去噪网络,以多帧带噪配准图像作为卷积神经网络的输入,以带噪参考帧作为卷积神经网络的参考图像,利用nose2noise方法进行逐帧迭代训练和去噪,获得带噪参考帧对应的去噪图像。
在本发明实施例中,光流估计模块02具体用于:
利用光流估计网络,对带噪参考帧和视频序列中的其他每一帧对应的图像进行光流估计;
图像变换模块04具体用于:
通过空间变化网络将视频序列中的其他每一帧对应的图像分别转换到带噪参考帧进行配准,获得多帧带噪配准图像。
在本发明实施例中,光流估计模块02具体用于:
利用光流估计算法对带噪参考帧和视频序列中的其他每一帧对应的图像进行光流估计;
图像变换模块04具体用于:
利用图像处理算法,将视频序列中的其他每一帧对应的图像分别转换到带噪参考帧进行配准。
在本发明实施例中,所述光流估计网络的数量为预设数量减1个,所述预设数量减1个光流估计网络具有相同的网络结构和参数;
光流估计模块02具体用于:
带噪参考帧和视频序列中的其他一帧对应的图像作为一个光流估计网络的输入,经过光流估计获得一个两帧图像之间的光流场;
对于预设数量减1个光流估计网络,获得预设数量减1个两帧图像之间的光流场。
在本发明实施例中,光流估计模块02还用于:
在进行光流估计之前,对所述光流估计网络进行预训练,获得预训练模型。
在本发明实施例中,所述空间变化网络的数量为预设数量减1个;
图像变换模块04具体用于:
每个空间变化网络将视频序列中的其他一帧对应的图像转换到带噪参考帧进行配准,获得一个配准后的视频序列;
对于预设数量减1个空间变化网络,获得预设数量减1个配准后的视频序列。
在本发明实施例中,多帧图像融合去噪模块06具体用于:
将多帧带噪配准图像堆叠在一起组成多通道图像;
以多通道图像为卷积神经网络的输入,带噪参考帧作为卷积神经网络的参考图像,利用nose2noise方法进行逐帧迭代训练和去噪,将每帧图像的整个迭代训练过程中去噪网络的输出去噪图像进行求和取平均,获得最终带噪参考帧的去噪图像。
在本发明实施例中,所述卷积神经网络中的损失函数可以采用公式(1)。
在本发明实施例中,多帧图像融合去噪模块06还用于:
根据光流场确定光流散度;
将光流散度的绝对值和设定阈值进行比较,将光流散度的绝对值大于设定阈值的区域记为光流场的遮挡区域;
根据所述遮挡区域定义二值化的遮挡掩膜;
将多个两帧图像之间的光流场对应的二值化的遮挡掩膜求和去平均,获得最终遮挡掩膜;
根据所述最终遮挡掩膜确定卷积神经网络中的损失函数。
在本发明实施例中,按照如公式(2)定义二值化的遮挡掩膜。
在本发明实施例中,所述损失函数可以采用公式(3)。
本发明实施例还提供了一种计算机设备,包括存储器、处理器及存储在存储器上并可在处理器上运行的计算机程序,所述处理器执行所述计算机程序时实现上述所述方法。
本发明实施例还提供了一种计算机可读存储介质,所述计算机可读存储介质存储有执行上述所述方法的计算机程序。
综上所述,本发明提出的基于深度学习的视频盲去噪方法和装置具有如下优点:
利用光流估计、图像变化的方法将对视频序列的前后帧进行光流估计、图像配准,再通过noise2noise的训练思想,仅利用一个视频就可以进行零样本学***均策略,对去噪网络结果求和取平均,有效均衡在线学习过程中过拟合和欠拟合的问题,并稳定了网络输出的波动,获得更优去噪效果,提升了视频帧之间去噪效果的连贯性与一致性。
本领域内的技术人员应明白,本发明的实施例可提供为方法、***、或计算机程序产品。因此,本发明可采用完全硬件实施例、完全软件实施例、或结合软件和硬件方面的实施例的形式。而且,本发明可采用在一个或多个其中包含有计算机可用程序代码的计算机可用存储介质(包括但不限于磁盘存储器、CD-ROM、光学存储器等)上实施的计算机程序产品的形式。
本发明是参照根据本发明实施例的方法、设备(***)、和计算机程序产品的流程图和/或方框图来描述的。应理解可由计算机程序指令实现流程图和/或方框图中的每一流程和/或方框、以及流程图和/或方框图中的流程和/或方框的结合。可提供这些计算机程序指令到通用计算机、专用计算机、嵌入式处理机或其他可编程数据处理设备的处理器以产生一个机器,使得通过计算机或其他可编程数据处理设备的处理器执行的指令产生用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的装置。
这些计算机程序指令也可存储在能引导计算机或其他可编程数据处理设备以特定方式工作的计算机可读存储器中,使得存储在该计算机可读存储器中的指令产生包括指令装置的制造品,该指令装置实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能。
这些计算机程序指令也可装载到计算机或其他可编程数据处理设备上,使得在计算机或其他可编程设备上执行一系列操作步骤以产生计算机实现的处理,从而在计算机或其他可编程设备上执行的指令提供用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的步骤。
以上所述仅为本发明的优选实施例而已,并不用于限制本发明,对于本领域的技术人员来说,本发明实施例可以有各种更改和变化。凡在本发明的精神和原则之内,所作的任何修改、等同替换、改进等,均应包含在本发明的保护范围之内。

Claims (14)

  1. 一种基于深度学习的视频盲去噪方法,其特征在于,包括:
    从待去噪视频序列中取包含预设数量帧的视频序列,将视频序列的中间帧作为带噪参考帧,对带噪参考帧和视频序列中的其他每一帧对应的图像进行光流估计,获得多个两帧图像之间的光流场;
    根据多个两帧图像之间的光流场,将视频序列中的其他每一帧对应的图像分别转换到带噪参考帧进行配准,获得多帧带噪配准图像;
    基于卷积神经网络构建去噪网络,以多帧带噪配准图像作为卷积神经网络的输入,以带噪参考帧作为卷积神经网络的参考图像,利用nose2noise方法进行逐帧迭代训练和去噪,获得带噪参考帧对应的去噪图像。
  2. 如权利要求1所述的基于深度学习的视频盲去噪方法,其特征在于,对带噪参考帧和视频序列中的其他每一帧对应的图像进行光流估计,包括:
    利用光流估计网络,对带噪参考帧和视频序列中的其他每一帧对应的图像进行光流估计;
    将视频序列中的其他每一帧对应的图像分别转换到带噪参考帧进行配准,包括:
    通过空间变化网络将视频序列中的其他每一帧对应的图像分别转换到带噪参考帧进行配准,获得多帧带噪配准图像。
  3. 如权利要求1所述的基于深度学习的视频盲去噪方法,其特征在于,对带噪参考帧和视频序列中的其他每一帧对应的图像进行光流估计,包括:
    利用光流估计算法对带噪参考帧和视频序列中的其他每一帧对应的图像进行光流估计;
    将视频序列中的其他每一帧对应的图像分别转换到带噪参考帧进行配准,包括:
    利用图像处理算法,将视频序列中的其他每一帧对应的图像分别转换到带噪参考帧进行配准。
  4. 如权利要求2所述的基于深度学习的视频盲去噪方法,其特征在于,所述光流估计网络的数量为预设数量减1个,所述预设数量减1个光流估计网络具有相同的网络结构和参数;
    利用光流估计网络,对带噪参考帧和视频序列中的其他每一帧对应的图像进行光流估计,获得多个两帧图像之间的光流场,包括:
    带噪参考帧和视频序列中的其他一帧对应的图像作为一个光流估计网络的输入,经过光流估计获得一个两帧图像之间的光流场;
    对于预设数量减1个光流估计网络,获得预设数量减1个两帧图像之间的光流场。
  5. 如权利要求2所述的基于深度学习的视频盲去噪方法,其特征在于,在进行光流估计之前,还包括:
    对所述光流估计网络进行预训练,获得预训练模型。
  6. 如权利要求4所述的基于深度学习的视频盲去噪方法,其特征在于,所述空间变化网络的数量为预设数量减1个;
    通过空间变化网络将视频序列中的其他每一帧对应的图像分别转换到带噪参考帧进行配准,获得多帧带噪配准图像,包括:
    每个空间变化网络将视频序列中的其他一帧对应的图像转换到带噪参考帧进行配准,获得一个配准后的视频序列;
    对于预设数量减1个空间变化网络,获得预设数量减1个配准后的视频序列。
  7. 如权利要求1所述的基于深度学习的视频盲去噪方法,其特征在于,以多帧带噪配准图像作为卷积神经网络的输入,以带噪参考帧作为卷积神经网络的参考图像,利用nose2noise方法进行逐帧迭代训练和去噪,获得带噪参考帧对应的去噪图像,包括:
    将多帧带噪配准图像堆叠在一起组成多通道图像;
    以多通道图像为卷积神经网络的输入,带噪参考帧作为卷积神经网络的参考图像,利用nose2noise方法进行逐帧迭代训练和去噪,将每帧图像的整个迭代训练过程中去噪网络的输出去噪图像进行求和取平均,获得最终带噪参考帧的去噪图像。
  8. 如权利要求7所述的基于深度学习的视频盲去噪方法,其特征在于,所述卷积神经网络中的损失函数为:
    Figure PCTCN2020086094-appb-100001
    其中,L 2( )表示L 2损失函数;I t表示带噪参考帧,t表示该带噪参考帧在视频序列中为第t帧;
    Figure PCTCN2020086094-appb-100002
    表示经过空间转换后的多通道图像
    Figure PCTCN2020086094-appb-100003
    经过去噪网络后输出的去噪图像;x表示视频序列的像素点的位置。
  9. 如权利要求7所述的基于深度学习的视频盲去噪方法,其特征在于,还包括:
    根据光流场确定光流散度;
    将光流散度的绝对值和设定阈值进行比较,将光流散度的绝对值大于设定阈值的区域记为光流场的遮挡区域;
    根据所述遮挡区域定义二值化的遮挡掩膜;
    将多个两帧图像之间的光流场对应的二值化的遮挡掩膜求和去平均,获得最终遮挡掩膜;
    根据所述最终遮挡掩膜确定卷积神经网络中的损失函数。
  10. 如权利要求9所述的基于深度学习的视频盲去噪方法,其特征在于,按照如下方式定义二值化的遮挡掩膜:
    Figure PCTCN2020086094-appb-100004
    其中,v t,t+i为带噪参考帧到t+i帧图像的光流场;M t,t+i为该光流场对应的遮挡掩膜;τ为设定阈值;div表示散度,x表示视频序列的像素点的位置,i大于0表示其他帧相对于带噪参考帧是后i帧,i小于0表示其他帧相对于带噪参考帧是前i帧,故i的取值范围为[-(N-1)/2,(N-1)/2],N表示视频序列的帧数。
  11. 如权利要求10所述的基于深度学习的视频盲去噪方法,其特征在于,所述损失函数为:
    Figure PCTCN2020086094-appb-100005
    其中,L 2( )表示L 2损失函数;I t表示带噪参考帧,t表示该带噪参考帧在视频序列中为第t帧;
    Figure PCTCN2020086094-appb-100006
    表示经过空间转换后的多通道图像
    Figure PCTCN2020086094-appb-100007
    经过去噪网络后输出的去噪图像;x表示视频序列的像素点的位置;M t表示对多个光流场对应的遮挡掩膜求平均得到的遮挡掩膜。
  12. 一种基于深度学习的视频盲去噪装置,其特征在于,包括:
    光流估计模块,用于从待去噪视频序列中取包含预设数量帧的视频序列,将视频序列的中间帧作为带噪参考帧,对带噪参考帧和视频序列中的其他每一帧对应的图像进行光流估计,获得多个两帧图像之间的光流场;
    图像变换模块,用于根据多个两帧图像之间的光流场,将视频序列中的其他每一帧对应的图像分别转换到带噪参考帧进行配准,获得多帧带噪配准图像;
    多帧图像融合去噪模块,用于基于卷积神经网络构建去噪网络,以多帧带噪配准图像作为卷积神经网络的输入,以带噪参考帧作为卷积神经网络的参考图像,利用nose2noise方法进行逐帧迭代训练和去噪,获得带噪参考帧对应的去噪图像。
  13. 一种计算机设备,包括存储器、处理器及存储在存储器上并可在处理器上运行的计算机程序,其特征在于,所述处理器执行所述计算机程序时实现权利要求1至11任一项所述方法。
  14. 一种计算机可读存储介质,其特征在于,所述计算机可读存储介质存储有执行权利要求1至11任一项所述方法的计算机程序。
PCT/CN2020/086094 2020-04-15 2020-04-22 基于深度学习的视频盲去噪方法及装置 WO2021208122A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202010294520.3A CN111539879B (zh) 2020-04-15 2020-04-15 基于深度学习的视频盲去噪方法及装置
CN202010294520.3 2020-04-15

Publications (1)

Publication Number Publication Date
WO2021208122A1 true WO2021208122A1 (zh) 2021-10-21

Family

ID=71976774

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/086094 WO2021208122A1 (zh) 2020-04-15 2020-04-22 基于深度学习的视频盲去噪方法及装置

Country Status (3)

Country Link
US (1) US11216914B2 (zh)
CN (1) CN111539879B (zh)
WO (1) WO2021208122A1 (zh)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113962964A (zh) * 2021-10-25 2022-01-21 北京影谱科技股份有限公司 基于时序图像数据的指定对象擦除方法及装置
US20220029665A1 (en) * 2020-07-27 2022-01-27 Electronics And Telecommunications Research Institute Deep learning based beamforming method and apparatus

Families Citing this family (20)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
GB2541179B (en) 2015-07-31 2019-10-30 Imagination Tech Ltd Denoising filter
US11842460B1 (en) * 2020-06-19 2023-12-12 Apple Inc. Burst image fusion and denoising using end-to-end deep neural networks
US20220156946A1 (en) * 2020-11-13 2022-05-19 Qualcomm Incorporated Supervised learning and occlusion masking for optical flow estimation
CN112488943B (zh) * 2020-12-02 2024-02-02 北京字跳网络技术有限公司 模型训练和图像去雾方法、装置、设备
CN113159019B (zh) * 2021-03-08 2022-11-08 北京理工大学 一种基于光流变换的暗光视频增强方法
CN113034401B (zh) * 2021-04-08 2022-09-06 中国科学技术大学 视频去噪方法及装置、存储介质及电子设备
CN113591761B (zh) * 2021-08-09 2023-06-06 成都华栖云科技有限公司 一种视频镜头语言识别方法
CN114339409B (zh) * 2021-12-09 2023-06-20 腾讯科技(上海)有限公司 视频处理方法、装置、计算机设备及存储介质
CN114485417B (zh) * 2022-01-07 2022-12-13 哈尔滨工业大学 一种结构振动位移识别方法及***
CN114529815A (zh) * 2022-02-10 2022-05-24 中山大学 一种基于深度学习的流量检测方法、装置、介质及终端
CN114626445B (zh) * 2022-02-28 2024-04-09 四川省水利科学研究院 基于光流网络与高斯背景建模的大坝白蚁视频识别方法
CN114529479B (zh) * 2022-03-02 2024-06-18 北京大学 一种基于互信息损失函数的无监督一锅式多帧图像去噪方法
CN114972061B (zh) * 2022-04-04 2024-05-31 北京理工大学 一种暗光视频去噪增强方法及***
CN115267899B (zh) * 2022-08-15 2024-01-12 河北地质大学 基于边界保持的DnCNN混合震源地震数据分离方法和***
US20240071035A1 (en) * 2022-08-26 2024-02-29 Samsung Electronics Co., Ltd. Efficient flow-guided multi-frame de-fencing
CN115482162B (zh) * 2022-09-02 2023-10-24 华南理工大学 一种基于随机重排和无标签模型的隐式图像盲去噪方法
CN116347248B (zh) * 2022-11-18 2024-02-06 上海玄戒技术有限公司 图像获取方法和装置、电子设备、介质及芯片
CN115880340B (zh) * 2023-02-03 2023-07-14 清华大学 小鼠行为分析方法、装置及电子设备
CN116823973B (zh) * 2023-08-25 2023-11-21 湖南快乐阳光互动娱乐传媒有限公司 一种黑白视频上色方法、装置及计算机可读介质
CN117422627B (zh) * 2023-12-18 2024-02-20 卓世科技(海南)有限公司 基于图像处理的ai仿真教学方法及***

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106331433A (zh) * 2016-08-25 2017-01-11 上海交通大学 基于深度递归神经网络的视频去噪方法
US20170372479A1 (en) * 2016-06-23 2017-12-28 Intel Corporation Segmentation of objects in videos using color and depth information
CN108257105A (zh) * 2018-01-29 2018-07-06 南华大学 一种针对视频图像的光流估计与去噪联合学习深度网络模型
CN109147331A (zh) * 2018-10-11 2019-01-04 青岛大学 一种基于计算机视觉的道路拥堵状态检测方法
CN109816692A (zh) * 2019-01-11 2019-05-28 南京理工大学 一种基于Camshift算法的运动目标跟踪方法
CN110351511A (zh) * 2019-06-28 2019-10-18 上海交通大学 基于场景深度估计的视频帧率上变换***及方法
CN110852961A (zh) * 2019-10-28 2020-02-28 北京影谱科技股份有限公司 一种基于卷积神经网络的实时视频去噪方法及***
CN110855998A (zh) * 2018-08-20 2020-02-28 华为技术有限公司 融合候选者列表构建方法、装置及的编/解方法及装置
CN110992381A (zh) * 2019-12-17 2020-04-10 嘉兴学院 一种基于改进Vibe+算法的运动目标背景分割方法

Family Cites Families (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9311690B2 (en) * 2014-03-11 2016-04-12 Adobe Systems Incorporated Video denoising using optical flow
US9449374B2 (en) * 2014-03-17 2016-09-20 Qualcomm Incoporated System and method for multi-frame temporal de-noising using image alignment
US10755395B2 (en) * 2015-11-27 2020-08-25 Canon Medical Systems Corporation Dynamic image denoising using a sparse representation
US10289951B2 (en) * 2016-11-02 2019-05-14 Adobe Inc. Video deblurring using neural networks
CN106934769A (zh) * 2017-01-23 2017-07-07 武汉理工大学 基于近景遥感的去运动模糊方法
US10841549B2 (en) * 2019-03-22 2020-11-17 Qualcomm Incorporated Methods and apparatus to facilitate enhancing the quality of video
US11151694B2 (en) * 2019-05-15 2021-10-19 Gopro, Inc. Method and apparatus for convolutional neural network-based video denoising

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20170372479A1 (en) * 2016-06-23 2017-12-28 Intel Corporation Segmentation of objects in videos using color and depth information
CN106331433A (zh) * 2016-08-25 2017-01-11 上海交通大学 基于深度递归神经网络的视频去噪方法
CN108257105A (zh) * 2018-01-29 2018-07-06 南华大学 一种针对视频图像的光流估计与去噪联合学习深度网络模型
CN110855998A (zh) * 2018-08-20 2020-02-28 华为技术有限公司 融合候选者列表构建方法、装置及的编/解方法及装置
CN109147331A (zh) * 2018-10-11 2019-01-04 青岛大学 一种基于计算机视觉的道路拥堵状态检测方法
CN109816692A (zh) * 2019-01-11 2019-05-28 南京理工大学 一种基于Camshift算法的运动目标跟踪方法
CN110351511A (zh) * 2019-06-28 2019-10-18 上海交通大学 基于场景深度估计的视频帧率上变换***及方法
CN110852961A (zh) * 2019-10-28 2020-02-28 北京影谱科技股份有限公司 一种基于卷积神经网络的实时视频去噪方法及***
CN110992381A (zh) * 2019-12-17 2020-04-10 嘉兴学院 一种基于改进Vibe+算法的运动目标背景分割方法

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20220029665A1 (en) * 2020-07-27 2022-01-27 Electronics And Telecommunications Research Institute Deep learning based beamforming method and apparatus
US11742901B2 (en) * 2020-07-27 2023-08-29 Electronics And Telecommunications Research Institute Deep learning based beamforming method and apparatus
CN113962964A (zh) * 2021-10-25 2022-01-21 北京影谱科技股份有限公司 基于时序图像数据的指定对象擦除方法及装置

Also Published As

Publication number Publication date
CN111539879B (zh) 2023-04-14
CN111539879A (zh) 2020-08-14
US11216914B2 (en) 2022-01-04
US20210327031A1 (en) 2021-10-21

Similar Documents

Publication Publication Date Title
WO2021208122A1 (zh) 基于深度学习的视频盲去噪方法及装置
Godard et al. Deep burst denoising
CN111667442B (zh) 一种基于事件相机的高质量高帧率图像重建方法
Wang et al. Joint filtering of intensity images and neuromorphic events for high-resolution noise-robust imaging
CN111091503B (zh) 基于深度学习的图像去失焦模糊方法
CN111028150B (zh) 一种快速时空残差注意力视频超分辨率重建方法
Tran et al. GAN-based noise model for denoising real images
TW202134997A (zh) 用於對影像進行去雜訊的方法、用於擴充影像資料集的方法、以及使用者設備
CN111462019A (zh) 基于深度神经网络参数估计的图像去模糊方法及***
CN112164011A (zh) 基于自适应残差与递归交叉注意力的运动图像去模糊方法
Gao et al. Atmospheric turbulence removal using convolutional neural network
Zhang et al. Deep motion blur removal using noisy/blurry image pairs
Fan et al. Multiscale cross-connected dehazing network with scene depth fusion
Chen et al. Image denoising via deep network based on edge enhancement
CN113724155A (zh) 用于自监督单目深度估计的自提升学习方法、装置及设备
CN112509144A (zh) 人脸图像处理方法、装置、电子设备及存储介质
Wang et al. An improved image blind deblurring based on dark channel prior
Xin et al. Video face super-resolution with motion-adaptive feedback cell
CN114494050A (zh) 一种基于事件相机的自监督视频去模糊和图像插帧方法
Shen et al. Spatial temporal video enhancement using alternating exposures
Yue et al. Rvideformer: Efficient raw video denoising transformer with a larger benchmark dataset
CN116523790A (zh) 一种sar图像去噪优化方法、***和存储介质
Cai et al. Real-time super-resolution for real-world images on mobile devices
CN117044215A (zh) 用于低光照媒体增强的方法和***
Anand et al. Hdrvideo-gan: deep generative hdr video reconstruction

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20931438

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20931438

Country of ref document: EP

Kind code of ref document: A1

122 Ep: pct application non-entry in european phase

Ref document number: 20931438

Country of ref document: EP

Kind code of ref document: A1