WO2023125440A1 - 一种降噪方法、装置、电子设备及介质 - Google Patents

一种降噪方法、装置、电子设备及介质 Download PDF

Info

Publication number
WO2023125440A1
WO2023125440A1 PCT/CN2022/142016 CN2022142016W WO2023125440A1 WO 2023125440 A1 WO2023125440 A1 WO 2023125440A1 CN 2022142016 W CN2022142016 W CN 2022142016W WO 2023125440 A1 WO2023125440 A1 WO 2023125440A1
Authority
WO
WIPO (PCT)
Prior art keywords
image
noise
images
free
data
Prior art date
Application number
PCT/CN2022/142016
Other languages
English (en)
French (fr)
Inventor
朱才志
王林
周晓
汝佩哲
孙耀晖
Original Assignee
英特灵达信息技术(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 英特灵达信息技术(深圳)有限公司 filed Critical 英特灵达信息技术(深圳)有限公司
Publication of WO2023125440A1 publication Critical patent/WO2023125440A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T5/00Image enhancement or restoration
    • G06T5/70Denoising; Smoothing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T5/00Image enhancement or restoration
    • G06T5/60Image enhancement or restoration using machine learning, e.g. neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]

Definitions

  • the present application relates to the technical field of image processing, in particular to a noise reduction method, device, electronic equipment and media.
  • the noise reduction of images or videos mainly adopts a multi-frame noise reduction method, that is, several frames of images before and after the image to be denoised are obtained, and a method of superimposing the average value of the obtained multi-frame images is used for noise reduction.
  • the shooting equipment may shake, and the object being shot may also be a moving object, so this noise reduction method of superimposing the average value of multiple frames of images will result in a noise-reduced image or video ghosting exists in .
  • the purpose of the embodiments of the present application is to provide a noise reduction method, device, electronic equipment, and medium, so as to solve the problem of ghosting in video or image noise reduction.
  • the specific technical scheme is as follows:
  • the embodiment of the present application provides a noise reduction method, including:
  • the image noise reduction model is a model obtained by training the convolutional neural network model based on the preset training set, and the preset training set includes multiple sets of labeled data and each set of labeled data corresponds to sample data; wherein, a set of annotation data includes multiple noise-free images obtained after simulated motion processing on a reference noise-free image, and the sample data corresponding to the set of annotation data includes superimposing the multiple noise-free images respectively An image obtained after noise reduction; acquiring noise-reduced image data output by the image noise reduction model; converting the image data into an image to obtain a noise-reduced image corresponding to the image to be processed.
  • the preset training set is obtained through the following steps:
  • Collect multiple reference noise-free images for each reference noise-free image, intercept multiple images with the same shape, the same area, overlapping areas and not identical from the reference noise-free image, and use the intercepted images as a set of annotations data; respectively superimposing noises on multiple images included in each group of labeled data to obtain sample data corresponding to each group of labeled data; generating the preset training set from the obtained multiple groups of labeled data and the sample data corresponding to each group of labeled data.
  • the preset training set is obtained through the following steps:
  • Collect multiple reference noise-free images select two images from the multiple reference noise-free images each time as a foreground image and a background image; crop the background image according to the first specified shape and the first specified size; Crop the foreground image with the first specified shape and the second specified size, and perform mask processing on the cropped foreground image to obtain a second specified shape of the foreground image; superimpose the cropped foreground image with the second specified shape
  • the starting position on the background image, the foreground image of the second specified shape is translated on the background image in a preset direction at a preset speed, and the images at multiple moments during the translation process are used as a set of annotation data ; Superimposing noises on multiple images included in each group of labeled data to obtain sample data corresponding to each group of labeled data; generating the preset training set from the obtained multiple groups of labeled data and the sample data corresponding to each group of labeled data.
  • the method before generating the preset training set from the multiple sets of labeled data to be obtained and the sample data corresponding to each set of labeled data, the method further includes: Selecting two images from the reference noise-free image as a foreground image and a background image respectively; cropping the background image according to the first specified shape and the first specified size; cropping the foreground image according to the first specified shape and the second specified size , performing mask processing on the cropped foreground image to obtain a foreground image of a second specified shape, the second specified size being smaller than the first specified size; superimposing the foreground image of the second specified shape on the background image Translate the foreground image of the second specified shape on the background image in a preset direction at a preset speed, and collect images at multiple moments during the translation process as a set of annotation data.
  • each reference noise-free image a plurality of images with the same shape, the same area, overlapping areas and not identical images are intercepted from the reference noise-free image, and the extracted images are used as A set of annotation data, including: for each reference noise-free image, a position is randomly selected from the reference noise-free image as the starting cropping coordinate; A square is used as a cropping image; the initial cropping coordinates are randomly shifted to obtain the next cropping coordinates, and the square with the specified side length is cropped at the next cropping coordinates of the reference noise-free image as a A cropped image; each pair of cropping coordinates is randomly shifted once, and a cropped image is cut out from the reference noise-free image until a preset number of cropped images is obtained, and the preset number of cropped images are taken as a group Annotate the data.
  • the collecting multiple reference noise-free images includes:
  • the image denoising model is trained through the following steps: after splicing the first frame image in a set of sample data included in the preset training set with itself, the spliced The image is input to the convolutional neural network model, and the spliced image is processed by the first convolutional network and the second convolutional network included in the convolutional neural network model in turn; the intermediate descending output of the first convolutional network is obtained.
  • Noise image and the final noise-reduced image output by the second convolutional network calculating a loss function value based on the final noise-reduced image and the first frame image; judging the convolutional neural network model according to the loss function value Whether to converge; If not, adjust the parameters of the convolutional neural network model based on the loss function value, and output the next frame image in the set of sample data with the latest output of the first convolutional network
  • the intermediate noise-reduced image is spliced, and returns to the step of inputting the spliced image into the convolutional neural network model until the convolutional neural network model converges, then the trained convolutional neural network model is used as the The image noise reduction model described above.
  • the embodiment of the present application provides a noise reduction device, including:
  • the first input module is used to input the image to be processed into the image noise reduction model
  • the image noise reduction model is a model obtained by training the convolutional neural network model based on the preset training set, and the preset training set includes multiple sets of labels data and the sample data corresponding to each set of labeled data
  • a set of labeled data includes multiple noise-free images obtained by performing simulated motion processing on a reference noise-free image
  • the sample data corresponding to the set of labeled data includes the An image obtained by superimposing noise on multiple noise-free images respectively
  • a first acquisition module configured to acquire denoised image data output by the image denoising model
  • a conversion module configured to convert the image data into an image, A noise-reduced image corresponding to the image to be processed is obtained.
  • the device further includes: an acquisition module, configured to acquire multiple reference noise-free images; an intercepting module, configured to intercept multiple reference noise-free images for each reference noise-free image For images with the same shape, the same area, overlapping regions and not identical images, the intercepted images are used as a set of annotation data; the first superposition module is used to superimpose noise on multiple images included in each set of annotation data, and obtain each The sample data corresponding to the group of labeled data; the first generation module is used to generate the preset training set from the obtained multiple groups of labeled data and the sample data corresponding to each group of labeled data.
  • the device further includes: an acquisition module, configured to acquire multiple reference noise-free images; a first selection module, configured to select two from the multiple reference noise-free images each time The images are respectively used as the foreground image and the background image; the first cropping module is used to cut the background image according to the first designated shape and the first designated size; the first processing module is used to cut the background image according to the first designated shape and the second designated size Crop the foreground image in size, and perform mask processing on the cropped foreground image to obtain a foreground image of a second specified shape; a first translation module is used to superimpose the foreground image of the second specified shape on the cropped background The starting position on the image, the foreground image of the second specified shape is translated on the background image in a preset direction according to a preset speed, and the images at multiple moments during the translation process are used as a set of label data; the first The second superimposition module is used to superimpose noise on multiple images included in each group of labeled data to obtain
  • the device further includes: a second selection module, configured to select two images from the plurality of reference noise-free images each time as the foreground image and the background image; the second cropping module , for cropping the background image according to a first specified shape and a first specified size; a second processing module, for cropping the foreground image according to the first specified shape and a second specified size, and for the cropped foreground image Perform mask processing to obtain a foreground image of a second specified shape, the second specified size is smaller than the first specified size; a second translation module is used to superimpose the foreground image of the second specified shape on the background image Translate the foreground image of the second specified shape on the background image in a preset direction at a preset speed, and collect images at multiple moments during the translation process as a set of annotation data.
  • a second selection module configured to select two images from the plurality of reference noise-free images each time as the foreground image and the background image
  • the second cropping module for cropping the background image according to a
  • the intercepting module is specifically configured to: for each reference noise-free image, randomly select a position from the reference noise-free image as the initial cropping coordinate; Crop a square with a specified side length at the initial cropping coordinates of , as a cropping image; random offset the initial cropping coordinates to obtain the next cropping coordinates, at the next cropping coordinates of the reference noise-free image Crop the square with the specified side length as a cropped image; each pair of cropping coordinates is randomly shifted once, and a cropped image is cropped from the reference noise-free image until a preset number of cropped images is obtained.
  • the preset number of cropped images is used as a set of annotation data.
  • the collection module is specifically configured to: collect a plurality of static RAW images captured by a shooting device whose sensitivity is set to the lowest value; for each static RAW image, the static RAW image They are respectively processed into different brightness values to obtain multiple benchmark noise-free images.
  • the device further includes: a second input module, configured to splice the first frame image in a set of sample data included in the preset training set with itself, and splice the spliced
  • a second input module configured to splice the first frame image in a set of sample data included in the preset training set with itself, and splice the spliced
  • the image of the convolutional neural network is input into the convolutional neural network model, and the spliced image is processed by the first convolutional network and the second convolutional network included in the convolutional neural network model in turn; the second acquisition module is used to acquire the first convolutional neural network.
  • the trained convolutional neural network model is used as the image noise reduction model.
  • the embodiment of the present application provides an electronic device, which is characterized in that it includes a processor, a communication interface, a memory, and a communication bus, wherein the processor, the communication interface, and the memory complete communication with each other through the communication bus; the memory , for storing a computer program; the processor, for executing the program stored in the memory, to implement the method steps described in any one of the above first aspects.
  • an embodiment of the present application provides a computer-readable storage medium, wherein a computer program is stored in the computer-readable storage medium, and when the computer program is executed by a processor, any of the above-mentioned first aspects can be implemented. - said method steps.
  • the embodiment of the present application further provides a computer program product including instructions, which, when run on a computer, causes the computer to execute any one of the noise reduction methods described above.
  • the image to be processed can be denoised through the image denoising model, and the image denoising model is trained based on the preset training set, wherein the labeled data in the preset training set is simulated motion processing of the reference noise-free image
  • the sample data corresponding to the labeled data includes images obtained by superimposing noise on the multiple noise-free images. It can be seen that the sample data and labeled data simulate image motion, so the image denoising model can learn how to denoise multiple frames of images with noise and motion during the training process, so that the denoised image is closer to its corresponding no-noise image. noisy image, thus reducing ghosting in the denoised image.
  • the image denoising model trained by using the preset training set has de-ghosting effect during denoising, so that the image obtained by denoising has no ghosting and is clearer.
  • any product or method of the present application does not necessarily need to achieve all the above-mentioned advantages at the same time.
  • FIG. 1 is a flow chart of a noise reduction method provided in an embodiment of the present application
  • FIG. 2 is a flow chart of another noise reduction method provided by the embodiment of the present application.
  • FIG. 3 is an exemplary schematic diagram of an image interception method provided by an embodiment of the present application.
  • FIG. 4 is a flow chart of another noise reduction method provided by the embodiment of the present application.
  • FIG. 5 is an exemplary schematic diagram of an image mask processing provided by an embodiment of the present application.
  • FIG. 6 is an exemplary schematic diagram of image translation provided by an embodiment of the present application.
  • FIG. 7 is a flow chart of another noise reduction method provided by the embodiment of the present application.
  • FIG. 8 is a flow chart of another noise reduction method provided by the embodiment of the present application.
  • FIG. 9 is a structural diagram of a convolutional neural network provided by an embodiment of the present application.
  • FIG. 10 is an exemplary schematic diagram of a comparison of image noise reduction effects provided by an embodiment of the present application.
  • FIG. 11 is a schematic structural diagram of a noise reduction device provided by an embodiment of the present application.
  • FIG. 12 is a schematic structural diagram of an electronic device provided by an embodiment of the present application.
  • an embodiment of the present application provides a noise reduction method, which can be applied to electronic devices, such as desktop computers, servers, tablet computers, or mobile phones that have image processing capabilities.
  • electronic devices such as desktop computers, servers, tablet computers, or mobile phones that have image processing capabilities.
  • the method includes the following steps:
  • the image denoising model is a model obtained by training a convolutional neural network model based on a preset training set, and the preset training set includes multiple sets of labeled data and sample data corresponding to each set of labeled data; wherein, a set of labeled data includes a pair of Multiple noise-free images obtained by performing simulated motion processing on a reference noise-free image, the sample data corresponding to this group of labeled data includes images obtained by superimposing noise on multiple noise-free images respectively.
  • a set of labeled data corresponds to a set of sample data, and there is a one-to-one correspondence between the noise-free images in a set of labeled data and the noisy images in a set of sample data.
  • the image to be processed may be an image that needs to be denoised, specifically a photo, or each frame of an image in a video.
  • the image to be processed can also be multiple images collected continuously, or multiple consecutive video frames in the video; when inputting the image noise reduction model, multiple frames of images can be input into the image noise reduction model in sequence according to the image acquisition sequence, or, according to The video frames are arranged sequentially, and the multiple video frames are respectively input into the image noise reduction model, and the noise reduction image data output by the image noise reduction model for each input image is obtained.
  • the image data output by the noise reduction model is the original image format (RAW Image Format, RAW) data of the image after noise reduction.
  • RAW RaW Image Format
  • the image data in order to present the best visual effect to the user, the image data needs to be converted into a standard red green blue (standard Red Green Blue, SRGB) image, for example, through an image signal processing (Image Signal Processor, ISP) algorithm
  • SRGB standard Red Green Blue
  • ISP Image Signal Processor
  • each video frame of the video frame can be denoised respectively, so as to obtain a denoised video. That is, each video frame of the video can be used as an image to be processed, and noise reduction is performed on each image to be processed by the method in FIG. 1 , so as to obtain a video after noise reduction.
  • the image to be processed can be denoised through the image denoising model, and the image denoising model is trained based on the preset training set, wherein the labeled data in the preset training set is simulated motion processing of the reference noise-free image
  • the sample data corresponding to the labeled data includes images obtained by superimposing noise on the multiple noise-free images.
  • the sample data and labeled data simulate image motion, so the image denoising model can learn how to denoise multiple frames of images with noise and motion during the training process, so that the denoised image is closer to its corresponding no-noise image. noisysy image, thus reducing ghosting in the denoised image. Therefore, the image denoising model trained by using the preset training set has de-ghosting effect during denoising, so that the image obtained by denoising has no ghosting and is clearer.
  • the preset training set is obtained through the following steps:
  • this step can specifically be implemented as: collecting multiple static RAW images captured by a shooting device whose sensitivity value is set to the lowest value; for each static RAW image, processing the static RAW image into different brightness values, Get multiple benchmark noise-free images.
  • the sensitivity in the embodiments of the present application may be the sensitivity specified by the International Standards Organization (ISO).
  • ISO International Standards Organization
  • the photographing device may be a device capable of photographing images such as a smart phone or a digital camera, which is not specifically limited in this embodiment of the present application.
  • the photographing device and the foregoing electronic device may be the same device or different devices, which is not specifically limited in this embodiment of the present application.
  • multiple static RAW images at a fixed brightness may be captured, and then each static RAW image may be processed into different brightness values to obtain multiple reference noise-free images.
  • the benchmark noise-free image under various brightnesses can be obtained by the following formula:
  • gt 0 is the data of the static RAW image obtained by shooting the image at a fixed brightness
  • ratio i is the brightness magnification
  • ratio i can be an integer value such as 1, 2 or 3, and can be set according to actual needs
  • gt i is the value after setting the brightness value The resulting baseline noise-free image.
  • the RAW domain data of the image has a linear relationship with the incident light intensity, and the RAW domain noise can be modeled based on the physical imaging process, so the RAW domain data based on the image can be well trained for the convolutional neural network model using supervised learning, thereby Get the image noise reduction model.
  • the shape and area of multiple intercepted images are the same, and there is an overlapping area between every two intercepted images, and the area of the overlapping area is smaller than the area of a single intercepted image.
  • a position may be randomly selected from the reference noise-free image as the initial cropping coordinates.
  • the specified side length can be expressed as PATCH_SIZE
  • PATCH_SIZE is the input size of the convolutional neural network kernel of the above-mentioned image noise reduction model.
  • the initial cropping coordinates are randomly shifted to obtain the next cropping coordinates, and a square with a specified side length is cropped at the next cropping coordinates of the reference noise-free image as a cropped image.
  • the initial clipping coordinates are (x 0 , y 0 ), and the subsequent clipping coordinates are ( xi , y i ), then the subsequent clipping coordinates are calculated by the following formula:
  • random_x represents the random offset of the abscissa x 0
  • random_y represents the random offset of the ordinate y 0
  • the random offset can be set by the technician.
  • the value range of the abscissa and ordinate of the initial cropping coordinates (x 0 , y 0 ) is :
  • RANGE is the maximum value of the preset random offset, for example, RANGE may be 100.
  • Each pair of cropping coordinates is randomly shifted, and a cropped image is cropped from the reference noise-free image until a preset number of cropped images is obtained, and the preset number of cropped images is used as a set of label data.
  • Figure 3 is a schematic diagram of cropping an image by the above method, the outermost rectangle in Figure 3 represents the reference noise-free image, and each square of the same size in the reference noise-free image is a cropped image, from It can be seen from Figure 3 that in the multiple cropped images in Figure 3, there is an overlapping area between each two images, but they do not completely overlap, which is equivalent to simulating multiple images captured when the shooting device is shaking Or continuous video frames, the method can be used to obtain images that can simulate the real motion scene of the shooting device.
  • noise i possion(gt i )+gaussian
  • noise i represents the RAW data of the image after superimposed noise
  • gt i represents the RAW data of the image without superimposed noise
  • the function possion() represents superimposed Poisson noise
  • gaussian represents superimposed Gaussian noise
  • multiple reference noise-free images can be collected in advance, and multiple images with the same shape, the same area, overlapping areas and not identical can be cut from the reference noise-free images, and the cut-out images can be used as a set of annotation data , to simulate the shaking of the shooting device when shooting images or videos, and then further superimpose noise processing on multiple labeled data, which is equivalent to obtaining a noisy video frame sequence captured when the shooting device is shaken, and there will be noise
  • the video frame sequence is used as sample data to train the convolutional neural network, which is equivalent to considering the camera shake in the training model process, that is, considering the image motion, so that the trained noise reduction model can reduce the video or image. Noise processing, and there will be no ghosting in the video or image after noise reduction, which improves the noise reduction effect compared with the existing technology.
  • the preset training set is obtained through the following steps:
  • Two images may be randomly selected from multiple reference noise-free images, or two images may be selected in a certain order, which is not limited in this embodiment of the present application.
  • One of the selected images can be used as the foreground image and the other image as the background image.
  • the first specified shape can be a rectangle, specifically, it can be a square, and correspondingly, the first specified size is the side length of a square, and the side length can be PATCH_SIZE, and PATCH_SIZE is the input size of the convolutional neural network kernel of the above-mentioned image noise reduction model .
  • a background image can be cropped to a square image with side length PATCH_SIZE.
  • the second size is much smaller than the first size, assuming that the length and width of the cropped foreground image are w and h respectively, then w and h are much smaller than PATCH_SIZE.
  • the second designated shape may be a rectangle, an ellipse, a rhombus, etc., or may be an irregular shape, such as a combination of multiple shapes such as a rectangle, an ellipse, and a rhombus.
  • the cropped foreground image is rectangular, in order to simulate the shape of the moving object, the cropped image needs to be masked to obtain an irregular foreground image .
  • the position with the number "1" in Figure 5 is the pixel retained in the foreground image after mask processing, and the position with the number "0" is set to be transparent, which is equivalent to the naked eye after mask processing. Go to the area with the number "1" in Figure 5.
  • the mask processing makes the foreground image closer to the shape of the real object, so that the noise reduction model trained by the preset training set can perform better on the edge noise reduction processing of the object in the image.
  • the foreground image of the second specified shape may be translated on the cropped background image, and the image at each moment during the translation process is a superimposed image of the foreground image of the second specified shape and the cropped background image.
  • the position ( xi , y i ) of the subsequent image on the background image after translation can be calculated by the following formula:
  • v represents the preset speed
  • v is a random integer between 0 and 40
  • i represents the moving moment
  • the position (x i , y i ) of the foreground image needs to meet the following conditions:
  • PATCH_SIZE is the side length of the background image
  • w and h represent the width and height of the foreground image respectively.
  • the background image in FIG. 6 is image A.
  • the foreground image is image B, and image B is translated within image A according to the above method.
  • a set of labeled data can be obtained every time S402-S405 is executed, and noise can be superimposed on multiple images included in the set of labeled data each time a set of labeled data is obtained.
  • Data and a preset training set of sample data corresponding to each set of labeled data can perform good noise reduction on images or videos captured by the camera including moving objects without ghosting.
  • the preset training set can also be obtained through the following steps:
  • S701-S702 are consistent with S201-S202, and reference may be made to relevant descriptions in S201-S202 above, which will not be repeated here.
  • the annotation data processed in this step includes the annotation data obtained in S702, and also includes the annotation data obtained in S706.
  • the embodiment of the present application provides two kinds of simulated motion scenes, namely, the scene where the photographed object is still while the photographing device shakes, and the scene where the photographed object moves when the photographing device is still, and the preset training set can be one of the sports
  • the scene can be obtained by simulating, or it can be obtained by simulating both kinds of motion scenes.
  • multiple images with the same shape, the same area, overlapping areas and not exactly the same are intercepted from the collected reference noise-free image, and the intercepted images are used as a set of labeled data, which is equivalent to simulating the shaking of the shooting equipment and The scene where the object to be photographed is still; two images are selected from multiple reference noise-free images as the foreground image and the background image, and the foreground image is translated on the background image, which is equivalent to simulating the scene where the shooting device is still and the object is moving.
  • the final preset training set is obtained by simulating two kinds of sports scenes, and then, the image denoising model trained using the preset training set can better handle the images captured by the shooting device under various sports conditions Or video, that is, the image denoising model can learn how to denoise noisy multi-frame images in various scenes during training, so that the denoised image is closer to its corresponding noise-free image, thus reducing It eliminates the phenomenon of ghosting in the image after noise reduction, and improves the ability of noise reduction and removal of ghosting.
  • the image denoising model is trained through the following steps:
  • a set of sample data can be regarded as a sequence of video frames, and the first image in the set of sample data is the first frame image.
  • the convolutional neural network model includes a first convolutional network and a second convolutional network.
  • the first convolutional network includes at least a convolutional activation layer and a convolutional layer
  • the second convolutional network includes at least a convolutional activation layer and a convolutional layer.
  • the embodiment of the present application does not limit the specific structures of the first convolutional network and the second convolutional network.
  • the first convolutional network will output an intermediate denoised image when processing the current frame image, that is, an image obtained after partially denoising the current frame image.
  • the final noise-reduced image is the image obtained by the current frame through the first and second two convolutional network noise-reduction processes.
  • frame[t] represents the t-th frame image in a set of labeled data
  • fused_frame[t-1] represents the intermediate noise-reduced image of the t-1-th frame image
  • frame[t] and fused_frame[t -1] Splicing input into the first convolutional network to obtain the intermediate noise reduction image fused_frame[t] of the t-th frame image, and then input the fused_frame[t] into the second convolutional network, and then compare the output of the second convolutional network with The fused_frame[t] is added (add), and finally the output result output[t] is obtained, which is the final denoised image after denoising the frame[t].
  • the convolutional neural network model When inputting the image frame[t+1] of the t+1th frame in the set of labeled data into the convolutional neural network model, it is necessary to splice fused_frame[t] and frame[t+1], and input the splicing result to
  • the first convolutional network obtains the intermediate noise reduction image fused_frame[t+1] of the t+1th frame image, and then inputs the fused_frame[t+1] into the second convolutional network, and then the output of the second convolutional network Add it to fused_frame[t+1] to get the final output result output[t+1], which is the final denoised image after denoising frame[t+1].
  • the splicing is the splicing of the first frame and itself.
  • Fig. 9 shows the processing process of the convolutional neural network model for two consecutive frames of images.
  • the convolutional neural network model in the embodiment of the present application only includes one first convolution in Fig. 9 network and a second convolutional network.
  • the final noise-reduced image is the noise-reduced image obtained after the image input to the convolutional neural network model is processed by the convolutional neural network model in the sample data in the preset training set.
  • loss represents the loss function value
  • output represents the RAW data of the final noise-reduced image output by the convolutional neural network model
  • gt i represents the RAW data of the image in the labeled data corresponding to the image in the input sample data in the preset training set .
  • the input image this time is the first frame image in a set of sample data
  • the loss function value between the final noise-reduced image and the RAW data of the image in the annotation data corresponding to the first frame image can be calculated.
  • S804. Determine whether the convolutional neural network model is converged according to the value of the loss function. If not converged, adjust the parameters of the convolutional neural network model based on the loss function value, and stitch the next frame image in a set of sample data with the intermediate noise-reduced image output by the first convolutional network last time, and return the splicing The step of inputting the final image into the convolutional neural network model until the convolutional neural network model converges, then using the trained convolutional neural network model as the image noise reduction model.
  • the convolutional neural network model when training the convolutional neural network model, when the convolutional neural network model is used to denoise each image, the image is concatenated with the intermediate noise-reduced image of the previous image and then input to the convolutional neural network model.
  • Neural network model, and two adjacent frames of images can reflect the motion of the shooting device or the object being shot, and there will be ghosting after splicing this image with the middle noise-reduced image of the previous image.
  • the annotated image corresponding to this image is ghost-free and noise-free, so the calculation of the loss function value on the final denoised image of the spliced image and the annotated image without ghosting and noise based on the convolutional neural network can make the trained image
  • the noise reduction model can remove ghosting and noise in the image, resulting in a clearer image with reduced noise.
  • the image to be processed is a frame in the sequence of video frames
  • the intermediate denoising results of the image to be processed and the previous frame image are concatenated and then input into the image denoising model, and then the image data after denoising of the image to be processed output by the image denoising model can be obtained.
  • the denoised image data of each frame image in the video frame sequence can be obtained, and then the denoised video can be obtained .
  • the image to be processed is a static image
  • the image to be processed is spliced with itself and input to the image noise reduction model, and then the image data of the image to be processed after noise reduction output by the image noise reduction model can be obtained.
  • image A in FIG. 10 is an image of the original noise at time t taken by the user.
  • Image B is the image obtained by denoising image A using the existing noise reduction method for superimposing the average value of multi-frame images, that is, t-2, t-1, t, t+1, t+ The 5 frames of images at time 2 are superimposed and averaged to obtain the noise-reduced image at time t. It can be seen that compared with image A, image B has a noise reduction effect in the static background part of the image, but there is a serious ghosting problem in the foreground part of the image, that is, the moving portrait in the image.
  • Image C is an image after denoising image A using the noise reduction model provided by this application.
  • image C whether it is a static background part or a moving foreground portrait, there is a good noise reduction effect, and the foreground portrait part There is no ghosting problem. It can be seen that, compared with the prior art, the noise reduction model provided by the present application can denoise the image very well without causing the ghosting problem.
  • the embodiment of the present application also provides an image noise reduction device, as shown in FIG. 11 , the device includes:
  • the first input module 1101 is used to input the image to be processed into the image denoising model.
  • the image denoising model is a model obtained by training the convolutional neural network model based on the preset training set.
  • the preset training set includes multiple sets of labeled data and each The sample data corresponding to the set of annotation data; wherein, a set of annotation data includes multiple noise-free images obtained by performing simulated motion processing on a reference noise-free image, and the sample data corresponding to the set of annotation data includes multiple noise-free images Images obtained by superimposing noises respectively;
  • the first acquisition module 1102 is configured to acquire image data after noise reduction output by the image noise reduction model
  • the conversion module 1103 is configured to convert the image data into an image to obtain a noise-reduced image corresponding to the image to be processed.
  • the device further includes: an acquisition module, configured to acquire multiple reference noise-free images; an intercepting module, configured to intercept multiple shapes from the reference noise-free image for each reference noise-free image For images that are the same, have the same area, have overlapping areas, and are not identical, use the intercepted images as a set of annotation data; the first superimposition module is used to superimpose noise on multiple images included in each set of annotation data, and obtain each group Sample data corresponding to the labeled data; a first generating module, configured to generate a preset training set from the obtained multiple sets of labeled data and the sample data corresponding to each set of labeled data.
  • the device further includes:
  • the second selection module is used to select two images as foreground images and background images respectively from multiple reference noise-free images at a time;
  • the second cropping module is used to crop the background image according to the first specified shape and the first specified size
  • the second processing module is configured to crop the foreground image according to the first specified shape and the second specified size, and perform mask processing on the cropped foreground image to obtain the foreground image of the second specified shape, and the second specified size is smaller than the first specified size ;
  • the second translation module is used to superimpose the foreground image of the second specified shape on the starting position of the background image, translate the foreground image of the second specified shape on the background image in a preset direction at a preset speed, and collect Images at multiple moments in the translation process are used as a set of labeled data.
  • the interception module is specifically used for:
  • each reference noise-free image randomly select a position from the reference noise-free image as the initial cropping coordinates; crop a square with a specified side length at the initial cropping coordinates of the reference noise-free image as a cropping image;
  • the acquisition module is specifically used for:
  • the device further includes:
  • the second input module is used to splice the first frame image in a set of sample data included in the preset training set with itself, and input the spliced image into the convolutional neural network model, which is sequentially included by the convolutional neural network model
  • the first convolutional network and the second convolutional network process the stitched image
  • the second acquisition module is used to acquire the intermediate noise reduction image output by the first convolution network and the final noise reduction image output by the second convolution network;
  • a calculation module configured to calculate a loss function value based on the final noise-reduced image and the first frame image
  • a judging module configured to judge whether the convolutional neural network model converges according to the loss function value; if not converged, trigger an adjustment module to adjust parameters of the convolutional neural network model based on the loss function value, and set a The next frame image in the group sample data is spliced with the intermediate noise reduction image output by the first convolutional network last time, triggering the second input module to perform the step of inputting the spliced image into the convolutional neural network In the step of modeling, until the convolutional neural network model of the judging module converges, the trained convolutional neural network model is used as the image noise reduction model.
  • the embodiment of the present application also provides an electronic device, as shown in FIG. 12 , including a processor 1201, a communication interface 1202, a memory 1203, and a communication bus 1204.
  • the memory 1203 is used to store computer programs; the processor 1201 is used to implement the method steps in the above method embodiments when executing the programs stored in the memory 1203 .
  • the communication bus mentioned above for the electronic device may be a Peripheral Component Interconnect (PCI) bus or an Extended Industry Standard Architecture (EISA) bus or the like.
  • PCI Peripheral Component Interconnect
  • EISA Extended Industry Standard Architecture
  • the communication bus can be divided into an address bus, a data bus, a control bus, and the like. For ease of representation, only one thick line is used in the figure, but it does not mean that there is only one bus or one type of bus.
  • the communication interface is used for communication between the electronic device and other devices.
  • the memory may include random access memory (Random Access Memory, RAM), and may also include non-volatile memory (Non-Volatile Memory, NVM), such as at least one magnetic disk memory.
  • RAM Random Access Memory
  • NVM non-Volatile Memory
  • the memory may also be at least one storage device located far away from the aforementioned processor.
  • the above-mentioned processor can be a general-purpose processor, including a central processing unit (Central Processing Unit, CPU), a network processor (Network Processor, NP), etc.; it can also be a digital signal processor (Digital Signal Processor, DSP), a dedicated integrated Circuit (Application Specific Integrated Circuit, ASIC), Field-Programmable Gate Array (Field-Programmable Gate Array, FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components.
  • CPU Central Processing Unit
  • NP Network Processor
  • DSP Digital Signal Processor
  • ASIC Application Specific Integrated Circuit
  • FPGA Field-Programmable Gate Array
  • a computer-readable storage medium is also provided, and a computer program is stored in the computer-readable storage medium, and when the computer program is executed by a processor, any of the above-mentioned noise reduction methods is implemented. A step of.
  • a computer program product including instructions is also provided, which, when run on a computer, causes the computer to execute any noise reduction method in the above embodiments.
  • all or part of them may be implemented by software, hardware, firmware or any combination thereof.
  • software When implemented using software, it may be implemented in whole or in part in the form of a computer program product.
  • the computer program product includes one or more computer instructions. When the computer program instructions are loaded and executed on the computer, the processes or functions according to the embodiments of the present application will be generated in whole or in part.
  • the computer can be a general purpose computer, a special purpose computer, a computer network, or other programmable devices.
  • the computer instructions may be stored in or transmitted from one computer-readable storage medium to another computer-readable storage medium, for example, the computer instructions may be transmitted from a website, computer, server or data center Transmission to another website site, computer, server, or data center by wired (eg, coaxial cable, optical fiber, digital subscriber line (DSL)) or wireless (eg, infrared, wireless, microwave, etc.).
  • the computer-readable storage medium may be any available medium that can be accessed by a computer, or a data storage device such as a server or a data center integrated with one or more available media.
  • the available medium may be a magnetic medium (for example, a floppy disk, a hard disk, or a magnetic tape), an optical medium (for example, DVD), or a semiconductor medium (for example, a Solid State Disk (SSD)).
  • SSD Solid State Disk
  • each embodiment in this specification is described in a related manner, the same and similar parts of each embodiment can be referred to each other, and each embodiment focuses on the differences from other embodiments.
  • the description is relatively simple, and for relevant parts, please refer to part of the description of the method embodiment.

Landscapes

  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Image Processing (AREA)
  • Image Analysis (AREA)

Abstract

本申请实施例提供了一种降噪方法、装置、电子设备及介质,涉及图像处理技术领域,该方法包括:将待处理图像输入图像降噪模型,图像降噪模型为基于预设训练集对卷积神经网络模型训练得到的模型,预设训练集中包括多组标注数据以及每组标注数据对应的样本数据;其中,一组标注数据包括对一张基准无噪声图像进行模拟运动处理后得到的多张无噪声图像,该组标注数据对应的样本数据包括对多张无噪声图像分别叠加噪声后得到的图像;获取图像降噪模型输出的降噪后的图像数据;将图像数据转换为图像,得到待处理图像对应的降噪后的图像。可以有效避免降噪后的图像存在重影的问题。

Description

一种降噪方法、装置、电子设备及介质
本申请要求于2021年12月31日提交中国专利局、申请号为202111663744.8发明名称为“一种降噪方法、装置、电子设备及介质”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及图像处理技术领域,特别是涉及一种降噪方法、装置、电子设备及介质。
背景技术
目前对于图像或者视频进行降噪主要采用多帧降噪的方式,即,获取要进行降噪的图像的前后几帧图像,对获取到的多帧图像采取叠加平均值的方法进行降噪。
而在拍摄图像或者视频时,拍摄设备可能会存在抖动,被拍摄的物体也可能是运动中的物体,所以这种对多帧图像叠加平均值的降噪方式会导致降噪后的图像或视频中存在重影。
发明内容
本申请实施例的目的在于提供一种降噪方法、装置、电子设备及介质,用以解决对视频或者图像降噪会产生重影的问题。具体技术方案如下:
第一方面,本申请实施例提供一种降噪方法,包括:
将待处理图像输入图像降噪模型,所述图像降噪模型为基于预设训练集对卷积神经网络模型训练得到的模型,所述预设训练集中包括多组标注数据以及每组标注数据对应的样本数据;其中,一组标注数据包括对一张基准无噪声图像进行模拟运动处理后得到的多张无噪声图像,该组标注数据对应的样本数据包括对所述多张无噪声图像分别叠加噪声后得到的图像;获取所述图像降噪模型输出的降噪后的图像数据;将所述图像数据转换为图像,得到所述待处理图像对应的降噪后的图像。
在一种可能的实现方式中,所述预设训练集通过以下步骤得到:
采集多张基准无噪声图像;针对每张基准无噪声图像,从该基准无噪声图像截取多张形状相同、面积相同、具有重叠区域且不完全相同的图像,将截取出的图像作为一组标注数据;对每组标注数据包括的多张图像分别叠加噪声,得到每组标注数据对应的样本数据;将得到的多组标注数据以及每组标注数据对应的样本数据生成所述预设训练集。
在一种可能的实现方式中,所述预设训练集通过以下步骤得到:
采集多张基准无噪声图像;每次从所述多张基准无噪声图像中选择两张图像分别作为前景图像和背景图像;按照第一指定形状和第一指定尺寸裁剪所述背景图像;按照所述第一指定形状和第二指定尺寸裁剪所述前景图像,对裁剪后的前景图像进行掩膜处理,得到第二指定形状的前景图像;将所述第二指定形状的前景图像叠加在裁剪后的背景图像上的起始位置,将所述第二指定形状的前景图像在所述背景图像上按照预设速度朝预设方向平移,将平移过程中的多个时刻的图像作为一组标注数据;对每组标注数据包括的多张图像分别叠加噪声,得到每组标注数据对应的样本数据;将得到的多组标注数据以及每组标注数据对应的样本数据生成所述预设训练集。
在一种可能的实现方式中,在所述将得到的多组标注数据以及每组标注数据对应的样本数据生成所述预设训练集之前,所述方法还包括:每次从所述多张基准无噪声图像中选择两张图像分别作为前景图像和背景图像;按照第一指定形状和第一指定尺寸裁剪所述背景图像;按照所述第一指定形状和第二指定尺寸裁剪所述前景图像,对裁剪后的前景图像进行掩膜处理,得到第二指定形状的 前景图像,所述第二指定尺寸小于所述第一指定尺寸;将所述第二指定形状的前景图像叠加在背景图像上的起始位置,将所述第二指定形状的前景图像在所述背景图像上按照预设速度朝预设方向平移,采集平移过程中多个时刻的图像作为一组标注数据。
在一种可能的实现方式中,所述针对每张基准无噪声图像,从该基准无噪声图像截取多张形状相同、面积相同、具有重叠区域且不完全相同的图像,将截取出的图像作为一组标注数据,包括:针对每张基准无噪声图像,从该基准无噪声图像中随机选择一个位置作为起始裁剪坐标;在所述基准无噪声图像的起始裁剪坐标处裁剪指定边长的正方形,作为一张裁剪图像;将所述起始裁剪坐标进行随机偏移,得到下一个裁剪坐标,在所述基准无噪声图像的下一个裁剪坐标处裁剪所述指定边长的正方形,作为一张裁剪图像;每对裁剪坐标进行一次随机偏移,从所述基准无噪声图像中裁剪出一张裁剪图像,直至得到预设数量的裁剪图像,将所述预设数量的裁剪图像作为一组标注数据。
在一种可能的实现方式中,所述采集多张基准无噪声图像,包括:
采集感光度被设置为最低值的拍摄设备拍摄到的多张静态RAW图像;针对每张静态RAW图像,将该静态RAW图像分别处理为不同的亮度值,得到多张基准无噪声图像。
在一种可能的实现方式中,所述图像降噪模型通过以下步骤训练得到:将所述预设训练集包括的一组样本数据中的第一帧图像与自身进行拼接后,将拼接后的图像输入所述卷积神经网络模型,依次由所述卷积神经网络模型包括的第一卷积网络和第二卷积网络处理拼接后的图像;获取所述第一卷积网络输出的中间降噪图像和所述第二卷积网络输出的最终降噪图像;基于所述最终降噪图像与所述第一帧图像计算损失函数值;根据所述损失函数值判断所述卷积神经网络模型是否收敛;若未收敛,则基于所述损失函数值调整所述卷积神经网络模型的参数,并将所述一组样本数据中的下一帧图像与所述第一卷积网络最近一次输出的中间降噪图像进行拼接,返回所述将拼接后的图像输入所述卷积神经网络模型的步骤,直至所述卷积神经网络模型收敛时,则将训练得到的卷积神经网络模型作为所述图像降噪模型。
第二方面,本申请实施例提供一种降噪装置,包括:
第一输入模块,用于将待处理图像输入图像降噪模型,所述图像降噪模型为基于预设训练集对卷积神经网络模型训练得到的模型,所述预设训练集中包括多组标注数据以及每组标注数据对应的样本数据;其中,一组标注数据包括对一张基准无噪声图像进行模拟运动处理后得到的多张无噪声图像,该组标注数据对应的样本数据包括对所述多张无噪声图像分别叠加噪声后得到的图像;第一获取模块,用于获取所述图像降噪模型输出的降噪后的图像数据;转换模块,用于将所述图像数据转换为图像,得到所述待处理图像对应的降噪后的图像。
在一种可能的实现方式中,所述装置还包括:采集模块,用于采集多张基准无噪声图像;截取模块,用于针对每张基准无噪声图像,从该基准无噪声图像截取多张形状相同、面积相同、具有重叠区域且不完全相同的图像,将截取出的图像作为一组标注数据;第一叠加模块,用于对每组标注数据包括的多张图像分别叠加噪声,得到每组标注数据对应的样本数据;第一生成模块,用于将得到的多组标注数据以及每组标注数据对应的样本数据生成所述预设训练集。
在一种可能的实现方式中,所述装置还包括:采集模块,用于采集多张基准无噪声图像;第一选择模块,用于每次从所述多张基准无噪声图像中选择两张图像分别作为前景图像和背景图像;第一裁剪模块,用于按照第一指定形状和第一指定尺寸裁剪所述背景图像;第一处理模块,用于按照 所述第一指定形状和第二指定尺寸裁剪所述前景图像,对裁剪后的前景图像进行掩膜处理,得到第二指定形状的前景图像;第一平移模块,用于将所述第二指定形状的前景图像叠加在裁剪后的背景图像上的起始位置,将所述第二指定形状的前景图像在所述背景图像上按照预设速度朝预设方向平移,将平移过程中的多个时刻的图像作为一组标注数据;第二叠加模块,用于对每组标注数据包括的多张图像分别叠加噪声,得到每组标注数据对应的样本数据;第二生成模块,用于将得到的多组标注数据以及每组标注数据对应的样本数据生成所述预设训练集。
在一种可能的实现方式中,所述装置还包括:第二选择模块,用于每次从所述多张基准无噪声图像中选择两张图像分别作为前景图像和背景图像;第二裁剪模块,用于按照第一指定形状和第一指定尺寸裁剪所述背景图像;第二处理模块,用于按照所述第一指定形状和第二指定尺寸裁剪所述前景图像,对裁剪后的前景图像进行掩膜处理,得到第二指定形状的前景图像,所述第二指定尺寸小于所述第一指定尺寸;第二平移模块,用于将所述第二指定形状的前景图像叠加在背景图像上的起始位置,将所述第二指定形状的前景图像在所述背景图像上按照预设速度朝预设方向平移,采集平移过程中多个时刻的图像作为一组标注数据。
在一种可能的实现方式中,所述截取模块,具体用于:针对每张基准无噪声图像,从该基准无噪声图像中随机选择一个位置作为起始裁剪坐标;在所述基准无噪声图像的起始裁剪坐标处裁剪指定边长的正方形,作为一张裁剪图像;将所述起始裁剪坐标进行随机偏移,得到下一个裁剪坐标,在所述基准无噪声图像的下一个裁剪坐标处裁剪所述指定边长的正方形,作为一张裁剪图像;每对裁剪坐标进行一次随机偏移,从所述基准无噪声图像中裁剪出一张裁剪图像,直至得到预设数量的裁剪图像,将所述预设数量的裁剪图像作为一组标注数据。
在一种可能的实现方式中,所述采集模块,具体用于:采集感光度被设置为最低值的拍摄设备拍摄到的多张静态RAW图像;针对每张静态RAW图像,将该静态RAW图像分别处理为不同的亮度值,得到多张基准无噪声图像。
在一种可能的实现方式中,所述装置还包括:第二输入模块,用于将所述预设训练集包括的一组样本数据中的第一帧图像与自身进行拼接后,将拼接后的图像输入所述卷积神经网络模型,依次由所述卷积神经网络模型包括的第一卷积网络和第二卷积网络处理拼接后的图像;第二获取模块,用于获取所述第一卷积网络输出的中间降噪图像和所述第二卷积网络输出的最终降噪图像;计算模块,用于基于所述最终降噪图像与所述第一帧图像计算损失函数值;判断模块,用于根据所述损失函数值判断所述卷积神经网络模型是否收敛;若未收敛,则触发调整模块基于所述损失函数值调整所述卷积神经网络模型的参数,并将所述一组样本数据中的下一帧图像与所述第一卷积网络最近一次输出的中间降噪图像进行拼接,触发所述第二输入模块执行所述将拼接后的图像输入所述卷积神经网络模型的步骤,直至所述判断模块所述卷积神经网络模型收敛时,则将训练得到的卷积神经网络模型作为所述图像降噪模型。
第三方面,本申请实施例提供了一种电子设备,其特征在于,包括处理器、通信接口、存储器和通信总线,其中,处理器,通信接口,存储器通过通信总线完成相互间的通信;存储器,用于存放计算机程序;处理器,用于执行存储器上所存放的程序时,实现上述第一方面任一所述的方法步骤。
第四方面,本申请实施例提供了一种计算机可读存储介质,其特征在于,所述计算机可读存储 介质内存储有计算机程序,所述计算机程序被处理器执行时实现上述第一方面任一所述的方法步骤。
第五方面,本申请实施例还提供了一种包含指令的计算机程序产品,当其在计算机上运行时,使得计算机执行上述任一所述的降噪方法。
采用本申请实施例,可以通过图像降噪模型对待处理图像进行降噪,且图像降噪模型基于预设训练集训练得到,其中预设训练集中的标注数据为对基准无噪声图像进行模拟运动处理后得到的多张无噪声图像,标注数据对应的样本数据包括对多张无噪声图像分别叠加噪声后得到的图像。可见样本数据和标注数据模拟了图像运动,所以图像降噪模型在训练过程中,可以学习如何对有噪声且存在运动的多帧图像进行降噪,使得降噪后的图像更接近其对应的无噪声图像,因此降低了降噪后的图像出现重影的情况。因此采用该预设训练集训练出的图像降噪模型在降噪时具有去重影效果,使得降噪得到的图像无重影,更加清晰。当然,实施本申请的任一产品或方法并不一定需要同时达到以上所述的所有优点。
附图说明
此处所说明的附图用来提供对本申请的进一步理解,构成本申请的一部分,本申请的示意性实施例及其说明用于解释本申请,并不构成对本申请的不当限定。
图1为本申请实施例提供的一种降噪方法的流程图;
图2为本申请实施例提供的另一种降噪方法的流程图;
图3为本申请实施例提供的一种图像截取方法的示例性示意图;
图4为本申请实施例提供的另一种降噪方法的流程图;
图5为本申请实施例提供的一种图像掩膜处理的示例性示意图;
图6为本申请实施例提供的一种图像平移的示例性示意图;
图7为本申请实施例提供的另一种降噪方法的流程图;
图8为本申请实施例提供的另一种降噪方法的流程图;
图9为本申请实施例提供的一种卷积神经网络结构图;
图10为本申请实施例提供的一种图像降噪效果对比的示例性示意图;
图11为本申请实施例提供一种降噪装置的结构示意图;
图12为本申请实施例提供的一种电子设备的结构示意图。
具体实施方式
为使本发明的目的、技术方案、及优点更加清楚明白,以下参照附图并举实施例,对本发明进一步详细说明。显然,所描述的实施例仅仅是本发明一部分实施例,而不是全部的实施例。基于本发明中的实施例,本领域普通技术人员在没有作出创造性劳动前提下所获得的所有其他实施例,都属于本发明保护的范围。
为了得到更清晰的降噪图像或降噪视频,本申请实施例提供了一种降噪方法,该方法可以应用电子设备,例如电子设备为台式计算机、服务器、平板电脑或者手机等具备图像处理能力的设备。如图1所示,该方法包括如下步骤:
S101、将待处理图像输入图像降噪模型。
其中,图像降噪模型为基于预设训练集对卷积神经网络模型训练得到的模型,预设训练集中包括多组标注数据以及每组标注数据对应的样本数据;其中,一组标注数据包括对一张基准无噪声图 像进行模拟运动处理后得到的多张无噪声图像,该组标注数据对应的样本数据包括对多张无噪声图像分别叠加噪声后得到的图像。
也就是说,一组标注数据对应一组样本数据,一组标注数据中的无噪声图像与一组样本数据中的有噪声的图像有一一对应的关系。
待处理图像可以为需要被降噪的图像,具体可以为一张照片,或者是一段视频中的每一帧图像。
待处理图像也可以为连续采集的多张图像,或视频中连续多个视频帧;在输入图像降噪模型时,可以按照图像采集顺序依次将多帧图像分别输入图像降噪模型,或者,按照视频帧排列顺序依次将多个视频帧分别输入图像降噪模型,得到图像降噪模型针对输入的每个图像,输出的降噪后的图像数据。
S102、获取图像降噪模型输出的降噪后的图像数据。
降噪模型输出的图像数据为降噪后的图像的原始图像格式(RAW Image Format,RAW)数据。
S103、将图像数据转换为图像,得到待处理图像对应的降噪后的图像。
在本申请实施例中,为了呈现给用户最佳视觉效果,需要将图像数据转换为标准红绿蓝(standard Red Green Blue,SRGB)图像,例如可以通过图像信号处理(Image Signal Processor,ISP)算法将图像数据转换为SRGB图像,即,可以得到降噪后的SRGB图像。
可以理解的是,如果需要对视频进行降噪,则可分别对该视频帧的每个视频帧进行降噪,从而得到降噪后的视频。即,可分别将该视频的每个视频帧作为待处理图像,并通过图1的方法对每个待处理图像进行降噪,从而得到降噪后的视频。
采用本申请实施例,可以通过图像降噪模型对待处理图像进行降噪,且图像降噪模型基于预设训练集训练得到,其中预设训练集中的标注数据为对基准无噪声图像进行模拟运动处理后得到的多张无噪声图像,标注数据对应的样本数据包括对多张无噪声图像分别叠加噪声后得到的图像。可见样本数据和标注数据模拟了图像运动,所以图像降噪模型在训练过程中,可以学习如何对有噪声且存在运动的多帧图像进行降噪,使得降噪后的图像更接近其对应的无噪声图像,因此降低了降噪后的图像出现重影的情况。因此采用该预设训练集训练出的图像降噪模型在降噪时具有去重影效果,使得降噪得到的图像无重影,更加清晰。
为了实现图1所示的方法流程,需要先训练得到图像降噪模型,为了使得训练出的图像降噪模型能够处理拍摄设备在运动场景下拍摄得到图像,需要先生成能够准确模拟相机运动的样本图像。以下对生成训练集的方法进行介绍。
在一种实施方式中,为了模拟在拍摄时,拍摄设备存在整体运动的情形,比如相机存在抖动,如图2所示,预设训练集通过以下步骤得到:
S201、采集多张基准无噪声图像。
其中,该步骤具体可以实现为:采集感光度值被设置为最低值的拍摄设备拍摄到的多张静态RAW图像;针对每张静态RAW图像,将该静态RAW图像分别处理为不同的亮度值,得到多张基准无噪声图像。
本申请实施例中的感光度可以为国际标准化组织(International Standards Organization,ISO)规定的感光度。
在拍摄图像时,拍摄设备的ISO值设置的越低,拍摄得到的图像的噪声强度就越小。
所以在本申请实施例中,为了获取基准无噪声图像,可以使用ISO值被设置为最低值的拍摄设备拍摄到的多张静态RAW图像,将多张静态RAW图像作为基准无噪声图像。
拍摄设备可以为智能手机、数码相机等具有图像拍摄功能的设备,本申请实施例不做具体限定。其中,拍摄设备与上述电子设备可以是同一设备,或者为不同的设备,本申请实施例对此不作具体限定。
本申请实施例中,可以拍摄固定亮度下的多张静态RAW图像,然后可以针对每张静态RAW图像,将该静态RAW图像分别处理为不同的亮度值,得到多张基准无噪声图像。
由于图像的RAW域数据和光照强度呈线性关系,所以可通过以下公式得到多种亮度下的基准无噪声图像:
gt i=gt 0/ratio i
其中gt 0为图像在固定亮度下拍摄得到的静态RAW图像的数据,ratio i为亮度倍率,ratio i可以为1、2或3等整数值,可根据实际需求设置,gt i为设置亮度值后得到的基准无噪声图像。
图像的RAW域数据与入射光强度呈线性关系,并且RAW域噪声可以基于物理成像过程进行建模,所以基于图像的RAW域数据可以很好的使用监督学习对卷积神经网络模型进行训练,从而得到图像降噪模型。
S202、针对每张基准无噪声图像,从该基准无噪声图像截取多张形状相同、面积相同、具有重叠区域且不完全相同的图像,将截取出的图像作为一组标注数据。
即,从基准无噪声图像中,截取的多张图像的形状和面积均相同,而且截取的每两张图像之间均具有重叠区域,且重叠区域面积小于截取的单张图像面积。
其中,针对每张基准无噪声图像,可以从该基准无噪声图像中随机选择一个位置作为起始裁剪坐标。在基准无噪声图像的起始裁剪坐标处裁剪指定边长的正方形,作为一张裁剪图像。例如,将起始裁剪坐标作为裁剪图像的左上顶点,或者将起始裁剪坐标作为裁剪图像的中心点,并根据指定边长,裁剪出一个正方形图像。
其中,指定边长可以表示为PATCH_SIZE,PATCH_SIZE为上述图像降噪模型的卷积神经网络内核的输入大小。
然后,将起始裁剪坐标进行随机偏移,得到下一个裁剪坐标,在基准无噪声图像的下一个裁剪坐标处裁剪指定边长的正方形,作为一张裁剪图像。
假设起始裁剪坐标为(x 0,y 0),后续的裁剪坐标为(x i,y i),则后续的裁剪坐标通过以下公式计算得到:
x i=x 0+random_x
y i=y 0+random_y
其中random_x表示横坐标x 0的随机偏移量,random_y表示纵坐标y 0的随机偏移量,随机偏移量可以由技术人员进行设置。
假设基准无噪声图像的高为H、宽为W,为了保证裁剪图像不会偏移出基准无噪声图像,起始裁剪坐标(x 0,y 0)的横坐标和纵坐标的取值范围为:
RANGE≤x 0≤W-1-PATCH_SIZE-RANGE
RANGE≤y 0≤H-1-PATCH_SIZE-RANGE
其中,RANGE为预先设置的随机偏移量的最大值,例如RANGE可以为100。
每对裁剪坐标进行一次随机偏移,从基准无噪声图像中裁剪出一张裁剪图像,直至得到预设数量的裁剪图像,将预设数量的裁剪图像作为一组标注数据。
如图3所示,图3为通过上述方式裁剪图像的示意图,图3中的最外层长方形表示基准无噪声图像,基准无噪声图像中的每个大小相同的正方形为一张裁剪图像,从图3中可以看出,图3中的多张裁剪图像中,每两张图像之间均具有重叠区域,但又不完全重合,相当于模拟了拍摄设备在抖动时,拍摄得到的多张图像或者连续的视频帧,采用该方法可以得到能够模拟拍摄设备真实运动场景的图像。
S203、对每组标注数据包括的多张图像分别叠加噪声,得到每组标注数据对应的样本数据。
本申请实施例中,采取以下公式对标注数据中的图像叠加噪声:
noise i=possion(gt i)+gaussian
其中,noise i表示叠加噪声后图像的RAW数据,gt i表示未叠加噪声的图像RAW数据,函数possion()表示叠加泊松噪声,gaussian表示叠加高斯噪声。
S204、将得到的多组标注数据以及每组标注数据对应的样本数据生成预设训练集。
即,将每组标注数据与样本数据一一对应,作为预设训练集。
采用该方法,可以通过预先采集多张基准无噪声图像,通过从基准无噪声图像截取多张形状相同、面积相同、具有重叠区域且不完全相同的图像,将截取出的图像作为一组标注数据,模拟拍摄图像或者视频时,拍摄设备抖动的情形,再进一步对多张标注数据进行叠加噪声处理,相当于得到了在拍摄设备存在抖动情况下拍摄得到的有噪声的视频帧序列,将有噪声的视频帧序列作为样本数据对卷积神经网络进行训练,相当于训练模型过程中考虑到了相机抖动的情况,即考虑了图像运动的情况,使得训练得到的降噪模型可以对视频或图像进行降噪处理,且降噪后的视频或图像中不会存在重影,相比于现有技术提高了降噪效果。
在另一实施方式中,为了模拟拍摄设备静止,但被拍摄物体为运动物体的情形,如图4所示,预设训练集通过以下步骤得到:
S401、采集多张基准无噪声图像。
基准无噪声图像的采集过程可以参考上述S201中的描述,此处不再重复描述。
S402、每次从多张基准无噪声图像中选择两张图像分别作为前景图像和背景图像。
可以从多张基准无噪声图像中随机选择两张图像,也可以按照一定的顺序选择两张图像,本申请实施例对此不作限定。可将选择的其中一张图像作为前景图像,将另一张图像作为背景图像。
S403、按照第一指定形状和第一指定尺寸裁剪背景图像。
其中,第一指定形状可以为矩形,具体可以为正方形,相应地第一指定尺寸为正方形的边长,该边长可以为PATCH_SIZE,PATCH_SIZE为上述图像降噪模型的卷积神经网络内核的输入大小。
可以将背景图像裁剪为边长为PATCH_SIZE的正方形图像。
S404、按照第一指定形状和第二指定尺寸裁剪前景图像,对裁剪后的前景图像进行掩膜处理,得到第二指定形状的前景图像。
其中,第二尺寸远小于第一尺寸,假设裁剪前景图像的长和宽分别为w和h,则w和h要远 远小于PATCH_SIZE。
第二指定形状可以为矩形、椭圆、菱形等,也可以为不规则形状,比如为矩形、椭圆和菱形等多种图形的组合变种。
由于实际拍摄过程中运动的物体形状通常是不规则的,而裁剪得到的前景图像为矩形,所以为了模拟运动物体的形状,需要对裁剪后的图像进行掩膜处理,以获得不规则的前景图像。
如图5所示,图5中有数字“1”的位置为掩膜处理后前景图像中保留的像素,有数字“0”的位置被设置为了透明,相当于掩膜处理后肉眼只能看到图5中有数字“1”的区域。
掩膜处理使得前景图像更接近真实物体的形状,使得预设训练集训练得到的降噪模型能够对图像中物体的边缘降噪处理效果更好。
S405、将第二指定形状的前景图像叠加在裁剪后的背景图像上的起始位置,将第二指定形状的前景图像在背景图像上按照预设速度朝预设方向平移,将平移过程中的多个时刻的图像作为一组标注数据。
其中,可以将第二指定形状的前景图像在裁剪后的背景图像上进行平移,平移过程中每个时刻的图像为第二指定形状的前景图像和裁剪后的背景图像的叠加图像。
假设前景图像在背景图像上的平移起始位置为(x 0,y 0),则后续图像平移后在背景图像上的位置(x i,y i)可以由下列公式计算得到:
x i=x 0+v*i
y i=y 0+v*i
其中v表示预设速度,例如v为0到40之间的随机整数,i表示移动时刻。
为了防止前景图像移动出背景图像的范围,前景图像的位置(x i,y i)需要满足以下条件:
0≤x i≤PATCH_SIZE-1-w
0≤y i≤PATCH_SIZE-1-h
其中,PATCH_SIZE为背景图像边长,w和h分别表示前景图像的宽和高。
以此来保证前景图像始终在背景图像内部平移。
如图6所示,图6中背景图像为图像A。前景图像为图像B,图像B在图像A内部按上述方法进行平移。
S406、对每组标注数据包括的多张图像分别叠加噪声,得到每组标注数据对应的样本数据。
可以理解的是,每执行一次S402-S405可得到一组标注数据,每得到一组标注数据即可对该组标注数据包括的多张图像分别叠加噪声。
其中,叠加噪声方式与上述S203中描述方法一致,可参考S203中的相关描述,此处不再赘述。
S407、将得到的多组标注数据以及每组标注数据对应的样本数据生成预设训练集。
即,将每组标注数据与样本数据一一对应,作为预设训练集。
采用该方法,可以通过预先采集多张基准无噪声图像,从中选取两张图像,分别裁剪成前景图像和背景图像,将前景图像叠加在背景图像上的起始位置,再从起始位置开始平移,将平移过程中的多个时刻的图像作为一组标注数据,对标注数据叠加噪声,相当于得到了在拍摄设备在静止情况 下拍摄运动物体得到的有噪声的视频,也就得到了包括标注数据和每组标注数据对应的样本数据的预设训练集。通过该预设训练集训练得到的图像降噪模型,可以对相机拍摄到的包括运动物体的图像或视频进行很好的降噪,不会产生重影。
在另一实施方式中,如图7所示,预设训练集还可以通过以下步骤得到:
S701、采集多张基准无噪声图像。
S702、针对每张基准无噪声图像,从该基准无噪声图像截取多张形状相同、面积相同、具有重叠区域且不完全相同的图像,将截取出的图像作为一组标注数据。
S701-S702与S201-S202一致,可以参考上述S201-S202中的有关描述,此处不再赘述。
S703、每次从多张基准无噪声图像中选择两张图像分别作为前景图像和背景图像。
S704、按照第一指定形状和第一指定尺寸裁剪背景图像。
S705、按照第一指定形状和第二指定尺寸裁剪前景图像,对裁剪后的前景图像进行掩膜处理,得到第二指定形状的前景图像,第二指定尺寸小于第一指定尺寸。
S706、将第二指定形状的前景图像叠加在背景图像上的起始位置,将第二指定形状的前景图像在所述背景图像上按照预设速度朝预设方向平移,采集平移过程中多个时刻的图像作为一组标注数据。其中,本步骤中的背景图像均为裁剪后的背景图像。
S703至S706具体实施方式与上述S402至S405实施方式一致,可以参考上述S402至S405实施例中的有关描述,此处不再赘述。
S707、对每组标注数据包括的多张图像分别叠加噪声,得到每组标注数据对应的样本数据。其中,该步骤中处理的标注数据包括S702中得到的标注数据,也包括S706中得到的标注数据。
S708、将得到的多组标注数据以及每组标注数据对应的样本数据生成预设训练集。
需要说明的是,本申请实施例提供了两种模拟运动场景,分别为拍摄设备抖动而被拍摄物体静止场景和拍摄设备静止而被拍摄物体运动场景,预设训练集可以为对其中一种运动场景进行模拟得到,也可以为对两种运动场景都进行模拟得到。
采用该方法,从采集到的基准无噪声图像截取多张形状相同、面积相同、具有重叠区域且不完全相同的图像,将截取出的图像作为一组标注数据,相当于模拟了拍摄设备抖动而被拍摄物体静止场景;从多张基准无噪声图像中选择两张图像分别作为前景图像和背景图像,让前景图像在背景图像上平移,相当于模拟了拍摄设备静止而被拍摄物体运动场景,也就是说,最终得到的预设训练集为对两种运动场景都进行模拟得到,进而,使用预设训练集训练的图像降噪模型可以更好地处理拍摄设备在各种运动情况下拍摄的图像或视频,即,使图像降噪模型在训练中,可以学习如何对各种场景下的有噪声的多帧图像进行降噪,使得降噪后的图像更接近其对应的无噪声图像,因此降低了降噪后的图像出现重影的情况,提升了降噪能力和去除重影能力。
在本申请另一实施例中,如图8所示,图像降噪模型通过以下步骤训练得到:
S801、将预设训练集包括的一组样本数据中的第一帧图像与自身进行拼接后,将拼接后的图像输入卷积神经网络模型,依次由卷积神经网络模型包括的第一卷积网络和第二卷积网络处理拼接后的图像。
其中,一组样本数据包括的多张图像是对相同的基准无噪声图像处理得到的,所以一组样本数据可以看作一个视频帧序列,该组样本数据中的第一张图像即为第一帧图像。
S802、获取第一卷积网络输出的中间降噪图像和第二卷积网络输出的最终降噪图像。
本申请实施例中,如图9所示,卷积神经网络模型包括第一卷积网络和第二卷积网络。其中,第一卷积网络至少包括卷积激活层和卷积层,第二卷积网络至少包括卷积激活层和卷积层。本申请实施例不对第一卷积网络和第二卷积网络的具体结构进行限制。其中第一卷积网络在处理当前帧图像时会输出一个中间降噪图像,即对当前帧图像进行部分降噪后得到的图像。最终降噪图像为当前帧经过第一和第二两个卷积网络降噪处理得到的图像。
如图9所示,frame[t]表示一组标注数据中的第t帧图像,fused_frame[t-1]表示第t-1帧图像的中间降噪图像,将frame[t]和fused_frame[t-1]拼接输入第一卷积网络,得到第t帧图像的中间降噪图像fused_frame[t],然后将fused_frame[t]输入第二卷积网络,再将第二卷积网络的输出结果与fused_frame[t]相加(add),最终得到输出结果output[t],即对frame[t]降噪后的最终降噪图像。
在将该组标注数据中的第t+1帧图像frame[t+1]输入该卷积神经网络模型时,需将fused_frame[t]与frame[t+1]进行拼接,将拼接结果输入到第一卷积网络,得到第t+1帧图像的中间降噪图像fused_frame[t+1],再将fused_frame[t+1]输入第二卷积网络,再将第二卷积网络的输出结果与fused_frame[t+1]相加,得到最终输出结果output[t+1],即对frame[t+1]降噪后的最终降噪图像。
需要说明的是,当t=0时,即第一帧输入时,由于没有上一帧的中间降噪图像,所以拼接时为第一帧与自身拼接。
为了方便理解,图9中示出了卷积神经网络模型对连续两帧图像的处理过程,实际应用中,本申请实施例中的卷积神经网络模型只包括图9中的一个第一卷积网络和一个第二卷积网络。
S803、基于最终降噪图像与第一帧图像计算损失函数值。
本申请实施例中,最终降噪图像为预设训练集中样本数据中,本次输入卷积神经网络模型的图像经过卷积神经网络模型处理后得到的降噪图像。
损失函数值计算公式为:
loss=|output-gt i|
其中,loss表示损失函数值,output表示卷积神经网络模型输出的最终降噪图像的RAW数据,gt i表示预设训练集中与输入的样本数据中的图像对应的标注数据中的图像的RAW数据。
例如,本次输入的图像为一组样本数据中的第一帧图像,则可计算最终降噪图像与该第一帧图像对应的标注数据中的图像的RAW数据之间的损失函数值。
S804、根据损失函数值判断卷积神经网络模型是否收敛。若未收敛,则基于损失函数值调整卷积神经网络模型的参数,并将一组样本数据中的下一帧图像与第一卷积网络最近一次输出的中间降噪图像进行拼接,返回将拼接后的图像输入卷积神经网络模型的步骤,直至卷积神经网络模型收敛时,则将训练得到的卷积神经网络模型作为图像降噪模型。
采用本申请实施例,由于训练卷积神经网络模型时,在采用卷积神经网络模型对每张图像进行降噪时,将该图像与上一张图像的中间降噪图像拼接后输入该卷积神经网络模型,且相邻两帧图像能够反映拍摄设备或所拍摄的物体的运动,将该图像与上一张图像的中间降噪图像拼接后会有重影。而该图像对应的标注图像是无重影无噪声的,所以基于该卷积神经网络对拼接图像的最终降噪图像与无重影无噪声的标注图像计算损失函数值,可以使得训练完成的图像降噪模型能够去除图像中的重影和噪声,得到更清晰的降噪图像。
可以理解的是,在应用图像降噪模型的过程中,如果待处理图像为视频帧序列中的一帧,则将该待处理图像和上一帧图像的中间降噪结果拼接后输入图像降噪模型,进而可获取图像降噪模型输出的对待处理图像降噪后的图像数据,按照该方法可以获取视频帧序列中的每一帧图像降噪后的图像数据,进而可以得到降噪后的视频。
如果待处理图像为静态图像,则将待处理图像与自身拼接后输入图像降噪模型,进而可获取图像降噪模型输出的对待处理图像降噪后的图像数据。
如图10所示,图10中图像A为用户拍摄的t时刻的原始噪声的图像。
图像B为采用现有对多帧图像叠加平均值的降噪方法对图像A进行降噪处理得到的图像,即对用户拍摄到的t-2、t-1、t、t+1、t+2时刻的5帧图像进行叠加平均值处理,得到的t时刻的降噪图像。可以看出图像B相比于图像A,在图像静止的背景部分有降噪效果,但对于图像前景部分,也就是图像中运动的人像,存在严重重影的问题。
图像C为使用本申请提供的降噪模型对图像A进行降噪后的图像,图像C中,不论是静止的背景部分还是运动的前景人像,都有很好的降噪效果,并且前景人像部分不存在重影问题,可以看出,与现有技术相比,本申请提供的降噪模型可很好的对图像进行降噪,并且不会产生重影问题。
对应于上述方法实施例,本申请实施例还提供了一种图像降噪装置,如图11所示,该装置包括:
第一输入模块1101,用于将待处理图像输入图像降噪模型,图像降噪模型为基于预设训练集对卷积神经网络模型训练得到的模型,预设训练集中包括多组标注数据以及每组标注数据对应的样本数据;其中,一组标注数据包括对一张基准无噪声图像进行模拟运动处理后得到的多张无噪声图像,该组标注数据对应的样本数据包括对多张无噪声图像分别叠加噪声后得到的图像;
第一获取模块1102,用于获取图像降噪模型输出的降噪后的图像数据;
转换模块1103,用于将图像数据转换为图像,得到待处理图像对应的降噪后的图像。
在本申请另一实施例中,该装置还包括:采集模块,用于采集多张基准无噪声图像;截取模块,用于针对每张基准无噪声图像,从该基准无噪声图像截取多张形状相同、面积相同、具有重叠区域且不完全相同的图像,将截取出的图像作为一组标注数据;第一叠加模块,用于对每组标注数据包括的多张图像分别叠加噪声,得到每组标注数据对应的样本数据;第一生成模块,用于将得到的多组标注数据以及每组标注数据对应的样本数据生成预设训练集。
在本申请另一实施例中,该装置还包括:采集模块,用于采集多张基准无噪声图像;第一选择模块,用于每次从多张基准无噪声图像中选择两张图像分别作为前景图像和背景图像;第一裁剪模块,用于按照第一指定形状和第一指定尺寸裁剪背景图像;第一处理模块,用于按照第一指定形状和第二指定尺寸裁剪前景图像,对裁剪后的前景图像进行掩膜处理,得到第二指定形状的前景图像;第一平移模块,用于将第二指定形状的前景图像叠加在裁剪后的背景图像上的起始位置,将第二指定形状的前景图像在所述背景图像上按照预设速度朝预设方向平移,将平移过程中的多个时刻的图像作为一组标注数据;第二叠加模块,用于对每组标注数据包括的多张图像分别叠加噪声,得到每组标注数据对应的样本数据;第二生成模块,用于将得到的多组标注数据以及每组标注数据对应的样本数据生成预设训练集。
在本申请另一实施例中,该装置还包括:
第二选择模块,用于每次从多张基准无噪声图像中选择两张图像分别作为前景图像和背景图像;
第二裁剪模块,用于按照第一指定形状和第一指定尺寸裁剪背景图像;
第二处理模块,用于按照第一指定形状和第二指定尺寸裁剪前景图像,对裁剪后的前景图像进行掩膜处理,得到第二指定形状的前景图像,第二指定尺寸小于第一指定尺寸;
第二平移模块,用于将第二指定形状的前景图像叠加在背景图像上的起始位置,将第二指定形状的前景图像在所述背景图像上按照预设速度朝预设方向平移,采集平移过程中多个时刻的图像作为一组标注数据。
在本申请另一实施例中,截取模块具体用于:
针对每张基准无噪声图像,从该基准无噪声图像中随机选择一个位置作为起始裁剪坐标;在基准无噪声图像的起始裁剪坐标处裁剪指定边长的正方形,作为一张裁剪图像;
将起始裁剪坐标进行随机偏移,得到下一个裁剪坐标,在基准无噪声图像的下一个裁剪坐标处裁剪指定边长的正方形,作为一张裁剪图像;每对裁剪坐标进行一次随机偏移,从基准无噪声图像中裁剪出一张裁剪图像,直至得到预设数量的裁剪图像,将预设数量的裁剪图像作为一组标注数据。
在本申请另一实施例中,采集模块,具体用于:
采集感光度被设置为最低值的拍摄设备拍摄到的多张静态RAW图像;针对每张静态RAW图像,将该静态RAW图像分别处理为不同的亮度值,得到多张基准无噪声图像。
在本申请另一实施例中,该装置还包括:
第二输入模块,用于将预设训练集包括的一组样本数据中的第一帧图像与自身进行拼接后,将拼接后的图像输入卷积神经网络模型,依次由卷积神经网络模型包括的第一卷积网络和第二卷积网络处理拼接后的图像;
第二获取模块,用于获取第一卷积网络输出的中间降噪图像和第二卷积网络输出的最终降噪图像;
计算模块,用于基于最终降噪图像与所述第一帧图像计算损失函数值;
判断模块,用于根据所述损失函数值判断所述卷积神经网络模型是否收敛;若未收敛,则触发调整模块基于所述损失函数值调整所述卷积神经网络模型的参数,并将一组样本数据中的下一帧图像与所述第一卷积网络最近一次输出的中间降噪图像进行拼接,触发所述第二输入模块执行所述将拼接后的图像输入所述卷积神经网络模型的步骤,直至所述判断模块所述卷积神经网络模型收敛时,则将训练得到的卷积神经网络模型作为所述图像降噪模型。
本申请实施例还提供了一种电子设备,如图12所示,包括处理器1201、通信接口1202、存储器1203和通信总线1204,其中,处理器1201,通信接口1202,存储器1203通过通信总线1204完成相互间的通信,存储器1203,用于存放计算机程序;处理器1201,用于执行存储器1203上所存放的程序时,实现上述方法实施例中的方法步骤。
上述电子设备提到的通信总线可以是外设部件互连标准(Peripheral Component Interconnect,PCI)总线或扩展工业标准结构(Extended Industry Standard Architecture,EISA)总线等。该通信总线可以分为地址总线、数据总线、控制总线等。为便于表示,图中仅用一条粗线表示,但并不表示仅有一根总线或一种类型的总线。通信接口用于上述电子设备与其他设备之间的通信。
存储器可以包括随机存取存储器(Random Access Memory,RAM),也可以包括非易失性存储 器(Non-Volatile Memory,NVM),例如至少一个磁盘存储器。可选的,存储器还可以是至少一个位于远离前述处理器的存储装置。
上述的处理器可以是通用处理器,包括中央处理器(Central Processing Unit,CPU)、网络处理器(Network Processor,NP)等;还可以是数字信号处理器(Digital Signal Processor,DSP)、专用集成电路(Application Specific Integrated Circuit,ASIC)、现场可编程门阵列(Field-Programmable Gate Array,FPGA)或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件。
在本申请提供的又一实施例中,还提供了一种计算机可读存储介质,该计算机可读存储介质内存储有计算机程序,所述计算机程序被处理器执行时实现上述任一降噪方法的步骤。
在本申请提供的又一实施例中,还提供了一种包含指令的计算机程序产品,当其在计算机上运行时,使得计算机执行上述实施例中任一降噪方法。
在上述实施例中,可以全部或部分地通过软件、硬件、固件或者其任意组合来实现。当使用软件实现时,可以全部或部分地以计算机程序产品的形式实现。所述计算机程序产品包括一个或多个计算机指令。在计算机上加载和执行所述计算机程序指令时,全部或部分地产生按照本申请实施例所述的流程或功能。所述计算机可以是通用计算机、专用计算机、计算机网络、或者其他可编程装置。所述计算机指令可以存储在计算机可读存储介质中,或者从一个计算机可读存储介质向另一个计算机可读存储介质传输,例如,所述计算机指令可以从一个网站站点、计算机、服务器或数据中心通过有线(例如同轴电缆、光纤、数字用户线(DSL))或无线(例如红外、无线、微波等)方式向另一个网站站点、计算机、服务器或数据中心进行传输。所述计算机可读存储介质可以是计算机能够存取的任何可用介质或者是包含一个或多个可用介质集成的服务器、数据中心等数据存储设备。所述可用介质可以是磁性介质,(例如,软盘、硬盘、磁带)、光介质(例如,DVD)、或者半导体介质(例如固态硬盘Solid State Disk(SSD))等。
需要说明的是,在本文中,诸如第一和第二等之类的关系术语仅仅用来将一个实体或者操作与另一个实体或操作区分开来,而不一定要求或者暗示这些实体或操作之间存在任何这种实际的关系或者顺序。而且,术语“包括”、“包含”或者其任何其他变体意在涵盖非排他性的包含,从而使得包括一系列要素的过程、方法、物品或者设备不仅包括那些要素,而且还包括没有明确列出的其他要素,或者是还包括为这种过程、方法、物品或者设备所固有的要素。在没有更多限制的情况下,由语句“包括一个……”限定的要素,并不排除在包括所述要素的过程、方法、物品或者设备中还存在另外的相同要素。
本说明书中的各个实施例均采用相关的方式描述,各个实施例之间相同相似的部分互相参见即可,每个实施例重点说明的都是与其他实施例的不同之处。尤其,对于装置实施例而言,由于其基本相似于方法实施例,所以描述的比较简单,相关之处参见方法实施例的部分说明即可。
以上所述仅为本发明的较佳实施例,并不用以限制本发明,凡在本发明的精神和原则之内,所做的任何修改、等同替换、改进等,均应包含在本发明保护的范围之内。

Claims (16)

  1. 一种降噪方法,其特征在于,包括:
    将待处理图像输入图像降噪模型,所述图像降噪模型为基于预设训练集对卷积神经网络模型训练得到的模型,所述预设训练集中包括多组标注数据以及每组标注数据对应的样本数据;其中,一组标注数据包括对一张基准无噪声图像进行模拟运动处理后得到的多张无噪声图像,该组标注数据对应的样本数据包括对所述多张无噪声图像分别叠加噪声后得到的图像;
    获取所述图像降噪模型输出的降噪后的图像数据;
    将所述图像数据转换为图像,得到所述待处理图像对应的降噪后的图像。
  2. 根据权利要求1所述的方法,其特征在于,所述预设训练集通过以下步骤得到:
    采集多张基准无噪声图像;
    针对每张基准无噪声图像,从该基准无噪声图像截取多张形状相同、面积相同、具有重叠区域且不完全相同的图像,将截取出的图像作为一组标注数据;
    对每组标注数据包括的多张图像分别叠加噪声,得到每组标注数据对应的样本数据;
    将得到的多组标注数据以及每组标注数据对应的样本数据生成所述预设训练集。
  3. 根据权利要求1所述的方法,其特征在于,所述预设训练集通过以下步骤得到:
    采集多张基准无噪声图像;
    每次从所述多张基准无噪声图像中选择两张图像分别作为前景图像和背景图像;
    按照第一指定形状和第一指定尺寸裁剪所述背景图像;
    按照所述第一指定形状和第二指定尺寸裁剪所述前景图像,对裁剪后的前景图像进行掩膜处理,得到第二指定形状的前景图像;
    将所述第二指定形状的前景图像叠加在裁剪后的背景图像上的起始位置,将所述第二指定形状的前景图像在所述背景图像上按照预设速度朝预设方向平移,将平移过程中的多个时刻的图像作为一组标注数据;
    对每组标注数据包括的多张图像分别叠加噪声,得到每组标注数据对应的样本数据;
    将得到的多组标注数据以及每组标注数据对应的样本数据生成所述预设训练集。
  4. 根据权利要求2所述的方法,其特征在于,在所述将得到的多组标注数据以及每组标注数据对应的样本数据生成所述预设训练集之前,所述方法还包括:
    每次从所述多张基准无噪声图像中选择两张图像分别作为前景图像和背景图像;
    按照第一指定形状和第一指定尺寸裁剪所述背景图像;
    按照所述第一指定形状和第二指定尺寸裁剪所述前景图像,对裁剪后的前景图像进行掩膜处理,得到第二指定形状的前景图像,所述第二指定尺寸小于所述第一指定尺寸;
    将所述第二指定形状的前景图像叠加在背景图像上的起始位置,将所述第二指定形状的前景图像在所述背景图像上按照预设速度朝预设方向平移,采集平移过程中多个时刻的图像作为一组标注数据。
  5. 根据权利要求2所述的方法,其特征在于,所述针对每张基准无噪声图像,从该基准无噪声图像截取多张形状相同、面积相同、具有重叠区域且不完全相同的图像,将截取出的图像作为一组标注数据,包括:
    针对每张基准无噪声图像,从该基准无噪声图像中随机选择一个位置作为起始裁剪坐标;
    在所述基准无噪声图像的起始裁剪坐标处裁剪指定边长的正方形,作为一张裁剪图像;
    将所述起始裁剪坐标进行随机偏移,得到下一个裁剪坐标,在所述基准无噪声图像的下一个裁剪坐标处裁剪所述指定边长的正方形,作为一张裁剪图像;
    每对裁剪坐标进行一次随机偏移,从所述基准无噪声图像中裁剪出一张裁剪图像,直至得到预设数量的裁剪图像,将所述预设数量的裁剪图像作为一组标注数据。
  6. 根据权利要求2或3所述的方法,其特征在于,所述采集多张基准无噪声图像,包括:
    采集感光度被设置为最低值的拍摄设备拍摄到的多张静态RAW图像;
    针对每张静态RAW图像,将该静态RAW图像分别处理为不同的亮度值,得到多张基准无噪声图像。
  7. 根据权利要求2-4任一项所述的方法,其特征在于,所述图像降噪模型通过以下步骤训练得到:
    将所述预设训练集包括的一组样本数据中的第一帧图像与自身进行拼接后,将拼接后的图像输入所述卷积神经网络模型,依次由所述卷积神经网络模型包括的第一卷积网络和第二卷积网络处理拼接后的图像;
    获取所述第一卷积网络输出的中间降噪图像和所述第二卷积网络输出的最终降噪图像;
    基于所述最终降噪图像与所述第一帧图像计算损失函数值;
    根据所述损失函数值判断所述卷积神经网络模型是否收敛;
    若未收敛,则基于所述损失函数值调整所述卷积神经网络模型的参数,并将所述一组样本数据中的下一帧图像与所述第一卷积网络最近一次输出的中间降噪图像进行拼接,返回所述将拼接后的图像输入所述卷积神经网络模型的步骤,直至所述卷积神经网络模型收敛时,则将训练得到的卷积神经网络模型作为所述图像降噪模型。
  8. 一种降噪装置,其特征在于,包括:
    第一输入模块,用于将待处理图像输入图像降噪模型,所述图像降噪模型为基于预设训练集对卷积神经网络模型训练得到的模型,所述预设训练集中包括多组标注数据以及每组标注数据对应的样本数据;其中,一组标注数据包括对一张基准无噪声图像进行模拟运动处理后得到的多张无噪声图像,该组标注数据对应的样本数据包括对所述多张无噪声图像分别叠加噪声后得到的图像;
    第一获取模块,用于获取所述图像降噪模型输出的降噪后的图像数据;
    转换模块,用于将所述图像数据转换为图像,得到所述待处理图像对应的降噪后的图像。
  9. 根据权利要求8所述的装置,其特征在于,所述装置还包括:
    采集模块,用于采集多张基准无噪声图像;
    截取模块,用于针对每张基准无噪声图像,从该基准无噪声图像截取多张形状相同、面积相同、具有重叠区域且不完全相同的图像,将截取出的图像作为一组标注数据;
    第一叠加模块,用于对每组标注数据包括的多张图像分别叠加噪声,得到每组标注数据对应的样本数据;
    第一生成模块,用于将得到的多组标注数据以及每组标注数据对应的样本数据生成所述预设训练集。
  10. 根据权利要求8所述的装置,其特征在于,所述装置还包括:
    采集模块,用于采集多张基准无噪声图像;
    第一选择模块,用于每次从所述多张基准无噪声图像中选择两张图像分别作为前景图像和背景图像;
    第一裁剪模块,用于按照第一指定形状和第一指定尺寸裁剪所述背景图像;
    第一处理模块,用于按照所述第一指定形状和第二指定尺寸裁剪所述前景图像,对裁剪后的前景图像进行掩膜处理,得到第二指定形状的前景图像;
    第一平移模块,用于将所述第二指定形状的前景图像叠加在裁剪后的背景图像上的起始位置,将所述第二指定形状的前景图像在所述背景图像上按照预设速度朝预设方向平移,将平移过程中的多个时刻的图像作为一组标注数据;
    第二叠加模块,用于对每组标注数据包括的多张图像分别叠加噪声,得到每组标注数据对应的样本数据;
    第二生成模块,用于将得到的多组标注数据以及每组标注数据对应的样本数据生成所述预设训练集。
  11. 根据权利要求9所述的装置,其特征在于,所述装置还包括:
    第二选择模块,用于每次从所述多张基准无噪声图像中选择两张图像分别作为前景图像和背景图像;
    第二裁剪模块,用于按照第一指定形状和第一指定尺寸裁剪所述背景图像;
    第二处理模块,用于按照所述第一指定形状和第二指定尺寸裁剪所述前景图像,对裁剪后的前景图像进行掩膜处理,得到第二指定形状的前景图像,所述第二指定尺寸小于所述第一指定尺寸;
    第二平移模块,用于将所述第二指定形状的前景图像叠加在背景图像上的起始位置,将所述第二指定形状的前景图像在所述背景图像上按照预设速度朝预设方向平移,采集平移过程中多个时刻的图像作为一组标注数据。
  12. 根据权利要求9所述的装置,其特征在于,所述截取模块,具体用于:
    针对每张基准无噪声图像,从该基准无噪声图像中随机选择一个位置作为起始裁剪坐标;
    在所述基准无噪声图像的起始裁剪坐标处裁剪指定边长的正方形,作为一张裁剪图像;
    将所述起始裁剪坐标进行随机偏移,得到下一个裁剪坐标,在所述基准无噪声图像的下一个裁剪坐标处裁剪所述指定边长的正方形,作为一张裁剪图像;
    每对裁剪坐标进行一次随机偏移,从所述基准无噪声图像中裁剪出一张裁剪图像,直至得到预设数量的裁剪图像,将所述预设数量的裁剪图像作为一组标注数据。
  13. 根据权利要求9或10所述的装置,其特征在于,所述采集模块,具体用于:
    采集感光度被设置为最低值的拍摄设备拍摄到的多张静态RAW图像;
    针对每张静态RAW图像,将该静态RAW图像分别处理为不同的亮度值,得到多张基准无噪声图像。
  14. 根据权利要求9-11任一项所述的装置,其特征在于,所述装置还包括:
    第二输入模块,用于将所述预设训练集包括的一组样本数据中的第一帧图像与自身进行拼接后,将拼接后的图像输入所述卷积神经网络模型,依次由所述卷积神经网络模型包括的第一卷积网络和 第二卷积网络处理拼接后的图像;
    第二获取模块,用于获取所述第一卷积网络输出的中间降噪图像和所述第二卷积网络输出的最终降噪图像;
    计算模块,用于基于所述最终降噪图像与所述第一帧图像计算损失函数值;
    判断模块,用于根据所述损失函数值判断所述卷积神经网络模型是否收敛;若未收敛,则触发调整模块基于所述损失函数值调整所述卷积神经网络模型的参数,并将所述一组样本数据中的下一帧图像与所述第一卷积网络最近一次输出的中间降噪图像进行拼接,触发所述第二输入模块执行所述将拼接后的图像输入所述卷积神经网络模型的步骤,直至所述判断模块所述卷积神经网络模型收敛时,则将训练得到的卷积神经网络模型作为所述图像降噪模型。
  15. 一种电子设备,其特征在于,包括处理器、通信接口、存储器和通信总线,其中,处理器,通信接口,存储器通过通信总线完成相互间的通信;
    存储器,用于存放计算机程序;
    处理器,用于执行存储器上所存放的程序时,实现权利要求1-7任一所述的方法步骤。
  16. 一种计算机可读存储介质,其特征在于,所述计算机可读存储介质内存储有计算机程序,所述计算机程序被处理器执行时实现权利要求1-7任一所述的方法步骤。
PCT/CN2022/142016 2021-12-31 2022-12-26 一种降噪方法、装置、电子设备及介质 WO2023125440A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202111663744.8 2021-12-31
CN202111663744.8A CN114331902B (zh) 2021-12-31 2021-12-31 一种降噪方法、装置、电子设备及介质

Publications (1)

Publication Number Publication Date
WO2023125440A1 true WO2023125440A1 (zh) 2023-07-06

Family

ID=81021832

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/142016 WO2023125440A1 (zh) 2021-12-31 2022-12-26 一种降噪方法、装置、电子设备及介质

Country Status (2)

Country Link
CN (1) CN114331902B (zh)
WO (1) WO2023125440A1 (zh)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116908212A (zh) * 2023-09-12 2023-10-20 厦门微亚智能科技股份有限公司 一种基于特征提取的电芯蓝膜外观缺陷检测方法及***

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114331902B (zh) * 2021-12-31 2022-09-16 英特灵达信息技术(深圳)有限公司 一种降噪方法、装置、电子设备及介质

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109360192A (zh) * 2018-09-25 2019-02-19 郑州大学西亚斯国际学院 一种基于全卷积网络的物联网大田作物叶部病害检测方法
CN110136147A (zh) * 2019-05-21 2019-08-16 湖北工业大学 一种基于U-Net模型的分割医学图像的方法、装置及存储介质
CN110163827A (zh) * 2019-05-28 2019-08-23 腾讯科技(深圳)有限公司 图像去噪模型的训练方法、图像去噪方法、装置及介质
CN111353948A (zh) * 2018-12-24 2020-06-30 Tcl集团股份有限公司 一种图像降噪方法、装置及设备
US20210390668A1 (en) * 2020-06-11 2021-12-16 GE Precision Healthcare LLC Image noise reduction method and device
CN113850741A (zh) * 2021-10-10 2021-12-28 杭州知存智能科技有限公司 图像降噪方法、装置、电子设备以及存储介质
CN114331902A (zh) * 2021-12-31 2022-04-12 英特灵达信息技术(深圳)有限公司 一种降噪方法、装置、电子设备及介质

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106910247B (zh) * 2017-03-20 2020-10-02 厦门黑镜科技有限公司 用于生成三维头像模型的方法和装置
US10691975B2 (en) * 2017-07-19 2020-06-23 XNOR.ai, Inc. Lookup-based convolutional neural network
CN108986047B (zh) * 2018-07-13 2022-03-01 中国科学技术大学 图像降噪方法
CN109064428B (zh) * 2018-08-01 2021-04-13 Oppo广东移动通信有限公司 一种图像去噪处理方法、终端设备及计算机可读存储介质

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109360192A (zh) * 2018-09-25 2019-02-19 郑州大学西亚斯国际学院 一种基于全卷积网络的物联网大田作物叶部病害检测方法
CN111353948A (zh) * 2018-12-24 2020-06-30 Tcl集团股份有限公司 一种图像降噪方法、装置及设备
CN110136147A (zh) * 2019-05-21 2019-08-16 湖北工业大学 一种基于U-Net模型的分割医学图像的方法、装置及存储介质
CN110163827A (zh) * 2019-05-28 2019-08-23 腾讯科技(深圳)有限公司 图像去噪模型的训练方法、图像去噪方法、装置及介质
US20210390668A1 (en) * 2020-06-11 2021-12-16 GE Precision Healthcare LLC Image noise reduction method and device
CN113850741A (zh) * 2021-10-10 2021-12-28 杭州知存智能科技有限公司 图像降噪方法、装置、电子设备以及存储介质
CN114331902A (zh) * 2021-12-31 2022-04-12 英特灵达信息技术(深圳)有限公司 一种降噪方法、装置、电子设备及介质

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116908212A (zh) * 2023-09-12 2023-10-20 厦门微亚智能科技股份有限公司 一种基于特征提取的电芯蓝膜外观缺陷检测方法及***
CN116908212B (zh) * 2023-09-12 2023-12-08 厦门微亚智能科技股份有限公司 一种基于特征提取的电芯蓝膜外观缺陷检测方法及***

Also Published As

Publication number Publication date
CN114331902A (zh) 2022-04-12
CN114331902B (zh) 2022-09-16

Similar Documents

Publication Publication Date Title
WO2023125440A1 (zh) 一种降噪方法、装置、电子设备及介质
WO2020171373A1 (en) Techniques for convolutional neural network-based multi-exposure fusion of multiple image frames and for deblurring multiple image frames
Lv et al. Attention guided low-light image enhancement with a large scale low-light simulation dataset
JP6803899B2 (ja) 画像処理方法、画像処理装置および電子機器
CN110428366B (zh) 图像处理方法和装置、电子设备、计算机可读存储介质
CN108898567B (zh) 图像降噪方法、装置及***
JP2021534520A (ja) 画像強調のための人工知能技法
Zhang et al. Gradient-directed multiexposure composition
US20170256036A1 (en) Automatic microlens array artifact correction for light-field images
US20170109912A1 (en) Creating a composite image from multi-frame raw image data
WO2021189733A1 (zh) 图像处理方法及装置、电子设备、存储介质
CN109951635B (zh) 拍照处理方法、装置、移动终端以及存储介质
WO2020215644A1 (zh) 视频图像处理方法及装置
WO2021082819A1 (zh) 一种图像生成方法、装置及电子设备
WO2020001219A1 (zh) 图像处理方法和装置、存储介质、电子设备
CN110084765B (zh) 一种图像处理方法、图像处理装置及终端设备
CN110971841A (zh) 图像处理方法、装置、存储介质及电子设备
Singh et al. A comprehensive review of convolutional neural network based image enhancement techniques
CN114390201A (zh) 对焦方法及其装置
CN114298942A (zh) 图像去模糊方法及装置、计算机可读介质和电子设备
WO2023125503A1 (zh) 一种暗光图像降噪方法及装置
CN112418243A (zh) 特征提取方法、装置及电子设备
CN111754411B (zh) 图像降噪方法、图像降噪装置及终端设备
CN115170383A (zh) 一种图像虚化方法、装置、存储介质及终端设备
Park et al. High dynamic range image acquisition using multiple images with different apertures

Legal Events

Date Code Title Description
WWE Wipo information: entry into national phase

Ref document number: 18024922

Country of ref document: US

121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22914694

Country of ref document: EP

Kind code of ref document: A1