WO2021179147A1 - 一种基于神经网络的图像处理方法及装置 - Google Patents

一种基于神经网络的图像处理方法及装置 Download PDF

Info

Publication number
WO2021179147A1
WO2021179147A1 PCT/CN2020/078484 CN2020078484W WO2021179147A1 WO 2021179147 A1 WO2021179147 A1 WO 2021179147A1 CN 2020078484 W CN2020078484 W CN 2020078484W WO 2021179147 A1 WO2021179147 A1 WO 2021179147A1
Authority
WO
WIPO (PCT)
Prior art keywords
image
processed
neural network
images
frame
Prior art date
Application number
PCT/CN2020/078484
Other languages
English (en)
French (fr)
Inventor
李蒙
胡慧
陈海
郑成林
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Priority to PCT/CN2020/078484 priority Critical patent/WO2021179147A1/zh
Priority to CN202080098211.7A priority patent/CN115244569A/zh
Publication of WO2021179147A1 publication Critical patent/WO2021179147A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T5/00Image enhancement or restoration
    • G06T5/60Image enhancement or restoration using machine learning, e.g. neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10024Color image
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]

Definitions

  • This application relates to the field of image processing technology, and in particular to a neural network-based image processing method and device.
  • the mobile terminal performs image signal processing (ISP) on the image signal.
  • ISP image signal processing
  • the main function of ISP is to perform post-processing on the image signal output by the front-end image sensor. Depending on the ISP, the images obtained under different optical conditions can better restore the details of the scene.
  • the ISP processing flow is shown in Figure 1.
  • the natural scene 101 obtains a Bayer image through a lens 102, and then obtains an analog electrical signal 105 through photoelectric conversion 104, and further obtains a digital electrical signal through denoising and analog-to-digital processing 106 (Ie, raw image) 107, which will then enter the digital signal processing chip 100.
  • the steps in the digital signal processing chip 100 are the core steps of ISP processing.
  • the digital signal processing chip 100 generally includes black level compensation (BLC) 108, lens shading correction 109, and dead pixel correction ( bad pixel correction, BPC) 110, demosaic (demosaic) 111, Bayer domain noise reduction (denoise) 112, auto white balance (AWB) 113, Ygamma 114, auto exposure (AE) 115, auto focus (auto focus, AF) (not shown in Figure 1), color correction (CC) 116, gamma correction 117, color gamut conversion 118, color denoising/detail enhancement 119, color enhancement (color Enhance (CE) 120, formater (formater) 121, input/output (input/output, I/O) control 122 and other modules.
  • BLC black level compensation
  • BPC dead pixel correction
  • demosaic demosaic
  • Bayer domain noise reduction denoise
  • ARB auto white balance
  • AE auto exposure
  • AF auto focus
  • CE color Enhance
  • CE color Enhance
  • ISP based on deep learning has achieved certain results in the application of many tasks.
  • the ISP based on deep learning will process the image data through a neural network and then output it.
  • the processing complexity of the neural network is generally very high.
  • the expected purpose can be achieved, but in scenarios that require real-time processing , Generally there are problems such as energy consumption and running time.
  • ISP based on neural network needs to be further optimized.
  • This application provides a neural network-based image processing method and device, in order to optimize the neural network-based image signal processing performance.
  • a neural network-based image processing method which uses a first neural network and a second neural network to process an image to be processed, and output the processed image.
  • the image to be processed includes a first component image and a second component image.
  • the image to be processed is input to the first neural network for calculation to obtain a first image
  • the first image is the The first component image processed by the first neural network of the image to be processed
  • the first to-be-processed image matrix is input to the second neural network for operation to obtain a second image
  • the second image is a second component image of the to-be-processed image processed by the second neural network; based on the The second image, the processed image is obtained.
  • the first image obtained after the image to be processed undergoes the first neural network operation can process a part of the component image of the image to be processed to obtain an intermediate result.
  • the first image is spliced with the image to be processed, and the spliced result is processed by the second neural network to obtain the second image.
  • the intermediate results can be applied to the processing of the second neural network, reducing the computational complexity of the second neural network and ensuring the quality of image processing.
  • the second component image is the brightness component of the image to be processed.
  • the brightness component is an important component in the image processing process, which occupies a relatively high proportion of network complexity.
  • the first neural network can process the brightness component first.
  • the processing result of the brightness component is input into the second neural network as an intermediate result, and the complexity requirement of the second neural network will be reduced.
  • the first image and the second image are combined to generate the processed image.
  • a third image is obtained at the same time, and the third image is a first component image of the image to be processed that has been processed by the second neural network; corresponding , Combining the third image and the second image to generate the processed image.
  • the complexity of the second neural network is lower than the complexity of the first neural network.
  • the computing power required by the first component image is higher than the computing power required by the second component image.
  • the image to be processed may be one or more frames.
  • the image to be processed includes multiple frames of temporally adjacent images.
  • the first image is multiple frames
  • the second image is multiple frames.
  • Each frame of the image to be processed corresponds to one frame of the first image and one frame. Frame the second image.
  • the first neural network is used to deal with the problem of complex computing power between multiple frames of images
  • the second neural network is used to deal with the problem of lower computing power in each frame of the multi-frame image, and output the multi-frame processed image, making the first
  • the comprehensive computing power of the neural network and the second neural network is dispersed on multiple frames of images, so that the processing complexity of each frame of image is reduced compared with the above-mentioned solutions, and at the same time, the quality of the image or video can be guaranteed.
  • the first component image is the luminance channel
  • the second component image is the chrominance channel.
  • the first neural network can solve the problem of inter-frame motion between multiple frames of images.
  • the luminance channel and the chrominance channel of the corresponding frame are input into the second Neural network, the second neural network processes the chromaticity of each frame of image, so that the result of a processed luminance channel can be used as a guide, and the second neural network with smaller computing power can be used to solve the color problem in the frame.
  • the image processing system provided by the present application has lower complexity in image processing and guarantees the quality of the image or video.
  • the computing power required by the color channel is less than the computing power required by the brightness channel.
  • YUV images generally use a 420 sampling format, that is, the resolution of the color channel is half of that of the brightness channel.
  • the calculation performed on the second neural network includes the following steps: obtaining a feature map matrix of the first image matrix to be processed according to the first image matrix to be processed, and adding the feature The image matrix is respectively vector-spliced with each frame of the first image to obtain a plurality of second to-be-processed image matrices, wherein each frame of the second image is obtained according to each second to-be-processed image matrix.
  • the vector stitching of the first image and the to-be-processed image may be implemented in the following manner: grouping the multiple frames of adjacent images in the time domain to obtain multiple sets of subgroup images Like; the first image of each frame and a group of sub-group images are vector stitched to generate multiple sub-matrices of images to be processed.
  • the first image and the sub-group of images for vector stitching correspond to the same frame of image to be processed.
  • the first component image is the brightness component of the image to be processed.
  • the second component image is one or more chrominance components, or one or more color components of the image to be processed.
  • the first component image and the second component image are respectively different color components of the image to be processed.
  • the first neural network and the second neural network constitute an image processing system, and the image processing system is used to process the image to be processed for noise reduction and mosaic effect elimination.
  • the format of the image to be processed may be a red-green-blue (RGB) format, a bright color separation (YUV) format, or a Bayer format.
  • RGB red-green-blue
  • YUV bright color separation
  • Bayer format a format of the image to be processed
  • a neural network-based image processing device can be a terminal device, a device in a terminal device (such as a chip, or a chip system, or a circuit), or a device that can be matched with the terminal device.
  • the device may include modules that perform one-to-one correspondence of the methods/operations/steps/actions described in the first aspect.
  • the modules may be hardware circuits, software, or hardware circuits combined with software.
  • the device processes the image to be processed and obtains the processed image.
  • the image to be processed includes a first component image and a second component image.
  • the device may include an arithmetic module and a splicing module.
  • the processing module is used to call the communication module to perform the function of receiving and/or sending.
  • An arithmetic module configured to input a to-be-processed image into a first neural network for calculation to obtain a first image, the first image being a first component image of the to-be-processed image processed by the first neural network;
  • the stitching module is used to concatenate the first image and the to-be-processed image in vector to obtain a first to-be-processed image matrix;
  • the arithmetic module is also used to concatenate the first to-be-processed image matrix
  • Input a second neural network to perform operations to obtain a second image, the second image being a second component image of the image to be processed that has been processed by the second neural network; based on the second image, the processed image is obtained After the image.
  • the arithmetic module obtains a processed image based on the second image
  • the calculation module is specifically used for:
  • the first image and the second image are combined to generate the processed image.
  • a third image is obtained at the same time, and the third image is a first component image of the image to be processed that has been processed by the second neural network; corresponding , Combining the third image and the second image to generate the processed image.
  • the complexity of the second neural network is lower than the complexity of the first neural network.
  • the computing power required by the first component image is higher than the computing power required by the second component image.
  • the image to be processed may be one or more frames.
  • the image to be processed includes multiple frames of temporally adjacent images.
  • the first image is multiple frames
  • the second image is multiple frames.
  • Each frame of the image to be processed corresponds to one frame of the first image and one frame. Frame the second image.
  • the calculation module in the calculation performed on the second neural network, is used to: obtain the feature map matrix of the first to-be-processed image matrix according to the first to-be-processed image matrix, and to convert the The feature map matrix is respectively vector stitched with each frame of the first image to obtain a plurality of second to-be-processed image matrices, wherein each frame of the second image is obtained according to each second to-be-processed image matrix.
  • the stitching module when performing vector stitching of the first image and the to-be-processed image, is configured to: group the images adjacent to the multiple frames in the time domain to obtain multiple groups of subgroups Image; the first image of each frame and a group of sub-groups of images are vector stitched to generate multiple sub-matrices of images to be processed.
  • the first image and the sub-group of images for vector stitching correspond to the same frame of image to be processed.
  • the first component image is the brightness component of the image to be processed.
  • the second component image is one or more chrominance components, or one or more color components of the image to be processed.
  • the first component image and the second component image are respectively different color components of the image to be processed.
  • the first neural network and the second neural network constitute an image processing system, and the image processing system is used to process the image to be processed for noise reduction and mosaic effect elimination.
  • the format of the image to be processed may be a red-green-blue (RGB) format, a bright color separation (YUV) format, or a Bayer format.
  • RGB red-green-blue
  • YUV bright color separation
  • Bayer format a format of the image to be processed
  • an embodiment of the present application provides an image processing device based on a neural network.
  • the device includes a processor, and the processor is used to call a set of programs, instructions, or data to execute the first aspect or any one of the first aspects. Possible design methods described.
  • the device may also include a memory for storing programs, instructions or data called by the processor.
  • the memory is coupled with the processor, and when the processor executes the instructions or data stored in the memory, it can implement the method described in the first aspect or any possible design.
  • an embodiment of the present application provides a chip system, which includes a processor and may also include a memory, for implementing the method described in the first aspect or any one of the possible designs of the first aspect.
  • the chip system can be composed of chips, and can also include chips and other discrete devices.
  • the embodiments of the present application also provide a computer-readable storage medium.
  • the computer-readable storage medium stores computer-readable instructions.
  • the method described in one aspect or any one of the possible designs of the first aspect is executed.
  • the embodiments of the present application also provide a computer program product containing instructions, which when run on a computer, cause the computer to execute the method described in the first aspect or any possible design of the first aspect .
  • FIG. 1 is a schematic diagram of an ISP processing flow in the prior art
  • FIG. 2 is a schematic structural diagram of the system architecture in an embodiment of the application
  • FIG. 3 is a schematic diagram of the principle of a neural network in an embodiment of the application.
  • FIG. 4 is a schematic flowchart of an image processing method based on a neural network in an embodiment of the application
  • FIG. 5a is a schematic diagram of implementation manner 1 of image processing in an embodiment of the application.
  • FIG. 5b is a schematic diagram of implementation manner 2 of image processing in an embodiment of the application.
  • FIG. 6 is a schematic diagram of a RGrGbB image processing method in an embodiment of the application.
  • FIG. 7 is a second schematic diagram of the RGrGbB image processing method in an embodiment of the application.
  • FIG. 8a is one of the schematic structural diagrams of the first neural network in an embodiment of the application.
  • FIG. 8b is the second schematic diagram of the structure of the first neural network in the embodiment of this application.
  • FIG. 9a is a schematic diagram of part of the processing process of a typical convolutional neural network in an embodiment of the application.
  • FIG. 9b is a schematic diagram of part of the processing process of a multi-branch neural network in an embodiment of the application.
  • FIG. 10a is a schematic structural diagram of a second neural network in an embodiment of the application.
  • 10b is a partial schematic diagram of the multi-branch operation of the second neural network in an embodiment of the application.
  • FIG. 11 is a partial schematic diagram of a typical neural network operation used in a second neural network in an embodiment of the application;
  • FIG. 12 is a schematic diagram of vector stitching of a first image and a feature map matrix in an embodiment of this application;
  • FIG. 13 is one of the schematic structural diagrams of the image processing device based on neural network in an embodiment of the application.
  • FIG. 14 is the second structural diagram of the image processing device based on neural network in an embodiment of the application.
  • the image processing method and device based on neural network (NN) provided by the embodiments of this application can be applied to electronic equipment, and the electronic equipment may be a mobile terminal (mobile terminal), mobile station (mobile station, MS), Mobile devices such as user equipment (UE) can also be fixed devices, such as fixed phones, desktop computers, etc., or video monitors.
  • the electronic device is an image acquisition and processing device with image signal acquisition and processing functions, and has an ISP processing function.
  • the electronic device can also optionally have a wireless connection function to provide users with a handheld device with voice and/or data connectivity, or other processing devices connected to a wireless modem.
  • the electronic device can be a mobile phone (or (Called "cellular" phones), computers with mobile terminals, etc., can also be portable, pocket-sized, handheld, computer-built or vehicle-mounted mobile devices, of course, can also be wearable devices (such as smart watches, smart bracelets) Etc.), tablet computers, personal computers (PC), personal digital assistants (PDAs), point of sales (POS), etc.
  • a terminal device may be used as an example for description.
  • FIG. 2 is a schematic diagram of an optional hardware structure of a terminal device 200 related to an embodiment of the application.
  • the terminal device 200 mainly includes a chipset and peripheral devices.
  • Components such as USB interface, memory, display screen, battery/mains power, earphone/speaker, antenna, sensor, etc. can be understood as peripheral devices.
  • the arithmetic processor, RAM, I/O, display interface, ISP, Sensor hub, baseband and other components in the chipset can form a system-on-a-chip (SOC), which is the main part of the chipset.
  • SOC system-on-a-chip
  • the components in the SOC can all be integrated into a complete chip, or part of the components in the SOC can be integrated, and the other parts are not integrated.
  • the baseband communication module in the SOC can not be integrated with other parts and become an independent part.
  • the components in the SOC can be connected to each other through a bus or other connecting lines.
  • the PMU, voice codec, RF, etc. outside the SOC usually include analog circuit parts, so they are often outside the SOC and are not integrated with each other.
  • the PMU is used to connect to the mains or battery to supply power to the SOC, and the mains can be used to charge the battery.
  • the voice codec is used as the sound codec unit to connect with earphones or speakers to realize the conversion between natural analog voice signals and digital voice signals that can be processed by the SOC.
  • the short-range module can include wireless fidelity (WiFi) and Bluetooth, and can also optionally include infrared, near field communication (NFC), radio (FM) or global positioning system (GPS) ) Module etc.
  • the RF is connected with the baseband communication module in the SOC to realize the conversion between the air interface RF signal and the baseband signal, that is, mixing. For mobile phones, receiving is down-conversion, and sending is up-conversion.
  • Both the short-range module and the RF can have one or more antennas for signal transmission or reception.
  • Baseband is used for baseband communication, including one or more of a variety of communication modes, used for processing wireless communication protocols, including physical layer (layer 1), medium access control (MAC) ( Layer 2), radio resource control (RRC) (layer 3) and other protocol layers can support various cellular communication standards, such as long term evolution (LTE) communication, or 5G new air interface ( new radio, NR) communication, etc.
  • the Sensor hub is an interface between the SOC and external sensors, and is used to collect and process data from at least one external sensor.
  • the external sensors can be, for example, accelerometers, gyroscopes, control sensors, image sensors, and so on.
  • the arithmetic processor can be a general-purpose processor, such as a central processing unit (CPU), or one or more integrated circuits, such as one or more application specific integrated circuits (ASICs), or , One or more digital signal processors (digital signal processors, DSP), or microprocessors, or, one or more field programmable gate arrays (FPGA), etc.
  • the arithmetic processor can include one or more cores, and can selectively schedule other units.
  • RAM can store some intermediate data during calculation or processing, such as intermediate calculation data of CPU and baseband.
  • ISP is used to process the data collected by the image sensor.
  • I/O is used for the SOC to interact with various external interfaces, such as the universal serial bus (USB) interface for data transmission.
  • USB universal serial bus
  • the memory can be a chip or a group of chips.
  • the display screen can be a touch screen, which is connected to the bus through a display interface.
  • the display interface can be used for data processing before image display, such as aliasing of multiple layers to be displayed, buffering of display data, or control and adjustment of screen brightness.
  • the terminal device 200 involved in the embodiments of the present application includes an image sensor, which can collect external signals such as light from the outside, and process and convert the external signals into sensor signals, that is, electrical signals.
  • the sensor signal can be a static image signal or a dynamic video image signal.
  • the image sensor may be a camera, for example.
  • the terminal device 200 involved in the embodiment of the present application also includes an image signal processor.
  • the image sensor collects sensor signals and transmits them to the image signal processor.
  • the image signal processor obtains the sensor signal and can perform image signal processing on the sensor signal. , In order to obtain the image signal of the sharpness, color, brightness and other aspects that are in line with the characteristics of the human eye.
  • the image signal processor involved in the embodiment of the present application may be one or a group of chips, that is, it may be integrated or independent.
  • the image signal processor included in the terminal device 200 may be an integrated ISP chip integrated in the arithmetic processor.
  • the terminal device 200 involved in the embodiments of the present application has the function of taking photos or recording videos.
  • the neural network-based image processing method provided in the embodiments of the present application mainly focuses on how to perform image signal processing based on the neural network.
  • a neural network is used to process the multi-frame images to be processed.
  • Neural network is a network structure that imitates the behavioral characteristics of animal neural networks for information processing, also referred to as neural networks (NN).
  • the neural network can be composed of neural units.
  • the neural unit can refer to the arithmetic unit that takes the input signal x s and the intercept 1 as input.
  • the output of the arithmetic unit can be as shown in formula (1-1):
  • s 1, 2,...n, n is a natural number greater than 1
  • W s is the weight of x s
  • b is the bias of the neural unit.
  • f is the activation function of the neural unit, which is used to introduce nonlinear characteristics into the neural network to convert the input signal in the neural unit into an output signal.
  • the output signal of the activation function can be used as the input of the next convolutional layer, and the activation function can be a sigmoid function.
  • a neural network is a network formed by connecting multiple above-mentioned single neural units together, that is, the output of one neural unit can be the input of another neural unit.
  • the input of each neural unit can be connected with the local receptive field of the previous layer to extract the characteristics of the local receptive field.
  • the local receptive field can be a region composed of several neural units.
  • the neural network 300 has N processing layers, where N ⁇ 3 and N takes a natural number.
  • the first layer of the neural network is the input layer 301, which is responsible for receiving input signals.
  • the last layer of the neural network is the output layer 303, which outputs the processing results of the neural network.
  • the other layers except the first and last layers are the intermediate layer 304.
  • These intermediate layers together form the hidden layer 302, each of the hidden layers
  • the middle layer of the layer can receive input signals and output signals, and the hidden layer is responsible for the processing of input signals.
  • Each layer represents a logic level of signal processing. Through multiple layers, data signals can be processed by multiple levels of logic.
  • the input signal of the neural network may be a signal in various forms such as a voice signal, a text signal, an image signal, and a temperature signal.
  • the processed image signals may be various sensor signals such as landscape signals captured by a camera (image sensor), image signals of a community environment captured by a display monitoring device, and facial signals of human faces acquired by an access control system.
  • the input signals of the neural network include various other engineering signals that can be processed by computers, so I won't list them all here. If the neural network is used for deep learning of the image signal, the image quality can be improved.
  • Deep neural network also known as multi-layer neural network
  • the DNN is divided according to the positions of different layers.
  • the neural network inside the DNN can be divided into three categories: input layer, hidden layer, and output layer.
  • the first layer is the input layer
  • the last layer is the output layer
  • the number of layers in the middle are all hidden layers.
  • the layers are fully connected, that is to say, any neuron in the i-th layer must be connected to any neuron in the i+1th layer.
  • x is the input vector and y is The output vector
  • b is the offset vector
  • W is the weight matrix (also called coefficient)
  • ⁇ () is the activation function.
  • Each layer simply performs such a simple operation on the input vector x to obtain the output vector y. Due to the large number of DNN layers, the number of coefficients W and offset vectors b is also relatively large.
  • DNN The definition of these parameters in DNN is as follows: Take coefficient W as an example: Suppose in a three-layer DNN, the linear coefficients from the fourth neuron in the second layer to the second neuron in the third layer are defined as The superscript 3 represents the number of layers where the coefficient W is located, and the subscript corresponds to the output third-level index 2 and the input second-level index 4.
  • the coefficient from the kth neuron in the L-1th layer to the jth neuron in the Lth layer is defined as
  • Convolutional neural network (convolutional neuron network, CNN) is a deep neural network with a convolutional structure.
  • the convolutional neural network contains a feature extractor composed of a convolutional layer and a sub-sampling layer.
  • the feature extractor can be regarded as a filter.
  • the convolutional layer refers to the neuron layer that performs convolution processing on the input signal in the convolutional neural network.
  • a neuron can be connected to only part of the neighboring neurons.
  • a convolutional layer usually contains several feature planes, and each feature plane can be composed of some rectangularly arranged neural units. Neural units in the same feature plane share weights, and the shared weights here are the convolution kernels.
  • Sharing weight can be understood as the way of extracting image information has nothing to do with location.
  • the convolution kernel can be initialized in the form of a matrix of random size. In the training process of the convolutional neural network, the convolution kernel can obtain reasonable weights through learning. In addition, the direct benefit of sharing weights is to reduce the connections between the layers of the convolutional neural network, and at the same time reduce the risk of overfitting.
  • the neural network in the embodiment of the present application may be a convolutional neural network, and of course, it may also be another type of neural network, such as a recurrent neural network (recurrent neural network, RNN).
  • recurrent neural network recurrent neural network
  • the images in the embodiments of the present application may be static images (or referred to as static pictures) or dynamic images (or referred to as dynamic pictures).
  • the images in the present application may be videos or dynamic pictures, or the present application
  • the images in can also be static pictures or photos.
  • static images or dynamic images are collectively referred to as images.
  • This method is executed by an image processing device based on a neural network.
  • the neural network-based image processing device may be any device or device with image processing functions to execute, for example, the method is executed by the terminal device 200 shown in FIG. 2 or executed by a device related to the terminal device, or It is executed by part of the equipment included in the terminal equipment.
  • multiple neural networks are used for image processing, for example, two neural networks are used to process the image to be processed, and the two neural networks are denoted as the first neural network and the second neural network.
  • the first neural network and the second neural network conform to the above description of the neural network.
  • the image to be processed includes component images of one or more dimensions.
  • the image to be processed includes a first component image and a second component image.
  • the process of performing image processing on the image to be processed includes processing the first component image and the second component image.
  • the neural network-based image processing method provided by the embodiment of the present application is as follows.
  • S401 Input the image to be processed into the first neural network for calculation to obtain a first image.
  • the first image is a first component image processed by the first neural network of the image to be processed.
  • S402 Perform vector concatenation of the first image and the to-be-processed image to obtain a first to-be-processed image matrix.
  • S403 Input the first image matrix to be processed into the second neural network for calculation to obtain a second image.
  • the second image is a second component image of the image to be processed that has been processed by the second neural network
  • the first image obtained after the first neural network operation is performed on the image to be processed can process a part of the component image of the image to be processed to obtain an intermediate result.
  • the first image is spliced with the image to be processed, and the splicing is processed by the second neural network to obtain the second image.
  • the intermediate results can be applied to the processing of the second neural network, reducing the computational complexity of the second neural network and ensuring the quality of image processing.
  • the first component image is the brightness component of the image to be processed.
  • the brightness component is an important component in the image processing process, which occupies a relatively high proportion of network complexity, and the brightness component can be processed first through the first neural network.
  • the processing result of the brightness component is input into the second neural network as an intermediate result, and the complexity requirement of the second neural network will be reduced.
  • the image to be processed includes a first component image and a second component image, and the processed image also includes the first component image and the second component image.
  • the first image is obtained based on the first neural network
  • the second image is obtained by the second neural network
  • the first image and the second image are combined to obtain the processed image.
  • Combining the first image and the second image can also be considered as combining the first image and the second image, because the first image is the first component image processed by the first neural network, and the second image is the second neural network.
  • the first image and the second image are combined, that is, the processed first component image and the processed second component image are combined to obtain the processed image.
  • the first image matrix to be processed is input to the second neural network for calculation, and when the second image is obtained, the third image is also obtained at the same time.
  • the third image is the first component image processed by the second neural network of the image to be processed. In this way, the third image and the second image can be combined to generate a processed image.
  • the image to be processed may be one frame or multiple adjacent frames in the time domain.
  • the adjacent multi-frames in the time domain include consecutive multi-frames in the time domain.
  • the adjacent multi-frames in the time domain are referred to as multi-frames in the description below.
  • the processed image is also corresponding to multiple frames.
  • the first component image and the second component image are multiple frames respectively
  • the first image obtained after processing by the first neural network is multiple frames
  • the second image obtained by the second neural network is Multiple frames
  • each frame of the to-be-processed image corresponds to one frame of the first image and one frame of the second image.
  • Each frame of processed image corresponds to one frame of first image and one frame of second image, or each frame of processed image corresponds to one frame of first image and one frame of third image.
  • the solution of the embodiment of the present application is as follows. Input the multi-frame images to be processed into the first neural network for processing to obtain the first image.
  • the first image is the multi-frame first component image of the multi-frame image to be processed and processed by the first neural network. After the processed image passes through the first neural network, a frame of the first component image will be obtained.
  • the first image and the image to be processed are vector stitched to obtain a first image matrix to be processed, and correspondingly, there are multiple first image matrices to be processed. Specifically, each frame of the first image and the corresponding frame of the image to be processed are vector stitched to obtain the first image matrix to be processed.
  • the first image matrix to be processed is input into the second neural network for calculation to obtain a second image.
  • the second image is a multi-frame second component image processed by the second neural network of the multi-frame image to be processed. A processed image is obtained based on the second image.
  • the two optional possible manners are as follows.
  • multiple frames of first component images processed by the first neural network and multiple frames of second component images processed by the second neural network are combined to generate a multi-frame processed image.
  • a third image is obtained.
  • the third image is a multi-frame first component image processed by the second neural network of the multi-frame image to be processed.
  • the image and the second image are combined to generate a processed image.
  • the first component image may be the brightness component or the brightness channel of the image to be processed.
  • the second component image is one or more chrominance components of the image to be processed, or the second component image is one or more color components of the image to be processed, or the second component image is one or more color channels of the image to be processed Or chroma channel.
  • the first component image is one or more chrominance components of the image to be processed
  • the second component image is one or more chrominance components of the image to be processed
  • the first component image is different chrominance components of the image to be processed.
  • the chrominance component may also be referred to as a chrominance channel or a color component or a color channel.
  • the format of the image to be processed may be a red-green-blue (RGB) format, a bright color separation (YUV) format, or a Bayer format.
  • RGB red-green-blue
  • YUV bright color separation
  • Bayer format There is no limitation in this application.
  • the format of the image to be processed is RGB
  • the first component image may be a G channel
  • the second component signal is an RB channel.
  • the first image matrix to be processed is input to the second neural network for calculation to obtain a second image.
  • the second image is multiple frames.
  • the first to-be-processed image matrix is formed by vector stitching of the first image and the to-be-processed image.
  • the first image is multiple frames.
  • the first image matrix to be processed is multiple matrices or includes multiple image sub-matrices to be processed.
  • the feature map matrix of the first to-be-processed image matrix is obtained according to the first to-be-processed image matrix, and the feature map matrix is respectively vector-spliced with each frame of the first image to obtain multiple second images.
  • a matrix of images to be processed, wherein each frame of the second image is obtained according to each second image matrix to be processed.
  • vector stitching is performed on the first image and the image to be processed to obtain the first image matrix to be processed.
  • the first image is multiple frames
  • the to-be-processed image is multiple frames.
  • Multiple frames of to-be-processed images can be grouped to obtain multiple sets of sub-groups of images, and the first image of each frame and a set of sub-groups of images can be vector stitched to obtain multiple sub-matrices of to-be-processed images.
  • the first to-be-processed image matrix includes the plurality of to-be-processed image sub-matrices, or in other words, the plurality of to-be-processed image sub-matrices form the first to-be-processed image matrix.
  • the first image for vector stitching and a group of sub-group images correspond to the same frame of image to be processed.
  • the number of multi-frame images to be processed is 4 frames, and 4 frames of to-be-processed images are input to the first neural network for calculation to obtain 4 first images.
  • each frame of the to-be-processed image corresponds to a first image, for example, the first frame of the to-be-processed image corresponds to the first first image; the second frame of the to-be-processed image corresponds to the second first image.
  • the first group of sub-group images corresponds to the first frame of images to be processed
  • the second group of sub-group images corresponds to the second frame of images to be processed.
  • the first group of sub-group images and the first first image are vector stitched.
  • the second group of sub-group images and the second first image are vector stitched.
  • the vector stitching of multiple frames of the first image and multiple groups of sub-group images can be regarded as the internal processing process of the second neural network.
  • the input to the second neural network is an overall matrix, that is, the first image matrix to be processed.
  • the process of splicing into the first image matrix to be processed can be decomposed into the splicing of the aforementioned multiple frames of first images and multiple groups of sub-group images.
  • the first neural network and the second neural network can be combined into an image processing system, and the image repair system is used to process the image to be processed to improve the quality of the image or video.
  • the processing process can include noise reduction, mosaic effect elimination and other processing.
  • the complexity of the first neural network is higher than the complexity of the second neural network.
  • multiple frames of images are often synthesized into one frame of output through a neural network to improve image or video quality.
  • a neural network requires a high degree of complexity, and in a video scene, a high processing speed is required.
  • terminal video processing needs to achieve a processing speed of 30 frames/s for a video with a resolution of 8K, that is, a frame rate of 30.
  • a neural network is used to synthesize multiple frames of images into one frame for output, it needs to face the problems of computational complexity and computational resource consumption, and a large time delay is required. If you blindly reduce the complexity of the neural network and use a lower complexity network, it will affect the quality of the image or video.
  • the first neural network is used to deal with the problem of complex computing power between multiple frames of images
  • the second neural network is used to deal with the problem of lower computing power in each frame of the multi-frame images, and output the multi-frame processed
  • the integrated computing power of the first neural network and the second neural network is allocated to multiple frames of images, so that the processing complexity of each frame of image is reduced compared with the above-mentioned solution, and at the same time, the quality of the image or video can be guaranteed.
  • the first component image is the luminance channel
  • the second component image is the chrominance channel.
  • the first neural network can solve the problem of inter-frame motion between multiple frames of images
  • the second neural network processes the chrominance of each frame of image. In this way, through the joint processing of the two neural networks, the image processing system provided by the present application has lower complexity in image processing and ensures the quality of the image or video. Improve the application of deep learning technology in the field of image signal processing.
  • the first neural network and the second neural network are convolutional neural networks as an example.
  • the image to be processed is 4 frames, and the processed image is 4 frames.
  • the format of the image to be processed is a Bayer format image, in particular the image format is an RGrGbB format, and one frame of an RGrGbB format image includes 4 channels (R, Gr, Gb, B).
  • R, Gr, Gb, B 4 channels
  • the image processing system includes a first neural network and a second neural network.
  • the 16 channels include (R1, Gr1, Gb1, B1, R2, Gr2, Gb2, B2, R3, Gr3, Gb3, B3, R4, Gr4, Gb4, B4).
  • the first component image is a Gr channel
  • 4 consecutive Gr channel images are obtained.
  • the four consecutive Gr channel images include a first frame of Gr channel image Gr1, a second frame of Gr channel image Gr2, a third frame of Gr channel image Gr3, and a fourth frame of Gr channel image Gr4.
  • the second component image is the R, Gb, and B channels, and 4 consecutive RGbB images (R1, Gb1, B1, R2, Gb2, B2, R3, Gb3, B3, R4, Gb4, B4) are obtained.
  • 4 frames of continuous RGbB channel images include the first frame of RGbB channel images R1, Gb1, B1, the second frame of RGbB channel images R2, Gb2, B2, the third frame of RGbB channel images R3, Gb3, B3, and the fourth frame of RGbB channel images R4, Gb4, B4.
  • the 16 channels include (R1, Gr1, Gb1, B1, R2, Gr2, Gb2, B2, R3, Gr3, Gb3, B3, R4, Gr4, Gb4, B4).
  • the first component image is a Gr channel
  • 4 consecutive Gr channel images are obtained.
  • the four consecutive Gr channel images include a first frame of Gr channel image Gr1, a second frame of Gr channel image Gr2, a third frame of Gr channel image Gr3, and a fourth frame of Gr channel image Gr4.
  • 4 frames of continuous Gr channel images and 4 frames of continuous RGrGbB images to be processed are vector stitched, and the obtained first to be processed image matrix is input into the second neural network for processing, and 4 continuous frames of processed images are obtained.
  • 4 consecutive frames of processed images can also be regarded as 4 consecutive frames of second images and 4 consecutive frames of third images.
  • the third image is the Gr channel, and the 4 consecutive third images are (Gr1, Gr2, Gr3, Gr4).
  • the second component image is the RGbB channel, and 4 consecutive RB images (R1, Gb1, B1, R2, Gb2, B2, R3, Gb3, B3, R4, Gb4, B4) are obtained.
  • Four consecutive frames of RGbB channel images include the first frame of RB channel images R1, Gb1, B1, the second frame of RGbB channel images R2, Gb2, B2, the third frame of RGbB channel images R3, Gb3, B3, and the fourth frame of RGbB channel images R4, Gb4, B4.
  • the architecture of the first neural network is shown in Figs. 8a and 8b. Because the drawings of the first neural network are too large, the first neural network is divided into two parts, as shown in Figs. 8a and 8b, respectively. out. Figures 8a and 8b together form the architecture of the first neural network. After add in Figure 8a, connect to the first layer in Figure 8b.
  • the convolutional layer is represented by a rectangular box.
  • Conv2d represents the 2-dimensional convolution
  • bias represents the bias term
  • 1x1/3x3 represents the size of the convolution kernel
  • Stride represents the step size
  • _32_16 represents the number of input and output feature maps
  • 32 represents the number of feature maps input to the layer is 32
  • 16 means that the number of feature maps of the output layer is 16.
  • Split represents the split layer, which means that the feature map is split in the channel (chanel) dimension.
  • Split 2 means to split the image in the feature map dimension. For example, an image input with 32 feature maps will become two images with 16 feature maps after the above operation.
  • concat stands for the jump chain layer, which means that images are merged in the feature map dimension, for example, two images with 16 feature maps are merged into one image with 32 feature maps.
  • add represents a matrix addition operation.
  • the convolutional layer of the first neural network shown in Figure 8a and Figure 8b adopts multi-branch operation, which can solve the interference of motion information between multiple frames of brightness channels, and then solve multiple frames through a complex convolutional neural network Motion interference between image brightness channels.
  • the neural network system in the embodiment of the present application is a network for noise reduction
  • the above-mentioned multi-branch operation can be used to obtain the result of multi-frame luminance channel noise reduction, and can ensure the multi-frame luminance channel noise reduction
  • the latter result does not have problems such as motion blur and motion smearing.
  • the convolutional layer of the first neural network may also adopt a group convolution (group convolution) operation. Among them, the group convolution is a special convolution layer.
  • N the number of groups M for group convolution.
  • the operation of the group of convolutional layers is to first divide the N channels into M parts. Each group corresponds to N/M channels, and the convolution of each group is performed independently. After completion, the output feature maps are vector concatenated (concat) together as the output channel of this layer.
  • the operation mode of group convolution can obtain the same or similar technical effect of the branch mode.
  • a typical convolutional neural network can also be used. If a typical convolutional neural network is used, before obtaining 4 frames of continuous Gr channel images, it is different from Figure 8b Yes, the operation after the concat result of the last step of the jump chain layer. For a better comparison, the multi-branch operation in Figure 8b is shown in Figure 9b.
  • the 32-layer feature maps of the same jump-connected layer result are used as the input of the classic neural network and the multi-branch neural network, respectively. In the classic neural network, multiple convolutional layers are shared.
  • the 32-layer feature map is operated by 4 convolutional layers, and finally 4 layers of feature maps (4 frames of Gr channel images) are output.
  • the multi-branch neural network adopts a 4-branch method. Each branch independently obtains the feature map output of one channel through a 4-layer convolution operation as the result of the luminance channel of one frame, and the four branches respectively obtain 4 luminance channel results.
  • the neural network that uses a mixture of classic convolutional layers and multi-branch convolutional layers can not only solve the problem of image noise reduction, but also ensure that there are no problems such as motion blur and motion smearing when multi-frame Gr channels are output at the same time.
  • the architecture of the second neural network is described below. Exemplarily, the architecture in the second neural network is shown in Fig. 10a.
  • _20_16 indicates that the number of feature maps input to this layer is 20, and the number of feature maps output to this layer is 16.
  • the convolutional layer of the second neural network shown in Fig. 10a adopts a multi-branch operation. If a multi-branch neural network is not used, a typical convolutional neural network can also be used. For better comparison, the multi-branch operation part in Figure 10a is shown in Figure 10b. The operating part of a typical convolutional neural network is shown in Figure 11. Figure 10b and Figure 11 also display the output Gr channel image and RGbB channel image.
  • the leftmost chain jump layer (concat) in FIG. 10b and the leftmost chain jump layer (concat) in FIG. 11 output the same results, and the operations after the concat are different.
  • Each _17_3 in FIG. 11 indicates that the number of feature maps input to this layer is 17, and the number of feature maps output to this layer is 3.
  • _17_12 in Figure 10b indicates that the number of feature maps input to this layer is 17, and the number of feature maps output to this layer is 12.
  • Figure 10b and Figure 11 are the same as the result of the chain-jumping layer.
  • the 17-layer feature maps are used as the input of the classic neural network and the multi-branch neural network, respectively. In the classic neural network, multiple convolutional layers are shared.
  • the 17-layer feature map is operated by one convolutional layer, and finally 12-layer feature map is output (4 frames of R ⁇ Gb ⁇ B channel images).
  • the multi-branch neural network adopts a 4-branch method. Each branch independently links the Gr channel of the first neural network result of the corresponding frame by means of chain skipping. After operating through 1 convolutional layer, 3 channels are obtained. Feature map output, as the result of one frame of color channel (R ⁇ Gb ⁇ B channel) image, four branches, 4 frames of color results are obtained respectively, using multi-branch convolutional layer neural network, and using the jump chain layer will correspond to the reduced Linking the noisy and clean brightness channels can solve the problem of image noise reduction in a very low-complexity network. It can also ensure that the simultaneous output of multi-frame R ⁇ Gb ⁇ B channels does not have problems such as motion blur and motion smearing.
  • 4 consecutive Gr channel images (Gr1, Gr2, Gr3, Gr4) are obtained.
  • the 4 groups of sub-group images are: (R1, Gr1, Gb1, B1), (R2, Gr2, Gb2, B2), (R3, Gr3, Gb3, B3), (R4, Gr4, Gb4, B4) .
  • the first image and the sub-group of images to be vector stitched correspond to the same frame of image to be processed.
  • the first image matrix to be processed obtained after vector stitching.
  • the feature map matrix of the first image matrix to be processed is obtained according to the first image matrix to be processed.
  • the feature map matrix is respectively vector-spliced with the first image of each frame to obtain multiple second to-be-processed image matrices.
  • the first image is obtained by the first neural network.
  • the first image of each frame is input to the second neural network, and vector stitching is performed with the feature map matrix.
  • the feature map matrix of the first image matrix to be processed corresponds to the result of the concat of the chain skipping layer on the leftmost side in FIG. 12.
  • obtaining multiple second vectors is taken as an example.
  • the neural network model needs to be trained before the first neural network and the second neural network are used.
  • the training data can include training images and ground truth images.
  • the training image includes a first component image and a second component image.
  • the output image is compared with the true value image of the first component image until the network converges, and the training of the first neural network model is completed.
  • the so-called network convergence may refer to, for example, that the difference between the output image and the first true value image is less than the set first threshold.
  • the parameters of the first component image obtained by the training of the first neural network are fixed, and the true value image of the second component image is used to process the collected training image to obtain and output the image.
  • the output image is compared with the true value image of the second component image until the network converges, and the training of the second neural network model is completed.
  • network convergence may mean that the difference between the output image and the true value image of the second component image is less than the set second threshold.
  • the input training image is four adjacent or continuous frames in the time domain.
  • the four consecutive images of the training image are R 1 G 1 B 1 , R 2 G 2 B 2 , R 3 G 3 B 3 , and R 4 G 4 B 4 .
  • the first component image is a luminance channel
  • the second component image is a color channel.
  • the training process of the first neural network (or called the luminance channel inter-network) and the second neural network (or called the color channel intra-network) is as follows. Construct two true-value images of brightness channel and color channel.
  • the models of the two networks can be tested.
  • testing there are two ways to output test results.
  • four frames of images R 1 G 1 B 1 , R 2 G 2 B 2 , R 3 G 3 B 3 , R 4 G 4 B 4 are used as input, and G′ is output through the inter-frame brightness network.
  • the four frames of images R 1 G 1 B 1 , R 2 G 2 B 2 , R 3 G 3 B 3 , R 4 G 4 B 4 are used as input and output through the inter-frame brightness network G′ 1 G′ 2 G′ 3 G′ 4 , and then output R′ 1 G′ 1 B′ 1 , R′ 2 G′ 2 B′ 2 , R′ 3 G′ 3 B′ 3 , through the intra-frame color network R′ 4 G′ 4 B′ 4 .
  • an image processing system is formed by a first neural network and a second neural network, and the image processing system is used to process multiple frames of to-be-processed images, and output multiple frames of processed images.
  • the complexity of the second neural network is lower than the complexity of the first neural network.
  • the calculation amount of the image processing system for each frame of the image to be processed is reduced to a certain extent compared with the scheme of processing multiple frames of images into one frame through the basic network in some technologies. In turn, the image processing time delay can be reduced, and the quality of the image or video can be guaranteed.
  • the computing power of the two neural networks for processing multiple frames of images to be processed will be illustrated below with examples.
  • the processed image output by the first neural network and the second neural network is 4 frames.
  • a frame is output after basic network processing.
  • the first neural network is shown in Figure 8a and Figure 8b, and the second neural network is shown in Figure 10a.
  • the calculation amount of the first neural network is about the same as that of the basic network, which is about 12000 MAC.
  • the calculation process of the network complexity of the basic network is as follows:
  • the calculation amount of the second neural network is about 6000, which is assumed to be 6000.
  • the multi-frame input and multi-frame output schemes performed by the first neural network and the second neural network can reduce the amount of calculation, thereby reducing the delay of image processing, and can meet the requirements of video in video scenarios.
  • Requirements for image processing delay The network computing power requirement of a video with a resolution of 8 thousand (K) pixels and a frame rate of 30 frames per second is about 3000 MAC.
  • the embodiment of this application outputs 8 frames, the amount of calculation specified by the image processing system can basically be Meet the network computing power requirements of 8K 30 videos.
  • the neural network-based image processing device may include a hardware structure and/or a software module, and implement the above in the form of a hardware structure, a software module, or a hardware structure plus a software module.
  • a hardware structure e.g., a hardware structure plus a software module.
  • Each function Whether a certain function among the above-mentioned functions is executed by a hardware structure, a software module, or a hardware structure plus a software module depends on the specific application and design constraint conditions of the technical solution.
  • an embodiment of the present application also provides a neural network-based image processing device 1300.
  • the neural network-based image processing device 1300 may be a mobile terminal or any device with image processing functions. .
  • the neural network-based image processing device 1300 may include modules that perform one-to-one correspondence of the methods/operations/steps/actions in the foregoing method embodiments.
  • the modules may be hardware circuits, software, or It is realized by hardware circuit combined with software.
  • the neural network-based image processing device 1300 may include an arithmetic module 1301 and a splicing module 1302.
  • the calculation module 1301 is configured to input the image to be processed into a first neural network for calculation to obtain a first image, and the first image is a first component image of the image to be processed that has been processed by the first neural network;
  • the stitching module 1302 is used for vector stitching (concatenate) the first image and the to-be-processed image to obtain the first to-be-processed image matrix;
  • the calculation module 1301 is also used to input the first image matrix to be processed into the second neural network for calculation to obtain a second image, the second image being a second component image of the image to be processed that has been processed by the second neural network; The second image, the processed image is obtained.
  • the arithmetic module 1301 and the splicing module 1202 can also be used to perform other corresponding steps or operations in the above method embodiments, and will not be repeated here.
  • the division of modules in the embodiments of this application is illustrative, and it is only a logical function division. In actual implementation, there may be other division methods.
  • the functional modules in the various embodiments of this application can be integrated into one process. In the device, it can also exist alone physically, or two or more modules can be integrated into one module.
  • the above-mentioned integrated modules can be implemented in the form of hardware or software functional modules.
  • an embodiment of the present application also provides an image processing device 1400 based on a neural network.
  • the image processing device 1400 of the neural network includes a processor 1401.
  • the processor 1401 is used to call a group of programs, so that the foregoing method embodiments are executed.
  • the neural network image processing device 1400 further includes a memory 1402, and the memory 1402 is used to store program instructions and/or data executed by the processor 1401.
  • the memory 1402 is coupled with the processor 1401.
  • the coupling in the embodiments of the present application is an indirect coupling or communication connection between devices, units or modules, and may be in electrical, mechanical or other forms, and is used for information exchange between devices, units or modules.
  • the processor 1401 may operate in cooperation with the memory 1402.
  • the processor 1401 may execute program instructions stored in the memory 1402.
  • the memory 1402 may be included in the processor 1401.
  • the neural network-based image processing device 1400 may be a chip system.
  • the chip system may be composed of chips, or may include chips and other discrete devices.
  • the processor 1401 is configured to input the image to be processed into a first neural network for calculation to obtain a first image, the first image being a first component image of the image to be processed that has been processed by the first neural network;
  • the image and the image to be processed are vector concatenated (concatenate) to obtain a first image matrix to be processed;
  • the second component image of the processed image processed by the second neural network based on the second image, the processed image is obtained.
  • the processor 1401 may also be used to perform other corresponding steps or operations in the foregoing method embodiments, which will not be repeated here.
  • the processor 1401 may be a general-purpose processor, a digital signal processor, an application specific integrated circuit, a field programmable gate array or other programmable logic device, a discrete gate or transistor logic device, a discrete hardware component, and may implement or execute the The disclosed methods, steps and logic block diagrams.
  • the general-purpose processor may be a microprocessor or any conventional processor or the like. The steps of the method disclosed in combination with the embodiments of the present application may be directly embodied as being executed and completed by a hardware processor, or executed and completed by a combination of hardware and software modules in the processor.
  • the memory 1402 may be a non-volatile memory, such as a hard disk drive (HDD) or a solid-state drive (SSD), etc., and may also be a volatile memory (volatile memory), such as random access memory (random access memory). -access memory, RAM).
  • the memory is any other medium that can be used to carry or store desired program codes in the form of instructions or data structures and that can be accessed by a computer, but is not limited to this.
  • the memory in the embodiments of the present application may also be a circuit or any other device capable of realizing a storage function for storing program instructions and/or data.
  • An embodiment of the present application also provides a chip including a processor, which is used to support the neural network-based image processing device to implement the functions involved in the foregoing method embodiments.
  • the chip is connected to a memory or the chip includes a memory, and the memory is used to store the necessary program instructions and data of the communication device.
  • the embodiment of the present application provides a computer-readable storage medium that stores a computer program, and the computer program includes instructions for executing the foregoing method embodiments.
  • the embodiments of the present application provide a computer program product containing instructions, which when run on a computer, cause the computer to execute the foregoing method embodiments.
  • this application can be provided as methods, systems, or computer program products. Therefore, this application may adopt the form of a complete hardware embodiment, a complete software embodiment, or an embodiment combining software and hardware. Moreover, this application may adopt the form of a computer program product implemented on one or more computer-usable storage media (including but not limited to disk storage, CD-ROM, optical storage, etc.) containing computer-usable program codes.
  • computer-usable storage media including but not limited to disk storage, CD-ROM, optical storage, etc.
  • These computer program instructions can also be stored in a computer-readable memory that can guide a computer or other programmable data processing equipment to work in a specific manner, so that the instructions stored in the computer-readable memory produce an article of manufacture including the instruction device.
  • the device implements the functions specified in one process or multiple processes in the flowchart and/or one block or multiple blocks in the block diagram.
  • These computer program instructions can also be loaded on a computer or other programmable data processing equipment, so that a series of operation steps are executed on the computer or other programmable equipment to produce computer-implemented processing, so as to execute on the computer or other programmable equipment.
  • the instructions provide steps for implementing the functions specified in one process or multiple processes in the flowchart and/or one block or multiple blocks in the block diagram.

Landscapes

  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Image Processing (AREA)
  • Image Analysis (AREA)

Abstract

一种基于神经网络的图像处理方法及装置,用于在保证图像质量的基础上降低图像处理时延。该方法包括:待处理图像包括第一分量图像和第二分量图像,将待处理图像输入第一神经网络进行运算,以获得第一图像,所述第一图像为所述待处理图像的经所述第一神经网络处理后的第一分量图像;将所述第一图像和所述待处理图像进行向量拼接(concatenate),以获得第一待处理图像矩阵;将所述第一待处理图像矩阵输入第二神经网络进行运算,以获得第二图像,所述第二图像为所述待处理图像的经所述第二神经网络处理后的第二分量图像;基于所述第二图像,获得处理后的图像。

Description

一种基于神经网络的图像处理方法及装置 技术领域
本申请涉及图像处理技术领域,特别涉及一种基于神经网络的图像处理方法及装置。
背景技术
随着科学技术的发展,手机、平板电脑等具有拍照和视频录制功能的移动终端已被人们广泛使用。移动终端在拍照或视频录制过程中,对图像信号进行图像信号处理(image Signal processing,ISP)。
ISP主要作用是对前端图像传感器输出的图像信号进行后期处理。依赖于ISP,在不同的光学条件下得到的图像才能较好的还原现场细节。ISP处理流程如图1所示,自然景物101通过镜头(lens)102获得贝尔(bayer)图像,然后通过光电转换104得到模拟电信号105,进一步通过消噪和模拟转数字处理106获得数字电信号(即原始图像(raw image))107,接下来会进入数字信号处理芯片100中。在数字信号处理芯片100中的步骤是ISP处理的核心步骤,数字信号处理芯片100一般包含黑电平矫正(black level compensation,BLC)108、镜头阴影矫正(lens shading correction)109、坏点矫正(bad pixel correction,BPC)110、去马赛克(demosaic)111、拜耳域降噪(denoise)112、自动白平衡(auto white balance,AWB)113、Ygamma114、自动曝光(auto exposure,AE)115、自动对焦(auto focus,AF)(图1中未示出)、色彩矫正(color correction,CC)116、伽玛(gamma)矫正117、色域转换118、色彩去噪/细节增强119、色彩增强(color enhance,CE)120、编织器(formater)121、输入输出(input/output,I/O)控制122等模块。
目前,深度学习的应用越来越广泛,基于深度学习的ISP,在很多任务的应用中取得一定的效果。基于深度学习的ISP,会将图像数据经过神经网络进行处理后输出,但是神经网络的处理复杂度一般会很高,在非实时处理场景下,可以达到预计目的,但在需要实时处理的场景中,一般存在能耗、运行时间等问题。
因此基于神经网络的ISP需要进一步优化。
发明内容
本申请提供一种基于神经网络的图像处理方法及装置,以期优化基于神经网络的图像信号处理性能。
第一方面,提供一种基于神经网络的图像处理方法,采用第一神经网络和第二神经网络对待处理图像进行处理,输出处理后的图像。其中,待处理图像包括第一分量图像和第二分量图像,该方法的步骤如下所述:将待处理图像输入第一神经网络进行运算,以获得第一图像,所述第一图像为所述待处理图像的经所述第一神经网络处理后的第一分量图像;将所述第一图像和所述待处理图像进行向量拼接(concatenate),以获得第一待处理图像矩阵;将所述第一待处理图像矩阵输入第二神经网络进行运算,以获得第二图像,所述第二图像为所述待处理图像的经所述第二神经网络处理后的第二分量图像;基于所述第二图像,获得处理后的图像。
将待处理图像经过第一神经网络运算后获得的第一图像,能够对待处理图像的一部分 分量图像做处理,得到中间结果。将第一图像与待处理图像进行拼接,将拼接后的结果经过第二神经网络进行处理,得到第二图像。可以将中间结果应用到第二神经网络的处理过程中,减小第二神经网络的计算复杂度,并能够保证图像处理质量。例如,第二分量图像为待处理图像的亮度分量,亮度分量是图像处理过程中一个重要的分量,占用网络复杂度的比例较高,通过第一神经网络能够先处理亮度分量。将亮度分量的处理结果作为中间结果输入第二神经网络,第二神经网络的复杂度要求就会降低。通过两个神经网络的配合使用,当处理多帧图像时,能够比一个神经网络达到更低的复杂度。
可选的,所述基于所述第二图像,获得处理后的图像,可以有两种可能的实现方式。
一种方式中,合并所述第一图像和所述第二图像,以生成所述处理后的图像。另一种方式中,在所述获得第二图像时,同时获得第三图像,所述第三图像为所述待处理图像的经所述第二神经网络处理后的第一分量图像;对应的,合并所述第三图像和所述第二图像,以生成所述处理后的图像。
可选的,第二神经网络的复杂度低于第一神经网络的复杂度。
可选的,第一分量图像所需要的算力高于第二分量图像所需要的算力。
可选的,待处理图像可能是一帧或多帧。例如,所述待处理图像包括多帧时域邻近的图像,对应的,所述第一图像为多帧,所述第二图像为多帧,每帧待处理图像对应一帧第一图像和一帧第二图像。采用第一神经网络处理多帧图像之间的复杂算力的问题,采用第二神经网络处理多帧图像中每帧图像较低算力的问题,并输出多帧处理后的图像,使得第一神经网络和第二神经网络的综合算力分散到多帧图像上,使得每帧图像的处理复杂度相比上述方案得到降低,同时又能够保证图像或视频的质量。例如,第一分量图像为亮度通道,第二分量图像为色度通道,第一神经网络能够解决多帧图像之间的帧间运动问题,将亮度通道和对应帧的色度通道一起输入第二神经网络,第二神经网络对每帧图像的色度进行处理,这样在有一个已经处理的亮度通道的结果作为引导,可以使用较小算力的第二神经网络解决帧内的颜色问题。通过两个神经网络的合力处理,使得本申请提供的图像处理***在图像处理时具有较低复杂度,并保证图像或视频的质量。提高了深度学习技术在图像信号处理领域的应用。颜色通道理论上需要的算力小于亮度通道需要的算力,例如,YUV图像一般采用420的采样格式,即颜色通道的分辨率是亮度通道的一半。
在一个可能的设计中,在所述第二神经网络进行的运算,包括以下运算步骤:根据所述第一待处理图像矩阵获得所述第一待处理图像矩阵的特征图矩阵,将所述特征图矩阵分别与每帧第一图像进行向量拼接,以获得多个第二待处理图像矩阵,其中,每帧第二图像根据每个第二待处理图像矩阵获得。
在一个可能的设计中,所述将所述第一图像和所述待处理图像进行向量拼接,可能通过以下方式实现:将所述多帧时域临近的图像分组,以获得多组子组图像;将每帧第一图像和一组子组图像进行向量拼接,以生成多个待处理图像子矩阵。
可选的,进行向量拼接的第一图像和子组图像对应于同一帧待处理图像。
在一个可能的设计中,所述第一分量图像为所述待处理图像的亮度分量。
在一个可能的设计中,所述第二分量图像为所述待处理图像的一个或多个色度分量,或者一个或多个颜色分量。
在一个可能的设计中,所述第一分量图像和所述第二分量图像分别为所述待处理图像不同的颜色分量。
在一个可能的设计中,所述第一神经网络和所述第二神经网络组成图像处理***,所述图像处理***用于对所述待处理图像进行降噪、消除马赛克效应处理。
可选的,待处理图像的格式可以为红绿蓝(RGB)格式,也可以为亮色分离(YUV)格式,也可以为贝尔(bayer)格式。
第二方面,提供基于神经网络的图像处理装置,该装置可以是终端设备,也可以是终端设备中的装置(例如芯片、或者芯片***、或者电路),或者是能够和终端设备匹配使用的装置。一种设计中,该装置可以包括执行第一方面中所描述的方法/操作/步骤/动作所一一对应的模块,该模块可以是硬件电路,也可是软件,也可以是硬件电路结合软件实现。该装置对待处理图像进行处理,获得处理后的图像。待处理图像包括第一分量图像和第二分量图像。一种设计中,该装置可以包括运算模块和拼接模块。处理模块用于调用通信模块执行接收和/或发送的功能。示例性地:
运算模块,用于将待处理图像输入第一神经网络进行运算,以获得第一图像,所述第一图像为所述待处理图像的经所述第一神经网络处理后的第一分量图像;拼接模块,用于将所述第一图像和所述待处理图像进行向量拼接(concatenate),以获得第一待处理图像矩阵;所述运算模块,还用于将所述第一待处理图像矩阵输入第二神经网络进行运算,以获得第二图像,所述第二图像为所述待处理图像的经所述第二神经网络处理后的第二分量图像;基于所述第二图像,获得处理后的图像。
可选的,运算模块在基于所述第二图像获得处理后的图像时,可以有两种可能的实现方式。运算模块具体用于:
一种方式中,合并所述第一图像和所述第二图像,以生成所述处理后的图像。另一种方式中,在所述获得第二图像时,同时获得第三图像,所述第三图像为所述待处理图像的经所述第二神经网络处理后的第一分量图像;对应的,合并所述第三图像和所述第二图像,以生成所述处理后的图像。
可选的,第二神经网络的复杂度低于第一神经网络的复杂度。
可选的,第一分量图像所需要的算力高于第二分量图像所需要的算力。
可选的,待处理图像可能是一帧或多帧。例如,所述待处理图像包括多帧时域邻近的图像,对应的,所述第一图像为多帧,所述第二图像为多帧,每帧待处理图像对应一帧第一图像和一帧第二图像。
在一个可能的设计中,在所述第二神经网络进行的运算,该运算模块用于:根据所述第一待处理图像矩阵获得所述第一待处理图像矩阵的特征图矩阵,将所述特征图矩阵分别与每帧第一图像进行向量拼接,以获得多个第二待处理图像矩阵,其中,每帧第二图像根据每个第二待处理图像矩阵获得。
在一个可能的设计中,在将所述第一图像和所述待处理图像进行向量拼接时,所述拼接模块用于:将所述多帧时域临近的图像分组,以获得多组子组图像;将每帧第一图像和一组子组图像进行向量拼接,以生成多个待处理图像子矩阵。
可选的,进行向量拼接的第一图像和子组图像对应于同一帧待处理图像。
在一个可能的设计中,所述第一分量图像为所述待处理图像的亮度分量。
在一个可能的设计中,所述第二分量图像为所述待处理图像的一个或多个色度分量,或者一个或多个颜色分量。
在一个可能的设计中,所述第一分量图像和所述第二分量图像分别为所述待处理图像 不同的颜色分量。
在一个可能的设计中,所述第一神经网络和所述第二神经网络组成图像处理***,所述图像处理***用于对所述待处理图像进行降噪、消除马赛克效应处理。
可选的,待处理图像的格式可以为红绿蓝(RGB)格式,也可以为亮色分离(YUV)格式,也可以为贝尔(bayer)格式。
第二方面的有益效果可以参考第一方面对应的效果,在此不再赘述。
第三方面,本申请实施例提供一种基于神经网络的图像处理装置,所述装置包括处理器,处理器用于调用一组程序、指令或数据,执行上述第一方面或第一方面的任一可能的设计所描述的方法。所述装置还可以包括存储器,用于存储处理器调用的程序、指令或数据。所述存储器与所述处理器耦合,所述处理器执行所述存储器中存储的、指令或数据时,可以实现上述第一方面或任一可能的设计描述的方法。
第四方面,本申请实施例提供了一种芯片***,该芯片***包括处理器,还可以包括存储器,用于实现上述第一方面或第一方面中任一种可能的设计中所述的方法。该芯片***可以由芯片构成,也可以包含芯片和其他分立器件。
第五方面,本申请实施例中还提供一种计算机可读存储介质,所述计算机可读存储介质中存储有计算机可读指令,当所述计算机可读指令在计算机上运行时,使得如第一方面或第一方面中任一种可能的设计中所述的方法被执行。
第六方面,本申请实施例中还提供一种包含指令的计算机程序产品,当其在计算机上运行时,使得计算机执行上述第一方面或第一方面的任一可能的设计中所述的方法。
附图说明
图1为现有技术中ISP处理流程示意图;
图2为本申请实施例中***架构的结构示意图;
图3为本申请实施例中神经网络的原理示意图;
图4为本申请实施例中基于神经网络的图像处理方法流程示意图;
图5a为本申请实施例中图像处理的实现方式一的示意图;
图5b为本申请实施例中图像处理的实现方式二的示意图;
图6为本申请实施例中RGrGbB图像处理方法之一示意图;
图7为本申请实施例中RGrGbB图像处理方法之二示意图;
图8a为本申请实施例中第一神经网络的结构示意图之一;
图8b为本申请实施例中第一神经网络的结构示意图之二;
图9a为本申请实施例中典型卷积神经网络的部分处理过程示意图;
图9b为本申请实施例中多分枝神经网络的部分处理过程示意图;
图10a为本申请实施例中第二神经网络的结构示意图;
图10b为本申请实施例中第二神经网络的多分枝操作部分示意图;
图11为本申请实施例中第二神经网络中采用典型神经网络操作部分示意图;
图12为本申请实施例中第一图像与特征图矩阵进行向量拼接的示意图;
图13为本申请实施例中基于神经网络的图像处理装置结构示意图之一;
图14为本申请实施例中基于神经网络的图像处理装置结构示意图之二。
具体实施方式
下面将结合附图,对本申请实施例进行详细描述。
本申请实施例提供的基于神经网络(neural network,NN)的图像处理方法及装置,可应用于电子设备,该电子设备,可以是移动终端(mobile terminal)、移动台(mobile station,MS)、用户设备(user equipment,UE)等移动设备,也可以是固定设备,如固定电话、台式电脑等,还可以是视频监控器。该电子设备,具有图像信号采集与处理功能的图像采集与处理设备,具有ISP处理功能。该电子设备还可以选择性地具有无线连接功能,以向用户提供语音和/或数据连通性的手持式设备、或连接到无线调制解调器的其他处理设备,比如:该电子设备可以是移动电话(或称为“蜂窝”电话)、具有移动终端的计算机等,还可以是便携式、袖珍式、手持式、计算机内置的或者车载的移动装置,当然也可以是可穿戴设备(如智能手表、智能手环等)、平板电脑、个人电脑(personal computer,PC)、个人数字助理(personal digital assistant,PDA)、销售终端(Point of Sales,POS)等。本申请实施例中不妨以一种终端设备为例进行说明。
图2所示为本申请实施例涉及的终端设备200的一种可选的硬件结构示意图。
如图2所示,终端设备200主要包括芯片组和外设装置,其中,图2中实线框中的电源管理单元(power management unit,PMU)、语音codec、短距离模块和射频(radio frequency,RF)、运算处理器、随机存储器(random-access memory,RAM)、输入/输出(input/output,I/O)、显示接口、图像处理器(image signal processor,ISP)、传感器接口(Sensor hub)、基带通信模块等各部件组成芯片或芯片组。USB接口、存储器、显示屏、电池/市电、耳机/扬声器、天线、传感器(Sensor)等部件可以理解为是外设装置。芯片组内的运算处理器、RAM、I/O、显示接口、ISP、Sensor hub、基带等部件可组成片上***(system-on-a-chip,SOC),为芯片组的主要部分。SOC内的各部件可以全部集成为一个完整芯片,或者SOC内也可以是部分部件集成,另一部分部件不集成,比如SOC内的基带通信模块,可以与其他部分不集成在一起,成为独立部分。SOC中的各部件可通过总线或其他连接线互相连接。SOC外部的PMU、语音codec、RF等通常包括模拟电路部分,因此经常在SOC之外,彼此并不集成。
图2中,PMU用于外接市电或电池,为SOC供电,可以利用市电为电池充电。语音codec作为声音的编解码单元外接耳机或扬声器,实现自然的模拟语音信号与SOC可处理的数字语音信号之间的转换。短距离模块可包括无线保真(wireless fidelity,WiFi)和蓝牙,也可选择性包括红外、近距离无线通信(near field communication,NFC)、收音机(FM)或全球定位***(global positioning system,GPS)模块等。RF与SOC中的基带通信模块连接,用来实现空口RF信号和基带信号的转换,即混频。对手机而言,接收是下变频,发送则是上变频。短距离模块和RF都可以有一个或多个用于信号发送或接收的天线。基带用来做基带通信,包括多种通信模式中的一种或多种,用于进行无线通信协议的处理,可包括物理层(层1)、媒体接入控制(medium access control,MAC)(层2)、无线资源控制(radio resource control,RRC)(层3)等各个协议层的处理,可支持各种蜂窝通信制式,例如长期演进(long term evolution,LTE)通信、或5G新空口(new radio,NR)通信等。Sensor hub是SOC与外界传感器的接口,用来收集和处理外界至少一个传感器的数据,外界的传感器例如可以是加速计、陀螺仪、控制传感器、图像传感器等。运算处理器可以是通用处理器,例如中央处理器(central processing unit,CPU),还可以是一个或多个集成电路,例如:一个或 多个特定集成电路(application specific integrated circuit,ASIC),或,一个或多个数字信号处理器(digital singnal processor,DSP),或微处理器,或,一个或者多个现场可编程门阵列(field programmable gate array,FPGA)等。运算处理器可包括一个或多个核,并可选择性调度其他单元。RAM可存储一些计算或处理过程中的中间数据,如CPU和基带的中间计算数据。ISP用于图像传感器采集的数据进行处理。I/O用于SOC与外界各类接口进行交互,如可与用于数据传输的通用串行总线(universal serial bus,USB)接口进行交互等。存储器可以是一个或一组芯片。显示屏可以是触摸屏,通过显示接口与总线连接,显示接口可以是进行图像显示前的数据处理,比如需要显示的多个图层的混叠、显示数据的缓存或对屏幕亮度的控制调整等。
本申请实施例中涉及的终端设备200中包括有图像传感器,该图像传感器可从外界采集光线等外界信号,将该外界信号进行处理转换成传感器信号,即电信号。该传感器信号可以是静态图像信号,也可以是动态的视频图像信号。其中,该图像传感器例如可以是摄像头。
本申请实施例中涉及的终端设备200还包括有图像信号处理器,图像传感器采集到传感器信号传送给图像信号处理器,图像信号处理器获取到该传感器信号,可对该传感器信号进行图像信号处理,以得到清晰度、色彩、亮度等各方面均符合人眼特性的图像信号。
可以理解的是,本申请实施例中涉及的图像信号处理器可以是一个或一组芯片,即可以是集成的,也可以是独立的。例如,终端设备200中包括的图像信号处理器可以是集成在运算处理器中的集成ISP芯片。
本申请实施例中涉及的终端设备200具有拍摄照片或录制视频的功能。
本申请实施例提供的基于神经网络的图像处理方法主要针对如何基于神经网络进行图像信号处理进行说明。
为了更好的理解本申请实施例的方案,首先对本申请实施例涉及到的概念术语进行解释说明。
(1)神经网络
本申请实施例中采用神经网络对待处理的多帧图像进行处理。神经网络是一种模仿动物神经网络行为特征进行信息处理的网络结构,也简称为神经网络(neural networks,NN)。
其中,神经网络可以是由神经单元组成的,神经单元可以是指以输入信号x s和截距1为输入的运算单元,该运算单元的输出可以如公式(1-1)所示:
Figure PCTCN2020078484-appb-000001
其中,s=1、2、……n,n为大于1的自然数,W s为x s的权重,b为神经单元的偏置。f为神经单元的激活函数(activation functions),用于将非线性特性引入神经网络中,来将神经单元中的输入信号转换为输出信号。该激活函数的输出信号可以作为下一层卷积层的输入,激活函数可以是sigmoid函数。神经网络是将多个上述单一的神经单元联结在一起形成的网络,即一个神经单元的输出可以是另一个神经单元的输入。每个神经单元的输入可以与前一层的局部接受域相连,来提取局部接受域的特征,局部接受域可以是由若干个神经单元组成的区域。
如图3所示,是一种神经网络的原理示意图,该神经网络300具有N个处理层,N≥3且N取自然数,该神经网络的第一层为输入层301,负责接收输入信号,该神经网络的最后一层为输出层303,输出神经网络的处理结果,除去第一层和最后一层的其他层为中间 层304,这些中间层共同组成隐藏层302,隐藏层中的每一层中间层既可以接收输入信号,也可以输出信号,隐藏层负责输入信号的处理过程。每一层代表了信号处理的一个逻辑级别,通过多个层,数据信号可经过多级逻辑的处理。
在一些可行的实施例中该神经网络的输入信号可以是语音信号、文本信号、图像信号、温度信号等各种形式的信号。在本实施例中,被处理的图像信号可以是相机(图像传感器)拍摄的风景信号、显监控设备捕捉的社区环境的图像信号以及门禁***获取的人脸的面部信号等各类传感器信号,该神经网络的输入信号包括其他各种计算机可处理的工程信号,在此不再一一列举。若利用神经网络对图像信号进行深度学习,可提高图像质量。
(2)深度神经网络
深度神经网络(deep neural network,DNN),也称多层神经网络,可以理解为具有多层隐含层的神经网络。按照不同层的位置对DNN进行划分,DNN内部的神经网络可以分为三类:输入层,隐含层,输出层。一般来说第一层是输入层,最后一层是输出层,中间的层数都是隐含层。层与层之间是全连接的,也就是说,第i层的任意一个神经元一定与第i+1层的任意一个神经元相连。
虽然DNN看起来很复杂,但是就每一层的工作来说,其实并不复杂,简单来说就是如下线性关系表达式:y=α(Wx+b),其中,x是输入向量,y是输出向量,b是偏移向量,W是权重矩阵(也称系数),α()是激活函数。每一层仅仅是对输入向量x经过如此简单的操作得到输出向量y。由于DNN层数多,系数W和偏移向量b的数量也比较多。这些参数在DNN中的定义如下所述:以系数W为例:假设在一个三层的DNN中,第二层的第4个神经元到第三层的第2个神经元的线性系数定义为
Figure PCTCN2020078484-appb-000002
上标3代表系数W所在的层数,而下标对应的是输出的第三层索引2和输入的第二层索引4。
综上,第L-1层的第k个神经元到第L层的第j个神经元的系数定义为
Figure PCTCN2020078484-appb-000003
需要注意的是,输入层是没有W参数的。在深度神经网络中,更多的隐含层让网络更能够刻画现实世界中的复杂情形。理论上而言,参数越多的模型复杂度越高,“容量”也就越大,也就意味着它能完成更复杂的学习任务。训练深度神经网络的也就是学习权重矩阵的过程,其最终目的是得到训练好的深度神经网络的所有层的权重矩阵(由很多层的向量W形成的权重矩阵)。
(3)卷积神经网络
卷积神经网络(convolutional neuron network,CNN)是一种带有卷积结构的深度神经网络。卷积神经网络包含了一个由卷积层和子采样层构成的特征抽取器,该特征抽取器可以看作是滤波器。卷积层是指卷积神经网络中对输入信号进行卷积处理的神经元层。在卷积神经网络的卷积层中,一个神经元可以只与部分邻层神经元连接。一个卷积层中,通常包含若干个特征平面,每个特征平面可以由一些矩形排列的神经单元组成。同一特征平面的神经单元共享权重,这里共享的权重就是卷积核。共享权重可以理解为提取图像信息的方式与位置无关。卷积核可以以随机大小的矩阵的形式初始化,在卷积神经网络的训练过程中卷积核可以通过学习得到合理的权重。另外,共享权重带来的直接好处是减少卷积神经网络各层之间的连接,同时又降低了过拟合的风险。
本申请实施例中的神经网络可以是卷积神经网络,当然也可以是其它类型的神经网络,例如循环神经网络(recurrent neural network,RNN)。
应理解,本申请实施例中的图像可以为静态图像(或称为静态画面)或动态图像(或 称为动态画面),例如,本申请中的图像可以为视频或动态图片,或者,本申请中的图像也可以为静态图片或照片。为了便于描述,本申请在下述实施例中将静态图像或动态图像统一称为图像。
下面对本申请实施例提供的基于神经网络的图像处理方法进行介绍。该方法由基于神经网络的图像处理装置来执行。该基于神经网络的图像处理装置可以是任意具有图像处理功能的装置或设备来执行,例如,该方法由图2所示的终端设备200来执行,或者由与终端设备相关的设备来执行,或者由终端设备包含的部分设备来执行。
本申请实施例中,采用多个神经网络进行图像处理,例如采用两个神经网络对待处理图像进行处理,该两个神经网络记为第一神经网络和第二神经网络。第一神经网络和第二神经网络符合上述对神经网络的介绍。待处理图像包括一个或多个维度的分量图像,例如,待处理图像包括第一分量图像和第二分量图像。对待处理图像进行图像处理的过程中,包括对第一分量图像和第二分量图像的处理。
如图4所示,本申请实施例提供的基于神经网络的图像处理方法如下所述。
S401、将待处理图像输入第一神经网络进行运算,以获得第一图像。
第一图像为待处理图像的经第一神经网络处理后的第一分量图像。
S402、将第一图像和待处理图像进行向量拼接(concatenate),以获得第一待处理图像矩阵。
S403、将第一待处理图像矩阵输入第二神经网络进行运算,以获得第二图像。
第二图像为待处理图像的经第二神经网络处理后的第二分量图像;
S404、基于第二图像,获得处理后的图像。
通过图4所示的方法,将待处理图像经过第一神经网络运算后获得的第一图像,能够对待处理图像的一部分分量图像做处理,得到中间结果。将第一图像与待处理图像进行拼接,将拼接经过第二神经网络进行处理,得到第二图像。可以将中间结果运用到第二神经网络的处理过程中,减小第二神经网络的计算复杂度,并能够保证图像处理质量。例如,第一分量图像为待处理图像的亮度分量,亮度分量是图像处理过程中一个重要的分量,占用网络复杂度的比例较高,通过第一神经网络能够先处理亮度分量。将亮度分量的处理结果作为中间结果输入第二神经网络,第二神经网络的复杂度要求就会降低。通过两个神经网络的配合使用,当处理多帧图像时,能够比一个神经网络达到更低的复杂度。
下面本申请实施例提供的基于神经网络的图像处理方法的一些可选的设计进行说明。
待处理图像包括第一分量图像和第二分量图像,处理后的图像也会包括第一分量图像和第二分量图像。如图5a所示,一种可能的实现方式中,基于第一神经网络获得第一图像,第二神经网络获得第二图像,合并第一图像和第二图像,即可获得处理后的图像。合并第一图像和第二图像,也可以认为将第一图像和第二图像进行组合,因为第一图像是经过第一神经网络处理后的第一分量图像,第二图像是经过第二神经网络处理后的第二分量图像,那么将第一图像和第二图像进行合并,即将处理后的第一分量图像和处理后的第二分量图像进行合并,得到处理后的图像。如图5b所示,在另一种可能的实现方式中,将第一待处理图像矩阵输入第二神经网络进行运算,在获得第二图像时,与此同时还获得了第三图像。第三图像为待处理图像的经过第二神经网络处理后的第一分量图像。这样可以合并第三图像和第二图像,以生成处理后的图像。
本申请实施例中,待处理图像可以是一帧,也可以是时域上邻近的多帧。时域上邻近 的多帧包括时域上连续的多帧。时域上邻近的多帧以下在描述中简称为多帧。当待处理图像是多帧时,经过第一神经网络和第二神经网络处理后,处理后的图像也是对应的多帧。当待处理图像是多帧时,第一分量图像和第二分量图像分别为多帧,经过第一神经网络处理后获得的第一图像为多帧,经过第二神经网络获得的第二图像为多帧,每帧待处理图像对应一帧第一图像和一帧第二图像。每帧处理后的图像对应一帧第一图像和一帧第二图像,或者每帧处理后的图像对应一帧第一图像和一帧第三图像。
当待处理图像为多帧时本申请实施例的方案如下所述。将待处理的多帧图像输入第一神经网络进行处理,获得第一图像,第一图像是待处理的多帧图像的经第一神经网络处理后的多帧第一分量图像,即每帧待处理的图像经过第一神经网络后,都会得到一帧第一分量图像。将第一图像和待处理图像进行向量拼接,以获得第一待处理图像矩阵,对应的,第一待处理图像矩阵为多个。具体地,将每一帧第一图像与对应的一帧待处理图像进行向量拼接,获得第一待处理图像矩阵。将第一待处理图像矩阵输入第二神经网络进行运算,获得第二图像,第二图像为待处理的多帧图像的经第二神经网络处理后的多帧第二分量图像。基于第二图像获得处理后的图像。
其中,基于图5a和图5b的两种可能实现方式,基于第二图像获得处理后的图像,可选的两种可能方式如下所述。一种可能的实现方式中,将经过第一神经网络处理后的多帧第一分量图像和经过第二神经网络处理的多帧第二分量图像进行合并,生成多帧处理后的图像。在另一种可能的方式中,在获得第二图像的同时,获得第三图像,第三图像为待处理多帧图像的经过第二神经网络处理后的多帧第一分量图像,将第三图像和第二图像进行合并,生成处理后的图像。
图5a和图5b两种可选方式的区别在于,图5a所示的方式中经过第二神经网络获得的是图像的全部信号,图5b所示的方式中经过第二神经网络获得的是图像的第二分量图像。图像的全部信号由第二分量图像和第一分量图像组成。
本申请实施例中,第一分量图像可以是待处理图像的亮度分量或亮度通道。第二分量图像为待处理图像的一个或多个色度分量,或者第二分量图像为待处理图像的一个或多个颜色分量,或者第二分量图像为待处理图像的一个或多个颜色通道或色度通道。
或者,第一分量图像为待处理图像的一个或多个色度分量,第二分量图像为待处理图像的一个或多个色度分量,第一分量图像为待处理图像的不同色度分量。
其中,色度分量也可以称为色度通道或颜色分量或颜色通道。
本申请实施例中,可选的,待处理图像的格式可以为红绿蓝(RGB)格式,也可以为亮色分离(YUV)格式,也可以为贝尔(bayer)格式。本申请中不作限定。
例如,待处理图像的格式为RGB,第一分量图像可以是G通道,第二分量信号为RB通道。
当待处理图像为多帧时,一些可能的设计如下所述。
S403中,将第一待处理图像矩阵输入第二神经网络进行运算,获得第二图像。第二图像为多帧。其中,第一待处理图像矩阵是第一图像和待处理图像进行向量拼接而成。第一图像为多帧。第一待处理图像矩阵为多个矩阵或者包括多个待处理图像子矩阵。可选的,在第二神经网络中,根据第一待处理图像矩阵获得第一待处理图像矩阵的特征图矩阵,将特征图矩阵分别与每帧第一图像进行向量拼接,获得多个第二待处理图像矩阵,其中每帧第二图像是根据每个第二待处理图像矩阵获得的。
在S402中,将第一图像和待处理图像进行向量拼接,获得第一待处理图像矩阵。其中,第一图像为多帧,待处理图像为多帧。可以将多帧待处理图像分组,获得多组子组图像,将每帧第一图像和一组子组图像进行向量拼接,获得多个待处理图像子矩阵。第一待处理图像矩阵包括该多个待处理图像子矩阵,或者说,该多个待处理图像子矩阵组成第一待处理图像矩阵。其中,进行向量拼接的第一图像和一组子组图像对应于同一帧待处理图像。
例如,多帧待处理图像的数量为4帧,将4帧待处理图像输入第一神经网络进行运算,获得4个第一图像。其中,每一帧待处理图像对应得到一个第一图像,例如,第1帧待处理图像对应得到第1个第一图像;第2帧待处理图像对应得到第2个第一图像。将4个第一图像和4帧待处理图像进行向量拼接,其中,4帧待处理图像可以分成4个子组图像,每一个子组图像对应一帧待处理图像。例如,第一组子组图像对应第一帧待处理图像,第二组子组图像对应第二帧待处理图像。将第一组子组图像和第1个第一图像进行向量拼接。将第二组子组图像和第2个第一图像进行向量拼接。
可以理解的是,多帧第一图像和多组子组图像进行向量拼接可以看成是第二神经网络内部的处理过程。输入第二神经网络的是一个整体矩阵,即第一待处理图像矩阵。拼接为第一待处理图像矩阵的过程可以分解成上述多帧第一图像和多组子组图像的拼接。
本申请实施例中,第一神经网络和第二神经网络可以组合成图像处理***,该图像修理***用于对待处理图像进行处理,以提高图像或视频的质量。处理过程可以包括降噪、消除马赛克效应等处理。
一般情况下,第一神经网络的复杂度要高于第二神经网络的复杂度。
一些技术中,往往将多帧图像经过神经网络合成为一帧输出,以提高图像或视频质量。但是这样神经网络需要很高的复杂度,在视频场景下,需要很高的处理速度。例如,终端视频实施处理需要将分辨率8K的视频达到30帧/s的处理速度,即帧率30。在视频场景对处理速度的高要求下,如果采用神经网络将多帧图像合成一帧输出,需要面临计算复杂度和计算资源耗费较大的问题,且需要很大的时延。如果一味的降低神经网络的复杂度,用复杂度较低的网络,又会影响图像或视频的质量。
本申请实施例中,采用第一神经网络处理多帧图像之间的复杂算力的问题,采用第二神经网络处理多帧图像中每帧图像较低算力的问题,并输出多帧处理后的图像,使得第一神经网络和第二神经网络的综合算力分摊到多帧图像上,使得每帧图像的处理复杂度相比上述方案得到降低,同时又能够保证图像或视频的质量。例如,第一分量图像为亮度通道,第二分量图像为色度通道,第一神经网络能够解决多帧图像之间的帧间运动问题,第二神经网络对每帧图像的色度进行处理,这样通过两个神经网络的合力处理,使得本申请提供的图像处理***在图像处理时具有较低复杂度,并保证图像或视频的质量。提高了深度学习技术在图像信号处理领域的应用。
下面以第一神经网络和第二神经网络为卷积神经网络为例进行说明。假设待处理图像为4帧,处理后的图像为4帧。待处理图像的格式为贝尔格式图像,特别地图像格式为RGrGbB格式,一帧RGrGbB格式的图像包括4个通道(R、Gr、Gb、B)。4帧待处理图像经过图像处理***后,输出4帧处理后的图像。图像处理***包括第一神经网络和第二神经网络。
如图6所示,4帧待处理的连续的RGrGbB图像,拆分成4*4=16个通道。16个通道 包括(R1、Gr1、Gb1、B1、R2、Gr2、Gb2、B2、R3、Gr3、Gb3、B3、R4、Gr4、Gb4、B4)。将4帧连续的RGrGbB图像输入第一神经网络,获得4帧连续的第一图像,例如第一分量图像为Gr通道,则获得4帧连续的Gr通道图像(Gr1、Gr2、Gr3、Gr4)。4帧连续的Gr通道图像包括第一帧Gr通道图像Gr1、第二帧Gr通道图像Gr2、第三帧Gr通道图像Gr3、第四帧Gr通道图像Gr4。
4帧连续的Gr通道图像与4帧待处理的连续的RGrGbB图像进行向量拼接,获得的第一待处理图像矩阵输入到第二神经网络中进行处理,获得4帧连续的第二图像,例如,第二分量图像为R、Gb、B通道,则获得4帧连续的RGbB图像(R1、Gb1、B1、R2、Gb2、B2、R3、Gb3、B3、R4、Gb4、B4)。4帧连续的RGbB通道图像包括第一帧RGbB通道图像R1、Gb1、B1,第二帧RGbB通道图像R2、Gb2、B2,第三帧RGbB通道图像R3、Gb3、B3,第四帧RGbB通道图像R4、Gb4、B4。
将4帧连续的Gr图像和4帧连续的RGbB图像合并为4帧连续的RGrGbB图像,包括4*4=16通道,16个通道包括(R1、Gr1、Gb1、B1、R2、Gr2、Gb2、B2、R3、Gr3、Gb3、B3、R4、Gr4、Gb4、B4)。
如图7所示,4帧待处理的连续的RGrGbB图像,拆分成4*4=16个通道。16个通道包括(R1、Gr1、Gb1、B1、R2、Gr2、Gb2、B2、R3、Gr3、Gb3、B3、R4、Gr4、Gb4、B4)。将4帧连续的RGGB图像输入第一神经网络,获得4帧连续的第一图像,例如第一分量图像为Gr通道,则获得4帧连续的Gr通道图像(Gr1、Gr2、Gr3、Gr4)。4帧连续的Gr通道图像包括第一帧Gr通道图像Gr1、第二帧Gr通道图像Gr2、第三帧Gr通道图像Gr3、第四帧Gr通道图像Gr4。
4帧连续的Gr通道图像与4帧待处理的连续的RGrGbB图像进行向量拼接,获得的第一待处理图像矩阵输入到第二神经网络中进行处理,获得4帧连续的处理后图像,处理后图像为RGrGbB图像,包括4*4=16个通道,16个通道包括(R1、Gr1、Gb1、B1、R2、Gr2、Gb2、B2、R3、Gr3、Gb3、B3、R4、Gr4、Gb4、B4)。其中,4帧连续的处理后图像还可以看成4帧连续的第二图像和4帧连续的第三图像。第三图像为Gr通道,4帧连续的第三图像为(Gr1、Gr2、Gr3、Gr4)。第二分量图像为RGbB通道,则获得4帧连续的RB图像(R1、Gb1、B1、R2、Gb2、B2、R3、Gb3、B3、R4、Gb4、B4)。4帧连续的RGbB通道图像包括第一帧RB通道图像R1、Gb1、B1,第二帧RGbB通道图像R2、Gb2、B2,第三帧RGbB通道图像R3、Gb3、B3,第四帧RGbB通道图像R4、Gb4、B4。将4帧连续的第三图像(Gr图像)和4帧连续的第二图像(RGbB图像)合并为4帧连续的RGGB图像,包括4*4=16通道,16个通道包括(R1、Gr1、Gb1、B1、R2、Gr2、Gb2、B2、R3、Gr3、Gb3、B3、R4、Gr4、Gb4、B4)。
示例性的,第一神经网络中的架构图8a和图8b所示,由于第一神经网络的附图过大,因此将第一神经网络拆分为两部分,分别由图8a和图8b示出。图8a和图8b共同组成第一神经网络的架构。图8a中的add之后连接图8b中的第一个层。
在图8a和图8b中,卷积层用矩形框表示。矩形框中的Conv2d+bias stride=2 3x3_16_32表示卷积层。其中Conv2d代表2维的卷积,bias表示偏置项,1x1/3x3代表卷积核大小,Stride表示步长,_32_16表示输入输出特征图个数,32表示输入该层的特征图个数为32,16表示输出该层的特征图个数为16。
Split表示拆分层,表示特征图在通道(chanel)维度进行拆分。Split 2表示将图像在 特征图维度进行拆分,如一个具有32个特征图的图像输入经过上述操作,会变成两个具有16个特征图的图像。
concat表示跳链层,表示将图像在特征图维度进行合并,例如将两个具有16个特征图的图像合并成一个具有32个特征图的图像。
add表示矩阵加法操作。
图8a和图8b所示的第一神经网络的卷积层采用多分枝的操作,可以很好地解决多帧亮度通道之间的运动信息干扰,进而通过一个复杂的卷积神经网络解决多帧图像亮度通道间的运动干扰。假设本申请实施例神经网络***是一种用于降噪的网络,则使用上述多分枝的操作方式可以很好的获得多帧亮度通道降噪后的结果,并且可以保证多帧亮度通道降噪后的结果不存在运动模糊、运动拖尾等问题。可选的,第一神经网络的卷积层还可以采用群卷积(group convolution)的操作。其中,群卷积是一个特殊的卷积层,假设上一层的输出特征图(feature map)有N个,即通道数channel=N,也就是说上一层有N个卷积核。再假设群卷积的群数目M。那么该群卷积层的操作就是,先将N个通道(channel)分成M份。每一个组(group)对应N/M个通道,各个群(group)卷积独立进行,完成后将输出的特征图进行向量拼接(concat)在一起,作为这一层的输出通道。采用群卷积的操作方式能够获得采用分枝的方式相同或相似的技术效果。
如图9a所示,若不采用多分枝的神经网络,也可以采用典型的卷积神经网络,若采用典型的卷积神经网络,则在获得4帧连续Gr通道图像之前,与图8b不同的是,在最后一步跳链层concat结果之后的操作。为了更好的对比,将图8b中的多分枝操作如图9b所示。图9a和图9b中,相同的跳连层结果32层的特征图分别作为经典的神经网络和多分枝神经网络的输入。经典的神经网络多个卷积层采取共享的方式,32层的特征图经过4个卷积层的操作,最终输出4层特征图(4帧Gr通道图像)。多分枝神经网络采取4分枝的方法,每一个分支独立的通过4层卷积操作获取一个通道的特征图输出,作为一帧的亮度通道结果,四个分支,分别获得4个亮度通道结果。使用经典卷积层和多分枝卷积层混合使用的神经网络,既可以很好地解决图像降噪问题,也能保证多帧Gr通道同时输出时不存在运动模糊、运动拖尾等问题。
以下对第二神经网络的架构进行说明。示例性的,第二神经网络中的架构如图10a所示。
在图10a中各参数的含义可以参考图8a和图8b中的描述。图10a中,_20_16表示输入该层的特征图个数为20,输出该层的特征图个数为16。
图10a所示的第二神经网络的卷积层采用多分枝的操作。若不采用多分枝的神经网络,也可以采用典型的卷积神经网络。为了更好的对比,将图10a中的多分枝操作部分如图10b所示。采用典型的卷积神经网络的操作部分如图11所示。图10b和图11将输出的Gr通道图像和RGbB通道图像也进行显示。
图10b中的最左边的跳链层(concat)和图11最左边的跳链层(concat)输出的结果相同,该concat之后的操作不同。图11中的每个_17_3表示输入该层的特征图个数为17,输出该层的特征图个数为3。图10b中的_17_12表示输入该层的特征图个数为17,输出该层的特征图个数为12。图10b和图11相同的跳链层结果17层的特征图分别作为经典的神经网络和多分枝神经网络的输入。经典的神经网络多个卷积层采取共享的方式,17层的特征图经过1个卷积层的操作,最终输出12层特征图(4帧R\Gb\B通道图像)。多分支神经 网络采取4分枝的方法,每一个分支独立的通过跳链的方式,把对应帧的第一神经网络结果Gr通道链接过来,在通过1个卷积层操作,获取3个通道的特征图输出,作为一帧的颜色通道(R\Gb\B通道)图像结果,四个分支,分别获得4帧颜色结果,使用多分枝卷积层神经网络,同时使用跳链层将对应已经降噪干净的亮度通道链接过来,可以在很低复杂度网络中,很好地解决图像降噪问题,也能保证多帧R\Gb\B通道同时输出不存在运动模糊、运动拖尾等问题。
基于上述对第一神经网络和第二神经网络的示例性架构,经过第一神经网络,获得4帧连续的Gr通道图像(Gr1、Gr2、Gr3、Gr4)。将第一图像和待处理图像进行向量拼接的过程中,可以将4帧连续RGrGbB图像,拆分成4*4=16通道(R1、Gr1、Gb1、B1、R2、Gr2、Gb2、B2、R3、Gr3、Gb3、B3、R4、Gr4、Gb4、B4),将16通道进行分组,将一组子组图像与一帧Gr通道图像进行向量拼接。例如,将4组子组图像为:(R1、Gr1、Gb1、B1)、(R2、Gr2、Gb2、B2)、(R3、Gr3、Gb3、B3)、(R4、Gr4、Gb4、B4)。将Gr1与(R1、Gr1、Gb1、B1)进行向量拼接,将Gr2与(R2、Gr2、Gb2、B2)进行向量拼接,将Gr3与(R3、Gr3、Gb3、B3)进行向量拼接,将Gr4与(R4、Gr4、Gb4、B4)进行向量拼接。进行向量拼接的第一图像和子组图像对应于同一帧待处理图像。向量拼接后获得的第一待处理图像矩阵。
基于上述对第一神经网络和第二神经网络的示例性架构,在第二神经网络结构中,根据第一待处理图像矩阵获得第一待处理图像矩阵的特征图矩阵。如图12所示,为特征图矩阵分别与每帧第一图像进行向量拼接,以获得多个第二待处理图像矩阵的示意图。第一图像为第一神经网络获得的。将每帧第一图像输入到第二神经网络中,与特征图矩阵进行向量拼接。其中,第一待处理图像矩阵的特征图矩阵对应图12中最左侧的跳链层concat的结果。将每一帧第一图像与特征图矩阵进行向量拼接,获得4个第二待处理图像矩阵,再经过一个卷积层的处理,获得多个第二向量或者获得多个处理后的图像。图12中以获得多个第二向量为例。
本申请实施例中,第一神经网络和第二神经网络在使用之前,需要对神经网络的模型进行训练。在对神经网络进行训练过程中,训练的数据可以包括训练图像和真值图像。训练图像包括第一分量图像和第二分量图像。
在训练第一神经网络的模型时:首先使用第一分量图像的真值图像对采集的训练图像进行处理,获得并输出图像。将输出的图像与第一分量图像的真值图像进行对比,直到网络收敛,完成对第一神经网络的模型的训练。所谓网络收敛例如可以是指输出的图像与第一真值图像的差值小于设定的第一阈值。
固定第一神经网络训练得到的第一分量图像的参数,使用第二分量图像的真值图像对采集的训练图像进行处理,获得并输出图像。将输出的图像与第二分量图像的真值图像进行对比,直到网络收敛,完成对第二神经网络的模型的训练。这里网络收敛可以是指输出的图像与第二分量图像的真值图像的差值小于设定的第二阈值。
假设训练图像的格式为RGB图像,在训练第一神经网络和第二神经网络时,输入的训练图像为时域相邻或连续的四帧。例如训练图像的四帧连续图像分别为R 1G 1B 1,R 2G 2B 2,R 3G 3B 3,R 4G 4B 4。假设第一分量图像为亮度通道,第二分量图像为颜色通道。对第一神经网络(或者称为亮度通道帧间网络)和第二神经网络(或者称为颜色通道帧内网络)的训练过程如下所述。构造亮度通道和颜色通道两种真值图像。首先使用四帧的亮度通道 G 1G 2G 3G 4作为真值,训练亮度通道帧间网络,待亮度通道帧间网络收敛后,其次使用R 1B 1、R 2B 2,R 3B 3,R 4B 4作为真值,固定上述亮度通道帧间网络参数,只训练颜色通道帧内网络,直到颜色通道帧内网络收敛为止。最后,调小学习率,同时训练上述两个网络,最终获得两个网络的模型。
在训练好第一神经网络和第二神经网络的模型之后,可以对两个网络的模型进行测试。在测试时,有两种输出测试结果的方式。一种可选的方式中,将四帧图像R 1G 1B 1,R 2G 2B 2,R 3G 3B 3,R 4G 4B 4作为输入,通过帧间亮度网络输出G′ 1 G′ 2 G′ 3 G′ 4,然后通过帧内颜色网络输出R′ 1 B′ 1 R′ 2 B′ 2 R′ 3 B′ 3 R′ 4 B′ 4,将上述帧间亮度网络输出的G′ 1G′ 2G′ 3G′ 4与帧内颜色网络输出的R′ 1B′ 1R′ 2B′ 2R′ 3B′ 3R′ 4B′ 4进行通道组合,获取四帧输出R′ 1 G′ 1 B′ 1、R′ 2 G′ 2 B′ 2、R′ 3 G′ 3 B′ 3、R′ 4 G′ 4 B′ 4。在另一种可选的方式中,将四帧图像R 1G 1B 1,R 2G 2B 2,R 3G 3B 3,R 4G 4B 4作为输入,通过帧间亮度网络输出G′ 1 G′ 2 G′ 3 G′ 4,然后通过帧内颜色网络输出R′ 1 G′ 1 B′ 1、R′ 2 G′ 2 B′ 2、R′ 3 G′ 3 B′ 3、R′ 4 G′ 4 B′ 4
本申请实施例,通过第一神经网络和第二神经网络构成图像处理***,使用该图像处理***对多帧待处理图像进行处理,输出多帧处理后的图像。第二神经网络的复杂度要低于第一神经网络的复杂度。图像处理***对每帧待处理图像的计算量,相比一些技术中将多帧图像通过基础网络处理成一帧的方案,计算量有一定程度的降低。进而能够降低图像处理时延,并且能够保证图像或视频的质量。下面对两个神经网络处理多帧待处理图像的算力进行举例说明。假设待处理图像为4帧,经过第一神经网络和第二神经网络输出的处理后的图像为4帧。经过基础网络处理后输出一帧。第一神经网络如图8a和图8b所示,第二神经网络如图10a所示。
第一神经网络的计算量与基础网络的计算量大约相同,为12000MAC左右。例如基础网络的网络复杂度计算过程如下:
(23*32*1*1+32*16*3*3)/4#1336
+16*32*3*3/16#288
+(32*32*3*3)/16#576
+32*64*3*3/64#288
+(64*96*3*3+(48*48*3*3*2+96*96*1*1*1)*2+96*64*3*3+32*32*3*3*2+64*64*1*1*1+64*64*3*3*1)/64#4240
+(64*32*2*2)/16+(concat)#512
+(64*32*3*3)/16#1152
+(32*16*2*2)/4+(concat)#512
+(32*16*3*3)/4#1152
+(16*16*3*3+16*16*3*3+16*4*3*3)/4#1296
=11352
下面给出第二神经网络的运动细节层网络复杂度计算过程:
(52*32*1*1+32*16*3*3)/4#1568
+16*32*3*3/16#288
+(32*32*3*3)/16#576
+32*64*3*3/64#288
+(64*64*3*3)/64#576
+(64*32*2*2)/16+(concat)#512
+(64*32*3*3)/16#1152
+(32*16*2*2)/4+(concat)#512
+(32*16*3*3)/4#1152
=6664
可见,第二神经网络的计算量为6000左右,假设为6000。
则当输入4帧待处理图像并同时输出4帧处理后图像时,图像处理***的计算量为(6000+12000)/4=4500;当输入8帧待处理图像并同时输出8帧处理后图像时,图像处理***的计算量为(6000+12000)/8=2250;当输入16帧待处理图像并同时输出16帧处理后图像时,图像处理***的计算量为(6000+12000)/16=1125。均小于通过基础网络将多帧处理成一帧的算力12000。可以看出通过本申请实施例提供的第一神经网络和第二神经网络进行的多帧输入多帧输出的方案,能够降低计算量,从而降低图像处理的时延,对于视频场景下能够满足视频对图像处理时延的要求。分辨率为8千(K)个像素,帧率为30帧/秒的视频的网络算力要求大概在3000MAC左右,本申请实施例在输出8帧的时候,图像处理***规定的计算量基本可以满足8K 30视频的网络算力要求。
需要说明的是,本申请中的各个应用场景中的举例仅仅表现了一些可能的实现方式,是为了对本申请的方法更好的理解和说明。本领域技术人员可以根据申请提供的参考信号的指示方法,得到一些演变形式的举例。
为了实现上述本申请实施例提供的方法中的各功能,基于神经网络的图像处理装置可以包括硬件结构和/或软件模块,以硬件结构、软件模块、或硬件结构加软件模块的形式来实现上述各功能。上述各功能中的某个功能以硬件结构、软件模块、还是硬件结构加软件模块的方式来执行,取决于技术方案的特定应用和设计约束条件。
如图13所示,基于同一技术构思,本申请实施例还提供了一种基于神经网络的图像处理装置1300,该基于神经网络的图像处理装置1300可以是移动终端或任意具有图像处理功能的设备。一种设计中,该基于神经网络的图像处理装置1300可以包括执行上述方法实施例中各方法/操作/步骤/动作所一一对应的模块,该模块可以是硬件电路,也可是软件,也可以是硬件电路结合软件实现。一种设计中,该基于神经网络的图像处理装置1300可以包括运算模块1301和拼接模块1302。
运算模块1301用于将待处理图像输入第一神经网络进行运算,以获得第一图像,第一图像为待处理图像的经第一神经网络处理后的第一分量图像;
拼接模块1302,用于将第一图像和待处理图像进行向量拼接(concatenate),以获得第一待处理图像矩阵;
运算模块1301,还用于将第一待处理图像矩阵输入第二神经网络进行运算,以获得第二图像,第二图像为待处理图像的经第二神经网络处理后的第二分量图像;基于第二图像,获得处理后的图像。
运算模块1301和拼接模块1202还可以用于执行上述方法实施例的其它对应的步骤或操作,在此不再一一赘述。
本申请实施例中对模块的划分是示意性的,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,另外,在本申请各个实施例中的各功能模块可以集成在一个处理器中,也可以是单独物理存在,也可以两个或两个以上模块集成在一个模块中。上述集成的 模块既可以采用硬件的形式实现,也可以采用软件功能模块的形式实现。
基于同一技术构思,如图14所示,本申请实施例还提供一种基于神经网络的图像处理装置1400。该神经网络的图像处理装置1400包括处理器1401。该处理器1401用于调用一组程序,以使得上述方法实施例被执行。该神经网络的图像处理装置1400还包括存储器1402,存储器1402用于存储处理器1401执行的程序指令和/或数据。存储器1402和处理器1401耦合。本申请实施例中的耦合是装置、单元或模块之间的间接耦合或通信连接,可以是电性,机械或其它的形式,用于装置、单元或模块之间的信息交互。处理器1401可能和存储器1402协同操作。处理器1401可能执行存储器1402中存储的程序指令。存储器1402可以包括于处理器1401中。
该基于神经网络的图像处理装置1400可以为芯片***。本申请实施例中,芯片***可以由芯片构成,也可以包含芯片和其他分立器件。
处理器1401用于将待处理图像输入第一神经网络进行运算,以获得第一图像,第一图像为待处理图像的经第一神经网络处理后的第一分量图像;以及用于将第一图像和待处理图像进行向量拼接(concatenate),以获得第一待处理图像矩阵;以及用于将第一待处理图像矩阵输入第二神经网络进行运算,以获得第二图像,第二图像为待处理图像的经第二神经网络处理后的第二分量图像;基于第二图像,获得处理后的图像。
处理器1401还可以用于执行上述方法实施例其它对应的步骤或操作,在此不再一一赘述。
处理器1401可以是通用处理器、数字信号处理器、专用集成电路、现场可编程门阵列或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件,可以实现或者执行本申请实施例中的公开的各方法、步骤及逻辑框图。通用处理器可以是微处理器或者任何常规的处理器等。结合本申请实施例所公开的方法的步骤可以直接体现为硬件处理器执行完成,或者用处理器中的硬件及软件模块组合执行完成。
存储器1402可以是非易失性存储器,比如硬盘(hard disk drive,HDD)或固态硬盘(solid-state drive,SSD)等,还可以是易失性存储器(volatile memory),例如随机存取存储器(random-access memory,RAM)。存储器是能够用于携带或存储具有指令或数据结构形式的期望的程序代码并能够由计算机存取的任何其他介质,但不限于此。本申请实施例中的存储器还可以是电路或者其它任意能够实现存储功能的装置,用于存储程序指令和/或数据。
本申请上述方法实施例所描述的各个操作和功能中的部分或全部,可以用芯片或集成电路来完成。
本申请实施例还提供一种芯片,包括处理器,用于支持该基于神经网络的图像处理装置实现上述方法实施例所涉及的功能。在一种可能的设计中,该芯片与存储器连接或者该芯片包括存储器,该存储器用于保存该通信装置必要的程序指令和数据。
本申请实施例提供了一种计算机可读存储介质,存储有计算机程序,该计算机程序包括用于执行上述方法实施例的指令。
本申请实施例提供了一种包含指令的计算机程序产品,当其在计算机上运行时,使得计算机执行上述方法实施例。
本领域内的技术人员应明白,本申请的实施例可提供为方法、***、或计算机程序产品。因此,本申请可采用完全硬件实施例、完全软件实施例、或结合软件和硬件方面的实 施例的形式。而且,本申请可采用在一个或多个其中包含有计算机可用程序代码的计算机可用存储介质(包括但不限于磁盘存储器、CD-ROM、光学存储器等)上实施的计算机程序产品的形式。
本申请是参照根据本申请实施例的方法、设备(***)、和计算机程序产品的流程图和/或方框图来描述的。应理解可由计算机程序指令实现流程图和/或方框图中的每一流程和/或方框、以及流程图和/或方框图中的流程和/或方框的结合。可提供这些计算机程序指令到通用计算机、专用计算机、嵌入式处理机或其他可编程数据处理设备的处理器以产生一个机器,使得通过计算机或其他可编程数据处理设备的处理器执行的指令产生用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的装置。
这些计算机程序指令也可存储在能引导计算机或其他可编程数据处理设备以特定方式工作的计算机可读存储器中,使得存储在该计算机可读存储器中的指令产生包括指令装置的制造品,该指令装置实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能。
这些计算机程序指令也可装载到计算机或其他可编程数据处理设备上,使得在计算机或其他可编程设备上执行一系列操作步骤以产生计算机实现的处理,从而在计算机或其他可编程设备上执行的指令提供用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的步骤。
尽管已描述了本申请的优选实施例,但本领域内的技术人员一旦得知了基本创造性概念,则可对这些实施例作出另外的变更和修改。所以,所附权利要求意欲解释为包括优选实施例以及落入本申请范围的所有变更和修改。
显然,本领域的技术人员可以对本申请实施例进行各种改动和变型而不脱离本申请实施例的精神和范围。这样,倘若本申请实施例的这些修改和变型属于本申请权利要求及其等同技术的范围之内,则本申请也意图包含这些改动和变型在内。

Claims (25)

  1. 一种基于神经网络的图像处理方法,待处理图像包括第一分量图像和第二分量图像,其特征在于,包括:
    将待处理图像输入第一神经网络进行运算,以获得第一图像,所述第一图像为所述待处理图像的经所述第一神经网络处理后的第一分量图像;
    将所述第一图像和所述待处理图像进行向量拼接(concatenate),以获得第一待处理图像矩阵;
    将所述第一待处理图像矩阵输入第二神经网络进行运算,以获得第二图像,所述第二图像为所述待处理图像的经所述第二神经网络处理后的第二分量图像;
    基于所述第二图像,获得处理后的图像。
  2. 根据权利要求1所述的方法,其特征在于,所述基于所述第二图像,获得处理后的图像,包括:
    合并所述第一图像和所述第二图像,以生成所述处理后的图像。
  3. 根据权利要求1所述的方法,其特征在于,在所述获得第二图像时,同时获得第三图像,所述第三图像为所述待处理图像的经所述第二神经网络处理后的第一分量图像;
    对应的,所述基于所述第二图像,获得处理后的图像,包括:合并所述第三图像和所述第二图像,以生成所述处理后的图像。
  4. 根据权利要求1-3任一项所述的方法,其特征在于,所述待处理图像包括多帧时域邻近的图像,对应的,所述第一图像为多帧,所述第二图像为多帧,每帧待处理图像对应一帧第一图像和一帧第二图像。
  5. 根据权利要求4所述的方法,其特征在于,在所述第二神经网络进行的运算,包括:根据所述第一待处理图像矩阵获得所述第一待处理图像矩阵的特征图矩阵,将所述特征图矩阵分别与每帧第一图像进行向量拼接,以获得多个第二待处理图像矩阵,其中,每帧第二图像根据每个第二待处理图像矩阵获得。
  6. 根据权利要求5所述的方法,其特征在于,所述将所述第一图像和所述待处理图像进行向量拼接,包括:
    将所述多帧时域临近的图像分组,以获得多组子组图像;
    将每帧第一图像和一组子组图像进行向量拼接,以生成多个待处理图像子矩阵。
  7. 根据权利要求6所述的方法,其特征在于,进行向量拼接的第一图像和子组图像对应于同一帧待处理图像。
  8. 根据权利要求1-7任一项所述的方法,其特征在于,所述第一分量图像为所述待处理图像的亮度分量。
  9. 根据权利要求8任一项所述的方法,其特征在于,所述第二分量图像为所述待处理图像的一个或多个色度分量,或者一个或多个颜色分量。
  10. 根据权利要求1-7任一项所述的方法,其特征在于,所述第一分量图像和所述第二分量图像分别为所述待处理图像不同的颜色分量。
  11. 根据权利要求1-10任一项所述的方法,其特征在于,所述第一神经网络和所述第二神经网络组成图像处理***,所述图像处理***用于对所述待处理图像进行降噪、消除马赛克效应处理。
  12. 一种基于神经网络的图像处理装置,待处理图像包括第一分量图像和第二分量图像,其特征在于,包括:
    运算模块,用于将待处理图像输入第一神经网络进行运算,以获得第一图像,所述第一图像为所述待处理图像的经所述第一神经网络处理后的第一分量图像;
    拼接模块,用于将所述第一图像和所述待处理图像进行向量拼接(concatenate),以获得第一待处理图像矩阵;
    所述运算模块,还用于将所述第一待处理图像矩阵输入第二神经网络进行运算,以获得第二图像,所述第二图像为所述待处理图像的经所述第二神经网络处理后的第二分量图像;基于所述第二图像,获得处理后的图像。
  13. 根据权利要求12所述的装置,其特征在于,在所述基于所述第二图像,获得处理后的图像时,所述运算模块用于:
    合并所述第一图像和所述第二图像,以生成所述处理后的图像。
  14. 根据权利要求12所述的装置,其特征在于,所述运算模块还用于,在所述获得第二图像时,同时获得第三图像,所述第三图像为所述待处理图像的经所述第二神经网络处理后的第一分量图像;
    对应的,在所述基于所述第二图像,获得处理后的图像时,所述运算模块用于:合并所述第三图像和所述第二图像,以生成所述处理后的图像。
  15. 根据权利要求12-14任一项所述的装置,其特征在于,所述待处理图像包括多帧时域邻近的图像,对应的,所述第一图像为多帧,所述第二图像为多帧,每帧待处理图像对应一帧第一图像和一帧第二图像。
  16. 根据权利要求15所述的装置,其特征在于,在所述第二神经网络进行运算时,所述运算模块用于:根据所述第一待处理图像矩阵获得所述第一待处理图像矩阵的特征图矩阵,将所述特征图矩阵分别与每帧第一图像进行向量拼接,以获得多个第二待处理图像矩阵,其中,每帧第二图像根据每个第二待处理图像矩阵获得。
  17. 根据权利要求16所述的装置,其特征在于,在将所述第一图像和所述待处理图像进行向量拼接时,所述拼接模块用于:
    将所述多帧时域临近的图像分组,以获得多组子组图像;
    将每帧第一图像和一组子组图像进行向量拼接,以生成多个待处理图像子矩阵。
  18. 根据权利要求17所述的装置,其特征在于,进行向量拼接的第一图像和子组图像对应于同一帧待处理图像。
  19. 根据权利要求12-18任一项所述的装置,其特征在于,所述第一分量图像为所述待处理图像的亮度分量。
  20. 根据权利要求19所述的装置,其特征在于,所述第二分量图像为所述待处理图像的一个或多个色度分量,或者一个或多个颜色分量。
  21. 根据权利要求12-18任一项所述的装置,其特征在于,所述第一分量图像和所述第二分量图像分别为所述待处理图像不同的颜色分量。
  22. 根据权利要求12-21任一项所述的装置,其特征在于,所述第一神经网络和所述第二神经网络组成图像处理***,所述图像处理***用于对所述待处理图像进行降噪、消除马赛克效应处理。
  23. 一种芯片,其特征在于,所述芯片与存储器相连,用于读取并执行所述存储器中 存储的软件程序,以实现如权利要求1-11中任一项所述的方法。
  24. 一种基于神经网络的图像处理装置,其特征在于,包括处理器和存储器,所述处理器用于运行一组程序,以使得如权利要求1-11中任一项的方法被执行。
  25. 一种计算机可读存储介质,其特征在于,所述计算机可读存储介质中存储有计算机可读指令,当所述计算机可读指令在基于神经网络的图像处理装置上运行时,使得所述装置执行权利要求1-11任一项所述的方法。
PCT/CN2020/078484 2020-03-09 2020-03-09 一种基于神经网络的图像处理方法及装置 WO2021179147A1 (zh)

Priority Applications (2)

Application Number Priority Date Filing Date Title
PCT/CN2020/078484 WO2021179147A1 (zh) 2020-03-09 2020-03-09 一种基于神经网络的图像处理方法及装置
CN202080098211.7A CN115244569A (zh) 2020-03-09 2020-03-09 一种基于神经网络的图像处理方法及装置

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2020/078484 WO2021179147A1 (zh) 2020-03-09 2020-03-09 一种基于神经网络的图像处理方法及装置

Publications (1)

Publication Number Publication Date
WO2021179147A1 true WO2021179147A1 (zh) 2021-09-16

Family

ID=77671071

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/078484 WO2021179147A1 (zh) 2020-03-09 2020-03-09 一种基于神经网络的图像处理方法及装置

Country Status (2)

Country Link
CN (1) CN115244569A (zh)
WO (1) WO2021179147A1 (zh)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP3054279A1 (en) * 2015-02-06 2016-08-10 St. Anna Kinderkrebsforschung e.V. Methods for classification and visualization of cellular populations on a single cell level based on microscopy images
CN109102475A (zh) * 2018-08-13 2018-12-28 北京飞搜科技有限公司 一种图像去雨方法及装置
CN110415187A (zh) * 2019-07-04 2019-11-05 深圳市华星光电技术有限公司 图像处理方法及图像处理***

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP3054279A1 (en) * 2015-02-06 2016-08-10 St. Anna Kinderkrebsforschung e.V. Methods for classification and visualization of cellular populations on a single cell level based on microscopy images
CN109102475A (zh) * 2018-08-13 2018-12-28 北京飞搜科技有限公司 一种图像去雨方法及装置
CN110415187A (zh) * 2019-07-04 2019-11-05 深圳市华星光电技术有限公司 图像处理方法及图像处理***

Also Published As

Publication number Publication date
CN115244569A (zh) 2022-10-25

Similar Documents

Publication Publication Date Title
US11430209B2 (en) Image signal processing method, apparatus, and device
EP2677732B1 (en) Method, apparatus and computer program product for capturing video content
US10136110B2 (en) Low-light image quality enhancement method for image processing device and method of operating image processing system performing the method
CN107431770A (zh) 自适应线性亮度域视频流水线架构
EP2791898A2 (en) Method, apparatus and computer program product for capturing images
US20220245765A1 (en) Image processing method and apparatus, and electronic device
CN112202986A (zh) 图像处理方法、图像处理装置、可读介质及其电子设备
US20190114750A1 (en) Color Correction Integrations for Global Tone Mapping
CN112954251B (zh) 视频处理方法、视频处理装置、存储介质与电子设备
WO2024027287A9 (zh) 图像处理***及方法、计算机可读介质和电子设备
WO2022148446A1 (zh) 图像处理方法、装置、设备及存储介质
CN113850367A (zh) 网络模型的训练方法、图像处理方法及其相关设备
WO2023010755A1 (zh) 一种hdr视频转换方法、装置、设备及计算机存储介质
WO2021179147A1 (zh) 一种基于神经网络的图像处理方法及装置
US20160241830A1 (en) Electronic system and image processing method
WO2021179142A1 (zh) 一种图像处理方法及相关装置
CN105338221B (zh) 一种图像处理方法及电子设备
CN117768774A (zh) 图像处理器、图像处理方法、拍摄装置和电子设备
US20230388623A1 (en) Composite image signal processor
WO2021196050A1 (zh) 一种基于神经网络的图像处理方法及装置
WO2022218080A1 (zh) 前置图像信号处理装置及相关产品
US11941789B2 (en) Tone mapping and tone control integrations for image processing
CN114363693B (zh) 画质调整方法及装置
CN109688333B (zh) 彩色图像获取方法、装置、设备以及存储介质
WO2024130715A1 (zh) 视频处理方法、视频处理装置和可读存储介质

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20924725

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20924725

Country of ref document: EP

Kind code of ref document: A1