CN116883232A

CN116883232A - Image processing method, image processing apparatus, electronic device, storage medium, and program product

Info

Publication number: CN116883232A
Application number: CN202210316530.1A
Authority: CN
Inventors: 孔祥宇; 薛远洋; 谢征
Original assignee: Samsung Electronics Co Ltd
Current assignee: Samsung Electronics Co Ltd
Priority date: 2022-03-28
Filing date: 2022-03-28
Publication date: 2023-10-13

Abstract

The embodiment of the application provides an image processing method, an image processing device, electronic equipment, a storage medium and a program product, and relates to the technical field of artificial intelligence. The method comprises the following steps: obtaining at least two first feature maps of different scales of an image; determining a first scale of each first feature map respectively related to the brightness information; determining a second scale of the image relative to the luminance information based on the first scale; and demosaicing the image based on the second scale to obtain a result image. The implementation of the application utilizes the multi-scale characteristics of the image, improves the accuracy of predicting the whole brightness scale of the image, and improves the accuracy of image processing. Meanwhile, the above-described image processing method performed by the electronic device may be performed using an artificial intelligence model.

Description

Image processing method, image processing apparatus, electronic device, storage medium, and program product

Technical Field

The present application relates to the field of artificial intelligence, and in particular, to an image processing method, an apparatus, an electronic device, a storage medium, and a program product.

Background

Demosaicing is an important ring in image signal processing, however, in demosaicing processing for multispectral images, the brightness scale of the image cannot be accurately predicted, so that the precision of the demosaicing image processing result is lower.

Disclosure of Invention

The embodiment of the application provides an image processing method, an image processing device, electronic equipment, a storage medium and a program product, aiming at improving the accuracy of image processing. The technical scheme is as follows:

according to an aspect of an embodiment of the present application, there is provided an image processing method including:

obtaining at least two first feature maps of different scales of an image;

determining a first scale of each first feature map respectively related to the brightness information;

determining a second scale of the image related to luminance information based on the first scale;

and demosaicing the image based on the second scale to obtain a result image.

According to another aspect of an embodiment of the present application, there is provided an image processing apparatus including:

the acquisition module is used for acquiring at least two first feature images with different scales of the image;

the first determining module is used for determining first scales of the first feature maps respectively related to the brightness information;

a second determining module, configured to determine a second scale of the image related to brightness information based on the first scale;

and the processing module is used for performing demosaicing processing on the image based on the second scale to obtain a result image.

According to another aspect of an embodiment of the present application, there is provided an electronic device including a memory, a processor, and a computer program stored on the memory, the processor executing the computer program to implement the steps of the above-described image processing method.

According to still another aspect of the embodiments of the present application, there is provided a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, implements the steps of the above-described image processing method.

According to an aspect of an embodiment of the present application, there is provided a computer program product comprising a computer program which, when executed by a processor, implements the steps of the above-described image processing method.

The technical scheme provided by the embodiment of the application has the beneficial effects that:

the application provides an image processing method, an image processing device, electronic equipment, a storage medium and a program product, in particular to the method, the device and the program product, which are characterized in that at least two first characteristic diagrams with different scales are obtained aiming at an input image, namely, the characteristics of the images with different scales can be extracted; then, determining first scales of the first feature images respectively related to the brightness information, and determining second scales of the image related to the brightness information based on the first scales, namely, determining the second scales based on a plurality of first scales; on the basis, demosaicing processing can be carried out on the image based on the second scale, and a result image is obtained. The embodiment of the application predicts the overall brightness scale of the image by utilizing the multi-scale characteristic of the image, namely the second scale, and the implementation is beneficial to improving the accuracy of the overall brightness scale prediction of the image so as to improve the precision of the demosaicing image processing result.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings that are required to be used in the description of the embodiments of the present application will be briefly described below.

Fig. 1 is a schematic flow chart of an image processing method according to an embodiment of the present application;

FIG. 2 is a flowchart of another image processing method according to an embodiment of the present application;

fig. 3 is a schematic diagram of a network structure according to an embodiment of the present application;

FIG. 4 is a schematic diagram of a network architecture corresponding to the processing unit shown in FIG. 3;

fig. 5 is a schematic diagram of another network structure according to an embodiment of the present application;

fig. 6 is a schematic diagram of another network structure according to an embodiment of the present application;

FIG. 7 is a schematic diagram illustrating a function of a CFA filter according to an embodiment of the present application;

fig. 8 is a schematic diagram of a filter pattern according to an embodiment of the present application;

fig. 9 is a schematic structural diagram of an image processing apparatus according to an embodiment of the present application;

fig. 10 is a schematic structural diagram of an electronic device according to an embodiment of the present application.

Detailed Description

Embodiments of the present application are described below with reference to the drawings in the present application. It should be understood that the embodiments described below with reference to the drawings are exemplary descriptions for explaining the technical solutions of the embodiments of the present application, and the technical solutions of the embodiments of the present application are not limited.

As used herein, the singular forms "a", "an", "the" and "the" are intended to include the plural forms as well, unless expressly stated otherwise, as understood by those skilled in the art. It will be further understood that the terms "comprises" and "comprising," when used in this specification, specify the presence of stated features, information, data, steps, operations, elements, and/or components, but do not preclude the presence or addition of other features, information, data, steps, operations, elements, components, and/or groups thereof, all of which may be included in the present specification. It will be understood that when an element is referred to as being "connected" or "coupled" to another element, it can be directly connected or coupled to the other element or intervening elements may be present. Further, "connected" or "coupled" as used herein may include wirelessly connected or wirelessly coupled. The term "and/or" as used herein indicates that at least one of the items defined by the term, e.g., "a and/or B" may be implemented as "a", or as "B", or as "a and B".

For the purpose of making the objects, technical solutions and advantages of the present application more apparent, the embodiments of the present application will be described in further detail with reference to the accompanying drawings.

The application relates to the technical field of artificial intelligence, wherein artificial intelligence (Artificial Intelligence, AI) is a theory, method, technology and application system which utilizes a digital computer or a machine controlled by the digital computer to simulate, extend and expand human intelligence, sense environment, acquire knowledge and acquire an optimal result by using the knowledge. In other words, artificial intelligence is an integrated technology of computer science that attempts to understand the essence of intelligence and to produce a new intelligent machine that can react in a similar way to human intelligence. Artificial intelligence, i.e. research on design principles and implementation methods of various intelligent machines, enables the machines to have functions of sensing, reasoning and decision. The artificial intelligence technology is a comprehensive subject, and relates to the technology with wide fields, namely the technology with a hardware level and the technology with a software level. Artificial intelligence infrastructure technologies generally include technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technologies, operation/interaction systems, mechatronics, and the like. The artificial intelligence software technology mainly comprises a computer vision technology, a voice processing technology, a natural language processing technology, machine learning/deep learning, automatic driving, intelligent traffic and other directions.

In particular, the application may relate to the field of computer vision. Computer Vision (CV) is a science of how to make a machine "look at", and more specifically, to replace human eyes with a camera and a Computer to perform machine Vision such as recognition, tracking and measurement on a target, and further perform graphic processing, so that the Computer processes the target into an image more suitable for human eyes to observe or transmit to an instrument to detect. As a scientific discipline, computer vision research-related theory and technology has attempted to build artificial intelligence systems that can acquire information from images or multidimensional data. Computer vision technologies typically include image processing, image recognition, image semantic understanding, image retrieval, OCR, video processing, video semantic understanding, video content/behavior recognition, three-dimensional object reconstruction, 3D technology, virtual reality, augmented reality, synchronous positioning and mapping, autopilot, intelligent transportation, etc., as well as common biometric technologies such as face recognition, fingerprint recognition, etc.

The technical scheme provided by the application can relate to image processing. In the related art, a technical scheme of predicting an overall brightness scale of an image is lacking.

In view of the technical problems or needs to be improved in the related art, the present application provides an image processing method, an apparatus, an electronic device, a storage medium, and a program product.

The technical solutions of the embodiments of the present application and technical effects produced by the technical solutions of the present application are described below by describing several exemplary embodiments. It should be noted that the following embodiments may be referred to, or combined with each other, and the description will not be repeated for the same terms, similar features, similar implementation steps, and the like in different embodiments.

Fig. 1 is a schematic flow chart of an image processing method according to an embodiment of the present application, where the method may be executed by any electronic device, for example, a terminal or a server. The terminal can be a smart phone, a tablet computer, a notebook computer, a desktop computer, a smart sound box, a smart watch, a vehicle-mounted device and the like; the server may be an independent physical server, a server cluster or a distributed system formed by a plurality of physical servers, or a cloud server providing cloud services, cloud databases, cloud computing, cloud functions, cloud storage, network services, cloud communication, middleware services, domain name services, security services, CDNs, basic cloud computing services such as big data and artificial intelligence platforms, but is not limited thereto.

Specifically, as shown in fig. 1, the image processing method provided by the embodiment of the present application includes the following steps S101 to S104:

step S101: a first feature map of at least two different scales of the image is obtained.

Step S102: a first scale is determined for each first feature map, each first scale being associated with luminance information.

Step S103: a second scale of the image is determined based on the first scale in relation to the luminance information.

Step S104: and demosaicing the image based on the second scale to obtain a result image.

Specifically, the first feature maps with different scales can represent semantic information with different degrees, and in the embodiment of the application, the first scales related to brightness information in each first feature map can be predicted. Then, a second scale of the input image and the brightness information can be determined through the first scale corresponding to each first feature map; the brightness scale of the whole image obtained by final prediction is dependent on the first scale corresponding to each first feature map; where the brightness scale is a feature that includes the brightness and color level of the image as a whole.

The input image in the embodiment of the application may be a mosaic image, and the mosaic image may be a multispectral image. In particular, multispectral imaging may be performed by employing one color filter array (color filter array, CFA) or an extended multispectral filter array (multi-spectral filter array, MSFA) and one image sensor to obtain multispectral images. An image acquired by CFA or MSFA, at least one band component (color component) is acquired per pixel location, and the resulting image is a mosaic image, which may also be referred to as a CFA or MSFA mode multispectral image.

Optionally, a first feature map of different scales of the image may be obtained by a downsampling operation; among them, downsampling (subsampled) is also called downscaling an image, or downsampling (downsampled). In the downsampling operation, for example, a downsampling operation is performed by S times on an image with an image size of m×n, so that a resolution image with an (M/S) -size (N/S) can be obtained. Specifically, in the embodiment of the present application, in order to obtain the first feature maps with different scales, a downsampling operation may be performed multiple times (e.g., at least twice), and the size of the image is continuously reduced to 1/S of the previous image through downsampling, so that the first feature map with multiple scales may be obtained. Alternatively, the downsampling may be achieved by convolution.

The following describes the above steps S101 to S103 in detail with reference to the network structure schematic diagram shown in fig. 3.

Specifically, a luminance scale prediction module shown in fig. 3 is included in the network for performing image processing, and a processing unit (indicated by a chain line frame), a fusion unit, and a local feature extraction unit formed by a feature pyramid network structure are included in the module.

Wherein the pyramid of an image is a series of image sets which are arranged in a pyramid shape, gradually decrease in resolution, and are derived from the same original image. In the pyramid network structure, the sampling is obtained through downsampling in a gradient manner until a certain termination condition is reached (the sampling is stopped until a preset sampling frequency is reached in the embodiment of the application). The higher the level of the pyramid, the smaller the image and the lower the resolution. As shown in fig. 3, the feature extraction unit includes three first feature layers (indicated by a dashed box) arranged in cascade, and each first feature map extracted based on an input image is indicated by a solid box; wherein the solid line with arrows represents the process of feature extraction.

The processing unit may be composed of a full connection layer (Full connection layer, FC layer) and an activation layer (Sigmoid layer), among others. Optionally, the number of fully connected layers and active layers laid out in the processing unit depends on the number of first feature maps.

Wherein the two-dot chain line with an arrow indicates the process of scale prediction.

The fusing unit is configured to perform the fusing operation shown in fig. 3, and may be to sum and average the multiple first scales obtained by the processing unit based on each first feature map.

In a possible embodiment, the step S101 of obtaining the first feature map of at least two different scales of the image includes the following steps A1:

step A1: and performing downsampling operation on the image through at least two first feature layers arranged in cascade in the network to obtain a first feature image output by each first feature layer.

In a possible embodiment, the determining the first scale of each first feature map related to the luminance information in step S102 includes step A2:

step A2: and respectively inputting each first feature map meeting preset conditions into a processing unit connected with a first feature layer of the first feature map in the network, and determining a first scale of the first feature map related to brightness information.

The preset condition may be set based on a receptive field, and when the receptive field of the feature map obtained by the convolution operation satisfies a preset requirement (is sufficiently large), the corresponding first feature map may be regarded as a first feature map satisfying the preset condition.

In a possible embodiment, the determining, in step S103, the second scale of the image related to the brightness information based on the first scale includes the following step A3:

step A3: and calculating the average value of the first scale, and determining the average value as a second scale of the image related to the brightness information.

Specifically, as shown in fig. 3, if the scale prediction is performed by using the first feature maps with 3 scales, 3 first scales can be obtained, and then the 3 scales are added and averaged to obtain a final scale prediction value, that is, a second scale.

The processing unit in the embodiment of the present application is specifically described below with reference to the network structure schematic shown in fig. 4.

Specifically, in order to enable the luminance scale prediction Module provided by the embodiment of the present application to adapt to the Demosaicing task of the multispectral image, the embodiment of the present application improves the Module of the SE (sequential-and-specification, feature compression and Excitation) -Resnet network and then applies the improved Module to the processing unit shown in fig. 3. Wherein the network utilizes a channel attention (channel attention) mechanism to obtain inter-channel correlation (inter-dependency).

In a possible embodiment, the determining the first scale of each first feature map related to the brightness information in step S102 includes the following steps B1-B2:

step B1: at least one of a max-pooling, a mean-pooling and a min-pooling operation is performed for the first feature map of each input.

Step B2: and determining first scales of the first feature graphs respectively related to the brightness information aiming at the first feature graphs after the pooling operation.

In one case, in step B1, all the first feature maps obtained in step S101 may be subjected to a pooling operation; the first feature map obtained in step S101 may be first screened based on a preset condition, and the pooling operation may be performed only for the first feature map satisfying the preset condition.

Alternatively, as shown in fig. 3, the feature map is represented by a solid line box; the first feature map closest to the input image in fig. 3 may be a feature map a subjected to the convolution operation, the second feature map may be a first feature map B subjected to one-time downsampling operation, the third feature map may be a first feature map C subjected to two-time downsampling operation, and the fourth feature map may be a first feature map D subjected to three-time downsampling operation, where the first feature map B, the first feature map C, and the first feature map D may be input to the processing unit to perform the operations of steps B1 to B2 described above.

Specifically, at least one of the average pooling Avg-pooling operation, the Max-pooling operation, and the Min-pooling operation may be used to process features between each channel in feature extraction. In which fig. 4 shows a possible embodiment for feature extraction using a max-pooling operation and a min-pooling operation. The implementation of step B1 may preserve a plurality of scale levels for each local region luminance scale, i.e. medium (corresponding to the mean pooling operation), high (corresponding to the maximum pooling operation), low (corresponding to the minimum pooling operation), whereas the luminance scale prediction for the whole final image is equivalent to a weighted calculation method based on spatial position. Wherein, as shown in fig. 4, reLU and Sigmoid can be understood as the activation layers corresponding to different activation functions; h represents the number of rows of image pixels, W represents the number of columns of image pixels, and C represents the number of channels of the image.

In step B2, firstly, the feature dimension is reduced to 1/r of the input, then the feature dimension is increased to the original dimension through a full connection layer (FC) after being activated by a ReLU, then, the normalized weight with the value range between [0,1] is obtained through a Sigmoid, finally, the normalized weight is weighted to the feature of each channel through Scale operation of Scale prediction, and further, the first Scale related to brightness information in the final image is determined.

In the embodiment of the application, the second scale output by the brightness scale prediction module can be used in demosaicing (Demosaic) processing to improve the accuracy of recovering lost image data. Specifically, the problem of inaccurate interpolation precision when the multi-spectrum image demosaicing is recovered by utilizing a single-channel or multi-channel image can be solved; the demosaicing process may be understood as recovering degraded image content, filling in missing image information, among other things.

Demosained is the most important ring in image signal processing (Image Signal Process, ISP) to convert RAW data into full-color RGB data, and then to relay the converted data for noise reduction, color conversion, and tone mapping. The demosaicing function is to convert the data output by the photosensitive sensor into a complete RGB data format viewable by human eyes for output on a display device. The sensors widely used at the terminals are CMOS sensors, and each pixel of the sensor array covers a color filter array, that is, a CFA color filter array, as shown in fig. 7, so that the data output from the photosensitive element is single-channel two-dimensional data, and each pixel has only one color component, and the data format is Bayer data, so that the data needs to be processed to recover two missing color components of each pixel, and the process is demosaicing. A Bayer-arranged filter pattern may be used on a cellular phone camera, and fig. 8 shows a Bayer-arranged filter pattern and a Bayer-like pattern (a filter pattern output by a sensor similar to Bayer). With the development of mobile phone cameras and AI technology, the need for recovering hyperspectral image data by using low-dimensional image data has arisen. The greater the need to recover the filled image data, the greater the difficulty and challenge.

However, most of the existing demosaicing methods adopt a uret network structure (an image segmentation network structure), the dimension of the image data is improved through interpolation, and then the spectrum value of each dimension is predicted through regression. The method can not make an accurate estimation on the whole brightness scale of the image, so that the multispectral result after regression prediction is inevitably influenced by local brightness and noise to cause deviation, and the interpolation precision accuracy is lower.

In order to solve the problem of inaccurate interpolation precision in demosaicing of image data and improve the interpolation precision of an algorithm, the embodiment of the application also applies a brightness scale prediction module to demosaicing processing, and reduces interference caused by brightness change of a local area and noise to interpolation in the demosaicing process by predicting the whole brightness scale of an image.

The following describes the above step S104 in detail with reference to the network configuration diagrams shown in fig. 5 and 6.

In a possible embodiment, as shown in fig. 2, step S104 of the embodiment of the present application performs demosaicing processing on the image based on the second scale to obtain a result image, and includes the following steps S201 to S202:

step S201: and performing coding operation on the image to obtain coding characteristic information.

Step S202: and combining the coding characteristic information and the second scale, and performing decoding operation on the image to obtain a processed result image.

Specifically, as shown in fig. 5, the luminance-scale prediction process and the encoding process may be performed in parallel for an input image, where a dashed box in the luminance-scale prediction module shows an output result (second scale) of the module, and a dashed box in the demosaicing module shows an output result (encoding feature information) of the encoding module. Then, the second scale may be input into a demosaicing module, and the decoding unit performs a decoding operation on the image in combination with the encoding feature information and the second scale, and finally outputs a resultant image (demosaicing image).

In a possible embodiment the network for image processing comprises a luminance scale prediction module and a demosaicing module.

Specifically, as shown in fig. 6, the brightness scale prediction module includes a local feature extraction unit, a processing unit and a fusion unit; the local feature extraction unit comprises at least two first feature layers (represented by a dotted line box) which are arranged in a cascading manner and is used for outputting a first feature map (represented by a solid line box) through a downsampling operation; the processing unit is used for modeling the correlation of the features among the channels and determining the corresponding first scale of each first feature map; and the fusion unit is used for fusing the output of each processing unit to obtain the second scale.

Specifically, as shown in fig. 5 and 6, the demosaicing module includes an encoding unit and a decoding unit; the coding unit comprises at least two second feature layers (shown by dotted lines) which are arranged in a cascading way and are used for outputting the coding feature information through a downsampling operation, reducing the image size through convolution and downsampling and extracting the feature of shallow display. The decoding unit comprises at least two third feature layers (indicated by dotted lines) arranged in cascade; outputting a second feature layer and a third feature layer corresponding to the feature images with the same scale, wherein the second feature layer and the third feature layer are connected with each other; specifically, based on the connection relationship, a first feature map in the encoding unit can be input to a feature fusion layer which is represented by a black rectangle in the decoding unit, and after feature fusion is completed through the feature fusion layer, the feature fusion layer is output to a third feature layer for up-sampling operation; the third feature layer is configured to perform an upsampling operation based on the second scale (as shown in fig. 6, the second scale may also be input to each feature fusion layer) and the encoded feature information to obtain the result image (demosaic image).

According to the embodiment of the application, the deep features are obtained through convolution and up-sampling operations, the deep and shallow features are combined to refine the image, the second scale is used as an adjustment coefficient, so that the final result image detail is more abundant, the interference of local area brightness operation and noise on interpolation in the demosaicing process is reduced, and the accuracy of the recovered lost image data is improved.

Alternatively, in the demosaicing module shown in fig. 6, a DPN (Deep Image Demosaicing for Submicron Image Sensors, depth image demosaicing of submicron image sensor) network may be employed, which uses a depth cascade network structure and a residual network, and the network may be divided into left and right parts for analysis, and the left part is a compression (also called encoding) process, i.e., an encoding unit corresponding to fig. 5. The image size is reduced by convolution and downsampling to extract some shallow features. The right part is the decoding process, namely the Decoder (corresponding to the decoding unit of fig. 5). Some deep features are obtained through convolution and up-sampling, and coding feature information (corresponding to a second feature map) obtained in a coding stage and a third feature map obtained in a decoding stage are combined through Skip connection operation, so that deep and shallow feature refinement images are combined, and details of a final result image are richer.

In a possible embodiment, the encoding operation is performed on the image in step S201 to obtain encoding feature information, which includes the following step C1:

step C1: performing downsampling operation on the image through at least two cascade-arranged second feature layers to obtain a second feature image output by each second feature layer; wherein the set of second feature maps forms encoded feature information.

Specifically, in the second feature layers arranged in cascade, the output of the second feature layer of the previous level is used as the input of the second feature layer of the current level. As for the second feature layer ordered at the third layer, its input is the second feature map of the second feature layer output of the second layer. It will be appreciated that each second feature layer has a respective output second feature map, including a set of all second feature maps corresponding to encoded feature information.

In a possible embodiment, in step S202, the decoding operation is performed on the image in combination with the encoding feature information and the second scale to obtain a processed result image, which includes the following step C2:

step C2: performing up-sampling operation on the image based on the coding feature information and the second scale through at least two third feature layers arranged in a cascading manner to obtain a processed result image; the input of each third feature layer comprises a second feature map of a second feature layer output with a corresponding level to the third feature layer, a third feature map of a third feature layer output of a previous level, and the second scale.

Specifically, illustrated with the third feature layer ordered at the third layer, the inputs of the third feature layer include: the second feature map output by the second feature layer ordered at the third layer, the third feature map output by the third feature layer ordered at the second layer, and the second scale output by the processing unit.

Alternatively, as shown in fig. 6, after the last third feature layer outputs the third feature map, a final output demosaicing image may be determined based on the second feature map obtained by performing convolution processing on the input image and the second scale.

Alternatively, the local feature extraction unit in the luma scale prediction module and the encoding unit in the demosaicing module may adopt the same network structure.

The embodiment of the application provides a brightness scale prediction module of an image, which can predict the overall brightness scale of the image by utilizing multi-scale characteristics of the image. Meanwhile, in order to enable the module to be suitable for a demosaicing task, an improved network structure based on a channel attention mechanism is provided, so that deviation of interpolation accuracy is reduced. On the basis, compared with the existing demosaicing method, the brightness scale of the whole predicted image is applied to demosaicing processing, and a more accurate interpolation result can be obtained.

In order to better illustrate the effects obtained by the examples of the present application, the following description is made in connection with the experimental data shown in table 1:

TABLE 1

Table 1 shows the accuracy performance of the method provided by the embodiment of the present application compared with the existing method on open source data. Experimental results show that the method provided by the embodiment of the application is obviously superior to the existing demosaicing method. In the experiment, resnet34 is used as the existing demosaicing method, the PSNR (peak signal to noise ratio) result of the test set is 38.16dB, and the PSNR result of the method provided by the embodiment of the application in the test set is 45.00dB, and the experimental result shows that the implementation of the method is beneficial to improving the result precision after interpolation. Wherein a higher value of PSNR indicates better performance.

An embodiment of the present application provides an image processing apparatus, as shown in fig. 9, the image processing apparatus 100 may include: an acquisition module 101, a first determination module 102, a second determination module 103 and a processing module 104.

Wherein, the obtaining module 101 is configured to obtain at least two first feature maps with different scales of the image; the first determining module 102 is configured to determine a first scale of each first feature map related to luminance information; a second determining module 103 is configured to determine a second scale of the image related to luminance information based on the first scale; the processing module 104 is configured to perform demosaicing processing on the image based on the second scale, to obtain a result image.

In a possible embodiment, the obtaining module 101 is specifically configured to, when configured to perform the first feature map of at least two different scales of the obtained image:

downsampling the image through at least two first feature layers arranged in cascade in the network to obtain a first feature image output by each first feature layer;

the first determining module 102, when configured to perform determining a first scale that each first feature map is related to luminance information, is specifically configured to:

and respectively inputting each first feature map meeting preset conditions into a processing unit connected with a first feature layer of the first feature map in the network, and determining a first scale of the first feature map related to brightness information.

In a possible embodiment, the first determining module 102 is specifically configured to, when configured to determine the first scale of each first feature map related to luminance information,:

performing at least one of a maximum pooling, a mean pooling and a minimum pooling operation for the first feature map of each input;

and determining first scales of the first feature graphs respectively related to the brightness information aiming at the first feature graphs after the pooling operation.

In a possible embodiment, the second determining module 103 is specifically configured to, when configured to perform determining, based on the first scale, a second scale of the image related to luminance information:

and calculating the average value of the first scale, and determining the average value as a second scale of the image related to the brightness information.

In a possible embodiment, the processing module 104 is specifically configured to, when configured to perform a demosaicing process on the image based on the second scale, obtain a resultant image:

performing coding operation on the image to obtain coding characteristic information;

and combining the coding characteristic information and the second scale, and performing decoding operation on the image to obtain a processed result image.

In a possible embodiment, the network for image processing includes a luminance scale prediction module and a demosaicing module;

The brightness scale prediction module comprises a local feature extraction unit, a processing unit and a fusion unit; the local feature extraction unit comprises at least two first feature layers which are arranged in cascade, and is used for outputting a first feature map through downsampling operation; the processing unit is used for determining a corresponding first scale of each first characteristic diagram; the fusion unit is used for fusing the output of each processing unit to obtain the second scale;

the demosaicing module comprises an encoding unit and a decoding unit; the coding unit comprises at least two second feature layers which are arranged in cascade and is used for outputting the coding feature information through downsampling operation; the decoding unit comprises at least two third feature layers which are arranged in cascade, and the third feature layers are used for carrying out up-sampling operation on the basis of the second scale and the coding feature information to obtain the result image.

In a possible embodiment, the processing module 104 is configured to, when configured to perform an encoding operation on the image, obtain encoding feature information, specifically:

performing downsampling operation on the image through at least cascade-arranged second feature layers to obtain second feature images output by the second feature layers; wherein the set of second feature maps forms encoded feature information;

The processing module 104 is configured to, when executing a decoding operation on the image to obtain a processed result image by combining the encoding feature information and the second scale, specifically:

performing up-sampling operation on the image based on the coding feature information and the second scale through at least two third feature layers arranged in a cascading manner to obtain a processed result image; the input of each third feature layer comprises a second feature map of a second feature layer output with a corresponding level to the third feature layer, a third feature map of a third feature layer output of a previous level, and the second scale.

The device of the embodiment of the present application may perform the method provided by the embodiment of the present application, and its implementation principle is similar, and actions performed by each module in the device of the embodiment of the present application correspond to steps in the method of the embodiment of the present application, and detailed functional descriptions of each module of the device may be referred to the descriptions in the corresponding methods shown in the foregoing, which are not repeated herein.

The embodiment of the application provides an electronic device, which comprises a memory, a processor and a computer program stored on the memory, wherein the processor executes the computer program to realize the steps of an image processing method, and compared with the related technology, the method can realize the following steps: according to the method, at least two first feature images with different scales are obtained aiming at the input image, so that the features with different scales of the image can be extracted; then, determining first scales of the first feature images respectively related to the brightness information, and determining second scales of the image related to the brightness information based on the first scales, namely, determining the second scales based on a plurality of first scales; on the basis, demosaicing processing can be carried out on the image based on the second scale, and a result image is obtained. The embodiment of the application predicts the overall brightness scale of the image by utilizing the multi-scale characteristic of the image, namely the second scale, and the implementation is beneficial to improving the accuracy of the overall brightness scale prediction of the image so as to improve the precision of the demosaicing image processing result.

In an alternative embodiment, there is provided an electronic device, as shown in fig. 10, the electronic device 4000 shown in fig. 10 includes: a processor 4001 and a memory 4003. Wherein the processor 4001 is coupled to the memory 4003, such as via a bus 4002. Optionally, the electronic device 4000 may further comprise a transceiver 4004, the transceiver 4004 may be used for data interaction between the electronic device and other electronic devices, such as transmission of data and/or reception of data, etc. It should be noted that, in practical applications, the transceiver 4004 is not limited to one, and the structure of the electronic device 4000 is not limited to the embodiment of the present application.

The processor 4001 may be a CPU (Central Processing Unit ), general purpose processor, DSP (Digital Signal Processor, data signal processor), ASIC (Application Specific Integrated Circuit ), FPGA (Field Programmable Gate Array, field programmable gate array) or other programmable logic device, transistor logic device, hardware components, or any combination thereof. Which may implement or perform the various exemplary logic blocks, modules and circuits described in connection with this disclosure. The processor 4001 may also be a combination that implements computing functionality, e.g., comprising one or more microprocessor combinations, a combination of a DSP and a microprocessor, etc.

Bus 4002 may include a path to transfer information between the aforementioned components. Bus 4002 may be a PCI (Peripheral Component Interconnect, peripheral component interconnect standard) bus or an EISA (Extended Industry Standard Architecture ) bus, or the like. The bus 4002 can be divided into an address bus, a data bus, a control bus, and the like. For ease of illustration, only one thick line is shown in fig. 10, but not only one bus or one type of bus.

Memory 4003 may be, but is not limited to, ROM (Read Only Memory) or other type of static storage device that can store static information and instructions, RAM (Random Access Memory ) or other type of dynamic storage device that can store information and instructions, EEPROM (Electrically Erasable Programmable Read Only Memory ), CD-ROM (Compact Disc Read Only Memory, compact disc Read Only Memory) or other optical disk storage, optical disk storage (including compact discs, laser discs, optical discs, digital versatile discs, blu-ray discs, etc.), magnetic disk storage media, other magnetic storage devices, or any other medium that can be used to carry or store a computer program and that can be Read by a computer.

The memory 4003 is used for storing a computer program for executing an embodiment of the present application, and is controlled to be executed by the processor 4001. The processor 4001 is configured to execute a computer program stored in the memory 4003 to realize the steps shown in the foregoing method embodiment.

The processor may include one or more processors. At this time, the one or more processors may be general-purpose processors (e.g., central Processing Units (CPUs), application Processors (APs), etc.), or purely graphics processing units (e.g., graphics Processing Units (GPUs), vision Processing Units (VPUs), and/or AI-specific processors (e.g., neural Processing Units (NPUs)).

The one or more processors control the processing of the input data according to predefined operating rules or Artificial Intelligence (AI) models stored in the non-volatile memory and the volatile memory. Predefined operational rules or artificial intelligence models are provided through training or learning.

Here, providing by learning refers to deriving a predefined operation rule or an AI model having a desired characteristic by applying a learning algorithm to a plurality of learning data. The learning may be performed in the apparatus itself in which the AI according to the embodiment is performed, and/or may be implemented by a separate server/system.

The AI model may be comprised of layers comprising a plurality of neural networks. Each layer has a plurality of weight values, and the calculation of one layer is performed by the calculation result of the previous layer and the plurality of weights of the current layer. Examples of neural networks include, but are not limited to, convolutional Neural Networks (CNNs), deep Neural Networks (DNNs), recurrent Neural Networks (RNNs), boltzmann machines limited (RBMs), deep Belief Networks (DBNs), bi-directional recurrent deep neural networks (BRDNNs), generation countermeasure networks (GANs), and deep Q networks.

A learning algorithm is a method of training a predetermined target device (e.g., a robot) using a plurality of learning data so that, allowing, or controlling the target device to make a determination or prediction. Examples of such learning algorithms include, but are not limited to, supervised learning, unsupervised learning, semi-supervised learning, or reinforcement learning.

Embodiments of the present application provide a computer readable storage medium having a computer program stored thereon, which when executed by a processor, implements the steps of the foregoing method embodiments and corresponding content.

The embodiment of the application also provides a computer program product, which comprises a computer program, wherein the computer program can realize the steps and corresponding contents of the embodiment of the method when being executed by a processor.

The terms "first," "second," "third," "fourth," "1," "2," and the like in the description and in the claims and in the above figures, if any, are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged where appropriate, such that the embodiments of the application described herein may be implemented in other sequences than those illustrated or otherwise described.

It should be understood that, although various operation steps are indicated by arrows in the flowcharts of the embodiments of the present application, the order in which these steps are implemented is not limited to the order indicated by the arrows. In some implementations of embodiments of the application, the implementation steps in the flowcharts may be performed in other orders as desired, unless explicitly stated herein. Furthermore, some or all of the steps in the flowcharts may include multiple sub-steps or multiple stages based on the actual implementation scenario. Some or all of these sub-steps or phases may be performed at the same time, or each of these sub-steps or phases may be performed at different times, respectively. In the case of different execution time, the execution sequence of the sub-steps or stages can be flexibly configured according to the requirement, which is not limited by the embodiment of the present application.

The foregoing is merely an optional implementation manner of some of the implementation scenarios of the present application, and it should be noted that, for those skilled in the art, other similar implementation manners based on the technical ideas of the present application are adopted without departing from the technical ideas of the scheme of the present application, and the implementation manner is also within the protection scope of the embodiments of the present application.

Claims

1. An image processing method, comprising:

obtaining at least two first feature maps of different scales of an image;

and demosaicing the image based on the second scale to obtain a result image.

2. The method of claim 1, wherein obtaining a first feature map of at least two different scales of an image comprises:

the determining the first scale of each first feature map related to the brightness information respectively includes:

3. The method according to claim 1 or 2, wherein determining the first scale of each first feature map relative to the luminance information comprises:

4. The method of claim 1, wherein the determining a second scale of the image related to luminance information based on the first scale comprises:

5. The method of claim 1, wherein demosaicing the image based on the second scale results in a resultant image, comprising:

6. The method of claim 5, wherein the network for image processing comprises a luma scale prediction module and a demosaicing module;

7. The method of claim 5, wherein the step of determining the position of the probe is performed,

the encoding operation is performed on the image to obtain encoding characteristic information, which comprises the following steps:

performing downsampling operation on the image through at least two cascade-arranged second feature layers to obtain a second feature image output by each second feature layer; wherein the set of second feature maps forms encoded feature information;

And performing decoding operation on the image by combining the coding feature information and the second scale to obtain a processed result image, wherein the decoding operation comprises the following steps:

8. An image processing apparatus, comprising:

9. An electronic device comprising a memory, a processor and a computer program stored on the memory, characterized in that the processor executes the computer program to carry out the steps of the method according to any one of claims 1-7.

10. A computer readable storage medium, on which a computer program is stored, characterized in that the computer program, when being executed by a processor, implements the steps of the method according to any of claims 1-7.

11. A computer program product comprising a computer program, characterized in that the computer program, when being executed by a processor, implements the steps of the method according to any of claims 1-7.