CN110781893B

CN110781893B - Feature map processing method, image processing method, device and storage medium

Info

Publication number: CN110781893B
Application number: CN201910906974.9A
Authority: CN
Inventors: 崔婵婕; 任宇鹏; 卢维
Original assignee: Zhejiang Dahua Technology Co Ltd
Current assignee: Zhejiang Dahua Technology Co Ltd
Priority date: 2019-09-24
Filing date: 2019-09-24
Publication date: 2022-06-07
Anticipated expiration: 2039-09-24
Also published as: CN110781893A

Abstract

The application discloses a processing method of a characteristic diagram, an image processing method, a device and a storage medium, wherein the processing method of the characteristic diagram comprises the following steps: acquiring an input feature map; wherein, the input characteristic diagram is a characteristic diagram obtained after the convolution network processing; determining an attention weight of each channel of the input feature map based on the spatial dimension information and the channel information of the input feature map; and processing the input feature map by using the attention weight to obtain an output feature map. By the method, when the attention weight is formed, the channel information of the characteristic diagram is considered, and the algorithm precision is greatly improved under the condition of a small amount of calculated amount and memory cost.

Description

Feature map processing method, image processing method, device and storage medium

Technical Field

The present application relates to the field of image processing technologies, and in particular, to a method, an apparatus, and a storage medium for processing a feature map.

Background

The task of semantic segmentation is to classify each pixel in an image, belongs to a classical dense prediction algorithm, and has the core technology of accurate extraction of a feature layer and accurate recovery of detail information. The application of Full Convolution Networks (FCN) greatly improves the accuracy of the semantic segmentation algorithm. The convolutional layer obtains output characteristics through linear combination of convolutional kernels and original characteristics, and a method of stacking the convolutional layers is often adopted to increase the receptive field and obtain long-distance semantic information, but the processing method is not efficient.

Disclosure of Invention

In order to solve the above problems, the present application provides a feature map processing method, an image processing apparatus, and a storage medium, which can greatly improve the algorithm accuracy at the cost of a small amount of computation and memory, in consideration of channel information of a feature map when forming the attention weight.

The technical scheme adopted by the application is as follows: a method for processing a feature map is provided, and comprises the following steps: acquiring an input feature map; wherein, the input characteristic diagram is a characteristic diagram obtained after the convolution network processing; determining an attention weight of each channel of the input feature map based on the spatial dimension information and the channel information of the input feature map; and processing the input feature map by using the attention weight to obtain an output feature map.

Wherein, based on the spatial dimension information and the channel information of the input feature map, determining the attention weight of each channel of the input feature map comprises: carrying out triple modeling on the input feature diagram to obtain a first feature diagram copy, a second feature diagram copy and a third feature diagram copy; determining channel weight of each channel of the input feature map based on the space dimension information and the channel information of the input feature map; and determining the attention weight of each channel of the input feature map based on the first feature map copy, the second feature map copy and the channel weight.

The method for determining the channel weight of each channel of the input feature map based on the space dimension information and the channel information of the input feature map comprises the following steps: and carrying out global average pooling processing and full convolution processing on the input feature map to obtain the channel weight of each channel of the input feature map.

The triple modeling of the input feature diagram to obtain a first feature diagram copy, a second feature diagram copy and a third feature diagram copy comprises the following steps: performing dimensionality reduction processing and 1 x 1 convolution processing on the input feature graph to generate a first feature graph copy and a second feature graph copy; and performing 1 × 1 convolution processing on the input feature map to generate a third feature map copy.

Wherein determining the attention weight of each channel of the input feature map based on the first feature map copy, the second feature map copy and the channel weight comprises: multiplying each element of each channel in the first feature map copy with an element at a corresponding position in the second feature map copy respectively to obtain an intermediate value; and weighting the intermediate value by using the channel weight of the corresponding channel to obtain the attention weight of the corresponding channel.

Wherein, processing the input feature map by using the attention weight to obtain an output feature map comprises: summing the attention weights of each channel, and performing normalization processing to obtain an attention diagram; obtaining an intermediate feature map based on the third feature map copy and the attention map; and obtaining an output characteristic diagram based on the input characteristic diagram and the intermediate characteristic diagram.

Wherein, based on the spatial dimension information and the channel information of the input feature map, determining the attention weight of each channel of the input feature map comprises: carrying out triple modeling on the input feature diagram to obtain a first feature diagram copy, a second feature diagram copy and a third feature diagram copy; and obtaining the attention weight of each channel based on the first feature map copy and the second feature map copy.

The triple modeling of the input feature diagram to obtain a first feature diagram copy, a second feature diagram copy and a third feature diagram copy comprises the following steps: and performing dimension reduction processing on the input feature diagram to generate a first feature diagram copy, a second feature diagram copy and a third feature diagram copy.

Wherein, processing the input feature map by using the attention weight to obtain an output feature map comprises: normalizing the attention weight of each channel to obtain an attention diagram; obtaining a first intermediate feature map based on the third feature map copy and the attention map; performing dimension-increasing processing on the first intermediate characteristic diagram to obtain a second intermediate characteristic diagram; and obtaining an output characteristic diagram based on the input characteristic diagram and the second intermediate characteristic diagram.

Another technical scheme adopted by the application is as follows: there is provided an image processing method including: acquiring an image to be processed; coding an image to be processed to obtain an input characteristic diagram; extracting semantic information from the input feature map by a method according to any one of claims 1 to 9 to obtain an output feature map; and decoding the output characteristic graph to obtain a processed image.

Another technical scheme adopted by the application is as follows: there is provided an image processing apparatus comprising a processor and a memory interconnected, the memory for storing program data, the processor for executing the program data to implement a method as described above.

Another technical scheme adopted by the application is as follows: provided is an image processing apparatus including: the encoding module is used for encoding the image to be processed to obtain an input characteristic diagram; the processing module is used for extracting semantic information of the input feature map by adopting the method so as to obtain an output feature map; and the decoding module is used for decoding the output characteristic graph to obtain a processed image.

Wherein, the coding module includes: a plurality of convolutional layers; a downsampling layer; combining a plurality of convolution layers; the decoding module includes: the first up-sampling layer is connected with the processing module; a data connection layer connecting a first one of the plurality of convolution layer combinations and the upsampling layer, for connecting data output by the first one of the plurality of convolution layer combinations and the upsampling layer; a plurality of convolutional layers; a second upsampling layer.

Another technical scheme adopted by the application is as follows: there is provided a computer storage medium having stored therein program data for, when executed by a processor, implementing a method as described above.

The processing method of the characteristic diagram provided by the application comprises the following steps: acquiring an input feature map; determining an attention weight of each channel of the input feature map based on the spatial dimension information and the channel information of the input feature map; and processing the input feature map by using the attention weight to obtain an output feature map. By the method, when the attention weight is formed, the channel information of the characteristic diagram is considered, and the algorithm precision is greatly improved under the condition of a small amount of calculated amount and memory cost.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts. Wherein:

fig. 1 is a schematic structural diagram of an image processing apparatus according to an embodiment of the present application;

FIG. 2 is a schematic flowchart of an image processing method provided in an embodiment of the present application;

FIG. 3 is a flow chart of a processing method of a feature map provided in an embodiment of the present application;

FIG. 4 is another schematic flow chart diagram of a processing method of a feature map provided in an embodiment of the present application;

fig. 5 is a network diagram of a processing method of a feature diagram provided in an embodiment of the present application;

FIG. 6 is a schematic flow chart of a processing method of a feature map provided in an embodiment of the present application;

fig. 7 is another network diagram of a processing method of a feature diagram provided in an embodiment of the present application;

fig. 8 is another schematic structural diagram of an image processing apparatus according to an embodiment of the present application;

fig. 9 is a schematic structural diagram of a computer storage medium provided in an embodiment of the present application.

Detailed Description

The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application. It is to be understood that the specific embodiments described herein are merely illustrative of the application and are not limiting of the application. It should be further noted that, for the convenience of description, only some of the structures related to the present application are shown in the drawings, not all of the structures. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

The terms "first", "second", etc. in this application are used to distinguish between different objects and not to describe a particular order. Furthermore, the terms "include" and "have," as well as any variations thereof, are intended to cover non-exclusive inclusions. For example, a process, method, system, article, or apparatus that comprises a list of steps or elements is not limited to only those steps or elements listed, but may alternatively include other steps or elements not listed, or inherent to such process, method, article, or apparatus.

Reference herein to "an embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment can be included in at least one embodiment of the application. The appearances of the phrase in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments. It is explicitly and implicitly understood by one skilled in the art that the embodiments described herein can be combined with other embodiments.

Referring to fig. 1 and fig. 2, fig. 1 is a schematic structural diagram of an image processing apparatus according to an embodiment of the present application, and fig. 2 is a schematic flow diagram of an image processing method according to an embodiment of the present application. The apparatus 11 comprises an encoding module 11, a processing module 12 and a decoding module 13. The encoding module 11 is configured to encode the acquired image, the processing module 12 is configured to process the encoded image, and the decoding module 13 is configured to decode the processed image. The image processing method comprises the following steps:

Step 21: and acquiring an image to be processed.

Step 22: and coding the image to be processed to obtain an input characteristic diagram.

Step 23: and extracting semantic information from the input feature map to obtain an output feature map.

Step 24: and decoding the output characteristic graph to obtain a processed image.

In this embodiment, the encoding is performed by the encoding module 11, the processing is performed by the processing module 12, and the decoding is performed by the decoding module 13.

The coding module 11 may adopt a network Deep Learning model, the network model is a carrier for Deep Learning (DL), the Deep Learning is one of the technology and research field of machine Learning, and artificial Neural Networks (ans) with hierarchical structures are established to realize artificial intelligence in a computing system. Because the hierarchical ANN can extract and screen the input information layer by layer, the deep learning has the capability of representation learning (representation learning), and end-to-end supervised learning and unsupervised learning can be realized. In addition, deep learning may also participate in building a reinforcement learning (learning) system, forming deep reinforcement learning.

Optionally, this embodiment may use a restnet101 network model, which includes multiple convolution layers, a downsampled layer (pooling layer), and multiple convolution layer combinations. The convolution layer has the function of extracting features of input data, and comprises a plurality of convolution kernels, and each element forming a convolution kernel corresponds to a weight coefficient and a bias vector (bias vector), and is similar to a neuron (neuron) of a feedforward neural network. Each neuron in a convolution layer is connected to a plurality of neurons in a closely located region in a previous layer, the size of the region being dependent on the size of the convolution kernel.

After feature extraction is performed on the convolutional layer, the output feature map is transmitted to the pooling layer for feature selection and information filtering. The pooling layer contains a pre-set pooling function whose function is to replace the result of a single point in the feature map with the feature map statistics of its neighboring regions. The step of selecting the pooling area by the pooling layer is the same as the step of scanning the characteristic diagram by the convolution kernel, and the pooling size, the step length and the filling are controlled.

And after the processing of the pooling layer, performing feature extraction by using a resnet residual error network to form a feature map.

In particular, the coding module 11 comprises in particular a first convolutional layer, a second convolutional layer, a third convolutional layer, a downsampling layer, a first convolutional layer combination, a second convolutional layer combination, a third convolutional layer combination and a fourth convolutional layer combination.

Alternatively, the second convolutional layer and the third convolutional layer may be replaced by hole convolution with expansion coefficients of 2 and 4, respectively, and the output characteristic image size is 1/8 of the original image.

The method performed by the processing module 12 is described below.

Referring to fig. 3, fig. 3 is a schematic flowchart of a processing method of a feature map provided in an embodiment of the present application, where the method includes:

step 31: and acquiring an input feature map.

The input feature map is a feature map that is output after being encoded by the encoding module 11.

Step 32: based on the spatial dimension information and the channel information of the input feature map, an attention weight of each channel of the input feature map is determined.

The feature map has three dimensions of a channel (channel), a height (height) and a width (width), wherein the height and the width are spatial dimensions. In the prior art, only the spatial dimension information is considered when determining the attention weight, and the channel information is considered in the embodiment. The following examples will detail this step.

Step 33: and processing the input feature map by using the attention weight to obtain an output feature map.

The acquisition and processing of the attention weight is described below in two embodiments.

Referring to fig. 4 and 5, fig. 4 is another schematic flow chart of a processing method of a feature map provided in an embodiment of the present application, and fig. 5 is a network diagram of a processing method of a feature map provided in an embodiment of the present application, where the method includes:

step 41: and acquiring an input feature map.

Step 42: and carrying out triple modeling on the input feature diagram to obtain a first feature diagram copy, a second feature diagram copy and a third feature diagram copy.

As shown in fig. 5, the triple modeling of the input feature map generates a feature map H (H: (H) (H))

C is the channel, H is the height, W is the width) Q, K and V (query, key, value).

Optionally, performing dimensionality reduction processing and 1 × 1 convolution processing on the input feature map to generate a first feature map copy and a second feature map copy; and performing 1-to-1 convolution processing on the input feature map to generate a third feature map copy.

Specifically, dimension reduction processing is performed on the feature map H, and Q, K two feature maps are generated respectively after 1 × 1 convolution processing

Performing 1-by-1 convolution on the characteristic diagram H to generate a characteristic diagram V consistent with the size of the characteristic diagram H

Step 43: and determining the channel weight of each channel of the input feature map based on the spatial dimension information and the channel information of the input feature map.

In this embodiment, in addition to the above-mentioned three branches Q, K and V, the input feature map is subjected to global average pooling and full convolution processing through another branch, so as to obtain the channel weight of each channel of the input feature map.

Specifically, the feature graph H is subjected to global average pooling processing and full convolution processing with pooling kernel size H W, and channel weight S is output

Step 44: and determining the attention weight of each channel of the input feature map based on the first feature map copy, the second feature map copy and the channel weight.

Optionally, multiplying each element of each channel in the first feature map copy by an element at a corresponding position in the second feature map copy to obtain an intermediate value; and weighting the intermediate value by using the channel weight of the corresponding channel to obtain the attention weight of the corresponding channel.

Specifically, each element of each channel in Q is multiplied by a single row and column of elements in K, and the corresponding attention weight A is obtained by multiplying each channel element by the value in the corresponding weight S

Summing the channels for the corresponding positions and normalizing them with the softmax function yields an attention-seeking diagram a ″

Step 45: and processing the input feature map by using the attention weight to obtain an output feature map.

Summing the attention weights of all channels, and performing normalization processing to obtain an attention diagram; obtaining an intermediate feature map based on the third feature map copy and the attention map; and obtaining an output characteristic diagram based on the input characteristic diagram and the intermediate characteristic diagram.

Specifically, R

The value of each position in is equal to the sum of the products of all the elements in V that are in the same row and column with the element of the corresponding position in a', and finally adding R and H yields the feature map H

Different from the prior art, the processing method of the feature map provided by the embodiment includes: acquiring an input feature map; determining an attention weight of each channel of the input feature map based on the channel information of the input feature map; and processing the input feature map by using the attention weight to obtain an output feature map. Through the method, the channel weighting is carried out in the process of calculating the attention weight to realize the fusion of the channel information, and the algorithm precision is greatly improved at the cost of a small amount of calculated amount and memory.

Referring to fig. 6 and 7, fig. 6 is a schematic flowchart of another processing method of a feature map provided in an embodiment of the present application, and fig. 7 is another network diagram of the processing method of the feature map provided in the embodiment of the present application, where the method includes:

Step 61: and acquiring an input feature map.

Step 62: and carrying out triple modeling on the input feature diagram to obtain a first feature diagram copy, a second feature diagram copy and a third feature diagram copy.

As shown in FIG. 7, performing triple modeling on the input feature maps generates the feature maps respectivelyH(

C is channel, H is height, W is width) Q, K and V (query, key, value).

Optionally, the input feature map is subjected to dimension reduction processing, and a first feature map copy, a second feature map copy and a third feature map copy are generated.

Specifically, dimension reduction processing is performed on the feature map H, three Q, K, V feature maps are respectively generated after 1 × 1 convolution processing, and the number of channels is reduced to 1/8 of the feature map H

And step 63: and obtaining the attention weight of each channel based on the first feature map copy and the second feature map copy.

Specifically, each element of each channel in Q is multiplied by an element of K in the same row and column of the same channel, respectively, to obtain a corresponding attention weight M, which is 1/8 × C (H + W-1) × H.

Step 64: and (4) normalizing the attention weight of each channel to obtain an attention map.

Specifically, normalizing its attention weight M with the softmax function generates an attention map a.

Step 65: and obtaining a first intermediate feature map based on the third feature map copy and the attention map.

Specifically, the value of each position in R (first intermediate feature map) is equal to the sum of the products of all elements in V in the same row and column as the corresponding channel and the element in the corresponding position in a.

And step 66: and performing dimension-raising processing on the first intermediate characteristic diagram to obtain a second intermediate characteristic diagram.

Specifically, 1 × 1 convolution is performed on R to raise the feature map R' (the second intermediate feature map) of the C channel.

Step 67: and obtaining an output feature map based on the input feature map and the second intermediate feature map.

Wherein, the R 'and the H are added to obtain a characteristic diagram H' (output characteristic diagram).

Different from the prior art, the processing method of the feature map provided by the embodiment includes: acquiring an input feature map; determining an attention weight of each channel of the input feature map based on the spatial dimension information and the channel information of the input feature map; and processing the input feature map by using the attention weight to obtain an output feature map. By the mode, the channel semantic information and the space semantic information are fused into the multidimensional attention mechanism, the aim of simultaneously considering the channel semantic information and the space semantic information is fulfilled, and the structure is more uniform.

The feature map can be processed in the above manner, and the feature map is decoded after being processed. The decoding module 13 of this embodiment is designed based on the encoding module 11, and the decoding module 13 includes a first upsampling layer, a data connection layer, a plurality of convolutional layers, and a second upsampling layer.

Specifically, the decoding module 13 specifically includes: a first upsampling layer connecting the last of the plurality of convolution layer combinations; a data connection layer connecting a first one of the plurality of convolution layer combinations and the upsampling layer, for connecting data output by the first one of the plurality of convolution layer combinations and the upsampling layer; a plurality of convolutional layers; a second upsampling layer.

In order to fully utilize the spatial position information in the low-level feature layer, the present proposal connects the first convolutional layer combination of the encoding module 11 to the data connection layer after the first up-sampling layer of the decoding module 13.

The specific decoding is as follows:

first upsampling layer (upsamplle 1): the input is the output of the processing module 12, and the output is the original drawing 1/4; data connection layer (Concate): connecting the output results of the first convolution layer combination and the first up-sampling layer; fourth convolution layer (Cat _ conv): consisting of two consecutive convolutions of 3 x 512 and 0.1 Dropout (random deactivation) layers; fifth convolution layer (Cls _ conv): the convolution kernel is 1 x 1, and the segmentation result is output in the size of the original image 1/8; second upsampling layer (upsamplle 2): and upsampling the segmentation result of the previous layer into the size of the original image.

It is understood that the image processing and feature map processing described above are mainly applied to semantic segmentation. In most semantic segmentation networks, the updated values of all parameters in the network can only be learned from loss constructed by the final segmentation result, and the intermediate result is difficult to supervise. Optionally, in another embodiment, in order to improve the accuracy of feature extraction and thus improve the accuracy of the segmentation result, a convolution operation is added in the feature extraction process of the encoding module 11 (such as a third convolution layer combination) to perform segmentation result prediction. And (3) calculating the prediction result of the third convolution layer combination to obtain a loss recorded as loss1, outputting the loss calculated by the segmentation result by the decoding module 13 to obtain a loss recorded as loss2, and finally obtaining a loss of 0.4 loss1+0.6 loss 2. By the mode, the low-order feature layer rich in the spatial position information is connected to the high-order feature layer rich in the semantic information, and the accuracy of the segmentation result is improved.

Referring to fig. 8, fig. 8 is another schematic structural diagram of an image processing apparatus provided in an embodiment of the present application, where the image processing apparatus 80 includes a processor 81 and a memory 82 connected to each other, the memory 82 is used for storing program data, and the processor 81 is used for executing the program data to implement the following methods:

Acquiring an image to be processed; coding an image to be processed to obtain an input characteristic diagram; extracting semantic information from the input feature map to obtain an output feature map; and decoding the output characteristic graph to obtain a processed image.

Optionally, when extracting semantic information from the feature map, the method is specifically configured to: acquiring an input feature map; determining an attention weight of each channel of the input feature map based on the spatial dimension information and the channel information of the input feature map; and processing the input feature map by using the attention weight to obtain an output feature map.

Optionally, the processor 81 is further configured to execute the sequence data to implement the following method: carrying out triple modeling on the input feature diagram to obtain a first feature diagram copy, a second feature diagram copy and a third feature diagram copy; determining a channel weight of each channel of the input feature map based on the channel information of the input feature map; and determining the attention weight of each channel of the input feature map based on the first feature map copy, the second feature map copy and the channel weight.

Optionally, the processor 81 is further configured to execute the sequence data to implement the following method: and carrying out global average pooling processing and full convolution processing on the input feature map to obtain the channel weight of each channel of the input feature map.

Optionally, the processor 81 is further configured to execute the run-time data to implement the following method: performing dimensionality reduction processing and 1 x 1 convolution processing on the input feature graph to generate a first feature graph copy and a second feature graph copy; and performing 1 × 1 convolution processing on the input feature map to generate a third feature map copy.

Optionally, the processor 81 is further configured to execute the sequence data to implement the following method: multiplying each element of each channel in the first feature map copy with an element at a corresponding position in the second feature map copy respectively to obtain an intermediate value; and weighting the intermediate value by using the channel weight of the corresponding channel to obtain the attention weight of the corresponding channel.

Optionally, the processor 81 is further configured to execute the sequence data to implement the following method: summing the attention weights of each channel, and performing normalization processing to obtain an attention diagram; obtaining an intermediate feature map based on the third feature map copy and the attention map; and obtaining an output characteristic diagram based on the input characteristic diagram and the intermediate characteristic diagram.

Optionally, the processor 81 is further configured to execute the sequence data to implement the following method: carrying out triple modeling on the input feature diagram to obtain a first feature diagram copy, a second feature diagram copy and a third feature diagram copy; and obtaining the attention weight of each channel based on the first feature map copy and the second feature map copy.

Optionally, the processor 81 is further configured to execute the run-time data to implement the following method: and performing dimension reduction processing on the input feature diagram to generate a first feature diagram copy, a second feature diagram copy and a third feature diagram copy.

Optionally, the processor 81 is further configured to execute the run-time data to implement the following method: normalizing the attention weight of each channel to obtain an attention diagram; obtaining a first intermediate feature map based on the third feature map copy and the attention map; performing dimension-raising processing on the first intermediate characteristic diagram to obtain a second intermediate characteristic diagram; and obtaining an output characteristic diagram based on the input characteristic diagram and the second intermediate characteristic diagram.

Referring to fig. 9, fig. 9 is a schematic structural diagram of a computer storage medium according to an embodiment of the present application, where the computer storage medium 90 stores program data 91, and the program data 91, when executed by a processor, is used to implement the following method:

In the several embodiments provided in the present application, it should be understood that the disclosed method and apparatus may be implemented in other manners. For example, the above-described device embodiments are merely illustrative, and for example, the division of the modules or units is only one logical division, and other divisions may be realized in practice, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed.

The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the embodiment.

In addition, functional units in the embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units may be integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.

The integrated units in the other embodiments described above may be stored in a computer-readable storage medium if they are implemented in the form of software functional units and sold or used as separate products. Based on such understanding, the technical solution of the present application may be substantially implemented or contributed by the prior art, or all or part of the technical solution may be embodied in a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, a network device, or the like) or a processor (processor) to execute all or part of the steps of the method according to the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.

The above description is only for the purpose of illustrating embodiments of the present application and is not intended to limit the scope of the present application, and all modifications of equivalent structures and equivalent processes, which are made according to the content of the present specification and the accompanying drawings, or which are directly or indirectly applied to other related technical fields, are also included in the scope of the present application.

Claims

1. A method for processing a feature map, the method comprising:

acquiring an input feature map; wherein the input feature map is a feature map obtained after processing by a convolutional network;

determining an attention weight of each channel of the input feature map based on the spatial dimension information and the channel information of the input feature map; the method comprises the following steps: performing dimensionality reduction processing and 1 x 1 convolution processing on the input feature map to generate a first feature map copy and a second feature map copy; performing 1 × 1 convolution processing on the input feature map to generate a third feature map copy; determining a channel weight of each channel of the input feature map based on the spatial dimension information and the channel information of the input feature map; multiplying each element of each channel in the first feature map copy with an element at a corresponding position in the second feature map copy respectively to obtain an intermediate value; weighting the intermediate value by using the channel weight of the corresponding channel to obtain the attention weight of the corresponding channel;

processing the input feature map by using the attention weight to obtain an output feature map; the method comprises the following steps: summing the attention weights of each channel, and performing normalization processing to obtain an attention diagram; obtaining an intermediate feature map based on the third feature map copy and the attention map; and obtaining an output characteristic diagram based on the input characteristic diagram and the intermediate characteristic diagram.

2. The method of claim 1,

the determining the channel weight of each channel of the input feature map based on the spatial dimension information and the channel information of the input feature map comprises:

and carrying out global average pooling processing and full convolution processing on the input feature map to obtain the channel weight of each channel of the input feature map.

3. An image processing method, characterized in that the method comprises:

acquiring an image to be processed;

encoding the image to be processed to obtain an input characteristic diagram;

extracting semantic information from the input feature map by using the method according to any one of claims 1-2 to obtain an output feature map;

and decoding the output characteristic graph to obtain a processed image.

4. An image processing apparatus, characterized in that the image processing apparatus comprises a processor and a memory connected to each other, the memory being adapted to store program data, the processor being adapted to execute the program data to implement the method according to any of claims 1-3.

5. An image processing apparatus characterized by comprising:

The encoding module is used for encoding the image to be processed to obtain an input characteristic diagram;

a processing module, configured to perform semantic information extraction on the input feature map by using the method according to any one of claims 1-2 to obtain an output feature map;

and the decoding module is used for decoding the output characteristic diagram to obtain a processed image.

6. The apparatus of claim 5,

the encoding module includes:

a plurality of convolutional layers;

a downsampling layer;

combining a plurality of convolution layers;

the decoding module includes:

the first upper sampling layer is connected with the processing module;

a data connection layer connecting a first one of the plurality of convolution layer combinations and the upsampling layer, for connecting data output by the first one of the plurality of convolution layer combinations and the upsampling layer;

a plurality of convolutional layers;

a second upsampling layer.

7. A computer storage medium, characterized in that program data are stored in the computer storage medium, which program data, when being executed by a processor, are adapted to carry out the method of any one of claims 1-3.