CN115936067A

CN115936067A - Neural network with ECA channel attention mechanism

Info

Publication number: CN115936067A
Application number: CN202211539305.0A
Authority: CN
Inventors: 谢宇嘉; 王晓峰; 李悦; 周辉; 赵雄波; 张辉; 吴松龄; 李晓敏; 杨钧宇; 路坤峰; 张隽; 丛龙剑; 盖一帆; 李山山; 吴敏; 林玉野; 靳蕊溪
Original assignee: Beijing Aerospace Automatic Control Research Institute
Current assignee: Beijing Aerospace Automatic Control Research Institute
Priority date: 2022-12-01
Filing date: 2022-12-01
Publication date: 2023-04-07
Also published as: WO2024113945A1

Abstract

The present disclosure relates to a neural network with an ECA channel attention mechanism, the neural network comprising an ECA channel attention device, the ECA channel attention device comprising: a first hierarchical quantization unit configured to perform hierarchical quantization on the input data and convert floating-point input data into fixed-point input data; in the first-level quantization module, the whole input tensor shares one quantization step size and one quantization zero point; a channel-level quantization unit that performs hierarchical quantization on the output of the active layer, the channel-level quantization module calculating a quantization step and a quantization zero separately for each channel; and the first layer level quantized output data and the channel level quantized output data are subjected to channel weighted multiplication calculation. The method and the device have the advantages that the lossless precision output is carried out on the result of the one-dimensional convolution level, the activation layer module is quantized along the channel direction, and the problem of model precision reduction is solved by using a scheme of level quantization operation on other data.

Description

Neural network with ECA channel attention mechanism

Technical Field

The patent belongs to the field of deep learning, and particularly relates to a neural network with an ECA channel attention mechanism, which is an ECA channel attention mechanism.

Background

The channel attention mechanism has proven to have great potential in improving the performance of deep convolutional neural networks. For different tasks, an ECA (Efficient Channel Attention) Channel Attention mechanism is added to backbone networks such as ResNet and MobileNet V2, so that the performance of the model can be effectively improved. The performance of the current application can also be improved by adding an ECA attention mechanism into a backbone network and deploying the ECA attention mechanism at the edge end to realize inference at the edge end.

A large number of parameters and calculations are introduced into the current deep learning network model, and the consumption of computing resources is large. In the use of a terminal, the storage space and the computing resources are strictly limited, so that the deployment of a deep learning network model on a hardware platform becomes a difficult point.

To reduce the computational consumption, one way of hardware processing is to use low precision data for the computation. The network model quantization is an operation process for converting trained data operated in the network model from high precision to low precision, and the model quantization can reduce memory and storage occupation, reduce power consumption and improve calculation speed. However, the reduced parameter bit width due to quantization often results in a loss of accuracy in model prediction.

For the attention module, the quantization of the model parameters and the feature values will have a great influence on the output result of the whole model, so that the accuracy requirement of the model cannot be met. Under the condition of reducing the storage requirement and the calculation force requirement of the model, the model precision is basically lossless, and a reasonable quantification scheme needs to be researched.

Disclosure of Invention

The present disclosure is made based on the above-mentioned needs of the prior art, and a technical problem to be solved by the present disclosure is to provide a method for quantifying an attention mechanism of an ECA channel, where accuracy of a model needs to be quantified without loss under the condition of reducing storage requirements and computational power requirements of the model according to application requirements.

In order to solve the above technical problem, the technical solution adopted by the present disclosure includes:

a neural network having an ECA channel attention mechanism, the neural network comprising an EAC channel attention device, the EAC channel attention device comprising: the first hierarchical quantization unit is used for hierarchically quantizing the input data and converting floating-point input data into fixed-point input data; in the first-level quantization module, the whole input tensor shares one quantization step size and one quantization zero point; the global average pooling layer is used for performing global average pooling on the data processed by the first-layer quantization module; the convolution layer is connected with the global average pooling layer, comprises convolution kernels of a plurality of channels and is used for performing convolution calculation on data in the global average pooling layer; the activation layer is connected with the convolution kernel group in a full-precision data mode; a channel-level quantization unit that performs hierarchical quantization on the output of the active layer, the channel-level quantization module calculating a quantization step and a quantization zero separately for each channel; and the channel multiplication weighting module is used for carrying out channel weighting multiplication on the first-layer quantized output data and the channel-level quantized output data.

Preferably, the active layer comprises a sigmoid active layer, and full-precision data transfer is adopted between the convolutional layer and the sigmoid active layer; the Sigmoid function is calculated using high precision floating point numbers.

Preferably, the convolutional layers are convolutional layers of one-dimensional convolutional kernels.

Preferably, the neural network with the ECA channel attention mechanism further comprises a second-level quantization unit, wherein the second-level quantization unit is arranged between the global average pooling layer and the convolutional layer and quantizes data output by the global average pooling layer.

Preferably, the neural network with the ECA channel attention mechanism further comprises a third-level quantization unit, wherein the third-level quantization unit is located before the output end of the neural network with the ECA channel attention mechanism and quantizes the output result before outputting the result.

Preferably, the quantization processing of the data by the hierarchical quantization unit includes: preprocessing data needing to be quantized, and performing two rounds of forward reasoning calculation on input data of a hierarchical quantization module to complete distribution statistics on the input data and obtain a statistical histogram, wherein the distribution statistics of the input data comprises obtaining a maximum value of the data, a minimum value of the data, and a mean value and a variance of the data; calculating a truncation range, and determining the data truncation range according to the constraint price adjustment of the application environment after the data distribution obtained by preprocessing is completed; and calculating the quantization step length and the quantization zero point of the parameters required by quantization according to the data truncation range and the hardware configuration required to be deployed.

Preferably, the channel-level quantization unit performs quantization processing on the data, including: preprocessing data needing to be quantized, and performing two rounds of forward reasoning calculation on input data of a hierarchical quantization module to complete distribution statistics of the input data and obtain a statistical histogram, wherein the distribution statistics of the input data comprises the maximum value of the obtained data, the minimum value of the data, and the mean value and the variance of the data; calculating a truncation range, and determining the data truncation range according to the constraint price adjustment of the application environment after the data distribution obtained by preprocessing is completed; and calculating the quantization step size and the quantization zero point of parameters required by quantization according to the data truncation range and the hardware configuration required to be deployed.

Preferably, only forward estimation is performed and no back propagation is performed when preprocessing data that needs to be quantized.

Preferably, the ECA apparatus includes a multiplication module, the channel-level quantization unit is located in the multiplication module, and the weighting result output by the active layer is quantized hierarchically.

Preferably, the ECA channel attention module is implemented by an FPGA.

According to the technical scheme, lossless precision output is carried out on the result of the dimensional convolution hierarchy, the activation layer module is quantized along the channel direction, and the problem of model precision reduction is solved by using a hierarchical quantization operation scheme for other data. According to the scheme, the precision loss caused by quantization can be reduced while the storage space and the data communication bandwidth requirement of the feature map during model forward reasoning are reduced and the calculation efficiency is improved.

Drawings

In order to more clearly illustrate the embodiments of the present specification or the technical solutions in the prior art, the drawings needed to be used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments described in the embodiments of the present specification, and other drawings can be obtained by those skilled in the art according to the drawings.

FIG. 1 is a schematic diagram of an ECA channel attention device having a neural network with an ECA channel attention mechanism in accordance with an embodiment of the present disclosure;

FIG. 2 is a process diagram of a quantization operation in an embodiment of the present disclosure;

FIG. 3 is a diagram illustrating the common step size and zero of different tensors in hierarchical quantization according to an embodiment of the present disclosure;

fig. 4 is a diagram illustrating the use of different step sizes and zeros for different tensors in channel level quantization in accordance with an embodiment of the present disclosure.

Detailed Description

In order to make the objects, technical solutions and advantages of the embodiments of the present application clearer, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are some embodiments of the present application, but not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

In the description of the embodiments of the present disclosure, it should be noted that, unless otherwise explicitly specified or limited, the term "connected" should be interpreted broadly, e.g., as being fixed or detachable, or integrally connected, mechanically or electrically; may be directly connected or indirectly connected through an intermediate. The specific meaning of the above terms in the present disclosure can be understood by those of ordinary skill in the art as appropriate.

For the purpose of facilitating understanding of the embodiments of the present application, the following description will be made in terms of specific embodiments with reference to the accompanying drawings, which are not intended to limit the embodiments of the present application.

The present embodiments provide a quantification method for the ECA channel attention mechanism.

As shown in fig. 1, the square is an internal operator of the ECA channel attention module, and the rectangle is a module for quantization processing.

Specifically, in the present embodiment, the input data of the ECA module, the output data of the ECA module, and the data transmitted between the internal operators in the ECA module all need to be quantized. In this embodiment, the quantization processing module includes the following hierarchical quantization step and channel level quantization step.

The hierarchical quantization step can be applied to the input of the ECA module (i.e. the input of the global average pooling layer), the output of the global average pooling layer and the output of the multiplication (i.e. the output of the ECA module) in a hierarchical quantization mode.

The channel-level quantization operation comprises applying a channel-level quantization operation at the output of the Sigmoid active layer. And the full-precision data transmission is carried out between the one-dimensional convolution layer and the Sigmoid active layer.

Wherein the entire input tensor in the hierarchical quantization operation shares one quantization step and quantization zero, see fig. 3. The method comprises the following steps:

s1001 carries out preprocessing on data needing quantization

In the step, a trained floating point model and a trained data set are used, and input data of the hierarchical quantization module is subjected to two rounds of forward reasoning calculation to complete distribution statistics of the input data and obtain a statistical histogram. In this step, the ECA module only completes forward reasoning and does not calculate reverberation propagation, and does not update the weight parameters.

Wherein the distribution statistics of the input data comprise obtaining a maximum value r of the data _max Data minimum value r _min Mean m and variance σ of the data ² . In the subsequent process, the quantized data range is revised according to the data distribution.

As shown in the first two blocks in fig. 2. At this stage, the model only completes forward reasoning, does not calculate back propagation, and does not update the weight parameters. If there is a normalization layer in the network model at this stage, the maximum and maximum values of the data can be obtainedSmall value of [ r ] _min ，r _max ]Mean m and variance σ of the data ² . In subsequent processes, the quantized data range is revised according to the data distribution.

S1002 calculating a truncation Range

After preprocessing is finished and data distribution is obtained, factors such as application scenes, hardware constraint conditions, total data, processing precision, mean value and variance of data, influence of a current layer in a network model and the like are comprehensively considered, and different schemes are selected to determine an upper limit [ r 'of data truncation' _min ，r′ _max ]Including selecting a truncation range (KL divergence) as a percentage of the amount of data that needs to be truncated.

For example: when selecting the truncation range as a percentage of the amount of data to be truncated, it is contemplated that for the input of the ECA, a range of 95% of the data is selected and the remaining 5% is truncated. For the output of the global pooling layer (GAP), because each channel is a tensor of 1 × 1, a range of 100% of data can be considered, and all data can be retained as much as possible.

When hardware constraints are taken into consideration, if only a symmetric quantization mode can be used and the upper and lower limits need to be symmetric around 0 point, the maximum value r 'of the truncation range is set' _max ＝MAX(|r′ _nax |，|r′ _nin |). From this, it can be determined that the upper and lower limits of truncation are [ -r' _max ，r′ _max ]；

When regular quantization is required, for example, the upper and lower limits are [ -r' _max ，r′ _max ]Within a certain range, the quantization range [ r ] can be obtained by appropriately enlarging/reducing the truncation range _min ，r _max ]。

S1003 calculating quantization parameter

Limiting the range after quantification to [ q _min ，q _max ]. For example, when 8bit symmetric quantization is used, r _min ＝-128，r _max =127. And calculating the quantization step size S and the quantization zero z of parameters required by quantization according to the data truncation range obtained in the step S1002 and the hardware configuration required to be deployed. Assuming that the hardware supports the calculation of b bits, the quantization step sIs calculated as follows:

if the hardware has other constraint conditions, the quantization parameter can be adjusted:

for example, part of FPGA hardware only supports the regular quantization calculation with the quantization step size of integer power of 2. The following may be performed for s:

and ceil is the lowest integer value of the returned floating-point type parameter, and floor is the lowest integer value of the returned floating-point type parameter.

The quantization zero is selected as

Where Round is rounding off the truncated decimal place; similar to the quantization step, the quantization zero may also be adjusted according to hardware, for example, for hardware supporting only symmetric quantization, 0 may be selected as the quantization zero.

S1004 performs quantization calculation according to the determined parameters

In the step, for verifying the running condition of the neural network model on fixed point equipment, in the algorithm forward reasoning and backward propagation process, analog quantization processing is adopted, and the specific calculation comprises the following steps:

firstly, dividing data x to be quantized by a quantization step length s, and subtracting a quantization zero point z to obtain first intermediate data x', wherein a calculation formula is as follows

Then, rounding the x' data obtained by the previous activation to obtain second intermediate data x ″, where rounding may be performed in a rounding manner, or in a rounding manner downward or upward or in another rounding manner, and a specific formula of the rounding method is as follows: x "= SIGN (x ')/floor (ABS (x') + 0.5), where floor is the lowest integer value of the return floating point type parameter, ABS is the absolute value of the return value, and SIGN is a flag that the return value is positive or negative.

Activating the acquired x in the last step ₁ Data saturation processing is carried out to calculate a final quantization result x _q The concrete formula is as follows

/>

The quantized result x _q The inverse quantization is finished by adding a quantization zero z and multiplying by a quantization step length s to obtain an inverse quantization result

Converting fixed point data quantized in the early stage into a single-precision data field, and continuing to carry out forward reasoning and backward propagation of the model, wherein the specific formula is as follows>

Channel level quantization requires that each channel calculates a quantization step and a quantization zero separately, see fig. 4. Similar to hierarchical quantization, the above steps are performed once for each channel to obtain a quantization step and a quantization zero for each channel. And then different calculations are performed on the data x of each channel. In this section, assume that there are c channels, i ∈ [0,c-1]

The first stage of S2001 is a preprocessing stage. In the preprocessing stage, the trained floating point model and the trained data set are used, and the input data of the hierarchical quantization module is subjected to two rounds of forward reasoning calculation to complete the distribution statistics of the input data and obtain a statistical histogram, as shown in the first two modules in fig. 2. At this stage, the model only completes forward reasoning, does not calculate back propagation, and does not update the weight parameters. At this stage, the maximum and minimum values of the data can be obtained[R _min ，R _max ]Mean M and variance Σ of the data ² . In subsequent processes, the quantized data range is revised according to the data distribution.

The second stage of S2002 is calculation of a truncation range, after preprocessing is completed and data distribution is obtained, factors such as an application scenario, hardware constraint conditions, a total amount of data, processing accuracy, a mean value and a variance of data, and an influence of a current layer in a network model are comprehensively considered, and different schemes are selected to determine an upper limit [ R 'of data truncation' _min ，R′ _min ]，R′ _min And R' _max Being a tensor of length c includes, but is not limited to, the following two methods, and so on:

selecting a truncation range according to the percentage of the data amount to be truncated

Dividing the truncation range into multiple intervals, and selecting the percentage of the truncation range in the total data range

For example:

for the output of the ECA, a range of 100% of the data can be considered to be chosen, keeping as much data as possible, due to the 1x1 tensor per channel.

Considering the constraint conditions of the hardware, if only a symmetric quantization mode can be used, the upper and lower limits need to be symmetric around the 0 point, and for each channel c, the following calculation is needed:

R′ _i，max ＝MAX(|R′ _i，max |，|R′ _i，min |)

upper and lower limits of truncation are [ -R' _i，max ，R′ _i，max ]

In the case of regular quantization, the truncation range can be appropriately widened/narrowed within a certain range, since the quantization step size needs to be an integer power of 2.

The third stage of S2003 is the calculation of quantization parameters.

Assume a quantized range of [ Q ] _min ，Q _max ]. For example, Q when channel i is symmetrically quantized with 8 bits _i，min ＝-128，Q _i，max ＝127

And calculating a parameter quantization step size s and a quantization zero point z required by quantization according to the data truncation range obtained in the step 2) and the hardware configuration required to be deployed. For channel i, the quantization step s is calculated as follows:

/>

the quantization zero is selected as

Similar to the quantization step, the quantization zero may also be adjusted according to hardware, for example, for hardware supporting only symmetric quantization, 0 may be selected as the quantization zero.

The fourth stage of S2004 is quantization calculation on the input.

In order to verify the running condition of the neural network model on fixed point equipment, in the process of algorithm forward reasoning and backward propagation, the simulation quantization processing is adopted, and for a channel i, the specific calculation is as follows

By data x to be quantized _i Divided by the quantisation step size s _i Subtracting the zero point z of quantization _i To give x' _i The calculation formula is as follows

Rounding the data obtained by the previous activation, wherein rounding can be performed in a rounding mode, or in a rounding mode downwards or upwards or in other rounding modes, and a specific formula of the rounding method is as follows:

x″ _i ＝sign(x′ _i )*floor(abs(x′ _i )+0.5)

wherein, floor is the lowest integer value of the return floating point type parameter, ABS is the absolute value of the return numerical value, and SIGN is a mark that the return numerical value is positive or negative.

S2003: activating the acquired x in the last step ₁ Data saturation processing is carried out to calculate final quantization result x _q The concrete formula is as follows

S2004: the quantized result x _q The inverse quantization processing is completed by adding a quantization zero point z and multiplying the zero point z by a quantization step length s, the fixed point data after the early quantization is converted into a single precision data domain, and the forward reasoning and the backward propagation of the model can be continuously carried out, wherein the specific formula is as follows

Considering that the quantization operation has a large influence on the attention mechanism, full-precision data transfer is adopted between the one-dimensional convolution layer and the sigmoid activation layer. The Sigmoid function is calculated using high precision floating point numbers. In the multiplication module of the ECA, the output result of the weight, namely the Sigmoid, is quantized in a channel level mode, and the feature map (the input of the ECA module) and the output feature map are quantized in a level mode.

The weight input is Sigmoid output, channel level quantization is adopted, each channel has different quantization step length, and the specific method is as above

The characteristic value input is the input of an ECA module (global average pooling layer), and hierarchical quantization is adopted, and the specific method is as above.

Aiming at the problem of quantization of an ECA module, the invention provides a scheme of performing lossless precision output on one-dimensional convolution, quantizing a Sigmoid module along a channel direction, and performing hierarchical quantization operation on other data to solve the problem of reduced model precision. The method can reduce the precision loss caused by quantization while reducing the storage space of the feature map and the data communication bandwidth requirement during the model forward reasoning and improving the calculation efficiency.

Claims

1. A neural network having an ECA channel attention mechanism, the neural network comprising an ECA channel attention device, the ECA channel attention device comprising:

the first hierarchical quantization unit is used for hierarchically quantizing the input data and converting floating-point input data into fixed-point input data; in the first-level quantization module, the whole input tensor shares one quantization step size and one quantization zero point;

the global average pooling layer is used for performing global average pooling on the data processed by the first-layer quantization module;

the convolution layer is connected with the global average pooling layer and comprises convolution kernels of a plurality of channels, and convolution calculation is carried out on data in the global average pooling layer respectively;

the active layer is in full-precision data connection with the convolutional layer;

a channel level quantization unit for performing level quantization on the output of the active layer, wherein the channel level quantization module calculates a quantization step and a quantization zero for each channel;

and the channel multiplication weighting module is used for carrying out channel weighting multiplication on the first-layer quantized output data and the channel-level quantized output data.

2. The neural network with an ECA channel attention mechanism of claim 1, wherein said activation layer comprises a sigmoid activation layer, full precision data transfer is employed between convolutional layer and sigmoid activation layer; the Sigmoid function is calculated using high precision floating point numbers.

3. The neural network with an ECA channel attention mechanism as claimed in claim 1, wherein said convolutional layer is a convolutional layer of one-dimensional convolutional kernels, and said convolutional kernels are hierarchically quantized.

4. The neural network with the ECA channel attention mechanism as claimed in claim 1, further comprising a second-level quantization unit, wherein the second-level quantization unit is disposed between the global average pooling layer and the convolutional layer, and quantizes data output by the global average pooling layer.

5. The neural network with the ECA channel attention mechanism as claimed in claim 4, wherein the neural network with the ECA channel attention mechanism further comprises a third level quantization unit, the third level quantization unit is located before the output end of the neural network with the ECA channel attention mechanism, and output results are quantized before the output results are output.

6. The neural network with the ECA channel attention mechanism as claimed in claim 1, wherein the hierarchical quantization unit performing quantization processing on the data comprises:

preprocessing data needing to be quantized, and performing two rounds of forward reasoning calculation on input data of a hierarchical quantization module to complete distribution statistics on the input data and obtain a statistical histogram, wherein the distribution statistics of the input data comprises obtaining a maximum value of the data, a minimum value of the data, and a mean value and a variance of the data;

calculating a truncation range, and determining the data truncation range according to the constraint price adjustment of the application environment after the preprocessing is completed to obtain the data distribution;

and calculating the quantization step length and the quantization zero point of the parameters required by quantization according to the data truncation range and the hardware configuration required to be deployed.

7. The neural network with the ECA channel attention mechanism as claimed in claim 6, wherein the channel level quantization unit performing quantization processing on the data comprises:

8. The neural network with an ECA channel attention mechanism as claimed in claim 6 or 7, wherein the data to be quantified is preprocessed by forward estimation and not back propagated.

9. The neural network with the ECA channel attention mechanism as claimed in claim 1, wherein the ECA device comprises a multiplication module, the channel level quantization unit is located in the multiplication module, and the weight result output by the activation layer is quantized in a hierarchy mode.

10. The neural network with an ECA channel attention mechanism of claim 1, wherein said ECA channel attention module is implemented by an FPGA.