CN115936067A - Neural network with ECA channel attention mechanism - Google Patents

Neural network with ECA channel attention mechanism Download PDF

Info

Publication number
CN115936067A
CN115936067A CN202211539305.0A CN202211539305A CN115936067A CN 115936067 A CN115936067 A CN 115936067A CN 202211539305 A CN202211539305 A CN 202211539305A CN 115936067 A CN115936067 A CN 115936067A
Authority
CN
China
Prior art keywords
data
quantization
channel
eca
neural network
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211539305.0A
Other languages
Chinese (zh)
Inventor
谢宇嘉
王晓峰
李悦
周辉
赵雄波
张辉
吴松龄
李晓敏
杨钧宇
路坤峰
张隽
丛龙剑
盖一帆
李山山
吴敏
林玉野
靳蕊溪
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Aerospace Automatic Control Research Institute
Original Assignee
Beijing Aerospace Automatic Control Research Institute
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Aerospace Automatic Control Research Institute filed Critical Beijing Aerospace Automatic Control Research Institute
Priority to CN202211539305.0A priority Critical patent/CN115936067A/en
Publication of CN115936067A publication Critical patent/CN115936067A/en
Priority to PCT/CN2023/111615 priority patent/WO2024113945A1/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/0464Convolutional networks [CNN, ConvNet]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/048Activation functions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Compression, Expansion, Code Conversion, And Decoders (AREA)

Abstract

The present disclosure relates to a neural network with an ECA channel attention mechanism, the neural network comprising an ECA channel attention device, the ECA channel attention device comprising: a first hierarchical quantization unit configured to perform hierarchical quantization on the input data and convert floating-point input data into fixed-point input data; in the first-level quantization module, the whole input tensor shares one quantization step size and one quantization zero point; a channel-level quantization unit that performs hierarchical quantization on the output of the active layer, the channel-level quantization module calculating a quantization step and a quantization zero separately for each channel; and the first layer level quantized output data and the channel level quantized output data are subjected to channel weighted multiplication calculation. The method and the device have the advantages that the lossless precision output is carried out on the result of the one-dimensional convolution level, the activation layer module is quantized along the channel direction, and the problem of model precision reduction is solved by using a scheme of level quantization operation on other data.

Description

Neural network with ECA channel attention mechanism
Technical Field
The patent belongs to the field of deep learning, and particularly relates to a neural network with an ECA channel attention mechanism, which is an ECA channel attention mechanism.
Background
The channel attention mechanism has proven to have great potential in improving the performance of deep convolutional neural networks. For different tasks, an ECA (Efficient Channel Attention) Channel Attention mechanism is added to backbone networks such as ResNet and MobileNet V2, so that the performance of the model can be effectively improved. The performance of the current application can also be improved by adding an ECA attention mechanism into a backbone network and deploying the ECA attention mechanism at the edge end to realize inference at the edge end.
A large number of parameters and calculations are introduced into the current deep learning network model, and the consumption of computing resources is large. In the use of a terminal, the storage space and the computing resources are strictly limited, so that the deployment of a deep learning network model on a hardware platform becomes a difficult point.
To reduce the computational consumption, one way of hardware processing is to use low precision data for the computation. The network model quantization is an operation process for converting trained data operated in the network model from high precision to low precision, and the model quantization can reduce memory and storage occupation, reduce power consumption and improve calculation speed. However, the reduced parameter bit width due to quantization often results in a loss of accuracy in model prediction.
For the attention module, the quantization of the model parameters and the feature values will have a great influence on the output result of the whole model, so that the accuracy requirement of the model cannot be met. Under the condition of reducing the storage requirement and the calculation force requirement of the model, the model precision is basically lossless, and a reasonable quantification scheme needs to be researched.
Disclosure of Invention
The present disclosure is made based on the above-mentioned needs of the prior art, and a technical problem to be solved by the present disclosure is to provide a method for quantifying an attention mechanism of an ECA channel, where accuracy of a model needs to be quantified without loss under the condition of reducing storage requirements and computational power requirements of the model according to application requirements.
In order to solve the above technical problem, the technical solution adopted by the present disclosure includes:
a neural network having an ECA channel attention mechanism, the neural network comprising an EAC channel attention device, the EAC channel attention device comprising: the first hierarchical quantization unit is used for hierarchically quantizing the input data and converting floating-point input data into fixed-point input data; in the first-level quantization module, the whole input tensor shares one quantization step size and one quantization zero point; the global average pooling layer is used for performing global average pooling on the data processed by the first-layer quantization module; the convolution layer is connected with the global average pooling layer, comprises convolution kernels of a plurality of channels and is used for performing convolution calculation on data in the global average pooling layer; the activation layer is connected with the convolution kernel group in a full-precision data mode; a channel-level quantization unit that performs hierarchical quantization on the output of the active layer, the channel-level quantization module calculating a quantization step and a quantization zero separately for each channel; and the channel multiplication weighting module is used for carrying out channel weighting multiplication on the first-layer quantized output data and the channel-level quantized output data.
Preferably, the active layer comprises a sigmoid active layer, and full-precision data transfer is adopted between the convolutional layer and the sigmoid active layer; the Sigmoid function is calculated using high precision floating point numbers.
Preferably, the convolutional layers are convolutional layers of one-dimensional convolutional kernels.
Preferably, the neural network with the ECA channel attention mechanism further comprises a second-level quantization unit, wherein the second-level quantization unit is arranged between the global average pooling layer and the convolutional layer and quantizes data output by the global average pooling layer.
Preferably, the neural network with the ECA channel attention mechanism further comprises a third-level quantization unit, wherein the third-level quantization unit is located before the output end of the neural network with the ECA channel attention mechanism and quantizes the output result before outputting the result.
Preferably, the quantization processing of the data by the hierarchical quantization unit includes: preprocessing data needing to be quantized, and performing two rounds of forward reasoning calculation on input data of a hierarchical quantization module to complete distribution statistics on the input data and obtain a statistical histogram, wherein the distribution statistics of the input data comprises obtaining a maximum value of the data, a minimum value of the data, and a mean value and a variance of the data; calculating a truncation range, and determining the data truncation range according to the constraint price adjustment of the application environment after the data distribution obtained by preprocessing is completed; and calculating the quantization step length and the quantization zero point of the parameters required by quantization according to the data truncation range and the hardware configuration required to be deployed.
Preferably, the channel-level quantization unit performs quantization processing on the data, including: preprocessing data needing to be quantized, and performing two rounds of forward reasoning calculation on input data of a hierarchical quantization module to complete distribution statistics of the input data and obtain a statistical histogram, wherein the distribution statistics of the input data comprises the maximum value of the obtained data, the minimum value of the data, and the mean value and the variance of the data; calculating a truncation range, and determining the data truncation range according to the constraint price adjustment of the application environment after the data distribution obtained by preprocessing is completed; and calculating the quantization step size and the quantization zero point of parameters required by quantization according to the data truncation range and the hardware configuration required to be deployed.
Preferably, only forward estimation is performed and no back propagation is performed when preprocessing data that needs to be quantized.
Preferably, the ECA apparatus includes a multiplication module, the channel-level quantization unit is located in the multiplication module, and the weighting result output by the active layer is quantized hierarchically.
Preferably, the ECA channel attention module is implemented by an FPGA.
According to the technical scheme, lossless precision output is carried out on the result of the dimensional convolution hierarchy, the activation layer module is quantized along the channel direction, and the problem of model precision reduction is solved by using a hierarchical quantization operation scheme for other data. According to the scheme, the precision loss caused by quantization can be reduced while the storage space and the data communication bandwidth requirement of the feature map during model forward reasoning are reduced and the calculation efficiency is improved.
Drawings
In order to more clearly illustrate the embodiments of the present specification or the technical solutions in the prior art, the drawings needed to be used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments described in the embodiments of the present specification, and other drawings can be obtained by those skilled in the art according to the drawings.
FIG. 1 is a schematic diagram of an ECA channel attention device having a neural network with an ECA channel attention mechanism in accordance with an embodiment of the present disclosure;
FIG. 2 is a process diagram of a quantization operation in an embodiment of the present disclosure;
FIG. 3 is a diagram illustrating the common step size and zero of different tensors in hierarchical quantization according to an embodiment of the present disclosure;
fig. 4 is a diagram illustrating the use of different step sizes and zeros for different tensors in channel level quantization in accordance with an embodiment of the present disclosure.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present application clearer, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are some embodiments of the present application, but not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
In the description of the embodiments of the present disclosure, it should be noted that, unless otherwise explicitly specified or limited, the term "connected" should be interpreted broadly, e.g., as being fixed or detachable, or integrally connected, mechanically or electrically; may be directly connected or indirectly connected through an intermediate. The specific meaning of the above terms in the present disclosure can be understood by those of ordinary skill in the art as appropriate.
For the purpose of facilitating understanding of the embodiments of the present application, the following description will be made in terms of specific embodiments with reference to the accompanying drawings, which are not intended to limit the embodiments of the present application.
The present embodiments provide a quantification method for the ECA channel attention mechanism.
As shown in fig. 1, the square is an internal operator of the ECA channel attention module, and the rectangle is a module for quantization processing.
Specifically, in the present embodiment, the input data of the ECA module, the output data of the ECA module, and the data transmitted between the internal operators in the ECA module all need to be quantized. In this embodiment, the quantization processing module includes the following hierarchical quantization step and channel level quantization step.
The hierarchical quantization step can be applied to the input of the ECA module (i.e. the input of the global average pooling layer), the output of the global average pooling layer and the output of the multiplication (i.e. the output of the ECA module) in a hierarchical quantization mode.
The channel-level quantization operation comprises applying a channel-level quantization operation at the output of the Sigmoid active layer. And the full-precision data transmission is carried out between the one-dimensional convolution layer and the Sigmoid active layer.
Wherein the entire input tensor in the hierarchical quantization operation shares one quantization step and quantization zero, see fig. 3. The method comprises the following steps:
s1001 carries out preprocessing on data needing quantization
In the step, a trained floating point model and a trained data set are used, and input data of the hierarchical quantization module is subjected to two rounds of forward reasoning calculation to complete distribution statistics of the input data and obtain a statistical histogram. In this step, the ECA module only completes forward reasoning and does not calculate reverberation propagation, and does not update the weight parameters.
Wherein the distribution statistics of the input data comprise obtaining a maximum value r of the data max Data minimum value r min Mean m and variance σ of the data 2 . In the subsequent process, the quantized data range is revised according to the data distribution.
As shown in the first two blocks in fig. 2. At this stage, the model only completes forward reasoning, does not calculate back propagation, and does not update the weight parameters. If there is a normalization layer in the network model at this stage, the maximum and maximum values of the data can be obtainedSmall value of [ r ] min ,r max ]Mean m and variance σ of the data 2 . In subsequent processes, the quantized data range is revised according to the data distribution.
S1002 calculating a truncation Range
After preprocessing is finished and data distribution is obtained, factors such as application scenes, hardware constraint conditions, total data, processing precision, mean value and variance of data, influence of a current layer in a network model and the like are comprehensively considered, and different schemes are selected to determine an upper limit [ r 'of data truncation' min ,r′ max ]Including selecting a truncation range (KL divergence) as a percentage of the amount of data that needs to be truncated.
For example: when selecting the truncation range as a percentage of the amount of data to be truncated, it is contemplated that for the input of the ECA, a range of 95% of the data is selected and the remaining 5% is truncated. For the output of the global pooling layer (GAP), because each channel is a tensor of 1 × 1, a range of 100% of data can be considered, and all data can be retained as much as possible.
When hardware constraints are taken into consideration, if only a symmetric quantization mode can be used and the upper and lower limits need to be symmetric around 0 point, the maximum value r 'of the truncation range is set' max =MAX(|r′ nax |,|r′ nin |). From this, it can be determined that the upper and lower limits of truncation are [ -r' max ,r′ max ];
When regular quantization is required, for example, the upper and lower limits are [ -r' max ,r′ max ]Within a certain range, the quantization range [ r ] can be obtained by appropriately enlarging/reducing the truncation range min ,r max ]。
S1003 calculating quantization parameter
Limiting the range after quantification to [ q min ,q max ]. For example, when 8bit symmetric quantization is used, r min =-128,r max =127. And calculating the quantization step size S and the quantization zero z of parameters required by quantization according to the data truncation range obtained in the step S1002 and the hardware configuration required to be deployed. Assuming that the hardware supports the calculation of b bits, the quantization step sIs calculated as follows:
Figure SMS_1
if the hardware has other constraint conditions, the quantization parameter can be adjusted:
for example, part of FPGA hardware only supports the regular quantization calculation with the quantization step size of integer power of 2. The following may be performed for s:
Figure SMS_2
Figure SMS_3
Figure SMS_4
and ceil is the lowest integer value of the returned floating-point type parameter, and floor is the lowest integer value of the returned floating-point type parameter.
The quantization zero is selected as
Figure SMS_5
Where Round is rounding off the truncated decimal place; similar to the quantization step, the quantization zero may also be adjusted according to hardware, for example, for hardware supporting only symmetric quantization, 0 may be selected as the quantization zero.
S1004 performs quantization calculation according to the determined parameters
In the step, for verifying the running condition of the neural network model on fixed point equipment, in the algorithm forward reasoning and backward propagation process, analog quantization processing is adopted, and the specific calculation comprises the following steps:
firstly, dividing data x to be quantized by a quantization step length s, and subtracting a quantization zero point z to obtain first intermediate data x', wherein a calculation formula is as follows
Figure SMS_6
Then, rounding the x' data obtained by the previous activation to obtain second intermediate data x ″, where rounding may be performed in a rounding manner, or in a rounding manner downward or upward or in another rounding manner, and a specific formula of the rounding method is as follows: x "= SIGN (x ')/floor (ABS (x') + 0.5), where floor is the lowest integer value of the return floating point type parameter, ABS is the absolute value of the return value, and SIGN is a flag that the return value is positive or negative.
Activating the acquired x in the last step 1 Data saturation processing is carried out to calculate a final quantization result x q The concrete formula is as follows
Figure SMS_7
/>
The quantized result x q The inverse quantization is finished by adding a quantization zero z and multiplying by a quantization step length s to obtain an inverse quantization result
Figure SMS_8
Converting fixed point data quantized in the early stage into a single-precision data field, and continuing to carry out forward reasoning and backward propagation of the model, wherein the specific formula is as follows>
Figure SMS_9
Channel level quantization requires that each channel calculates a quantization step and a quantization zero separately, see fig. 4. Similar to hierarchical quantization, the above steps are performed once for each channel to obtain a quantization step and a quantization zero for each channel. And then different calculations are performed on the data x of each channel. In this section, assume that there are c channels, i ∈ [0,c-1]
The first stage of S2001 is a preprocessing stage. In the preprocessing stage, the trained floating point model and the trained data set are used, and the input data of the hierarchical quantization module is subjected to two rounds of forward reasoning calculation to complete the distribution statistics of the input data and obtain a statistical histogram, as shown in the first two modules in fig. 2. At this stage, the model only completes forward reasoning, does not calculate back propagation, and does not update the weight parameters. At this stage, the maximum and minimum values of the data can be obtained[R min ,R max ]Mean M and variance Σ of the data 2 . In subsequent processes, the quantized data range is revised according to the data distribution.
The second stage of S2002 is calculation of a truncation range, after preprocessing is completed and data distribution is obtained, factors such as an application scenario, hardware constraint conditions, a total amount of data, processing accuracy, a mean value and a variance of data, and an influence of a current layer in a network model are comprehensively considered, and different schemes are selected to determine an upper limit [ R 'of data truncation' min ,R′ min ],R′ min And R' max Being a tensor of length c includes, but is not limited to, the following two methods, and so on:
selecting a truncation range according to the percentage of the data amount to be truncated
Dividing the truncation range into multiple intervals, and selecting the percentage of the truncation range in the total data range
For example:
for the output of the ECA, a range of 100% of the data can be considered to be chosen, keeping as much data as possible, due to the 1x1 tensor per channel.
Considering the constraint conditions of the hardware, if only a symmetric quantization mode can be used, the upper and lower limits need to be symmetric around the 0 point, and for each channel c, the following calculation is needed:
R′ i,max =MAX(|R′ i,max |,|R′ i,min |)
upper and lower limits of truncation are [ -R' i,max ,R′ i,max ]
In the case of regular quantization, the truncation range can be appropriately widened/narrowed within a certain range, since the quantization step size needs to be an integer power of 2.
The third stage of S2003 is the calculation of quantization parameters.
Assume a quantized range of [ Q ] min ,Q max ]. For example, Q when channel i is symmetrically quantized with 8 bits i,min =-128,Q i,max =127
And calculating a parameter quantization step size s and a quantization zero point z required by quantization according to the data truncation range obtained in the step 2) and the hardware configuration required to be deployed. For channel i, the quantization step s is calculated as follows:
Figure SMS_10
if the hardware has other constraint conditions, the quantization parameter can be adjusted:
for example, part of FPGA hardware only supports the regular quantization calculation with the quantization step size of integer power of 2. The following may be performed for s:
Figure SMS_11
/>
Figure SMS_12
Figure SMS_13
the quantization zero is selected as
Figure SMS_14
Similar to the quantization step, the quantization zero may also be adjusted according to hardware, for example, for hardware supporting only symmetric quantization, 0 may be selected as the quantization zero.
The fourth stage of S2004 is quantization calculation on the input.
In order to verify the running condition of the neural network model on fixed point equipment, in the process of algorithm forward reasoning and backward propagation, the simulation quantization processing is adopted, and for a channel i, the specific calculation is as follows
By data x to be quantized i Divided by the quantisation step size s i Subtracting the zero point z of quantization i To give x' i The calculation formula is as follows
Figure SMS_15
Rounding the data obtained by the previous activation, wherein rounding can be performed in a rounding mode, or in a rounding mode downwards or upwards or in other rounding modes, and a specific formula of the rounding method is as follows:
x″ i =sign(x′ i )*floor(abs(x′ i )+0.5)
wherein, floor is the lowest integer value of the return floating point type parameter, ABS is the absolute value of the return numerical value, and SIGN is a mark that the return numerical value is positive or negative.
S2003: activating the acquired x in the last step 1 Data saturation processing is carried out to calculate final quantization result x q The concrete formula is as follows
Figure SMS_16
S2004: the quantized result x q The inverse quantization processing is completed by adding a quantization zero point z and multiplying the zero point z by a quantization step length s, the fixed point data after the early quantization is converted into a single precision data domain, and the forward reasoning and the backward propagation of the model can be continuously carried out, wherein the specific formula is as follows
Figure SMS_17
Considering that the quantization operation has a large influence on the attention mechanism, full-precision data transfer is adopted between the one-dimensional convolution layer and the sigmoid activation layer. The Sigmoid function is calculated using high precision floating point numbers. In the multiplication module of the ECA, the output result of the weight, namely the Sigmoid, is quantized in a channel level mode, and the feature map (the input of the ECA module) and the output feature map are quantized in a level mode.
The weight input is Sigmoid output, channel level quantization is adopted, each channel has different quantization step length, and the specific method is as above
The characteristic value input is the input of an ECA module (global average pooling layer), and hierarchical quantization is adopted, and the specific method is as above.
Aiming at the problem of quantization of an ECA module, the invention provides a scheme of performing lossless precision output on one-dimensional convolution, quantizing a Sigmoid module along a channel direction, and performing hierarchical quantization operation on other data to solve the problem of reduced model precision. The method can reduce the precision loss caused by quantization while reducing the storage space of the feature map and the data communication bandwidth requirement during the model forward reasoning and improving the calculation efficiency.

Claims (10)

1. A neural network having an ECA channel attention mechanism, the neural network comprising an ECA channel attention device, the ECA channel attention device comprising:
the first hierarchical quantization unit is used for hierarchically quantizing the input data and converting floating-point input data into fixed-point input data; in the first-level quantization module, the whole input tensor shares one quantization step size and one quantization zero point;
the global average pooling layer is used for performing global average pooling on the data processed by the first-layer quantization module;
the convolution layer is connected with the global average pooling layer and comprises convolution kernels of a plurality of channels, and convolution calculation is carried out on data in the global average pooling layer respectively;
the active layer is in full-precision data connection with the convolutional layer;
a channel level quantization unit for performing level quantization on the output of the active layer, wherein the channel level quantization module calculates a quantization step and a quantization zero for each channel;
and the channel multiplication weighting module is used for carrying out channel weighting multiplication on the first-layer quantized output data and the channel-level quantized output data.
2. The neural network with an ECA channel attention mechanism of claim 1, wherein said activation layer comprises a sigmoid activation layer, full precision data transfer is employed between convolutional layer and sigmoid activation layer; the Sigmoid function is calculated using high precision floating point numbers.
3. The neural network with an ECA channel attention mechanism as claimed in claim 1, wherein said convolutional layer is a convolutional layer of one-dimensional convolutional kernels, and said convolutional kernels are hierarchically quantized.
4. The neural network with the ECA channel attention mechanism as claimed in claim 1, further comprising a second-level quantization unit, wherein the second-level quantization unit is disposed between the global average pooling layer and the convolutional layer, and quantizes data output by the global average pooling layer.
5. The neural network with the ECA channel attention mechanism as claimed in claim 4, wherein the neural network with the ECA channel attention mechanism further comprises a third level quantization unit, the third level quantization unit is located before the output end of the neural network with the ECA channel attention mechanism, and output results are quantized before the output results are output.
6. The neural network with the ECA channel attention mechanism as claimed in claim 1, wherein the hierarchical quantization unit performing quantization processing on the data comprises:
preprocessing data needing to be quantized, and performing two rounds of forward reasoning calculation on input data of a hierarchical quantization module to complete distribution statistics on the input data and obtain a statistical histogram, wherein the distribution statistics of the input data comprises obtaining a maximum value of the data, a minimum value of the data, and a mean value and a variance of the data;
calculating a truncation range, and determining the data truncation range according to the constraint price adjustment of the application environment after the preprocessing is completed to obtain the data distribution;
and calculating the quantization step length and the quantization zero point of the parameters required by quantization according to the data truncation range and the hardware configuration required to be deployed.
7. The neural network with the ECA channel attention mechanism as claimed in claim 6, wherein the channel level quantization unit performing quantization processing on the data comprises:
preprocessing data needing to be quantized, and performing two rounds of forward reasoning calculation on input data of a hierarchical quantization module to complete distribution statistics on the input data and obtain a statistical histogram, wherein the distribution statistics of the input data comprises obtaining a maximum value of the data, a minimum value of the data, and a mean value and a variance of the data;
calculating a truncation range, and determining the data truncation range according to the constraint price adjustment of the application environment after the preprocessing is completed to obtain the data distribution;
and calculating the quantization step length and the quantization zero point of the parameters required by quantization according to the data truncation range and the hardware configuration required to be deployed.
8. The neural network with an ECA channel attention mechanism as claimed in claim 6 or 7, wherein the data to be quantified is preprocessed by forward estimation and not back propagated.
9. The neural network with the ECA channel attention mechanism as claimed in claim 1, wherein the ECA device comprises a multiplication module, the channel level quantization unit is located in the multiplication module, and the weight result output by the activation layer is quantized in a hierarchy mode.
10. The neural network with an ECA channel attention mechanism of claim 1, wherein said ECA channel attention module is implemented by an FPGA.
CN202211539305.0A 2022-12-01 2022-12-01 Neural network with ECA channel attention mechanism Pending CN115936067A (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN202211539305.0A CN115936067A (en) 2022-12-01 2022-12-01 Neural network with ECA channel attention mechanism
PCT/CN2023/111615 WO2024113945A1 (en) 2022-12-01 2023-08-08 Neural network having efficient channel attention (eca) mechanism

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211539305.0A CN115936067A (en) 2022-12-01 2022-12-01 Neural network with ECA channel attention mechanism

Publications (1)

Publication Number Publication Date
CN115936067A true CN115936067A (en) 2023-04-07

Family

ID=86698667

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211539305.0A Pending CN115936067A (en) 2022-12-01 2022-12-01 Neural network with ECA channel attention mechanism

Country Status (2)

Country Link
CN (1) CN115936067A (en)
WO (1) WO2024113945A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2024113945A1 (en) * 2022-12-01 2024-06-06 北京航天自动控制研究所 Neural network having efficient channel attention (eca) mechanism

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11947061B2 (en) * 2019-10-18 2024-04-02 Korea University Research And Business Foundation Earthquake event classification method using attention-based convolutional neural network, recording medium and device for performing the method
CN113344806A (en) * 2021-07-23 2021-09-03 中山大学 Image defogging method and system based on global feature fusion attention network
CN113780551B (en) * 2021-09-03 2023-03-24 北京市商汤科技开发有限公司 Model quantization method, device, equipment, storage medium and computer program product
CN114638002B (en) * 2022-03-21 2023-04-28 华南理工大学 Compressed image encryption method supporting similarity retrieval
CN114663857A (en) * 2022-03-22 2022-06-24 深圳海星智驾科技有限公司 Point cloud target detection method and device and domain controller
CN115936067A (en) * 2022-12-01 2023-04-07 北京航天自动控制研究所 Neural network with ECA channel attention mechanism

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2024113945A1 (en) * 2022-12-01 2024-06-06 北京航天自动控制研究所 Neural network having efficient channel attention (eca) mechanism

Also Published As

Publication number Publication date
WO2024113945A1 (en) 2024-06-06

Similar Documents

Publication Publication Date Title
CN110378468B (en) Neural network accelerator based on structured pruning and low bit quantization
Eshratifar et al. Bottlenet: A deep learning architecture for intelligent mobile cloud computing services
CN107340993B (en) Arithmetic device and method
CN110555450B (en) Face recognition neural network adjusting method and device
CN109002889B (en) Adaptive iterative convolution neural network model compression method
WO2020176250A1 (en) Neural network layer processing with normalization and transformation of data
Zhao et al. Focused quantization for sparse CNNs
WO2024113945A1 (en) Neural network having efficient channel attention (eca) mechanism
CN113159276A (en) Model optimization deployment method, system, equipment and storage medium
CN111696149A (en) Quantization method for stereo matching algorithm based on CNN
CN112686384A (en) Bit-width-adaptive neural network quantization method and device
CN114418057A (en) Operation method of convolutional neural network and related equipment
CN116502691A (en) Deep convolutional neural network mixed precision quantization method applied to FPGA
CN114970853A (en) Cross-range quantization convolutional neural network compression method
CN113610227A (en) Efficient deep convolutional neural network pruning method
US20240104342A1 (en) Methods, systems, and media for low-bit neural networks using bit shift operations
CN113918882A (en) Data processing acceleration method of dynamic sparse attention mechanism capable of being realized by hardware
CN114462591A (en) Inference method for dynamic quantitative neural network
CN113902109A (en) Compression method and device for regular bit serial computation of neural network
CN112561050B (en) Neural network model training method and device
CN113283591B (en) Efficient convolution implementation method and device based on Winograd algorithm and approximate multiplier
CN112232477A (en) Image data processing method, apparatus, device and medium
CN115983343A (en) YOLOv4 convolutional neural network lightweight method based on FPGA
CN114372565A (en) Target detection network compression method for edge device
Lu et al. A very compact embedded CNN processor design based on logarithmic computing

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination