CN114372565A - Target detection network compression method for edge device - Google Patents

Target detection network compression method for edge device Download PDF

Info

Publication number
CN114372565A
CN114372565A CN202210038592.0A CN202210038592A CN114372565A CN 114372565 A CN114372565 A CN 114372565A CN 202210038592 A CN202210038592 A CN 202210038592A CN 114372565 A CN114372565 A CN 114372565A
Authority
CN
China
Prior art keywords
network
skynet
quantization
layer
merging
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210038592.0A
Other languages
Chinese (zh)
Inventor
钟胜
唐维伟
邹旭
徐文辉
谭富中
卢金仪
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Huazhong University of Science and Technology
Original Assignee
Huazhong University of Science and Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Huazhong University of Science and Technology filed Critical Huazhong University of Science and Technology
Priority to CN202210038592.0A priority Critical patent/CN114372565A/en
Publication of CN114372565A publication Critical patent/CN114372565A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/082Learning methods modifying the architecture, e.g. adding, deleting or silencing nodes or connections

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Compression Or Coding Systems Of Tv Signals (AREA)

Abstract

The invention belongs to the technical field of target detection of edge equipment, and discloses a target detection network compression method for the edge equipment, which comprises the following steps: optimizing a network structure for SkyNet, and quantizing the characteristic diagram and the weight parameter; and reconstructing a forward reasoning structure and combining part of calculation processing in the depth separable convolution. The invention is respectively sent from the angle of algorithm optimization, takes SkyNet as an example, and provides a compression processing technology for the target detection network, thereby reducing the deployment difficulty of the target detection network on edge equipment. The invention cuts the network and is more suitable for the edge device. The invention carries out quantization processing and greatly reduces the size of the network model. The invention carries out merging processing, and greatly reduces the calculated amount of the network.

Description

Target detection network compression method for edge device
Technical Field
The invention belongs to the technical field of target detection of edge equipment, and particularly relates to a target detection network compression method for edge equipment.
Background
At present, the task of target detection is to find out and identify the target of interest in the image, and they are widely used in scenes such as automatic driving, face detection and video monitoring. In recent years, a target detection algorithm based on a convolutional neural network obtains better performance than that of a traditional method, but due to huge calculation amount and parameter amount, most of the convolutional neural networks are deployed on a general CPU or a general GPU, power consumption is large, the size is large, meanwhile, real-time detection of edge equipment is difficult to achieve, and a light-weight network or a compression network mode is urgently needed, so that the convolutional neural network can directly carry out inference work on the edge equipment.
To solve this problem, prior art 1 proposes a lightweight network of mobilene, and prior art 2 proposes a lightweight network of Xception, which uses a Deep Separable Convolution (DSC) instead of the standard Convolution to reduce the amount of calculation and parameters and to improve the operation efficiency of the target detection network on the edge device to some extent. In the prior art 3, a hardware-friendly network Skynet is designed for edge devices on the basis of deep separable convolution, and compared with MobileNet and Xception, the Skynet structure is more regular, the module reuse rate is higher, and the deployment platform still has higher computational requirements.
To meet the edge application scenario, a number of neural network optimization techniques are proposed, which mainly include network compression, and are roughly divided into quantization of the network and pruning of the network. The quantization part uses low precision number to replace high precision number to participate in convolution calculation, uses partial precision loss to avoid floating point number operation, or uses a central value to replace a part of weight value in a clustering mode; the pruning is to utilize the network sparsity, and as the weight in the neural network is smaller, the influence on the final prediction result is smaller, the network can be skipped by judging whether the weight of the network is zero.
Through the above analysis, the problems and defects of the prior art are as follows: the existing target detection network has low operation efficiency on the edge equipment, high requirement on the equipment, large power consumption and large volume, and cannot carry out real-time detection on the edge equipment.
The difficulty in solving the above problems and defects is: the high-precision target detection task can be realized by utilizing the convolutional neural network, but the characteristics of large parameter and large calculation amount make the realization on edge equipment difficult.
The significance of solving the problems and the defects is as follows: by the set of compression method, the network parameter quantity and the calculated quantity can be reduced, network deployment can be carried out on low-power-consumption and low-cost edge equipment, and the precision loss is in a reasonable range.
Disclosure of Invention
Aiming at the problems in the prior art, the invention provides a target detection network compression method for edge equipment.
The invention is realized in such a way that an object detection network compression method for edge equipment comprises the following steps:
optimizing a network structure for SkyNet, and quantizing the characteristic diagram and the weight parameter; and reconstructing a forward reasoning structure and combining part of calculation processing in the depth separable convolution.
Further, the target detection network compression method for the edge device comprises the following steps:
removing a bypass branch structure in the SkyNet network and deleting a partial channel output by a first layer to perform optimized cutting of the SkyNet network to obtain an optimized SkyNet network;
step two, the SkyNet network is compressed after the weight is quantized to 7 bits, the characteristic diagram is quantized to 8 bits, the bias parameters and the scaling coefficients are fused, and the point is fixed to 32 bits for retraining;
and step three, merging the convolution layer and the normalization layer, merging the activation processing, the quantization, the inverse quantization and the saturation truncation processing into FETCH processing, and merging the SkyNet network structure.
Further, the performing optimized cutting of the SkyNet network includes the following steps:
first, the Skynet branches are pruned; pruning the output of the first layer into 32 channels by taking each depth separable convolution as a minimum unit layer;
secondly, adding pooling treatment after the sixth layer, modifying the last layer into depth separable convolution, and optimizing the whole network structure into a straight cylinder type.
Further, the optimized SkyNet network includes:
a 3-channel input layer CHL3, intermediate layers CHL32, CHL96, CHL192, CHL384, CHL512, CHL96, and a regression layer CHL 30;
the convolution between each layer of the optimized SkyNet network uses a deep separable convolution.
Further, the compression of the SkyNet network after retraining comprises the following steps:
(1) selecting the maximum value in the convolution kernel corresponding to each output channel as the maximum value of quantization, and performing weight quantization by adopting a maximum value quantization mode:
Figure BDA0003469305490000031
q_w=w×scale_w;
wherein w represents a vector of the original weight corresponding to each channel; q _ w represents a vector of quantized weights; scale _ w represents that the scaling factor is a scalar;
(2) selecting a threshold value by utilizing the KL relative entropy, and quantizing the feature diagram by adopting a saturation type quantization method: selecting a threshold value T, scaling the value of the original distribution within the range of +/-T to-127 to +127 in equal proportion, carrying out saturation treatment on the part out of the range, and directly taking the value of the part out of the range represented by the saturation value.
Further, selecting a threshold value by using the KL relative entropy, and performing feature map quantization by using saturated quantization comprises:
1) selecting a threshold value by utilizing KL relative entropy:
Figure BDA0003469305490000032
Figure BDA0003469305490000033
Figure BDA0003469305490000034
wherein p represents original distribution before quantization is set, q represents distribution after quantization using a threshold value T, H (p) represents original distribution information entropy, H (p, q) represents cross entropy of original distribution and distribution after quantization, and DKL (p | | q) represents KL relative entropy;
2) calculating a scaling factor scale _ fm:
Figure BDA0003469305490000041
3) performing fixed-point processing on the bias and scaling coefficients, merging floating point number units appearing in forward reasoning, amplifying and rounding, and storing the final coefficient by adopting a 32-bit integer number;
4) calculating the inverse quantization coefficient after merging, amplification and rounding:
Bias_merge=int(bias×scale_next_fm×shift_coe);
Figure BDA0003469305490000042
wherein scale _ w represents a weight quantization coefficient; scale _ fm represents the quantization coefficient of the front layer feature map; bias represents bias; scale _ next _ fm represents the next layer quantized coefficient; scale _ merge represents the inverse quantization coefficient after merging, amplification and rounding; bias _ merge represents a Bias coefficient; shift _ coe represents magnification.
Further, the merging the SkyNet network structure includes:
(1) merging the convolution layer and the normalization layer to obtain a merged output y3Comprises the following steps:
Figure BDA0003469305490000043
wherein:
Figure BDA0003469305490000044
Figure BDA0003469305490000045
wherein, y1Represents the convolution output; x represents an input; w represents a weight; b represents a deviation; x, w and b are vectors; μ represents a mean value; σ represents the standard deviation; γ represents a scaling coefficient; β represents a scaling offset; 1 e-6; w represents a post-fusion weight; b represents post-fusion bias;
(2) merging activation processing, quantization, inverse quantization and saturation truncation into FETCH processing; the FETCH processing converts 32 bits into 8 bits; ReLu activation and saturation truncation were performed.
Further, the performing ReLu activation and saturation truncation comprises:
judging the positive and negative of the input data, and if the positive is true, performing saturation truncation; if the output is negative, the activated value is 0.
Another object of the present invention is to provide an object detection network compression system for an edge device, comprising:
the network pruning module is used for carrying out optimized cutting on the SkyNet network by removing a bypass branch structure in the SkyNet network and deleting a part of channels output by the first layer to obtain the optimized SkyNet network;
the network compression module is used for compressing the SkyNet network after the weight is quantized to 7 bits, the characteristic diagram is quantized to 8 bits, the bias parameters and the scaling coefficients are fused, and the point is fixed to 32 bits for retraining;
and the network structure merging module is used for merging the convolution layer and the normalization layer, merging the activation processing, the quantization, the inverse quantization and the saturation truncation processing into FETCH processing and merging the SkyNet network structure.
By combining all the technical schemes, the invention has the advantages and positive effects that:
the invention is respectively sent from the angle of algorithm optimization, takes SkyNet as an example, and provides a compression processing technology for the target detection network, thereby reducing the deployment difficulty of the target detection network on edge equipment. The invention cuts the network and is more suitable for the edge device. The invention carries out quantization processing and greatly reduces the size of the network model. The invention carries out merging processing, and greatly reduces the calculated amount of the network.
Drawings
Fig. 1 is a flowchart of a target detection network compression method for an edge device according to an embodiment of the present invention.
Fig. 2 is a diagram of the optimized Skynet network structure according to an embodiment of the present invention.
Fig. 3 is a schematic diagram of a saturated truncated scaling quantization according to an embodiment of the present invention.
Fig. 4 is a schematic diagram of a merged convolutional layer and a normalization layer according to an embodiment of the present invention.
Fig. 5 is a schematic diagram of the FETCH operation provided by the embodiment of the present invention.
Fig. 6 is a schematic diagram of a calculation process of each layer before fixed-point processing according to an embodiment of the present invention.
Fig. 7 is a schematic diagram of a calculation process of each layer after the fixed point processing provided by the embodiment of the invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is further described in detail with reference to the following embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.
In view of the problems in the prior art, the present invention provides a target detection network compression method for an edge device, and the following describes the present invention in detail with reference to the accompanying drawings.
The target detection network compression method for the edge device provided by the embodiment of the invention comprises the following steps:
optimizing a network structure for SkyNet, and quantizing the characteristic diagram and the weight parameter; and reconstructing a forward reasoning structure and combining part of calculation processing in the depth separable convolution.
As shown in fig. 1, the method for compressing an object detection network for an edge device according to an embodiment of the present invention includes the following steps:
s101, performing optimized cutting on the SkyNet network by removing a bypass branch structure in the SkyNet network and deleting a partial channel output by a first layer to obtain an optimized SkyNet network;
s102, compressing the SkyNet network after the weight is quantized to 7 bits, the characteristic diagram is quantized to 8 bits, the bias parameters and the scaling coefficients are fused, and the point is fixed to 32 bits for retraining;
s103, merging the convolution layer and the normalization layer, merging the activation processing, the quantization, the inverse quantization and the saturation truncation processing into FETCH processing, and merging the SkyNet network structure.
The SkyNet network optimized cutting method provided by the embodiment of the invention comprises the following steps:
first, the Skynet branches are pruned; pruning the output of the first layer into 32 channels by taking each depth separable convolution as a minimum unit layer;
secondly, adding pooling treatment after the sixth layer, modifying the last layer into depth separable convolution, and optimizing the whole network structure into a straight cylinder type.
The optimized SkyNet network provided by the embodiment of the invention comprises the following components:
a 3-channel input layer CHL3, intermediate layers CHL32, CHL96, CHL192, CHL384, CHL512, CHL96, and a regression layer CHL 30;
the convolution between each layer of the optimized SkyNet network uses a deep separable convolution.
The SkyNet network compression method after retraining provided by the embodiment of the invention comprises the following steps:
(1) selecting the maximum value in the convolution kernel corresponding to each output channel as the maximum value of quantization, and performing weight quantization by adopting a maximum value quantization mode:
Figure BDA0003469305490000071
q_w=w×scale_w;
wherein w represents a vector of the original weight corresponding to each channel; q _ w represents a vector of quantized weights; scale _ w represents that the scaling factor is a scalar;
(2) selecting a threshold value by utilizing the KL relative entropy, and quantizing the feature diagram by adopting a saturation type quantization method: selecting a threshold value T, scaling the value of the original distribution within the range of +/-T to-127 to +127 in equal proportion, carrying out saturation treatment on the part out of the range, and directly taking the value of the part out of the range represented by the saturation value.
The method for selecting the threshold value by utilizing the KL relative entropy and quantizing the characteristic diagram by adopting the saturation quantization comprises the following steps:
1) selecting a threshold value by utilizing KL relative entropy:
Figure BDA0003469305490000072
Figure BDA0003469305490000073
Figure BDA0003469305490000074
wherein p represents original distribution before quantization is set, q represents distribution after quantization using a threshold value T, H (p) represents original distribution information entropy, H (p, q) represents cross entropy of original distribution and distribution after quantization, and DKL (p | | q) represents KL relative entropy;
2) calculating a scaling factor scale _ fm:
Figure BDA0003469305490000075
3) performing fixed-point processing on the bias and scaling coefficients, merging floating point number units appearing in forward reasoning, amplifying and rounding, and storing the final coefficient by adopting a 32-bit integer number;
4) calculating the inverse quantization coefficient after merging, amplification and rounding:
Bias_merge=int(bias×scale_next_fm×shift_coe);
Figure BDA0003469305490000081
wherein scale _ w represents a weight quantization coefficient; scale _ fm represents the quantization coefficient of the front layer feature map; bias represents bias; scale _ next _ fm represents the next layer quantized coefficient; scale _ merge represents the inverse quantization coefficient after merging, amplification and rounding; bias _ merge represents a Bias coefficient; shift _ coe represents magnification.
The SkyNet network structure merging method provided by the embodiment of the invention comprises the following steps:
(1) merging the convolution layer and the normalization layer to obtain a merged output y3Comprises the following steps:
Figure BDA0003469305490000082
wherein:
Figure BDA0003469305490000083
Figure BDA0003469305490000084
wherein, y1Represents the convolution output; x represents an input; w represents a weight; b represents a deviation; x, w and b are vectors; μ represents a mean value; σ represents the standard deviation; γ represents a scaling coefficient; β represents a scaling offset; 1 e-6; w represents a post-fusion weight; b represents post-fusion bias;
(2) merging activation processing, quantization, inverse quantization and saturation truncation into FETCH processing; the FETCH processing converts 32 bits into 8 bits; ReLu activation and saturation truncation were performed.
The ReLu activation and saturation truncation provided by the embodiment of the invention comprises the following steps:
judging the positive and negative of the input data, and if the positive is true, performing saturation truncation; if the output is negative, the activated value is 0.
The technical solution of the present invention is further described with reference to the following specific embodiments.
Example 1:
the target detection network compression method for the edge device provided by the embodiment of the invention comprises the following steps:
(1) optimized tailoring
The Skynet branches are pruned, each depth separable convolution is taken as a minimum unit layer, the output of a first layer is pruned into 32 channels, then the pooling operation is added after a sixth layer, and finally the last layer is modified into the depth separable convolution, so that the whole network structure is optimized into a straight cylinder type.
As shown in fig. 2, the optimized Skynet includes 8 layers in total, namely a 3-channel input layer (CHL3), an intermediate layer (CHL32, CHL96, CHL192, CHL384, CHL512, CHL96), and a regression layer (CHL30), wherein the convolution between each layer is realized by using Deep Separable Convolution (DSC), the network structure is regular, the module multiplexing is facilitated, and a high-efficiency calculation structure is formed.
The input image size used in the present invention is 160 × 160, and the image size of each layer is shown in table 1:
TABLE 1 detailed dimensions of the layers
Figure BDA0003469305490000091
(2) Compressing a model
The quantization parts are quantization of weights, quantization of feature maps and spotting of bias and scaling coefficients, respectively.
The weight adopts a maximum value quantization mode, and the maximum value selects the maximum value in the convolution kernels corresponding to each output channel.
And (3) setting the original weight corresponding to each channel as a vector w, the weight after quantization as a vector q _ w and the scaling coefficient as a scalar scale _ w, wherein the corresponding relation is shown as formulas (1) and (2), in order to facilitate operation, the boundary is taken as 63, and the finally obtained q _ w is uniformly distributed between-63 and 63.
Figure BDA0003469305490000101
q_w=w×scale_w (2)
The characteristic diagram quantization adopts saturation quantization, and a threshold value is selected by utilizing the KL relative entropy, so that the precision loss is obviously reduced.
Saturated quantization as shown in fig. 3, a threshold T is selected, the values of the original distribution within the range of ± T are scaled to-127 to +127, the red value table in the figure is scaled to be out of the range, saturation processing is performed, and the saturated value is directly taken to represent the part of the values.
Assuming that original distribution before quantization is p, distribution after quantization is q by using a threshold value T, original distribution information entropy is H (p), cross entropy of original distribution and distribution after quantization is H (p, q), and KL relative entropy is DKL (p | | | q), the following formula is given:
Figure BDA0003469305490000102
Figure BDA0003469305490000103
Figure BDA0003469305490000104
the available scaling factor is scale _ fm, where:
Figure BDA0003469305490000105
and (4) performing fixed-point processing on the bias and scaling coefficients, merging floating point number units appearing in forward reasoning, amplifying and rounding, and storing the coefficients by adopting 32-bit integer numbers. And setting the weight quantization coefficient as scale _ w, the current layer feature map quantization coefficient as scale _ fm, the bias as bias, and the next layer quantization coefficient as scale _ next _ fm.
Setting the inverse quantization coefficient after merging, amplification and rounding as Scale _ merge, the Bias coefficient as Bias _ merge, and the amplification factor as shift _ coe, including:
Bias_merge=int(bias×scale_next_fm×shift_coe) (7)
Figure BDA0003469305490000106
(3) merging network structures
And combining the convolution layer and the normalization layer, wherein the deep separable convolution comprises the normalization layer, and the normalization layer can accelerate network convergence, control overfitting and solve the problems of gradient disappearance and gradient explosion when a network model is trained. After the model training is completed, all parameters are fixed, and at this time, the convolution layer parameters and the normalization layer parameters in the network are combined, as shown in fig. 4, so that the network structure can be effectively simplified, the calculation amount is reduced, and the calculation efficiency is improved.
Assuming that the convolution output is y1, the input is x, the weight is w, the deviation is b, x, w and b are vectors, the normalization layer output is y2, the mean value is μ, the standard deviation is σ, the scaling coefficient is γ, the scaling offset is β, and let ε be 1e-6, the convolution calculation formula is:
y1=wx+b (9)
the normalization layer calculation formula is as follows:
Figure BDA0003469305490000111
wherein:
Figure BDA0003469305490000112
Figure BDA0003469305490000113
substituting equation (10) for equation (11), assuming W as the post-fusion weight and B as the post-fusion bias, with the combined output y 3:
Figure BDA0003469305490000114
wherein:
Figure BDA0003469305490000115
Figure BDA0003469305490000116
and after fusion, normalization layer calculation is not needed any more, so that the size of the model is reduced, the calculation resource is saved, and the performance is improved for forward reasoning.
The merge activation operation, quantization, dequantization, and saturation truncation are FETCH operations, as shown in fig. 5.
The fetch operation performs a 32bit to 8bit transition, essentially completing both the ReLu activation and saturation truncation processes. After calculation by using fixed-point data, the result needs to be reduced to the previous 1/shift _ coe, for edge equipment, since shift _ coe is the power of 2, the bit can be directly taken, the positive and negative of input data needs to be judged in the middle, and if the positive is true, saturation truncation is carried out; if the output is negative, the activated value is 0.
FIG. 6 shows the inference steps before the stationarity and network structure consolidation, and FIG. 7 shows the inference steps after optimization.
Example 2:
(1) optimized cut and retrain
And (5) performing retraining on the model after the Skynet network structure is optimized into a straight cylinder type. Table 2 shows comparison of accuracy before and after pruning of Skynet, and the accuracy is reduced by less than 0.03 by using average accuracy (AP, as shown in formula 23) as an evaluation index, so that the efficiency of real-time calculation of Skynet on the edge device is greatly improved under the condition of meeting the actual application requirements.
TABLE 2 comparison of accuracy before and after Skynet pruning
Type of model Average accuracy
Complete model 0.797
Model after pruning 0.770
(2) Compressing models and merging network structures
The model after pruning and retraining is quantized and combined to activate operation, quantization, inverse quantization and saturation truncation operation, compared with the model before uncompressed, the precision is reduced by only 2.34%, and the size of the network model is reduced by 74.5%.
Comparison tables relating to model size, average accuracy and loss of accuracy before and after model compression optimization, e.g.
Shown in table 3.
Before compression After compression
Data type 32bit floating point type 7bit/8bit/32bit fixed point type
Model size (MB) 1.41 0.359
Average Precision (AP) 0.770 0.752
Loss of precision --- 2.34%
Compression ratio --- 74.5%
The above description is only for the purpose of illustrating the present invention and the appended claims are not to be construed as limiting the scope of the invention, which is intended to cover all modifications, equivalents and improvements that are within the spirit and scope of the invention as defined by the appended claims.

Claims (8)

1. An object detection network compression method for an edge device, the object detection network compression method for an edge device comprising:
optimizing a network structure for SkyNet, and quantizing the characteristic diagram and the weight parameter; and reconstructing a forward reasoning structure and combining part of calculation processing in the depth separable convolution.
2. The object detection network compression method for edge devices of claim 1, wherein the object detection network compression method for edge devices comprises the steps of:
removing a bypass branch structure in the SkyNet network and deleting a partial channel output by a first layer to perform optimized cutting of the SkyNet network to obtain an optimized SkyNet network;
step two, the SkyNet network is compressed after the weight is quantized to 7 bits, the characteristic diagram is quantized to 8 bits, the bias parameters and the scaling coefficients are fused, and the point is fixed to 32 bits for retraining;
and step three, merging the convolution layer and the normalization layer, merging the activation processing, the quantization, the inverse quantization and the saturation truncation processing into FETCH processing, and merging the SkyNet network structure.
3. The target detection network compression method for an edge device of claim 2, wherein the performing optimized pruning of the SkyNet network comprises the steps of:
first, the Skynet branches are pruned; pruning the output of the first layer into 32 channels by taking each depth separable convolution as a minimum unit layer;
secondly, adding pooling treatment after the sixth layer, modifying the last layer into depth separable convolution, and optimizing the whole network structure into a straight cylinder type.
4. The target detection network compression method for an edge device of claim 2, wherein the optimized SkyNet network comprises:
a 3-channel input layer CHL3, intermediate layers CHL32, CHL96, CHL192, CHL384, CHL512, CHL96, and a regression layer CHL 30;
the convolution between each layer of the optimized SkyNet network uses a deep separable convolution.
5. The method of object detection network compression for an edge device of claim 2, wherein the compression of the retrained SkyNet network comprises the steps of:
(1) selecting the maximum value in the convolution kernel corresponding to each output channel as the maximum value of quantization, and performing weight quantization by adopting a maximum value quantization mode:
Figure FDA0003469305480000021
q_w=w×scale_w;
wherein w represents a vector of the original weight corresponding to each channel; q _ w represents a vector of quantized weights; scale _ w represents that the scaling factor is a scalar;
(2) selecting a threshold value by utilizing the KL relative entropy, and quantizing the feature diagram by adopting a saturation type quantization method: selecting a threshold value T, scaling the value of the original distribution within the range of +/-T to-127 to +127 in equal proportion, carrying out saturation treatment on the part out of the range, and directly taking the value of the part out of the range represented by the saturation value.
6. The method according to claim 5, wherein the selecting the threshold value using the KL relative entropy and performing the feature map quantization using saturation quantization comprises:
1) selecting a threshold value by utilizing KL relative entropy:
Figure FDA0003469305480000022
Figure FDA0003469305480000023
Figure FDA0003469305480000024
wherein p represents original distribution before quantization is set, q represents distribution after quantization using a threshold value T, H (p) represents original distribution information entropy, H (p, q) represents cross entropy of original distribution and distribution after quantization, and DKL (p | | q) represents KL relative entropy;
2) calculating a scaling factor scale _ fm:
Figure FDA0003469305480000025
3) performing fixed-point processing on the bias and scaling coefficients, merging floating point number units appearing in forward reasoning, amplifying and rounding, and storing the final coefficient by adopting a 32-bit integer number;
4) calculating the inverse quantization coefficient after merging, amplification and rounding:
Bias_merge=int(bias×scale_next_fm×shift_coe);
Figure FDA0003469305480000031
wherein scale _ w represents a weight quantization coefficient; scale _ fm represents the quantization coefficient of the front layer feature map; bias represents bias; scale _ next _ fm represents the next layer quantized coefficient; scale _ merge represents the inverse quantization coefficient after merging, amplification and rounding; bias _ merge represents a Bias coefficient; shift _ coe represents magnification.
7. The method of object detection network compression for an edge device of claim 2, wherein said merging the SkyNet network structure comprises:
(1) merging the convolution layer and the normalization layer to obtain a merged output y3Comprises the following steps:
Figure FDA0003469305480000032
wherein:
Figure FDA0003469305480000033
Figure FDA0003469305480000034
wherein, y1Represents the convolution output; x represents an input; w represents a weight; b represents a deviation; x, w and b are vectors; μ represents a mean value; σ represents the standard deviation; γ represents a scaling coefficient; β represents a scaling offset; 1 e-6; w represents a post-fusion weight; b represents post-fusion bias;
(2) merging activation processing, quantization, inverse quantization and saturation truncation into FETCH processing; the FETCH processing converts 32 bits into 8 bits; performing ReLu activation and saturation truncation; the performing ReLu activation and saturation truncation comprises:
judging the positive and negative of the input data, and if the positive is true, performing saturation truncation; if the output is negative, the activated value is 0.
8. An object detection network compression system for an edge device implementing the object detection network compression method for the edge device according to any one of claims 7, wherein the object detection network compression system for the edge device comprises:
the network pruning module is used for carrying out optimized cutting on the SkyNet network by removing a bypass branch structure in the SkyNet network and deleting a part of channels output by the first layer to obtain the optimized SkyNet network;
the network compression module is used for compressing the SkyNet network after the weight is quantized to 7 bits, the characteristic diagram is quantized to 8 bits, the bias parameters and the scaling coefficients are fused, and the point is fixed to 32 bits for retraining;
and the network structure merging module is used for merging the convolution layer and the normalization layer, merging the activation processing, the quantization, the inverse quantization and the saturation truncation processing into FETCH processing and merging the SkyNet network structure.
CN202210038592.0A 2022-01-13 2022-01-13 Target detection network compression method for edge device Pending CN114372565A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210038592.0A CN114372565A (en) 2022-01-13 2022-01-13 Target detection network compression method for edge device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210038592.0A CN114372565A (en) 2022-01-13 2022-01-13 Target detection network compression method for edge device

Publications (1)

Publication Number Publication Date
CN114372565A true CN114372565A (en) 2022-04-19

Family

ID=81144914

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210038592.0A Pending CN114372565A (en) 2022-01-13 2022-01-13 Target detection network compression method for edge device

Country Status (1)

Country Link
CN (1) CN114372565A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113505774A (en) * 2021-07-14 2021-10-15 青岛全掌柜科技有限公司 Novel policy identification model size compression method

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113505774A (en) * 2021-07-14 2021-10-15 青岛全掌柜科技有限公司 Novel policy identification model size compression method

Similar Documents

Publication Publication Date Title
CN108764471B (en) Neural network cross-layer pruning method based on feature redundancy analysis
CN110378468B (en) Neural network accelerator based on structured pruning and low bit quantization
CN111079781B (en) Lightweight convolutional neural network image recognition method based on low rank and sparse decomposition
CN113159173B (en) Convolutional neural network model compression method combining pruning and knowledge distillation
CN109002889B (en) Adaptive iterative convolution neural network model compression method
CN112163628A (en) Method for improving target real-time identification network structure suitable for embedded equipment
CN112329922A (en) Neural network model compression method and system based on mass spectrum data set
CN110533022B (en) Target detection method, system, device and storage medium
CN113222138A (en) Convolutional neural network compression method combining layer pruning and channel pruning
CN113011570A (en) Adaptive high-precision compression method and system of convolutional neural network model
CN110781912A (en) Image classification method based on channel expansion inverse convolution neural network
CN111696149A (en) Quantization method for stereo matching algorithm based on CNN
CN113610192A (en) Neural network lightweight method and system based on continuous pruning
CN114742211B (en) Convolutional neural network deployment and optimization method facing microcontroller
CN112597919A (en) Real-time medicine box detection method based on YOLOv3 pruning network and embedded development board
CN113595993A (en) Vehicle-mounted sensing equipment joint learning method for model structure optimization under edge calculation
CN114372565A (en) Target detection network compression method for edge device
Qi et al. Learning low resource consumption cnn through pruning and quantization
CN112561054B (en) Neural network filter pruning method based on batch characteristic heat map
CN116959477B (en) Convolutional neural network-based noise source classification method and device
CN117333497A (en) Mask supervision strategy-based three-dimensional medical image segmentation method for efficient modeling
CN112613604A (en) Neural network quantification method and device
CN117151178A (en) FPGA-oriented CNN customized network quantification acceleration method
CN112308213A (en) Convolutional neural network compression method based on global feature relationship
CN116757255A (en) Method for improving weight reduction of mobile NetV2 distracted driving behavior detection model

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination