CN108510063B - Acceleration method and accelerator applied to convolutional neural network - Google Patents
Acceleration method and accelerator applied to convolutional neural network Download PDFInfo
- Publication number
- CN108510063B CN108510063B CN201810306577.3A CN201810306577A CN108510063B CN 108510063 B CN108510063 B CN 108510063B CN 201810306577 A CN201810306577 A CN 201810306577A CN 108510063 B CN108510063 B CN 108510063B
- Authority
- CN
- China
- Prior art keywords
- feature map
- preset threshold
- density
- neural network
- convolution kernel
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/082—Learning methods modifying the architecture, e.g. adding, deleting or silencing nodes or connections
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/048—Activation functions
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/06—Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons
- G06N3/063—Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons using electronic means
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Health & Medical Sciences (AREA)
- Computing Systems (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- Data Mining & Analysis (AREA)
- Evolutionary Computation (AREA)
- Life Sciences & Earth Sciences (AREA)
- Molecular Biology (AREA)
- Artificial Intelligence (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Health & Medical Sciences (AREA)
- Image Analysis (AREA)
- Complex Calculations (AREA)
Abstract
The invention provides an acceleration method and an accelerator applied to a convolutional neural network, wherein the method comprises the following steps: s1, for any layer in the convolutional neural network, respectively calculating the density of each feature map output by the layer; s2, comparing the density of each characteristic graph output by the layer with a plurality of preset thresholds, and carrying out sparse coding on each characteristic graph according to the comparison result; wherein, different comparison results correspond to different sparse coding modes; and S3, convolving each feature map after sparse coding and each convolution kernel in the convolutional neural network which is sparsely coded in advance based on the convolutional layer of the next layer. The invention reduces the calculation amount of convolution operation in the convolution neural network and improves the operation speed.
Description
Technical Field
The invention belongs to the technical field of operation optimization, and particularly relates to an acceleration method and an accelerator applied to a convolutional neural network.
Background
A Convolutional Neural Network (CNN) is a feed-forward Neural Network whose artificial neurons can respond to a portion of the surrounding cells in the coverage, and is suitable for processing large images. The convolutional neural network is widely applied to the fields of image recognition, voice recognition and the like, but the calculation amount is very large.
A large number of sparse feature maps (feature maps) are caused by the activation function ReLU (corrected linear unit) in the convolutional neural network; meanwhile, training the convolutional neural network by using methods such as pruning and the like can cause a large amount of sparse weight data (weight data). The calculation efficiency of the convolutional neural network can be greatly improved by utilizing the sparsity of the feature map and the weight data. At present, many methods are used for improving the calculation speed based on the sparsity of feature maps and weight data in a convolutional neural network. These methods can be roughly divided into two categories, one of which focuses on skipping the 0 value. For example, some methods remove the value of 0 from the input, thereby reducing the number of invalid calculations for an input of 0. The other takes the approach of ignoring zero values. For example, there is a method of reducing the number of operations by not performing multiplication when the input data is 0. However, these methods all focus on dealing with sparse neural networks themselves, assuming that neural networks are sparse as a premise. In practice, however, the output feature maps of the various layers in the convolutional neural network may be sparse and may be non-sparse. In practical application, the weight data of each layer of the convolutional neural network and the density of the characteristic diagram are generally distributed between 5% and 90%.
The sparse matrix refers to a matrix in which the number of elements having a value of 0 is much greater than the number of elements other than 0, and the elements other than 0 are distributed irregularly. In the prior art, on one hand, only a sparse convolutional neural network can be processed, and under the condition that the convolutional neural network is not sparse, the calculation amount is large and the operation speed is low; on the other hand, the prior art can only process the condition that the weight data or the characteristic graph in the convolution nerve is sparse, and cannot process the condition that the weight data and the characteristic graph are both sparse.
Disclosure of Invention
In order to overcome the problem of low operation speed of the convolutional neural network or at least partially solve the problem, the invention provides an acceleration method and an accelerator applied to the convolutional neural network.
According to a first aspect of the present invention, there is provided an acceleration method applied to a convolutional neural network, comprising:
s1, for any layer in the convolutional neural network, respectively calculating the density of each feature map output by the layer;
s2, comparing the density of each characteristic graph output by the layer with a plurality of preset thresholds, and carrying out sparse coding on each characteristic graph according to the comparison result; wherein, different comparison results correspond to different sparse coding modes;
and S3, convolving each feature map after sparse coding and each convolution kernel in the convolutional neural network which is sparsely coded in advance based on the convolutional layer of the next layer.
Specifically, the step S1 specifically includes:
for any one feature map, counting the number of non-0 elements in the feature map and the total number of all elements in the feature map;
and taking the ratio of the number of the elements which are not 0 in the feature map to the total number of all the elements in the feature map as the density of the feature map.
Specifically, the preset threshold includes a first preset threshold and a second preset threshold; wherein the first preset threshold is smaller than the second preset threshold;
correspondingly, the step S2 specifically includes:
if the density of each characteristic diagram is smaller than the first preset threshold value, encoding each characteristic diagram into a sparse matrix storage format;
if the density of each feature map is greater than or equal to the first preset threshold and less than the second preset threshold, marking 0 element in each feature map;
and if the density of each feature map is greater than or equal to the second preset threshold, not performing sparse coding on each feature map.
Specifically, the step S3 is preceded by:
calculating the density of each convolution kernel in the trained convolution network;
if the density of each convolution kernel is smaller than the first preset threshold value, encoding each convolution kernel into a sparse matrix storage format;
if the density of each convolution kernel is greater than or equal to the first preset threshold and less than the second preset threshold, marking 0 elements in each convolution kernel;
and if the density of each convolution kernel is greater than or equal to the second preset threshold, not performing sparse coding on each convolution kernel.
Specifically, the step S3 specifically includes:
when the mark exists in each feature map or each convolution kernel, the element corresponding to the mark in each feature map or each convolution kernel is not calculated.
According to another aspect of the present invention, there is provided an accelerator applied to a convolutional neural network, including: the device comprises a neural network computing array module and a dynamic sparse adjustment module;
the dynamic sparse adjustment module is used for calculating the density of each characteristic diagram output by each layer of the convolutional neural network, comparing the density of each characteristic diagram with a plurality of preset thresholds, and performing sparse coding on each characteristic diagram according to a comparison result; wherein, different sparse coding modes corresponding to different comparison results;
the neural network calculation array module is used for carrying out convolution operation on each feature map subjected to sparse coding and each convolution kernel in the convolution neural network subjected to sparse coding in advance.
Specifically, the dynamic sparse adjustment module comprises an online density identification module, an output temporary registering module, a dynamic coding module and a dynamic sparse control module;
the on-line density identification module is used for counting the number of 0 elements in the feature map and the total number of all elements in the feature map for any feature map; taking the ratio of the number of 0 elements in the feature map to the total number of all elements in the feature map as the density of the feature map;
the output temporary register module is used for storing each characteristic graph output by each layer in the convolutional neural network;
the dynamic sparse control module is used for comparing the density of each characteristic diagram output by the on-line density identification module with a plurality of preset thresholds;
and the dynamic coding module is used for carrying out sparse coding on each characteristic diagram in the output temporary registering module according to a comparison result.
Specifically, the preset threshold includes a first preset threshold and a second preset threshold; wherein the first preset threshold is smaller than the second preset threshold;
correspondingly, the dynamic encoding module is specifically configured to:
if the density of each characteristic diagram is smaller than the first preset threshold value, encoding each characteristic diagram into a sparse matrix storage format;
if the density of each feature map is greater than or equal to the first preset threshold and less than the second preset threshold, marking 0 element in each feature map;
and if the density of each feature map is greater than or equal to the second preset threshold, not performing sparse coding on each feature map.
Specifically, the dynamic encoding module is further configured to:
if the pre-calculated consistency of each convolution kernel is smaller than the first preset threshold value, encoding each convolution kernel into a sparse matrix storage format;
if the density of each convolution kernel is greater than or equal to the first preset threshold and less than the second preset threshold, marking 0 elements in each convolution kernel;
and if the density of each convolution kernel is greater than or equal to the second preset threshold, not performing sparse coding on each convolution kernel.
Specifically, the neural network computational array module is specifically configured to:
when the mark exists in each feature map or each convolution kernel, the element corresponding to the mark in each feature map or each convolution kernel is not calculated.
The invention provides an acceleration method and an accelerator applied to a convolutional neural network, wherein the method comprises the steps of comparing the density of each characteristic diagram output by each layer in the convolutional neural network with a plurality of preset thresholds to obtain the sparse state of each characteristic diagram, carrying out sparse coding in different modes on the characteristic diagrams in different sparse states, and carrying out convolution operation on each characteristic diagram after sparse coding and a convolution kernel in the convolutional neural network with sparse coding in advance based on the convolution layer of the next layer of each layer, so that the calculation amount of convolution operation in the convolutional neural network is reduced, and the operation speed is improved.
Drawings
Fig. 1 is a schematic overall flowchart of an acceleration method applied to a convolutional neural network according to an embodiment of the present invention;
FIG. 2 is a schematic diagram of an overall structure of an accelerator applied to a convolutional neural network according to an embodiment of the present invention;
FIG. 3 is a diagram illustrating a limit energy efficiency test result of an accelerator applied to a convolutional neural network according to an embodiment of the present invention;
fig. 4 is a schematic diagram illustrating comparison of the results of the limit energy efficiency test in the accelerator applied to the convolutional neural network according to the embodiment of the present invention.
Detailed Description
The following detailed description of embodiments of the present invention is provided in connection with the accompanying drawings and examples. The following examples are intended to illustrate the invention but are not intended to limit the scope of the invention.
In an embodiment of the present invention, an acceleration method applied to a convolutional neural network is provided, and fig. 1 is a schematic flowchart of an overall acceleration method applied to a convolutional neural network, provided in an embodiment of the present invention, and the method includes:
s1, for any layer in the convolutional neural network, respectively calculating the density of each feature map output by the layer;
in particular, the convolutional neural network may or may not include pooling layers. Firstly, the convolutional neural network is trained, and after the training is finished, the convolutional kernel in the convolutional neural network does not change any more, so that the convolutional kernel in the convolutional neural network does not need on-line dynamic sparse coding, and the on-line sparse coding is directly carried out once. The inline refers to being on the chip of the accelerator and the offline refers to not being on the chip of the accelerator. And directly reading the convolution kernel of the sparse code to carry out convolution calculation in each convolution operation. When the original image data is input, the original image data is sparsely encoded, and then the sparsely encoded original data and a sparsely encoded convolution kernel are input to a first layer convolution layer of the convolutional neural network for convolution calculation. Because the original image data is not sparse in general, the original image data may not be sparsely encoded and may be directly input. The sparse coding is to store data in a sparse format.
In S1, since the feature maps output by each layer in the convolutional neural network have different densities, the feature maps output by different layers also dynamically change, and thus the densities also dynamically change. The density represents the degree of sparseness of each of the feature maps. In order to better improve the operation speed of the convolutional neural network, the density of each feature map output by each layer is calculated, and each feature map output by each layer is sparsely encoded according to the density of each feature map output by each layer.
S2, comparing the density of each characteristic graph output by the layer with a plurality of preset thresholds, and carrying out sparse coding on each characteristic graph according to the comparison result; wherein, different comparison results correspond to different sparse coding modes;
in S2, all the feature maps output from each layer are sparsely encoded in the prior art, which results in a large amount of calculation. In this embodiment, the sparse state of each feature map output by the layer is obtained according to the preset threshold. Therefore, different forms of sparse coding are carried out on the feature maps in different sparse states.
And S3, convolving each feature map after sparse coding and each convolution kernel in the convolutional neural network which is sparsely coded in advance based on the convolutional layer of the next layer.
In S3, a convolution operation is performed using each sparsely encoded feature map and each convolution kernel in the previously sparsely encoded convolutional neural network as input to a convolution layer of the layer next to the layer. Then, taking the result of the convolution operation as the input of the next layer of the convolutional layer, and continuing to perform the sparse coding and convolution operation on the feature map output by the next layer of the convolutional layer until each feature map is output by the last layer of the convolutional neural network. The present embodiment is not limited to the sparse coding scheme of the convolution kernel.
In the embodiment, the density of each feature graph output by each layer in the convolutional neural network is compared with a plurality of preset thresholds, the sparse state of each feature graph is obtained, the feature graphs in different sparse states are subjected to sparse coding in different modes, and then, on the basis of the convolutional layer of the next layer of each layer, the feature graphs after sparse coding and convolutional kernels in the convolutional neural network subjected to sparse coding in advance are subjected to convolutional operation, so that the calculation amount of convolutional operation in the convolutional neural network is reduced, and the operation speed is improved.
On the basis of the foregoing embodiment, in this embodiment, the step S1 specifically includes: for any one feature map, counting the number of non-0 elements in the feature map and the total number of all elements in the feature map; and taking the ratio of the number of the elements which are not 0 in the feature map to the total number of all the elements in the feature map as the density of the feature map.
Specifically, the density of each feature map is a ratio of the number of elements other than 0 in each feature map to the total number of all elements in each feature map. For example, if the number of non-0 elements in a feature map is 10 and the total number of all elements in the feature map is 100, the density of the feature map is 0.1.
On the basis of the above embodiment, in this embodiment, the preset threshold includes a first preset threshold and a second preset threshold; the preset threshold comprises a first preset threshold and a second preset threshold; correspondingly, the step S2 specifically includes: if the density of each characteristic diagram is smaller than the first preset threshold value, encoding each characteristic diagram into a sparse matrix storage format; if the density of each feature map is greater than or equal to the first preset threshold and less than the second preset threshold, marking 0 element in each feature map; and if the density of each feature map is greater than or equal to the second preset threshold, not performing sparse coding on each feature map.
Specifically, the preset threshold in this embodiment includes a first preset threshold th1 and a second preset threshold th 2. And dividing the characteristic state AS of each characteristic graph into three states according to the first preset threshold and the second preset threshold, namely dividing the characteristic graph with the consistency smaller than the first preset threshold into a complete sparse state S, dividing the characteristic graph with the consistency larger than or equal to the first preset threshold and smaller than the second preset threshold into a medium sparse state M, and dividing the characteristic graph with the consistency larger than or equal to the second preset threshold into a complete non-sparse state D. And if each characteristic map is in a sparse state S, encoding each characteristic map into a sparse matrix storage format, wherein the sparse matrix storage format comprises non-0 data activ and sparse index in each characteristic map, such as coordinate encoding and compressed sparse row encoding. By encoding the feature map into a sparse matrix storage format, a large amount of storage space can be saved, while saving a large amount of computing time. If each feature map is in the medium sparse state M, a label guard is added to the 0 element in each feature map, and the label is used for identifying the 0 element. The elements marked may not participate in computation and storage, thereby reducing power consumption. Labeling 0 elements in each feature map is also a sparse coding approach. And if each characteristic diagram is in a complete non-sparse state D, the non-sparse data of each characteristic diagram is directly output without dynamic coding.
On the basis of the foregoing embodiment, step S3 in this embodiment further includes: calculating the density of each convolution kernel in the trained convolution network; if the density of each convolution kernel is smaller than the first preset threshold value, encoding each convolution kernel into a sparse matrix storage format; if the density of each convolution kernel is greater than or equal to the first preset threshold and less than the second preset threshold, marking 0 elements in each convolution kernel; and if the density of each convolution kernel is greater than or equal to the second preset threshold value, not encoding each convolution kernel.
Specifically, the consistency of each convolution kernel is the ratio of the number of non-0 elements in each convolution kernel to the total number of all elements in each convolution kernel. The state WS of each convolution kernel is classified into three states as in the signature. Each state corresponds to a different sparse coding mode. The feature diagram and the convolution kernel have three states respectively, and the combined state has 9 states, so that the consistency of the convolution neural network is divided in a finer granularity.
On the basis of the foregoing embodiments, in this embodiment, the step S3 specifically includes: when the mark exists in each feature map or each convolution kernel, the element corresponding to the mark in each feature map or each convolution kernel is not calculated.
Specifically, when each feature map or each convolution kernel is in a completely sparse state S, 0 is removed before input, and a storage space is reduced without calculating 0 elements; when each feature map or each convolution kernel is in the medium sparse state M, although 0 element in each feature map or each convolution kernel is stored, the element corresponding to the label is not calculated, so that the calculation is reduced.
In another embodiment of the present invention, an accelerator applied to a convolutional neural network is provided, and fig. 2 is a schematic diagram of an overall structure of the accelerator applied to the convolutional neural network, which is provided by the embodiment of the present invention, and includes: the device comprises a neural network computing array module and a dynamic sparse adjustment module; the dynamic sparse adjustment module is used for calculating the density of each feature map output by each layer in the convolutional neural network, comparing the density of each feature map with a plurality of preset thresholds, and performing sparse coding on each feature map according to a comparison result; wherein, different sparse coding modes corresponding to different comparison results; the neural network calculation array module is used for carrying out convolution operation on each feature map subjected to sparse coding and each convolution kernel in the convolution neural network subjected to sparse coding in advance.
In particular, the convolutional neural network may or may not include pooling layers. Firstly, the convolutional neural network is trained, and after the training is finished, the convolutional kernel in the convolutional neural network is not changed any more, so that the convolutional kernel in the convolutional neural network does not need on-line dynamic sparse coding, and can be directly subjected to off-line sparse coding once. And the neural network computing array module directly reads the convolution kernel of the offline sparse code for convolution computation during each convolution operation. When the convolution neural network inputs original image data, the dynamic sparse adjustment module conducts sparse coding on the original image data, and then the neural network calculation array module conducts convolution calculation according to the sparsely coded original data and sparsely coded convolution kernels. Because the original image data is not sparse in general, the original image data may not be sparsely encoded and may be directly input. The sparse coding is to store data in a sparse format.
Because the consistency of each characteristic diagram output by each layer in the convolutional neural network is different, the characteristic diagrams output by different layers are also dynamically changed, and therefore, the consistency is also dynamically changed. The density represents the degree of sparseness of each of the feature maps. In order to better improve the operation speed of the convolutional neural network, the dynamic sparse adjustment module calculates the density of each feature map output by each layer, so as to perform sparse coding on each feature map output by each layer according to the density of each feature map output by each layer.
And the dynamic sparse adjustment module acquires the sparse state of each characteristic diagram output by the layer according to a plurality of preset thresholds. Therefore, the feature maps in different sparse states are subjected to different forms of sparse coding, and the method is not limited to one type of sparse coding. In the prior art, all the special pattern images output by each layer are subjected to sparse coding, and the calculation amount is large.
And the neural network computing array module performs convolution operation according to each feature map subjected to sparse coding and each convolution kernel in the convolution neural network subjected to sparse coding in advance. If the pooling module is included, the pooling module performs a pooling operation on the result of the convolution operation. In addition, the accelerator also comprises an intermediate data memory module, a main chip controller and an on-chip and off-chip data communication module. The main controller controls the operation action and the time sequence of the whole chip of the accelerator. The chip upper and lower data communication module is used for reading data from the chip external memory or writing the data calculated by the chip into the external memory. For example, after initialization, a chip reads original image data and an initial convolution kernel from an external memory through the chip up-down data communication module under the control of the main controller. The intermediate data storage module is used for storing intermediate results in the calculation process of the neural network calculation array module.
In this embodiment, the dynamic sparse adjustment module compares the density of each feature map output by each layer of the convolutional neural network with a plurality of preset thresholds to obtain the sparse state of each feature map, and performs sparse coding in different modes on the feature maps in different sparse states, so that the neural network computation array module performs convolutional operation on each feature map after sparse coding and a convolutional kernel in the convolutional neural network with sparse coding in advance, on one hand, the computation amount of convolutional operation in the convolutional neural network is reduced, and the computation speed is improved; on the other hand, the processing state of the accelerator is dynamically switched according to the difference of the sparse states, so that the flexibility of the accelerator is improved.
On the basis of the above embodiment, the dynamic sparse adjustment module in this embodiment includes an on-line thick density identification module, an output temporary registration module, a dynamic coding module, and a dynamic sparse control module; the on-line density identification module is used for counting the number of 0 elements in the feature map and the total number of all elements in the feature map for any feature map; taking the ratio of the number of 0 elements in the feature map to the total number of all elements in the feature map as the density of the feature map; the output temporary register module is used for storing each characteristic graph output by each layer in the convolutional neural network; the dynamic sparse control module is used for comparing the density of each characteristic diagram output by the on-line density identification module with a plurality of preset thresholds; and the dynamic coding module is used for carrying out sparse coding on each characteristic diagram in the output temporary registering module according to a comparison result.
Specifically, the dynamic sparsity adjusting module specifically includes four modules. The on-line consistency identification module is used for counting the number of non-0 elements in each feature map in the calculation process so as to calculate the consistency of each feature map. The output temporary registering module is used for temporarily storing the characteristic diagram output by each layer in the convolutional neural network in a non-sparse format. The dynamic sparse control module is used for controlling the sparse state of the characteristic diagram through a plurality of preset thresholds. And the dynamic coding module performs sparse coding on each characteristic diagram in the output temporary registering module according to the sparse state of each characteristic diagram, so that the speed of convolution operation is increased.
On the basis of the above embodiment, in this embodiment, the preset threshold includes a first preset threshold and a second preset threshold; the preset threshold comprises a first preset threshold and a second preset threshold; correspondingly, the dynamic encoding module is specifically configured to: if the density of each characteristic diagram is smaller than the first preset threshold value, encoding each characteristic diagram into a sparse matrix storage format; if the density of each feature map is greater than or equal to the first preset threshold and less than the second preset threshold, marking 0 element in each feature map; and if the density of each feature map is greater than or equal to the second preset threshold, not encoding each feature map.
Specifically, the preset threshold in this embodiment includes a first preset threshold th1 and a second preset threshold th 2. The dynamic sparse control module divides the characteristic state AS of each characteristic diagram into three states according to the first preset threshold and the second preset threshold, namely, the characteristic diagram with the density smaller than the first preset threshold is divided into a complete sparse state S, the characteristic diagram with the density larger than or equal to the first preset threshold and smaller than the second preset threshold is divided into a medium sparse state M, and the characteristic diagram with the density larger than or equal to the second preset threshold is divided into a complete non-sparse state D.
If each feature map is in a sparse state S, the dynamic encoding module encodes each feature map in the output temporary registration module into a sparse matrix storage format, where the sparse matrix storage format includes non-0 data activ and a sparse index in each feature map, such as coordinate encoding and compressed sparse row encoding. By encoding the feature map into a sparse matrix storage format, a large amount of storage space can be saved, while saving a large amount of computing time. If each feature map is in a medium sparse state M, the dynamic coding module adds a mark guard to 0 elements in each feature map in the output temporary registering module, and the marked elements do not participate in calculation and storage, so that power consumption is reduced. And if each characteristic diagram is in a complete non-sparse state D, dynamic coding is not needed, and the dynamic coding module directly outputs non-sparse data of each characteristic diagram.
On the basis of the foregoing embodiment, in this embodiment, the dynamic encoding module is further configured to: if the pre-calculated consistency of each convolution kernel is smaller than the first preset threshold value, encoding each convolution kernel into a sparse matrix storage format; if the density of each convolution kernel is greater than or equal to the first preset threshold and less than the second preset threshold, marking 0 elements in each convolution kernel; and if the density of each convolution kernel is greater than or equal to the second preset threshold value, not encoding each convolution kernel.
Specifically, the consistency of each convolution kernel is the ratio of the number of non-0 elements in each convolution kernel to the total number of all elements in each convolution kernel. The state WS of each convolution kernel has three states as well as the signature. Each state corresponds to a different sparse coding mode. The feature diagram and the convolution kernel have three states respectively, and the combined state has 9 states, so that the consistency of the convolution neural network is divided in a finer granularity.
On the basis of the foregoing embodiments, the neural network computational array module in this embodiment is specifically configured to: when the mark exists in each feature map or each convolution kernel, the element corresponding to the mark in each feature map or each convolution kernel is not calculated.
Specifically, when each feature map or each convolution kernel is in a completely sparse state S, 0 is removed before each feature map or each convolution kernel is input to the neural network computational array module, so that the storage space is reduced, and meanwhile, 0 element is not required to be calculated; when each feature map or each convolution kernel is in the medium sparse state M, although 0 element in each feature map or each convolution kernel is stored, the element corresponding to the label is not calculated, so that the calculation is reduced.
For example, the chip of the accelerator is manufactured by a deposition 65nm process, the area of the chip is 3mm by 4mm, the operating frequency is 20-200MHz, and the power consumption is 20.5-248.4 milliwatts. The energy efficiency limit in this embodiment rises rapidly as the feature map and convolution kernel density decrease, as shown in fig. 3. When the density of the characteristic diagram and the convolution kernel is 5%, the limit energy efficiency can reach 62.1TOPS/W, which is 6.2 times of the limit energy efficiency when the accelerator is not adopted. As shown in fig. 4, compared to the implementation that only feature data sparsity is supported, the energy efficiency of the embodiment can be improved by 4.3 times. Compared with the realization without self-adaptive sparse control, the invention can improve the energy efficiency by 2.8 times. Compared with the realization of no density control but variable quantization precision, the energy efficiency of the invention can be improved by 2 times.
Finally, the method of the present application is only a preferred embodiment and is not intended to limit the scope of the present invention. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.
Claims (8)
1. An acceleration method applied to a convolutional neural network, comprising:
s1, for any layer in the convolutional neural network, respectively calculating the density of each feature map output by the layer;
s2, comparing the density of each characteristic graph output by the layer with a plurality of preset thresholds, and carrying out sparse coding on each characteristic graph according to the comparison result; wherein, different comparison results correspond to different sparse coding modes;
s3, convolving each feature map after sparse coding and each convolution kernel in the convolutional neural network which is sparsely coded in advance based on the convolution layer of the next layer of the layer;
the preset threshold comprises a first preset threshold and a second preset threshold; wherein the first preset threshold is smaller than the second preset threshold;
correspondingly, the step S2 specifically includes:
if the density of each characteristic diagram is smaller than the first preset threshold value, encoding each characteristic diagram into a sparse matrix storage format;
if the density of each feature map is greater than or equal to the first preset threshold and less than the second preset threshold, marking 0 element in each feature map;
if the density of each feature map is greater than or equal to the second preset threshold, not performing sparse coding on each feature map;
the step S3 specifically includes:
and when the marks exist in each characteristic diagram, not calculating the elements corresponding to the marks in each characteristic diagram.
2. The method according to claim 1, wherein the step S1 specifically includes:
for any one feature map, counting the number of non-0 elements in the feature map and the total number of all elements in the feature map;
and taking the ratio of the number of the elements which are not 0 in the feature map to the total number of all the elements in the feature map as the density of the feature map.
3. The method according to claim 1, wherein the step S3 is preceded by:
calculating the density of each convolution kernel in the trained convolution network;
if the density of each convolution kernel is smaller than the first preset threshold value, encoding each convolution kernel into a sparse matrix storage format;
if the density of each convolution kernel is greater than or equal to the first preset threshold and less than the second preset threshold, marking 0 elements in each convolution kernel;
and if the density of each convolution kernel is greater than or equal to the second preset threshold, not performing sparse coding on each convolution kernel.
4. The method according to claim 3, wherein the step S3 specifically includes:
and when the mark exists in each convolution kernel, not calculating the element corresponding to the mark in each convolution kernel.
5. An accelerator for application to a convolutional neural network, comprising: the device comprises a neural network computing array module and a dynamic sparse adjustment module;
the dynamic sparse adjustment module is used for calculating the density of each feature map output by each layer in the convolutional neural network, comparing the density of each feature map with a plurality of preset thresholds, and performing sparse coding on each feature map according to a comparison result; wherein, different sparse coding modes corresponding to different comparison results;
the neural network computing array module is used for carrying out convolution operation on each feature map subjected to sparse coding and each convolution kernel in the convolution neural network subjected to sparse coding in advance;
the preset threshold comprises a first preset threshold and a second preset threshold; wherein the first preset threshold is smaller than the second preset threshold;
correspondingly, the dynamic sparsity adjustment module includes a dynamic encoding module, and the dynamic encoding module is specifically configured to:
if the density of each characteristic diagram is smaller than the first preset threshold value, encoding each characteristic diagram into a sparse matrix storage format;
if the density of each feature map is greater than or equal to the first preset threshold and less than the second preset threshold, marking 0 element in each feature map;
if the density of each feature map is greater than or equal to the second preset threshold, not performing sparse coding on each feature map;
the neural network computing array module is specifically configured to:
and when the marks exist in each characteristic diagram, not calculating the elements corresponding to the marks in each characteristic diagram.
6. The accelerator according to claim 5, wherein the dynamic sparsity adjustment module comprises an on-line thickness identification module, an output temporary registration module, a dynamic encoding module, and a dynamic sparsity control module;
the on-line density identification module is used for counting the number of 0 elements in the feature map and the total number of all elements in the feature map for any feature map; taking the ratio of the number of 0 elements in the feature map to the total number of all elements in the feature map as the density of the feature map;
the output temporary register module is used for storing each characteristic graph output by each layer in the convolutional neural network;
the dynamic sparse control module is used for comparing the density of each characteristic diagram output by the on-line density identification module with a plurality of preset thresholds;
and the dynamic coding module is used for carrying out sparse coding on each characteristic diagram in the output temporary registering module according to a comparison result.
7. The accelerator of claim 5, wherein the dynamic encoding module is further to:
if the pre-calculated consistency of each convolution kernel is smaller than the first preset threshold value, encoding each convolution kernel into a sparse matrix storage format;
if the density of each convolution kernel is greater than or equal to the first preset threshold and less than the second preset threshold, marking 0 elements in each convolution kernel;
and if the density of each convolution kernel is greater than or equal to the second preset threshold, not performing sparse coding on each convolution kernel.
8. The accelerator of claim 7, wherein the neural network computational array module is specifically configured to:
and when the mark exists in each convolution kernel, not calculating the element corresponding to the mark in each convolution kernel.
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810306577.3A CN108510063B (en) | 2018-04-08 | 2018-04-08 | Acceleration method and accelerator applied to convolutional neural network |
PCT/CN2018/095365 WO2019196223A1 (en) | 2018-04-08 | 2018-07-12 | Acceleration method and accelerator used for convolutional neural network |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810306577.3A CN108510063B (en) | 2018-04-08 | 2018-04-08 | Acceleration method and accelerator applied to convolutional neural network |
Publications (2)
Publication Number | Publication Date |
---|---|
CN108510063A CN108510063A (en) | 2018-09-07 |
CN108510063B true CN108510063B (en) | 2020-03-20 |
Family
ID=63380995
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201810306577.3A Active CN108510063B (en) | 2018-04-08 | 2018-04-08 | Acceleration method and accelerator applied to convolutional neural network |
Country Status (2)
Country | Link |
---|---|
CN (1) | CN108510063B (en) |
WO (1) | WO2019196223A1 (en) |
Families Citing this family (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109389043B (en) * | 2018-09-10 | 2021-11-23 | 中国人民解放军陆军工程大学 | Crowd density estimation method for aerial picture of unmanned aerial vehicle |
CN109409518B (en) * | 2018-10-11 | 2021-05-04 | 北京旷视科技有限公司 | Neural network model processing method and device and terminal |
CN109784484A (en) * | 2019-01-31 | 2019-05-21 | 深兰科技(上海)有限公司 | Neural network accelerated method, device, neural network accelerate chip and storage medium |
CN110097172B (en) * | 2019-03-18 | 2021-10-29 | 中国科学院计算技术研究所 | Convolutional neural network data processing method and device based on Winograd convolutional operation |
CN109858575B (en) * | 2019-03-19 | 2024-01-05 | 苏州市爱生生物技术有限公司 | Data classification method based on convolutional neural network |
CN110443357B (en) * | 2019-08-07 | 2020-09-15 | 上海燧原智能科技有限公司 | Convolutional neural network calculation optimization method and device, computer equipment and medium |
CN110909801B (en) * | 2019-11-26 | 2020-10-09 | 山东师范大学 | Data classification method, system, medium and device based on convolutional neural network |
CN111291230B (en) * | 2020-02-06 | 2023-09-15 | 北京奇艺世纪科技有限公司 | Feature processing method, device, electronic equipment and computer readable storage medium |
CN111401554B (en) * | 2020-03-12 | 2023-03-24 | 交叉信息核心技术研究院(西安)有限公司 | Accelerator of convolutional neural network supporting multi-granularity sparsity and multi-mode quantization |
CN113537465A (en) * | 2021-07-07 | 2021-10-22 | 深圳市易成自动驾驶技术有限公司 | LSTM model optimization method, accelerator, device and medium |
WO2023164855A1 (en) * | 2022-03-03 | 2023-09-07 | Intel Corporation | Apparatus and method for 3d dynamic sparse convolution |
Family Cites Families (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105184362B (en) * | 2015-08-21 | 2018-02-02 | 中国科学院自动化研究所 | The acceleration of the depth convolutional neural networks quantified based on parameter and compression method |
US10380479B2 (en) * | 2015-10-08 | 2019-08-13 | International Business Machines Corporation | Acceleration of convolutional neural network training using stochastic perforation |
CN107689948B (en) * | 2016-08-22 | 2020-09-01 | 赛灵思公司 | Efficient data access management device applied to neural network hardware acceleration system |
CN107239824A (en) * | 2016-12-05 | 2017-10-10 | 北京深鉴智能科技有限公司 | Apparatus and method for realizing sparse convolution neutral net accelerator |
CN107609641B (en) * | 2017-08-30 | 2020-07-03 | 清华大学 | Sparse neural network architecture and implementation method thereof |
-
2018
- 2018-04-08 CN CN201810306577.3A patent/CN108510063B/en active Active
- 2018-07-12 WO PCT/CN2018/095365 patent/WO2019196223A1/en active Application Filing
Also Published As
Publication number | Publication date |
---|---|
WO2019196223A1 (en) | 2019-10-17 |
CN108510063A (en) | 2018-09-07 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN108510063B (en) | Acceleration method and accelerator applied to convolutional neural network | |
Liu et al. | Bi-real net: Enhancing the performance of 1-bit cnns with improved representational capability and advanced training algorithm | |
CN111368662B (en) | Method, device, storage medium and equipment for editing attribute of face image | |
CN111160523B (en) | Dynamic quantization method, system and medium based on characteristic value region | |
EP1727051A1 (en) | Robust modeling | |
WO2019127362A1 (en) | Neural network model block compression method, training method, computing device and system | |
CN105718943A (en) | Character selection method based on particle swarm optimization algorithm | |
CN109886391B (en) | Neural network compression method based on space forward and backward diagonal convolution | |
Bruske et al. | Dynamic cell structures | |
CN113657421B (en) | Convolutional neural network compression method and device, and image classification method and device | |
CN112257844B (en) | Convolutional neural network accelerator based on mixed precision configuration and implementation method thereof | |
Liu et al. | RB-Net: Training highly accurate and efficient binary neural networks with reshaped point-wise convolution and balanced activation | |
CN112215298A (en) | Model training method, device, equipment and readable storage medium | |
CN111291861A (en) | Input pulse coding method applied to pulse neural network | |
Wei et al. | Automatic group-based structured pruning for deep convolutional networks | |
CN109670582A (en) | A kind of design method of full fixed point neural network | |
CN112288046A (en) | Mixed granularity-based joint sparse method for neural network | |
CN116894189A (en) | Model training method, device, equipment and readable storage medium | |
US20210397962A1 (en) | Effective network compression using simulation-guided iterative pruning | |
Rong et al. | Soft Taylor pruning for accelerating deep convolutional neural networks | |
Shymyrbay et al. | Training-aware low precision quantization in spiking neural networks | |
CN111602145A (en) | Optimization method of convolutional neural network and related product | |
CN113705784A (en) | Neural network weight coding method based on matrix sharing and hardware system | |
CN114298291A (en) | Model quantization processing system and model quantization processing method | |
CN114065920A (en) | Image identification method and system based on channel-level pruning neural network |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |