CN111091183A - Neural network acceleration system and method - Google Patents

Neural network acceleration system and method Download PDF

Info

Publication number
CN111091183A
CN111091183A CN201911304163.8A CN201911304163A CN111091183A CN 111091183 A CN111091183 A CN 111091183A CN 201911304163 A CN201911304163 A CN 201911304163A CN 111091183 A CN111091183 A CN 111091183A
Authority
CN
China
Prior art keywords
output
characteristic diagram
data
module
calculating
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201911304163.8A
Other languages
Chinese (zh)
Other versions
CN111091183B (en
Inventor
李远超
蔡权雄
牛昕宇
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen Corerain Technologies Co Ltd
Original Assignee
Shenzhen Corerain Technologies Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen Corerain Technologies Co Ltd filed Critical Shenzhen Corerain Technologies Co Ltd
Priority to CN201911304163.8A priority Critical patent/CN111091183B/en
Publication of CN111091183A publication Critical patent/CN111091183A/en
Application granted granted Critical
Publication of CN111091183B publication Critical patent/CN111091183B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/06Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons
    • G06N3/063Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons using electronic means
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Evolutionary Computation (AREA)
  • General Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Artificial Intelligence (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Computational Linguistics (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Neurology (AREA)
  • Image Analysis (AREA)
  • Complex Calculations (AREA)

Abstract

The embodiment of the invention discloses a neural network acceleration system and a method, wherein the system comprises: the data processing module is used for converting input data calculated by the convolutional neural network from floating point numbers to fixed point numbers; a feature map splitting module for splitting the input data into a plurality of computation feature maps qd according to a first preset rulej(ii) a A first calculation module for calculating the characteristic graph qd for each according to a second preset rulejCalculating to obtain a plurality of first output characteristic graphs qoi(ii) a An accumulation module for sequentially comparing all the first output characteristic graphs qoiAccumulating to obtain a second output characteristic diagram; and the second calculation module is used for processing the second output characteristic diagram according to a third preset rule to obtain output data. By converting floating point number into fixed point number, the calculation of the convolutional neural network does not affect the accuracy of the calculation while needing less logic resources, the occupied storage resources are greatly reduced, and data transmission is realizedThe output speed is accelerated.

Description

Neural network acceleration system and method
Technical Field
The embodiment of the invention relates to a neural network technology, in particular to a neural network acceleration system and a neural network acceleration method.
Background
Convolutional neural networks have evolved significantly over the past few years and are now the fundamental tools for many intelligent systems. However, in order to improve the accuracy of image classification, image recognition, and the like, the computational complexity of the convolutional neural network and the consumption of storage resources are increasing. Therefore, convolutional neural network acceleration has become a hot topic.
For hardware implementation of convolutional neural networks, a collection of FPGA or ASIC based accelerators have been proposed in recent years. The design of these accelerators optimizes the convolutional neural network from different aspects, such as optimizing the computational resources of the convolutional neural network, optimizing the output of data, optimizing the computational resources and the access latency of off-chip memory.
However, the design of these accelerators usually takes the convolutional neural network algorithm as a small black box, and only the hardware structure is optimized, which easily causes the accuracy of the convolutional neural network calculation after hardware acceleration to decrease.
Disclosure of Invention
In view of this, embodiments of the present invention provide a neural network acceleration system and method to reduce logic resources required for neural network computation and improve data transmission speed.
In a first aspect, an embodiment of the present invention provides a neural network acceleration system, including:
the data processing module is used for converting input data calculated by the convolutional neural network from floating point numbers to fixed point numbers;
a feature map splitting module for splitting the input data into a plurality of computation feature maps qd according to a first preset rulej
A first calculation module for calculating the characteristic graph qd for each according to a second preset rulejCalculating to obtain a plurality of first output characteristic graphs qoi
An accumulation module for sequentially comparing all the first output characteristic graphs qoiAccumulating to obtain a second output characteristic diagram;
and the second calculation module is used for processing the second output characteristic diagram according to a third preset rule to obtain output data.
Further, the first calculation module comprises:
the weight memory is used for storing the weight qw;
a convolution calculation unit for calculating the feature map qd according to the calculationjPerforming convolution calculation with the weight qw to obtain a first part of a first output characteristic diagram
Figure BDA0002322626630000021
A branch addition tree unit for calculating the feature graph qd according to a fourth preset rule pairjCalculating to obtain the second part of the first output characteristic diagram
Figure BDA0002322626630000022
A first output feature map calculation unit for calculating a first part of the first output feature map
Figure BDA0002322626630000023
And a second portion of the first output profile
Figure BDA0002322626630000024
Subtracting to obtain the first output characteristic diagram qoi
Further, the data processing module is further configured to convert the weight qw stored in the weight memory into a fixed point number.
Further, the feature map splitting module is specifically configured to:
splitting input data into a plurality of computation characteristic graphs qd comprising 3-by-3 matrix data structures according to a preset step lengthj
Further, the second calculation module includes:
the offset module is used for adding a preset offset parameter to the second output characteristic diagram to obtain an output offset characteristic diagram;
and the quantization module is used for calculating the output bias characteristic diagram and a preset quantization parameter to obtain output data.
Further, the data processing module comprises:
the first data processing unit is used for converting input data calculated by the convolutional neural network from floating point numbers to signed fixed point numbers;
and the second data processing unit is used for converting the signed fixed point number into an unsigned fixed point number.
In a second aspect, an embodiment of the present invention provides a neural network acceleration method, including:
converting input data calculated by the convolutional neural network from floating point number to fixed point number;
splitting input data into a plurality of computation characteristic graphs qd according to a first preset rulej
Calculating a characteristic graph qd for each according to a second preset rulejCalculating to obtain a plurality of first output characteristic graphs qoi
Sequentially comparing all the first output characteristic graphs qoiOverlapping to obtain a second output characteristic diagram; and processing the second output characteristic diagram according to a third preset rule to obtain output data.
Further, the calculating the feature map qd for each of the calculation feature maps according to a second preset rulejCalculating to obtain a plurality of first output characteristic graphs qoiThe method comprises the following steps:
obtaining the computation characteristic graph qdjThe corresponding weight value qw;
according to the calculated characteristic diagram qdjPerforming convolution calculation with the weight qw to obtain a first part of a first output characteristic diagram
Figure BDA0002322626630000031
According to a fourth preset rule, calculating the feature graph qdjCalculating to obtain the second part of the first output characteristic diagram
Figure BDA0002322626630000032
A first part of the first output characteristic diagram
Figure BDA0002322626630000033
And a second portion of the first output profile
Figure BDA0002322626630000041
Subtracting to obtain the first output characteristic diagram qoi
Further, the processing the second output feature map according to a third preset rule to obtain output data includes:
adding a bias parameter to the second output characteristic diagram to obtain an output bias characteristic diagram;
and calculating the output bias characteristic diagram and the quantization parameter to obtain output data.
Further, the converting the input data calculated by the convolutional neural network from a floating point number to a fixed point number includes:
converting input data calculated by the convolutional neural network from floating point numbers to signed fixed point numbers;
the signed fixed-point number is converted into an unsigned fixed-point number.
The embodiment of the invention is used for converting input data calculated by a convolutional neural network from a floating point number to a fixed point number through a data processing module; a feature map splitting module for splitting the input data into a plurality of computation feature maps qd according to a first preset rulej(ii) a A first calculation module for calculating the characteristic graph qd for each according to a second preset rulejCalculating to obtain a plurality of first output characteristic graphs qoi(ii) a An accumulation module for sequentially comparing all the first output characteristic graphs qoiAccumulating to obtain a second output characteristic diagram; and the second calculation module is used for processing the second output characteristic diagram according to a third preset rule to obtain output data. By converting the floating point number into the fixed point number, the calculation of the convolutional neural network does not affect the calculation accuracy while needing less logic resources, the occupied storage resources are greatly reduced, and the data transmission speed is accelerated.
Drawings
Fig. 1 is a schematic structural diagram of a neural network acceleration system according to an embodiment of the present invention;
FIG. 2 is a diagram illustrating an input feature according to an embodiment of the present invention;
fig. 3 is a schematic diagram of splitting an input feature map into calculation feature maps according to a first embodiment of the present invention;
fig. 4 is a schematic structural diagram of a neural network acceleration system according to a second embodiment of the present invention;
fig. 5 is a schematic structural diagram of a neural network acceleration system according to a third embodiment of the present invention;
fig. 6 is a schematic structural diagram of a neural network acceleration system according to a fourth embodiment of the present invention;
fig. 7 is a schematic flowchart of a neural network acceleration method according to a fifth embodiment of the present invention.
Detailed Description
The present invention will be described in further detail with reference to the accompanying drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of the invention and are not limiting of the invention. It should be further noted that, for the convenience of description, only some of the structures related to the present invention are shown in the drawings, not all of the structures.
Before discussing exemplary embodiments in more detail, it should be noted that some exemplary embodiments are described as processes or methods depicted as flowcharts. Although a flowchart may describe the steps as a sequential process, many of the steps can be performed in parallel, concurrently or simultaneously. In addition, the order of the steps may be rearranged. A process may be terminated when its operations are completed, but may have additional steps not included in the figure. A process may correspond to a method, a function, a procedure, a subroutine, a subprogram, etc.
Furthermore, the terms "first," "second," and the like may be used herein to describe various orientations, actions, steps, elements, or the like, but the orientations, actions, steps, or elements are not limited by these terms. These terms are only used to distinguish one direction, action, step or element from another direction, action, step or element. For example, a first computing module may be referred to as a second computing module, and similarly, a second computing module may be referred to as a first computing module, without departing from the scope of the present application. The first computing module and the second computing module are both computing modules, but are not the same computing module. The terms "first", "second", etc. are not to be construed as indicating or implying relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defined as "first" or "second" may explicitly or implicitly include one or more of that feature. In the description of the present invention, "a plurality" means at least two, e.g., two, three, etc., unless specifically limited otherwise.
Example one
Fig. 1 is a schematic structural diagram of a neural network acceleration system according to an embodiment of the present invention, which is applicable to the calculation of a convolutional neural network. As shown in fig. 1, a neural network acceleration system according to a first embodiment of the present invention includes: the system comprises a data processing module 100, a feature map splitting module 200, a first calculating module 300, an accumulating module 400 and a second calculating module 500.
The data processing module 100 is configured to convert input data calculated by the convolutional neural network from a floating point number to a fixed point number;
the feature map splitting module 200 is configured to split the input data into a plurality of computation feature maps qd according to a first preset rulej
The first calculation module 300 is used for calculating the feature map qd for each according to a second preset rulejCalculating to obtain a plurality of first output characteristic graphs qoi
The accumulation module 400 is configured to sequentially compare all the first output feature maps qoiAccumulating to obtain a second output characteristic diagram;
the second calculating module 500 is configured to process the second output feature map according to a third preset rule to obtain output data.
Specifically, there are two general methods for representing data (real numbers) in machine language: floating point numbers and fixed point numbers. When fixed-point numbers express real numbers, the positions of the decimal points are fixed, and the decimal points are not expressed in the machine but are agreed at fixed positions in advance, and once the decimal points are determinedThe position of the point can not be changed, so that the data range represented by the fixed point number is limited, and the corresponding occupied memory space (bit) is small. Floating point numbers use scientific notation to express real numbers, i.e., a mantissa, a radix, an exponent, and a sign representing positive or negative, e.g., a floating point number representing real number 123.45 is 1.2345x102Where 1.2345 is the mantissa, 10 is the radix, and 2 is the exponent. The floating point number achieves the effect of floating decimal point through the exponent number, so that large-range data can be flexibly expressed, and correspondingly, the occupied memory space is large. The convolutional neural network is a neural network simulating the human brain in order to realize the machine learning technology similar to artificial intelligence, and the data usually adopts the expression form of floating point numbers.
The data processing module 100 converts the input data calculated by the convolutional neural network from floating point numbers to fixed point numbers, and the converted fixed point numbers are unsigned 8bit integers, so that the consumption of hardware logic resources in the calculation of the convolutional neural network is reduced.
The input data of the convolutional neural network is a multilayer three-dimensional matrix data structure composed of a plurality of data, including a rows, b columns and c layers, and in the calculation of the convolutional neural network, the multilayer three-dimensional matrix data structure is generally called as an input feature map. As shown in fig. 2, an input signature comprising 6 rows, 6 columns and 3 layers is shown, which is sized 6 × 3, and has a total of 108 data (for convenience of description, the data in fig. 2 are shown in the form of simple integers).
When performing calculation, the convolutional neural network does not directly perform calculation on all data of the input feature map at the same time, but performs convolution calculation by taking data of d rows and e columns each time, and a two-dimensional matrix data structure including d rows and e columns extracted from the input feature map is called a calculation feature map.
The specific working process of the feature map splitting module 200 is as follows: and when the number-taking frame is rightwards moved to the right boundary of the input feature map, returning to the left boundary of the input feature map again and moving downwards by one row to continue number-taking to form the calculation feature map until the last data of each layer of the input feature map is taken out. The number of columns shifted to the right of the number taking frame is called a step length, and the smaller the step length is, the smaller the number of columns shifted to the right of the number taking frame is, the higher the calculation accuracy of the convolutional neural network is, and the larger the corresponding calculation amount is.
The feature map splitting module 200 splits the input data into a plurality of computation feature maps qd according to a first preset rulejAlternatively, the feature map splitting module 200 may split the input feature map into a plurality of computation feature maps qd including a 3 × 3 matrix data structure according to a preset step sizej. For example, the input feature map size is 6 × 3, the calculated feature map size is 3 × 3, and the preset step size is 1, the feature map splitting module 200 may split each layer of the input feature map into 16 calculated feature maps with the size of 3 × 3, and the input feature map may be split into 48 calculated feature maps qd1~qd48Can also be described as
Figure BDA0002322626630000081
Where c represents the number of layers of the input feature map where the feature map is calculated, and obviously, the value of c is 1, 2, and 3. Referring to fig. 3, the feature map splitting module 200 is shown splitting the first layer of the input feature map into 16 computed feature maps of size 3 x 3
Figure BDA0002322626630000082
When the first calculation module 300 performs calculation, all the calculation feature maps included in the first layer of the input feature map are calculated
Figure BDA0002322626630000083
Calculating to obtain a first output characteristic diagram qo1Then, for all the calculated feature maps included in the second layer of the input feature map
Figure BDA0002322626630000084
Calculating to obtain a second first output characteristicGraph qo2Then, for all the calculated feature maps included in the third layer of the input feature map
Figure BDA0002322626630000085
Calculating to obtain a third first output characteristic diagram qo3Therefore, it can be seen that a first output feature map can be obtained after one layer of the input feature map is calculated by the first calculating module 300, and the number of the first output feature maps is equal to the number of layers of the input feature map.
The accumulation module 400 sequentially outputs all the first output characteristic maps qo to the first calculation module 300iAnd accumulating to obtain a second output characteristic diagram. The accumulation module 400 adopts a First Input First Output (FIFO) buffer structure, and First Output characteristic diagram qo is buffered by using the FIFO buffer1When the first computing module 300 outputs the second first output characteristic map qo2When the data is stored, the accumulation module 400 first outputs the first output characteristic graph qo2And a second first output profile qo1Add, then q o1+qo2The result of (2) is buffered in FIFO; when the first computing module 300 outputs the third first output characteristic map qo3When the data is stored, the accumulation module 400 calculates qo first1+qo2+qo3As a result of (2), qo is further reduced1+qo2+qo3The result of (2) is buffered in a FIFO.
The second calculation module 500 performs offset and quantization processing on the data in the second output characteristic diagram, so as to obtain the final output data of the neural network acceleration system, wherein the output data is still an unsigned 8-bit fixed point number.
The neural network acceleration system provided by the first embodiment of the invention is used for converting input data calculated by a convolutional neural network from a floating point number to a fixed point number through a data processing module; a feature map splitting module for splitting the input data into a plurality of computation feature maps qd according to a first preset rulej(ii) a A first calculation module for calculating the characteristic graph qd for each according to a second preset rulejCalculating to obtain a plurality of first output characteristic graphs qoi(ii) a An accumulation module for sequentially comparing allFirst output characteristic diagram qoiAccumulating to obtain a second output characteristic diagram; and the second calculation module is used for processing the second output characteristic diagram according to a third preset rule to obtain output data. By converting the floating point number into the fixed point number, the calculation of the convolutional neural network does not affect the calculation accuracy while needing less logic resources, the occupied storage resources are greatly reduced, and the data transmission speed is accelerated.
Example two
Fig. 4 is a schematic structural diagram of a neural network acceleration system according to a second embodiment of the present invention, which is a further refinement of the first calculation module in the foregoing embodiment. As shown in fig. 4, a neural network acceleration system according to a second embodiment of the present invention includes: the data processing system comprises a data processing module 100, a feature map splitting module 200, a first calculating module 300, an accumulating module 400 and a second calculating module 500, wherein the first calculating module 300 comprises: a weight memory 310, a convolution calculation unit 320, a branch addition tree unit 330 and a first output feature map calculation unit 340.
The weight memory 310 is used for storing the weight qw;
the convolution calculating unit 320 is used for calculating the feature map qd according to the calculationjPerforming convolution calculation with the weight qw to obtain a first part of a first output characteristic diagram
Figure BDA0002322626630000101
The branch addition tree unit 330 is configured to calculate the feature map qd according to a fourth preset rule pairjCalculating to obtain the second part of the first output characteristic diagram
Figure BDA0002322626630000102
The first output feature map calculation unit 340 is used for calculating a first part of the first output feature map
Figure BDA0002322626630000103
And a second portion of the first output profile
Figure BDA0002322626630000104
Subtracting to obtain the first output characteristic diagram qoi
Specifically, the convolution calculation of the convolutional neural network is actually a multiply-add operation between input data and weight data, the input data is represented by a plurality of calculation feature maps divided by an input feature map, correspondingly, the weight data is also represented by the weight data during calculation, the weight is a three-dimensional matrix data structure composed of d rows, e columns and c layers, the weight is recorded as qw, and then one layer of the weight (a two-dimensional matrix data structure including d rows, e columns) is recorded as qwc(where, c represents the number of layers of the two-dimensional matrix data structure including d rows and e columns) when performing calculation, one layer of data of the input calculation graph and the data of the layer corresponding to the weight are calculated to obtain one layer of output data (i.e. the first part of a first output characteristic graph).
Before calculation, the weight qw stored in the weight memory 310 is still processed by the data processing module 100, and the weight qw is converted from a floating point number to an unsigned 8-bit integer by the data processing module 100.
The size of the weight value qw determines the calculation characteristic graph qdjI.e. the feature map splitting module 200 splits the input data into a plurality of calculated feature maps qd according to the size of the weight qw and the preset step lengthj
The convolution calculation unit 320 calculates the feature maps qd for a plurality of featuresjAnd the weight value qw is subjected to multiplication and addition calculation to obtain a first part of a first output characteristic diagram
Figure BDA0002322626630000111
As shown in fig. 3, taking the first layer in the input feature map of 6 × 3 as an example, the weight qw is 3 × 3, the preset step size is 1, and the feature map splitting module 200 splits the first layer of the input feature map into 16 calculated feature maps of 3 × 3 size
Figure BDA0002322626630000112
The first layer qw of each computation feature graph and weight qw1Performing multiply-add operation to obtain a convolution output data, volumeProduct computation unit 320 pairs computed feature maps
Figure BDA0002322626630000113
And a first level qw of weights qw1Performing multiply-add calculation to obtain a convolution output feature map with size of 4 × 4, and referring the convolution output feature map as the first part of the first output feature map
Figure BDA0002322626630000114
Branch adder tree unit 330 accumulates all data in a computed profile and then multiplies it by quantization parameter ZwIf a branch output data is obtained, then the characteristic diagram is calculated
Figure BDA0002322626630000115
All the calculations are carried out to obtain a 4 x 4 size output profile of the branch, which is referred to as the second part of the first output profile
Figure BDA0002322626630000116
The first output feature map calculation unit 340 calculates a first portion of the first output feature map
Figure BDA0002322626630000117
And a second portion of the first output profile
Figure BDA0002322626630000118
Subtracting to obtain a first output characteristic diagram qo1
Similarly, the convolution computation unit 320 computes a feature map for the second layer split of the input feature map
Figure BDA0002322626630000119
And a second level qw of weights qw2Performing multiply-add calculation to obtain a first part of a second first output characteristic diagram
Figure BDA00023226266300001110
Branch addition tree unit 330 pairs computation feature maps
Figure BDA00023226266300001111
Are all calculated to obtain a second part of a second first output characteristic diagram
Figure BDA00023226266300001112
The first output profile calculation unit 340 calculates a first portion of the second first output profile
Figure BDA00023226266300001113
And a second portion of a second first output profile
Figure BDA00023226266300001114
Subtracting to obtain a second first output characteristic diagram qo2. Convolution computation unit 320 computes feature map for the third split level of the input feature map
Figure BDA00023226266300001115
And a third level qw of weights qw3Performing multiply-add calculation to obtain the first part of the third first output characteristic diagram
Figure BDA00023226266300001116
Branch addition tree unit 330 pairs computation feature maps
Figure BDA00023226266300001117
Are all calculated to obtain a second part of a third first output characteristic diagram
Figure BDA00023226266300001118
The first output feature map calculation unit 340 calculates a first part of the third first output feature map
Figure BDA0002322626630000121
And a second portion of the third first output profile
Figure BDA0002322626630000122
Subtracting to obtain a third first output characteristic diagram qo3
The output data of the first output profile calculation unit 340 (i.e., the first output profile qo)iThe data in (1) can be represented by the formula (2-1).
Figure BDA0002322626630000123
Wherein the content of the first and second substances,
Figure BDA0002322626630000124
representing the jth data in the c-th first output signature. ZwIs a preset parameter, N is the jth calculated characteristic diagram qdjThe total number of data of (a) is,
Figure BDA0002322626630000125
j-th computation feature graph qd representing the c-th split of the input feature graphjThe number k of the first data of (2),
Figure BDA0002322626630000126
and representing the kth data of the c layer in the weight value. In the present application, the magnitude of the calculated feature map or weight is 3 × 3, so N is 9. Since one of the calculated feature maps in the input feature map of the first layer is calculated by the convolution calculating unit 320 and the branch addition tree unit 330 to obtain one data in the first output feature map, the output data qo of the first output feature map calculating unit 340jQuantity of and computation of feature maps qdjThe number of (2) is the same.
The accumulation module 400 sequentially outputs the first output characteristic map qo to the first calculation module 3001~qo3And accumulating to obtain a second output characteristic diagram.
The output data of the accumulation module 400 (i.e., the data in the second output profile) may be represented by equation (2-2).
Figure BDA0002322626630000127
Wherein qejIn the graph representing the second output characteristicThe (j) th data of (2),
Figure BDA0002322626630000128
the j-th data in the C-th first output characteristic diagram is shown, C represents the total number of the first output characteristic diagrams, and C is 3 in the embodiment.
The second calculation module 500 performs bias and quantization processing on the data in the second output characteristic diagram, so as to obtain final output data of the neural network acceleration system.
The neural network acceleration system provided by the second embodiment of the invention divides the calculation of the first calculation module into two parts through the convolution calculation unit and the branch addition tree unit, and finally adds the output result of the convolution calculation unit and the output result of the branch addition tree unit through the first output characteristic diagram calculation unit to obtain the final output result of the first calculation unit, so that the first calculation module is divided into the combination of two simple multiplication and addition calculation modules, the calculation process is simplified, and the calculation speed is accelerated.
EXAMPLE III
Fig. 5 is a schematic structural diagram of a neural network acceleration system according to a third embodiment of the present invention, which is a further refinement of the second calculation module in the foregoing embodiment. As shown in fig. 5, a neural network acceleration system according to a second embodiment of the present invention includes: the data processing system comprises a data processing module 100, a feature map splitting module 200, a first calculating module 300, an accumulating module 400 and a second calculating module 500, wherein the first calculating module 300 comprises: a weight memory 310, a convolution calculating unit 320, a branch addition tree unit 330 and a first output characteristic diagram calculating unit 340; the second calculation module 500 includes: a bias module 510 and a quantization module 520.
The offset module 510 is configured to add a preset offset parameter to the second output characteristic map to obtain an output offset characteristic map.
Specifically, the output data of the bias module 510 (i.e., the data in the output bias profile) may be represented by equation (3-1).
Figure BDA0002322626630000131
Wherein, qbjRepresenting the jth data, qe, in the output bias profilejRepresenting the jth data, q, in the second output characteristic diagrambiasIs a preset bias parameter.
The quantization module 520 is configured to calculate the output bias characteristic map and a preset quantization parameter to obtain output data. The final output data is still an unsigned 8bit integer.
Specifically, the output data of the quantization module 520 (i.e., the final output data) may be represented by equation (3-2).
Figure BDA0002322626630000141
Wherein Q isjThe j-th output data may be regarded as output data obtained by quantizing the j-th data in the output bias characteristic diagram, ZoIs a first predetermined quantization parameter, and M is a second predetermined quantization parameter.
Optionally, the convolutional neural network generally includes a plurality of convolutional calculation units, that is, the first calculation module 300 generally includes a plurality of convolutional calculation units 320, and weights corresponding to each of the convolutional calculation units 320 are not necessarily the same, so when the convolutional neural network includes a plurality of convolutional calculation units 320, the accumulation module 400 outputs a plurality of second output feature maps, and preset offset parameters corresponding to each of the second output feature maps are different, so that the preset offset parameters of the offset module 510 should be set according to the second output feature maps.
According to the embodiment of the invention, the subsequent calculation of the second output characteristic diagram is completed through the bias module and the quantization module, the affine quantization is applied to the neural network acceleration system, and the influence of the hardware structure of the neural network acceleration system on the calculation accuracy after optimization is reduced.
Example four
Fig. 6 is a schematic structural diagram of a neural network acceleration system according to a fourth embodiment of the present invention, which is a further refinement of the data processing module in the foregoing embodiment. As shown in fig. 6, a neural network acceleration system according to a fourth embodiment of the present invention includes: the data processing system comprises a data processing module 100, a feature map splitting module 200, a first calculating module 300, an accumulating module 400 and a second calculating module 500, wherein the first calculating module 300 comprises: a weight memory 310, a convolution calculating unit 320, a branch addition tree unit 330 and a first output characteristic diagram calculating unit 340; the second calculation module 500 includes: a bias module 510 and a quantization module 520; the data processing module 100 includes: a first data processing unit 110 and a second data processing unit 120.
The first data processing unit 110 is configured to convert input data calculated by the convolutional neural network from floating point numbers to signed fixed point numbers.
Specifically, the first data processing unit 110 converts the floating-point number into a signed fixed-point number according to equation (4-1),
Figure BDA0002322626630000151
wherein round (x) represents rounding the value of data x, r is the data of the convolution neural network when the input data is a floating point number, and q is the floating point numberintFor signed fixed-point numbers after conversion of floating-point numbers, Z being qintZero point data of, i.e. qintWhen the content is equal to 0, the content,
Figure BDA0002322626630000152
s is a conversion parameter, and S is calculated from the formula (4-2)
Figure BDA0002322626630000153
Wherein n is the conversion accuracy and represents the converted qintWhere n is 8.
The second data processing unit 120 is adapted to convert signed fixed point numbers into unsigned fixed point numbers.
Specifically, the second data processing unit 120 converts the signed fixed point number into the unsigned fixed point number according to equation (4-3).
q=clamp(0,2n-1,qint) (4-3)
Wherein, the use method of the clamp function is shown as the formula (4-4).
Figure BDA0002322626630000154
In the neural network acceleration system provided by the fourth embodiment of the present invention, the first data processing unit converts the input data of the convolutional neural network from floating point numbers to signed fixed point numbers, and the second data processing unit converts the signed fixed point numbers to unsigned fixed point numbers, so that the response of the storage space occupied by the data calculated by the neural network acceleration system is reduced, and the consumption of hardware logic resources of the neural network acceleration system is reduced.
EXAMPLE five
Fig. 7 is a schematic flow chart of a neural network acceleration method provided in the fifth embodiment of the present invention, which is applicable to calculation of a convolutional neural network, and the method can be implemented by a neural network acceleration system provided in any embodiment of the present invention, and reference may be made to the description in any system embodiment of the present invention for content that is not described in detail in the fifth embodiment of the present invention.
As shown in fig. 7, a neural network acceleration method according to a fifth embodiment of the present invention includes:
and S710, converting the input data calculated by the convolutional neural network from a floating point number to a fixed point number.
Specifically, there are two general methods for representing data (real numbers) in machine language: floating point numbers and fixed point numbers. When the fixed point number expresses the real number, the position of the decimal point is fixed, the decimal point is not expressed in a machine, but is appointed at the fixed position in advance, and once the position of the decimal point is determined, the position cannot be changed, so that the data range expressed by the fixed point number is limited, and the corresponding occupied memory space (bit) is small. Floating point numbers use scientific notation to express real numbers, i.e., a mantissa, a radix, an exponent, and a sign representing positive or negative, e.g., a floating point number representing real number 123.45 is 1.2345x102Where 1.2345 is the mantissa, 10 is the radix, and 2 is the exponent. The floating-point number achieves the effect of floating decimal point through the exponent number, thereby flexibly expressing large-range data and phasesThe occupied memory space is large. The convolutional neural network is a convolutional neural network simulating the human brain so as to realize the machine learning technology similar to artificial intelligence, and the data of the convolutional neural network usually adopts a floating point number expression form.
The input data calculated by the convolutional neural network is converted into fixed point numbers from floating point numbers, and the converted fixed point numbers are unsigned 8bit integers, so that the consumption of hardware logic resources in the calculation of the convolutional neural network is reduced.
Further, step S710 includes S711 to S711, specifically:
s711, converting input data calculated by the convolutional neural network from floating point numbers to signed fixed point numbers;
and S712, converting the signed fixed point number into an unsigned fixed point number.
S720, splitting the input data into a plurality of computation characteristic graphs qd according to a first preset rulej
Specifically, the input data is represented in the form of an input feature map, and the input data is split into a plurality of computation feature maps qd according to a first preset rulejThe method comprises the following steps: splitting the input feature map into a plurality of computation feature maps qd comprising a 3 x 3 matrix data structure according to a preset step sizej. The input signature is a three-dimensional matrix data structure including a rows, b columns and c layers, as shown in fig. 2, and is an input signature with a size of 6 × 3. The computed feature map is a two-dimensional matrix data structure comprising d rows and e columns, and each layer of the input feature map can be split into a plurality of computed feature maps of 3 × 3 size. And when the number-taking frame is rightwards moved to the right boundary of the input feature map, returning the sitting boundary of the input feature map again and moving downwards by one row to continue number-taking to form the calculation feature map until the last data of each layer of the input feature map is taken out. The number of columns of the right shift of the number taking frame is called step length, and the smaller the step length is, the smaller the number of columns of the right shift of the number taking frame is, and the calculation precision of the convolutional neural network isThe higher the corresponding calculation amount is.
For example, the input feature map size is 6 × 3, the calculated feature map size is 3 × 3, and the preset step size is 1, the feature map splitting module 200 may split each layer of the input feature map into 16 calculated feature maps with the size of 3 × 3, and the input feature map may be split into 48 calculated feature maps qd1~qd48Can also be described as
Figure BDA0002322626630000171
Where c represents the number of layers of the input feature map where the feature map is calculated, and obviously, the value of c is 1, 2, and 3. FIG. 3 illustrates the splitting of the first layer of input signatures into 16 computed signatures of size 3 x 3
Figure BDA0002322626630000172
S730, calculating each feature graph qd according to a second preset rulejCalculating to obtain a plurality of first output characteristic graphs qoi
Specifically, first, all the calculated feature maps included in the first layer of the input feature map are calculated
Figure BDA0002322626630000181
Calculating to obtain a first layer (or a first) first output characteristic diagram qo1Then, for all the calculated feature maps included in the second layer of the input feature map
Figure BDA0002322626630000182
Calculating to obtain a first output characteristic diagram qo of a second layer (or a second layer)2Then, for all the calculated feature maps included in the third layer of the input feature map
Figure BDA0002322626630000183
Calculating to obtain a third layer (or a third layer) first output characteristic diagram qo3Therefore, it can be seen that a first output feature map can be obtained after one layer of the input feature map is calculated, the number of the first output feature maps and the layer of the input feature mapThe numbers are equal.
Further, the step S730 includes S731 to S734, specifically:
s731, obtaining the computation characteristic graph qdjThe corresponding weight value qw;
s732, calculating the feature map qdjPerforming convolution calculation with the weight qw to obtain a first part of a first output characteristic diagram
Figure BDA0002322626630000184
S733, calculating the feature graph qd according to a fourth preset rulejCalculating to obtain the second part of the first output characteristic diagram
Figure BDA0002322626630000185
S734, outputting the first part of the first output characteristic diagram
Figure BDA0002322626630000186
And a second portion of the first output profile
Figure BDA0002322626630000187
Subtracting to obtain the first output characteristic diagram qoi
S740, sequentially comparing all the first output characteristic graphs qoiAnd overlapping to obtain a second output characteristic diagram.
Specifically, the first output characteristic graph qo is buffered by using FIFO firstly1When a second first output characteristic diagram qo is obtained2When the data is stored, the first output characteristic graph qo is firstly obtained2And a second first output profile qo1Add, then q o1+qo2The result of (2) is buffered in FIFO; when a third first output characteristic diagram qo is obtained3When data of (2) is being calculated qo1+qo2+qo3As a result of (2), qo is further reduced1+qo2+qo3The result of (2) is buffered in a FIFO.
And S750, processing the second output characteristic diagram according to a third preset rule to obtain output data.
Specifically, the data in the second output characteristic diagram is subjected to offset and quantization processing, so that final output data of the neural network acceleration system is obtained, and the output data is still an unsigned 8-bit fixed point number.
Further, the step S750 includes S751 to S752, specifically:
s751, adding a bias parameter to the second output characteristic diagram to obtain an output bias characteristic diagram;
and S752, calculating the output bias characteristic diagram and the quantization parameter to obtain output data.
In the neural network acceleration method provided by the fifth embodiment of the invention, the input data calculated by the convolutional neural network is converted into fixed point numbers from floating point numbers; splitting input data into a plurality of computation characteristic graphs qd according to a first preset rulej(ii) a Calculating a characteristic graph qd for each according to a second preset rulejCalculating to obtain a plurality of first output characteristic graphs qoi(ii) a Sequentially comparing all the first output characteristic graphs qoiOverlapping to obtain a second output characteristic diagram; and processing the second output characteristic diagram according to a third preset rule to obtain output data. By converting the floating point number into the fixed point number, the calculation of the convolutional neural network does not affect the calculation accuracy while needing less logic resources, the occupied storage resources are greatly reduced, and the data transmission speed is accelerated.
It is to be noted that the foregoing is only illustrative of the preferred embodiments of the present invention and the technical principles employed. It will be understood by those skilled in the art that the present invention is not limited to the particular embodiments described herein, but is capable of various obvious changes, rearrangements and substitutions as will now become apparent to those skilled in the art without departing from the scope of the invention. Therefore, although the present invention has been described in greater detail by the above embodiments, the present invention is not limited to the above embodiments, and may include other equivalent embodiments without departing from the spirit of the present invention, and the scope of the present invention is determined by the scope of the appended claims.

Claims (10)

1. A neural network acceleration system, comprising:
the data processing module is used for converting input data calculated by the convolutional neural network from floating point numbers to fixed point numbers;
a feature map splitting module for splitting the input data into a plurality of computation feature maps qd according to a first preset rulej
A first calculation module for calculating the characteristic graph qd for each according to a second preset ruleiCalculating to obtain a plurality of first output characteristic graphs qoi
An accumulation module for sequentially comparing all the first output characteristic graphs qoiAccumulating to obtain a second output characteristic diagram;
and the second calculation module is used for processing the second output characteristic diagram according to a third preset rule to obtain output data.
2. The system of claim 1, wherein the first computing module comprises:
the weight memory is used for storing the weight qw;
a convolution calculation unit for calculating the feature map qd according to the calculationjPerforming convolution calculation with the weight qw to obtain a first part of a first output characteristic diagram
Figure FDA0002322626620000011
A branch addition tree unit for calculating the feature graph qd according to a fourth preset rule pairjCalculating to obtain the second part of the first output characteristic diagram
Figure FDA0002322626620000012
A first output feature map calculation unit for calculating a first part of the first output feature map
Figure FDA0002322626620000013
And a second portion of the first output profile
Figure FDA0002322626620000014
Subtracting to obtain the first output characteristic diagram qoi
3. The system of claim 2, wherein the data processing module is further configured to convert the weights qw stored in the weight memory into fixed-point numbers.
4. The system of claim 1, wherein the feature map splitting module is specifically configured to:
splitting input data into a plurality of computation characteristic graphs qd comprising 3-by-3 matrix data structures according to a preset step lengthj
5. The system of claim 1, wherein the second computing module comprises:
the offset module is used for adding a preset offset parameter to the second output characteristic diagram to obtain an output offset characteristic diagram;
and the quantization module is used for calculating the output bias characteristic diagram and a preset quantization parameter to obtain output data.
6. The system of claim 1, wherein the data processing module comprises:
the first data processing unit is used for converting input data calculated by the convolutional neural network from floating point numbers to signed fixed point numbers;
and the second data processing unit is used for converting the signed fixed point number into an unsigned fixed point number.
7. A neural network acceleration method, comprising:
converting input data calculated by the convolutional neural network from floating point number to fixed point number;
splitting input data into a plurality of computation characteristic graphs qd according to a first preset rulej
Calculating a characteristic graph qd for each according to a second preset rulejCalculating to obtain a plurality of first output characteristic graphs qoi
Sequentially comparing all the first output characteristic graphs qoiOverlapping to obtain a second output characteristic diagram; and processing the second output characteristic diagram according to a third preset rule to obtain output data.
8. The method according to claim 7, wherein said computing a feature map qd for each of said computed feature maps according to a second preset rulejCalculating to obtain a plurality of first output characteristic graphs qoiThe method comprises the following steps:
obtaining the computation characteristic graph qdjThe corresponding weight value qw;
according to the calculated characteristic diagram qdjPerforming convolution calculation with the weight qw to obtain a first part of a first output characteristic diagram
Figure FDA0002322626620000021
According to a fourth preset rule, calculating the feature graph qdjCalculating to obtain the second part of the first output characteristic diagram
Figure FDA0002322626620000031
A first part of the first output characteristic diagram
Figure FDA0002322626620000032
And a second portion of the first output profile
Figure FDA0002322626620000033
Subtracting to obtain the first output characteristic diagram qoi
9. The method of claim 7, wherein the processing the second output feature map according to a third predetermined rule to obtain output data comprises:
adding a bias parameter to the second output characteristic diagram to obtain an output bias characteristic diagram;
and calculating the output bias characteristic diagram and the quantization parameter to obtain output data.
10. The method of claim 7, wherein converting the input data computed by the convolutional neural network from a floating point number to a fixed point number comprises:
converting input data calculated by the convolutional neural network from floating point numbers to signed fixed point numbers;
the signed fixed-point number is converted into an unsigned fixed-point number.
CN201911304163.8A 2019-12-17 2019-12-17 Neural network acceleration system and method Active CN111091183B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911304163.8A CN111091183B (en) 2019-12-17 2019-12-17 Neural network acceleration system and method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911304163.8A CN111091183B (en) 2019-12-17 2019-12-17 Neural network acceleration system and method

Publications (2)

Publication Number Publication Date
CN111091183A true CN111091183A (en) 2020-05-01
CN111091183B CN111091183B (en) 2023-06-13

Family

ID=70395071

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911304163.8A Active CN111091183B (en) 2019-12-17 2019-12-17 Neural network acceleration system and method

Country Status (1)

Country Link
CN (1) CN111091183B (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111737193A (en) * 2020-08-03 2020-10-02 深圳鲲云信息科技有限公司 Data storage method, device, equipment and storage medium
CN112232499A (en) * 2020-10-13 2021-01-15 华中光电技术研究所(中国船舶重工集团公司第七一七研究所) Convolutional neural network accelerator
CN115994561A (en) * 2023-03-22 2023-04-21 山东云海国创云计算装备产业创新中心有限公司 Convolutional neural network acceleration method, system, storage medium, device and equipment

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180247180A1 (en) * 2015-08-21 2018-08-30 Institute Of Automation, Chinese Academy Of Sciences Deep convolutional neural network acceleration and compression method based on parameter quantification
CN109063825A (en) * 2018-08-01 2018-12-21 清华大学 Convolutional neural networks accelerator

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180247180A1 (en) * 2015-08-21 2018-08-30 Institute Of Automation, Chinese Academy Of Sciences Deep convolutional neural network acceleration and compression method based on parameter quantification
CN109063825A (en) * 2018-08-01 2018-12-21 清华大学 Convolutional neural networks accelerator

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111737193A (en) * 2020-08-03 2020-10-02 深圳鲲云信息科技有限公司 Data storage method, device, equipment and storage medium
CN111737193B (en) * 2020-08-03 2020-12-08 深圳鲲云信息科技有限公司 Data storage method, device, equipment and storage medium
CN112232499A (en) * 2020-10-13 2021-01-15 华中光电技术研究所(中国船舶重工集团公司第七一七研究所) Convolutional neural network accelerator
CN115994561A (en) * 2023-03-22 2023-04-21 山东云海国创云计算装备产业创新中心有限公司 Convolutional neural network acceleration method, system, storage medium, device and equipment

Also Published As

Publication number Publication date
CN111091183B (en) 2023-06-13

Similar Documents

Publication Publication Date Title
CN111684473B (en) Improving performance of neural network arrays
CN112740171B (en) Multiplication and accumulation circuit
JP7349835B2 (en) Method and apparatus for processing parameters in neural networks
CN107239829B (en) Method for optimizing artificial neural network
CN110852416B (en) CNN hardware acceleration computing method and system based on low-precision floating point data representation form
CN107688855B (en) Hierarchical quantization method and device for complex neural network
CN111091183A (en) Neural network acceleration system and method
US11775611B2 (en) Piecewise quantization for neural networks
CN110852434B (en) CNN quantization method, forward calculation method and hardware device based on low-precision floating point number
US20220004884A1 (en) Convolutional Neural Network Computing Acceleration Method and Apparatus, Device, and Medium
CN112088354B (en) Block floating point computation using shared exponents
CN112673383A (en) Data representation of dynamic precision in neural network cores
WO2020098368A1 (en) Adaptive quantization method and apparatus, device and medium
KR102655950B1 (en) High speed processing method of neural network and apparatus using thereof
US20220036189A1 (en) Methods, systems, and media for random semi-structured row-wise pruning in neural networks
CN111696149A (en) Quantization method for stereo matching algorithm based on CNN
CN112734020B (en) Convolution multiplication accumulation hardware acceleration device, system and method of convolution neural network
CN114626516A (en) Neural network acceleration system based on floating point quantization of logarithmic block
CN113052299B (en) Neural network memory computing device based on lower communication bound and acceleration method
KR102340412B1 (en) Log-quantized mac for stochastic computing and accelerator comprising the same
CN112561050A (en) Neural network model training method and device
Wong et al. Low bitwidth CNN accelerator on FPGA using Winograd and block floating point arithmetic
CN111492369A (en) Residual quantization of shift weights in artificial neural networks
CN113487012B (en) FPGA-oriented deep convolutional neural network accelerator and design method
CN112836793B (en) Floating point separable convolution calculation accelerating device, system and image processing method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant