CN111091183A

CN111091183A - Neural network acceleration system and method

Info

Publication number: CN111091183A
Application number: CN201911304163.8A
Authority: CN
Inventors: 李远超; 蔡权雄; 牛昕宇
Original assignee: Shenzhen Corerain Technologies Co Ltd
Current assignee: Shenzhen Corerain Technologies Co Ltd
Priority date: 2019-12-17
Filing date: 2019-12-17
Publication date: 2020-05-01
Anticipated expiration: 2039-12-17
Also published as: CN111091183B

Abstract

The embodiment of the invention discloses a neural network acceleration system and a method, wherein the system comprises: the data processing module is used for converting input data calculated by the convolutional neural network from floating point numbers to fixed point numbers; a feature map splitting module for splitting the input data into a plurality of computation feature maps qd according to a first preset rule_j(ii) a A first calculation module for calculating the characteristic graph qd for each according to a second preset rule_jCalculating to obtain a plurality of first output characteristic graphs qo_i(ii) a An accumulation module for sequentially comparing all the first output characteristic graphs qo_iAccumulating to obtain a second output characteristic diagram; and the second calculation module is used for processing the second output characteristic diagram according to a third preset rule to obtain output data. By converting floating point number into fixed point number, the calculation of the convolutional neural network does not affect the accuracy of the calculation while needing less logic resources, the occupied storage resources are greatly reduced, and data transmission is realizedThe output speed is accelerated.

Description

Neural network acceleration system and method

Technical Field

The embodiment of the invention relates to a neural network technology, in particular to a neural network acceleration system and a neural network acceleration method.

Background

Convolutional neural networks have evolved significantly over the past few years and are now the fundamental tools for many intelligent systems. However, in order to improve the accuracy of image classification, image recognition, and the like, the computational complexity of the convolutional neural network and the consumption of storage resources are increasing. Therefore, convolutional neural network acceleration has become a hot topic.

For hardware implementation of convolutional neural networks, a collection of FPGA or ASIC based accelerators have been proposed in recent years. The design of these accelerators optimizes the convolutional neural network from different aspects, such as optimizing the computational resources of the convolutional neural network, optimizing the output of data, optimizing the computational resources and the access latency of off-chip memory.

However, the design of these accelerators usually takes the convolutional neural network algorithm as a small black box, and only the hardware structure is optimized, which easily causes the accuracy of the convolutional neural network calculation after hardware acceleration to decrease.

Disclosure of Invention

In view of this, embodiments of the present invention provide a neural network acceleration system and method to reduce logic resources required for neural network computation and improve data transmission speed.

In a first aspect, an embodiment of the present invention provides a neural network acceleration system, including:

the data processing module is used for converting input data calculated by the convolutional neural network from floating point numbers to fixed point numbers;

a feature map splitting module for splitting the input data into a plurality of computation feature maps qd according to a first preset rule_j；

A first calculation module for calculating the characteristic graph qd for each according to a second preset rule_jCalculating to obtain a plurality of first output characteristic graphs qo_i；

An accumulation module for sequentially comparing all the first output characteristic graphs qo_iAccumulating to obtain a second output characteristic diagram;

and the second calculation module is used for processing the second output characteristic diagram according to a third preset rule to obtain output data.

Further, the first calculation module comprises:

the weight memory is used for storing the weight qw;

a convolution calculation unit for calculating the feature map qd according to the calculation_jPerforming convolution calculation with the weight qw to obtain a first part of a first output characteristic diagram

A branch addition tree unit for calculating the feature graph qd according to a fourth preset rule pair_jCalculating to obtain the second part of the first output characteristic diagram

A first output feature map calculation unit for calculating a first part of the first output feature map

And a second portion of the first output profile

Subtracting to obtain the first output characteristic diagram qo_i。

Further, the data processing module is further configured to convert the weight qw stored in the weight memory into a fixed point number.

Further, the feature map splitting module is specifically configured to:

splitting input data into a plurality of computation characteristic graphs qd comprising 3-by-3 matrix data structures according to a preset step length_j。

Further, the second calculation module includes:

the offset module is used for adding a preset offset parameter to the second output characteristic diagram to obtain an output offset characteristic diagram;

and the quantization module is used for calculating the output bias characteristic diagram and a preset quantization parameter to obtain output data.

Further, the data processing module comprises:

the first data processing unit is used for converting input data calculated by the convolutional neural network from floating point numbers to signed fixed point numbers;

and the second data processing unit is used for converting the signed fixed point number into an unsigned fixed point number.

In a second aspect, an embodiment of the present invention provides a neural network acceleration method, including:

converting input data calculated by the convolutional neural network from floating point number to fixed point number;

splitting input data into a plurality of computation characteristic graphs qd according to a first preset rule_j；

Calculating a characteristic graph qd for each according to a second preset rule_jCalculating to obtain a plurality of first output characteristic graphs qo_i；

Sequentially comparing all the first output characteristic graphs qo_iOverlapping to obtain a second output characteristic diagram; and processing the second output characteristic diagram according to a third preset rule to obtain output data.

Further, the calculating the feature map qd for each of the calculation feature maps according to a second preset rule_jCalculating to obtain a plurality of first output characteristic graphs qo_iThe method comprises the following steps:

obtaining the computation characteristic graph qd_jThe corresponding weight value qw;

according to the calculated characteristic diagram qd_jPerforming convolution calculation with the weight qw to obtain a first part of a first output characteristic diagram

According to a fourth preset rule, calculating the feature graph qd_jCalculating to obtain the second part of the first output characteristic diagram

A first part of the first output characteristic diagram

And a second portion of the first output profile

Subtracting to obtain the first output characteristic diagram qo_i。

Further, the processing the second output feature map according to a third preset rule to obtain output data includes:

adding a bias parameter to the second output characteristic diagram to obtain an output bias characteristic diagram;

and calculating the output bias characteristic diagram and the quantization parameter to obtain output data.

Further, the converting the input data calculated by the convolutional neural network from a floating point number to a fixed point number includes:

converting input data calculated by the convolutional neural network from floating point numbers to signed fixed point numbers;

the signed fixed-point number is converted into an unsigned fixed-point number.

The embodiment of the invention is used for converting input data calculated by a convolutional neural network from a floating point number to a fixed point number through a data processing module; a feature map splitting module for splitting the input data into a plurality of computation feature maps qd according to a first preset rule_j(ii) a A first calculation module for calculating the characteristic graph qd for each according to a second preset rule_jCalculating to obtain a plurality of first output characteristic graphs qo_i(ii) a An accumulation module for sequentially comparing all the first output characteristic graphs qo_iAccumulating to obtain a second output characteristic diagram; and the second calculation module is used for processing the second output characteristic diagram according to a third preset rule to obtain output data. By converting the floating point number into the fixed point number, the calculation of the convolutional neural network does not affect the calculation accuracy while needing less logic resources, the occupied storage resources are greatly reduced, and the data transmission speed is accelerated.

Drawings

Fig. 1 is a schematic structural diagram of a neural network acceleration system according to an embodiment of the present invention;

FIG. 2 is a diagram illustrating an input feature according to an embodiment of the present invention;

fig. 3 is a schematic diagram of splitting an input feature map into calculation feature maps according to a first embodiment of the present invention;

fig. 4 is a schematic structural diagram of a neural network acceleration system according to a second embodiment of the present invention;

fig. 5 is a schematic structural diagram of a neural network acceleration system according to a third embodiment of the present invention;

fig. 6 is a schematic structural diagram of a neural network acceleration system according to a fourth embodiment of the present invention;

fig. 7 is a schematic flowchart of a neural network acceleration method according to a fifth embodiment of the present invention.

Detailed Description

The present invention will be described in further detail with reference to the accompanying drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of the invention and are not limiting of the invention. It should be further noted that, for the convenience of description, only some of the structures related to the present invention are shown in the drawings, not all of the structures.

Before discussing exemplary embodiments in more detail, it should be noted that some exemplary embodiments are described as processes or methods depicted as flowcharts. Although a flowchart may describe the steps as a sequential process, many of the steps can be performed in parallel, concurrently or simultaneously. In addition, the order of the steps may be rearranged. A process may be terminated when its operations are completed, but may have additional steps not included in the figure. A process may correspond to a method, a function, a procedure, a subroutine, a subprogram, etc.

Furthermore, the terms "first," "second," and the like may be used herein to describe various orientations, actions, steps, elements, or the like, but the orientations, actions, steps, or elements are not limited by these terms. These terms are only used to distinguish one direction, action, step or element from another direction, action, step or element. For example, a first computing module may be referred to as a second computing module, and similarly, a second computing module may be referred to as a first computing module, without departing from the scope of the present application. The first computing module and the second computing module are both computing modules, but are not the same computing module. The terms "first", "second", etc. are not to be construed as indicating or implying relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defined as "first" or "second" may explicitly or implicitly include one or more of that feature. In the description of the present invention, "a plurality" means at least two, e.g., two, three, etc., unless specifically limited otherwise.

Example one

Fig. 1 is a schematic structural diagram of a neural network acceleration system according to an embodiment of the present invention, which is applicable to the calculation of a convolutional neural network. As shown in fig. 1, a neural network acceleration system according to a first embodiment of the present invention includes: the system comprises a data processing module 100, a feature map splitting module 200, a first calculating module 300, an accumulating module 400 and a second calculating module 500.

The data processing module 100 is configured to convert input data calculated by the convolutional neural network from a floating point number to a fixed point number;

the feature map splitting module 200 is configured to split the input data into a plurality of computation feature maps qd according to a first preset rule_j；

The first calculation module 300 is used for calculating the feature map qd for each according to a second preset rule_jCalculating to obtain a plurality of first output characteristic graphs qo_i；

The accumulation module 400 is configured to sequentially compare all the first output feature maps qo_iAccumulating to obtain a second output characteristic diagram;

the second calculating module 500 is configured to process the second output feature map according to a third preset rule to obtain output data.

Specifically, there are two general methods for representing data (real numbers) in machine language: floating point numbers and fixed point numbers. When fixed-point numbers express real numbers, the positions of the decimal points are fixed, and the decimal points are not expressed in the machine but are agreed at fixed positions in advance, and once the decimal points are determinedThe position of the point can not be changed, so that the data range represented by the fixed point number is limited, and the corresponding occupied memory space (bit) is small. Floating point numbers use scientific notation to express real numbers, i.e., a mantissa, a radix, an exponent, and a sign representing positive or negative, e.g., a floating point number representing real number 123.45 is 1.2345x10²Where 1.2345 is the mantissa, 10 is the radix, and 2 is the exponent. The floating point number achieves the effect of floating decimal point through the exponent number, so that large-range data can be flexibly expressed, and correspondingly, the occupied memory space is large. The convolutional neural network is a neural network simulating the human brain in order to realize the machine learning technology similar to artificial intelligence, and the data usually adopts the expression form of floating point numbers.

The data processing module 100 converts the input data calculated by the convolutional neural network from floating point numbers to fixed point numbers, and the converted fixed point numbers are unsigned 8bit integers, so that the consumption of hardware logic resources in the calculation of the convolutional neural network is reduced.

The input data of the convolutional neural network is a multilayer three-dimensional matrix data structure composed of a plurality of data, including a rows, b columns and c layers, and in the calculation of the convolutional neural network, the multilayer three-dimensional matrix data structure is generally called as an input feature map. As shown in fig. 2, an input signature comprising 6 rows, 6 columns and 3 layers is shown, which is sized 6 × 3, and has a total of 108 data (for convenience of description, the data in fig. 2 are shown in the form of simple integers).

When performing calculation, the convolutional neural network does not directly perform calculation on all data of the input feature map at the same time, but performs convolution calculation by taking data of d rows and e columns each time, and a two-dimensional matrix data structure including d rows and e columns extracted from the input feature map is called a calculation feature map.

The specific working process of the feature map splitting module 200 is as follows: and when the number-taking frame is rightwards moved to the right boundary of the input feature map, returning to the left boundary of the input feature map again and moving downwards by one row to continue number-taking to form the calculation feature map until the last data of each layer of the input feature map is taken out. The number of columns shifted to the right of the number taking frame is called a step length, and the smaller the step length is, the smaller the number of columns shifted to the right of the number taking frame is, the higher the calculation accuracy of the convolutional neural network is, and the larger the corresponding calculation amount is.

The feature map splitting module 200 splits the input data into a plurality of computation feature maps qd according to a first preset rule_jAlternatively, the feature map splitting module 200 may split the input feature map into a plurality of computation feature maps qd including a 3 × 3 matrix data structure according to a preset step size_j. For example, the input feature map size is 6 × 3, the calculated feature map size is 3 × 3, and the preset step size is 1, the feature map splitting module 200 may split each layer of the input feature map into 16 calculated feature maps with the size of 3 × 3, and the input feature map may be split into 48 calculated feature maps qd₁～qd₄₈Can also be described as

Where c represents the number of layers of the input feature map where the feature map is calculated, and obviously, the value of c is 1, 2, and 3. Referring to fig. 3, the feature map splitting module 200 is shown splitting the first layer of the input feature map into 16 computed feature maps of size 3 x 3

When the first calculation module 300 performs calculation, all the calculation feature maps included in the first layer of the input feature map are calculated

Calculating to obtain a first output characteristic diagram qo₁Then, for all the calculated feature maps included in the second layer of the input feature map

Calculating to obtain a second first output characteristicGraph qo₂Then, for all the calculated feature maps included in the third layer of the input feature map

Calculating to obtain a third first output characteristic diagram qo₃Therefore, it can be seen that a first output feature map can be obtained after one layer of the input feature map is calculated by the first calculating module 300, and the number of the first output feature maps is equal to the number of layers of the input feature map.

The accumulation module 400 sequentially outputs all the first output characteristic maps qo to the first calculation module 300_iAnd accumulating to obtain a second output characteristic diagram. The accumulation module 400 adopts a First Input First Output (FIFO) buffer structure, and First Output characteristic diagram qo is buffered by using the FIFO buffer₁When the first computing module 300 outputs the second first output characteristic map qo₂When the data is stored, the accumulation module 400 first outputs the first output characteristic graph qo₂And a second first output profile qo₁Add, then q o₁+qo₂The result of (2) is buffered in FIFO; when the first computing module 300 outputs the third first output characteristic map qo₃When the data is stored, the accumulation module 400 calculates qo first₁+qo₂+qo₃As a result of (2), qo is further reduced₁+qo₂+qo₃The result of (2) is buffered in a FIFO.

The second calculation module 500 performs offset and quantization processing on the data in the second output characteristic diagram, so as to obtain the final output data of the neural network acceleration system, wherein the output data is still an unsigned 8-bit fixed point number.

The neural network acceleration system provided by the first embodiment of the invention is used for converting input data calculated by a convolutional neural network from a floating point number to a fixed point number through a data processing module; a feature map splitting module for splitting the input data into a plurality of computation feature maps qd according to a first preset rule_j(ii) a A first calculation module for calculating the characteristic graph qd for each according to a second preset rule_jCalculating to obtain a plurality of first output characteristic graphs qo_i(ii) a An accumulation module for sequentially comparing allFirst output characteristic diagram qo_iAccumulating to obtain a second output characteristic diagram; and the second calculation module is used for processing the second output characteristic diagram according to a third preset rule to obtain output data. By converting the floating point number into the fixed point number, the calculation of the convolutional neural network does not affect the calculation accuracy while needing less logic resources, the occupied storage resources are greatly reduced, and the data transmission speed is accelerated.

Example two

Fig. 4 is a schematic structural diagram of a neural network acceleration system according to a second embodiment of the present invention, which is a further refinement of the first calculation module in the foregoing embodiment. As shown in fig. 4, a neural network acceleration system according to a second embodiment of the present invention includes: the data processing system comprises a data processing module 100, a feature map splitting module 200, a first calculating module 300, an accumulating module 400 and a second calculating module 500, wherein the first calculating module 300 comprises: a weight memory 310, a convolution calculation unit 320, a branch addition tree unit 330 and a first output feature map calculation unit 340.

The weight memory 310 is used for storing the weight qw;

the convolution calculating unit 320 is used for calculating the feature map qd according to the calculation_jPerforming convolution calculation with the weight qw to obtain a first part of a first output characteristic diagram

The branch addition tree unit 330 is configured to calculate the feature map qd according to a fourth preset rule pair_jCalculating to obtain the second part of the first output characteristic diagram

The first output feature map calculation unit 340 is used for calculating a first part of the first output feature map

And a second portion of the first output profile

Subtracting to obtain the first output characteristic diagram qo_i。

Specifically, the convolution calculation of the convolutional neural network is actually a multiply-add operation between input data and weight data, the input data is represented by a plurality of calculation feature maps divided by an input feature map, correspondingly, the weight data is also represented by the weight data during calculation, the weight is a three-dimensional matrix data structure composed of d rows, e columns and c layers, the weight is recorded as qw, and then one layer of the weight (a two-dimensional matrix data structure including d rows, e columns) is recorded as qw_c(where, c represents the number of layers of the two-dimensional matrix data structure including d rows and e columns) when performing calculation, one layer of data of the input calculation graph and the data of the layer corresponding to the weight are calculated to obtain one layer of output data (i.e. the first part of a first output characteristic graph).

Before calculation, the weight qw stored in the weight memory 310 is still processed by the data processing module 100, and the weight qw is converted from a floating point number to an unsigned 8-bit integer by the data processing module 100.

The size of the weight value qw determines the calculation characteristic graph qd_jI.e. the feature map splitting module 200 splits the input data into a plurality of calculated feature maps qd according to the size of the weight qw and the preset step length_j。

The convolution calculation unit 320 calculates the feature maps qd for a plurality of features_jAnd the weight value qw is subjected to multiplication and addition calculation to obtain a first part of a first output characteristic diagram

As shown in fig. 3, taking the first layer in the input feature map of 6 × 3 as an example, the weight qw is 3 × 3, the preset step size is 1, and the feature map splitting module 200 splits the first layer of the input feature map into 16 calculated feature maps of 3 × 3 size

The first layer qw of each computation feature graph and weight qw₁Performing multiply-add operation to obtain a convolution output data, volumeProduct computation unit 320 pairs computed feature maps

And a first level qw of weights qw₁Performing multiply-add calculation to obtain a convolution output feature map with size of 4 × 4, and referring the convolution output feature map as the first part of the first output feature map

Branch adder tree unit 330 accumulates all data in a computed profile and then multiplies it by quantization parameter Z_wIf a branch output data is obtained, then the characteristic diagram is calculated

All the calculations are carried out to obtain a 4 x 4 size output profile of the branch, which is referred to as the second part of the first output profile

The first output feature map calculation unit 340 calculates a first portion of the first output feature map

And a second portion of the first output profile

Subtracting to obtain a first output characteristic diagram qo₁。

Similarly, the convolution computation unit 320 computes a feature map for the second layer split of the input feature map

And a second level qw of weights qw₂Performing multiply-add calculation to obtain a first part of a second first output characteristic diagram

Branch addition tree unit 330 pairs computation feature maps

Are all calculated to obtain a second part of a second first output characteristic diagram

The first output profile calculation unit 340 calculates a first portion of the second first output profile

And a second portion of a second first output profile

Subtracting to obtain a second first output characteristic diagram qo₂. Convolution computation unit 320 computes feature map for the third split level of the input feature map

And a third level qw of weights qw₃Performing multiply-add calculation to obtain the first part of the third first output characteristic diagram

Branch addition tree unit 330 pairs computation feature maps

Are all calculated to obtain a second part of a third first output characteristic diagram

The first output feature map calculation unit 340 calculates a first part of the third first output feature map

And a second portion of the third first output profile

Subtracting to obtain a third first output characteristic diagram qo₃。

The output data of the first output profile calculation unit 340 (i.e., the first output profile qo)_iThe data in (1) can be represented by the formula (2-1).

Wherein the content of the first and second substances,

representing the jth data in the c-th first output signature. Z_wIs a preset parameter, N is the jth calculated characteristic diagram qd_jThe total number of data of (a) is,

j-th computation feature graph qd representing the c-th split of the input feature graph_jThe number k of the first data of (2),

and representing the kth data of the c layer in the weight value. In the present application, the magnitude of the calculated feature map or weight is 3 × 3, so N is 9. Since one of the calculated feature maps in the input feature map of the first layer is calculated by the convolution calculating unit 320 and the branch addition tree unit 330 to obtain one data in the first output feature map, the output data qo of the first output feature map calculating unit 340_jQuantity of and computation of feature maps qd_jThe number of (2) is the same.

The accumulation module 400 sequentially outputs the first output characteristic map qo to the first calculation module 300₁～qo₃And accumulating to obtain a second output characteristic diagram.

The output data of the accumulation module 400 (i.e., the data in the second output profile) may be represented by equation (2-2).

Wherein qe_jIn the graph representing the second output characteristicThe (j) th data of (2),

the j-th data in the C-th first output characteristic diagram is shown, C represents the total number of the first output characteristic diagrams, and C is 3 in the embodiment.

The second calculation module 500 performs bias and quantization processing on the data in the second output characteristic diagram, so as to obtain final output data of the neural network acceleration system.

The neural network acceleration system provided by the second embodiment of the invention divides the calculation of the first calculation module into two parts through the convolution calculation unit and the branch addition tree unit, and finally adds the output result of the convolution calculation unit and the output result of the branch addition tree unit through the first output characteristic diagram calculation unit to obtain the final output result of the first calculation unit, so that the first calculation module is divided into the combination of two simple multiplication and addition calculation modules, the calculation process is simplified, and the calculation speed is accelerated.

EXAMPLE III

Fig. 5 is a schematic structural diagram of a neural network acceleration system according to a third embodiment of the present invention, which is a further refinement of the second calculation module in the foregoing embodiment. As shown in fig. 5, a neural network acceleration system according to a second embodiment of the present invention includes: the data processing system comprises a data processing module 100, a feature map splitting module 200, a first calculating module 300, an accumulating module 400 and a second calculating module 500, wherein the first calculating module 300 comprises: a weight memory 310, a convolution calculating unit 320, a branch addition tree unit 330 and a first output characteristic diagram calculating unit 340; the second calculation module 500 includes: a bias module 510 and a quantization module 520.

The offset module 510 is configured to add a preset offset parameter to the second output characteristic map to obtain an output offset characteristic map.

Specifically, the output data of the bias module 510 (i.e., the data in the output bias profile) may be represented by equation (3-1).

Wherein, qb_jRepresenting the jth data, qe, in the output bias profile_jRepresenting the jth data, q, in the second output characteristic diagram_biasIs a preset bias parameter.

The quantization module 520 is configured to calculate the output bias characteristic map and a preset quantization parameter to obtain output data. The final output data is still an unsigned 8bit integer.

Specifically, the output data of the quantization module 520 (i.e., the final output data) may be represented by equation (3-2).

Wherein Q is_jThe j-th output data may be regarded as output data obtained by quantizing the j-th data in the output bias characteristic diagram, Z_oIs a first predetermined quantization parameter, and M is a second predetermined quantization parameter.

Optionally, the convolutional neural network generally includes a plurality of convolutional calculation units, that is, the first calculation module 300 generally includes a plurality of convolutional calculation units 320, and weights corresponding to each of the convolutional calculation units 320 are not necessarily the same, so when the convolutional neural network includes a plurality of convolutional calculation units 320, the accumulation module 400 outputs a plurality of second output feature maps, and preset offset parameters corresponding to each of the second output feature maps are different, so that the preset offset parameters of the offset module 510 should be set according to the second output feature maps.

According to the embodiment of the invention, the subsequent calculation of the second output characteristic diagram is completed through the bias module and the quantization module, the affine quantization is applied to the neural network acceleration system, and the influence of the hardware structure of the neural network acceleration system on the calculation accuracy after optimization is reduced.

Example four

Fig. 6 is a schematic structural diagram of a neural network acceleration system according to a fourth embodiment of the present invention, which is a further refinement of the data processing module in the foregoing embodiment. As shown in fig. 6, a neural network acceleration system according to a fourth embodiment of the present invention includes: the data processing system comprises a data processing module 100, a feature map splitting module 200, a first calculating module 300, an accumulating module 400 and a second calculating module 500, wherein the first calculating module 300 comprises: a weight memory 310, a convolution calculating unit 320, a branch addition tree unit 330 and a first output characteristic diagram calculating unit 340; the second calculation module 500 includes: a bias module 510 and a quantization module 520; the data processing module 100 includes: a first data processing unit 110 and a second data processing unit 120.

The first data processing unit 110 is configured to convert input data calculated by the convolutional neural network from floating point numbers to signed fixed point numbers.

Specifically, the first data processing unit 110 converts the floating-point number into a signed fixed-point number according to equation (4-1),

wherein round (x) represents rounding the value of data x, r is the data of the convolution neural network when the input data is a floating point number, and q is the floating point number_intFor signed fixed-point numbers after conversion of floating-point numbers, Z being q_intZero point data of, i.e. q_intWhen the content is equal to 0, the content,

s is a conversion parameter, and S is calculated from the formula (4-2)

Wherein n is the conversion accuracy and represents the converted q_intWhere n is 8.

The second data processing unit 120 is adapted to convert signed fixed point numbers into unsigned fixed point numbers.

Specifically, the second data processing unit 120 converts the signed fixed point number into the unsigned fixed point number according to equation (4-3).

q＝clamp(0,2ⁿ-1,q_int) (4-3)

Wherein, the use method of the clamp function is shown as the formula (4-4).

In the neural network acceleration system provided by the fourth embodiment of the present invention, the first data processing unit converts the input data of the convolutional neural network from floating point numbers to signed fixed point numbers, and the second data processing unit converts the signed fixed point numbers to unsigned fixed point numbers, so that the response of the storage space occupied by the data calculated by the neural network acceleration system is reduced, and the consumption of hardware logic resources of the neural network acceleration system is reduced.

EXAMPLE five

Fig. 7 is a schematic flow chart of a neural network acceleration method provided in the fifth embodiment of the present invention, which is applicable to calculation of a convolutional neural network, and the method can be implemented by a neural network acceleration system provided in any embodiment of the present invention, and reference may be made to the description in any system embodiment of the present invention for content that is not described in detail in the fifth embodiment of the present invention.

As shown in fig. 7, a neural network acceleration method according to a fifth embodiment of the present invention includes:

and S710, converting the input data calculated by the convolutional neural network from a floating point number to a fixed point number.

Specifically, there are two general methods for representing data (real numbers) in machine language: floating point numbers and fixed point numbers. When the fixed point number expresses the real number, the position of the decimal point is fixed, the decimal point is not expressed in a machine, but is appointed at the fixed position in advance, and once the position of the decimal point is determined, the position cannot be changed, so that the data range expressed by the fixed point number is limited, and the corresponding occupied memory space (bit) is small. Floating point numbers use scientific notation to express real numbers, i.e., a mantissa, a radix, an exponent, and a sign representing positive or negative, e.g., a floating point number representing real number 123.45 is 1.2345x10²Where 1.2345 is the mantissa, 10 is the radix, and 2 is the exponent. The floating-point number achieves the effect of floating decimal point through the exponent number, thereby flexibly expressing large-range data and phasesThe occupied memory space is large. The convolutional neural network is a convolutional neural network simulating the human brain so as to realize the machine learning technology similar to artificial intelligence, and the data of the convolutional neural network usually adopts a floating point number expression form.

The input data calculated by the convolutional neural network is converted into fixed point numbers from floating point numbers, and the converted fixed point numbers are unsigned 8bit integers, so that the consumption of hardware logic resources in the calculation of the convolutional neural network is reduced.

Further, step S710 includes S711 to S711, specifically:

s711, converting input data calculated by the convolutional neural network from floating point numbers to signed fixed point numbers;

and S712, converting the signed fixed point number into an unsigned fixed point number.

S720, splitting the input data into a plurality of computation characteristic graphs qd according to a first preset rule_j。

Specifically, the input data is represented in the form of an input feature map, and the input data is split into a plurality of computation feature maps qd according to a first preset rule_jThe method comprises the following steps: splitting the input feature map into a plurality of computation feature maps qd comprising a 3 x 3 matrix data structure according to a preset step size_j. The input signature is a three-dimensional matrix data structure including a rows, b columns and c layers, as shown in fig. 2, and is an input signature with a size of 6 × 3. The computed feature map is a two-dimensional matrix data structure comprising d rows and e columns, and each layer of the input feature map can be split into a plurality of computed feature maps of 3 × 3 size. And when the number-taking frame is rightwards moved to the right boundary of the input feature map, returning the sitting boundary of the input feature map again and moving downwards by one row to continue number-taking to form the calculation feature map until the last data of each layer of the input feature map is taken out. The number of columns of the right shift of the number taking frame is called step length, and the smaller the step length is, the smaller the number of columns of the right shift of the number taking frame is, and the calculation precision of the convolutional neural network isThe higher the corresponding calculation amount is.

For example, the input feature map size is 6 × 3, the calculated feature map size is 3 × 3, and the preset step size is 1, the feature map splitting module 200 may split each layer of the input feature map into 16 calculated feature maps with the size of 3 × 3, and the input feature map may be split into 48 calculated feature maps qd₁～qd₄₈Can also be described as

Where c represents the number of layers of the input feature map where the feature map is calculated, and obviously, the value of c is 1, 2, and 3. FIG. 3 illustrates the splitting of the first layer of input signatures into 16 computed signatures of size 3 x 3

S730, calculating each feature graph qd according to a second preset rule_jCalculating to obtain a plurality of first output characteristic graphs qo_i。

Specifically, first, all the calculated feature maps included in the first layer of the input feature map are calculated

Calculating to obtain a first layer (or a first) first output characteristic diagram qo₁Then, for all the calculated feature maps included in the second layer of the input feature map

Calculating to obtain a first output characteristic diagram qo of a second layer (or a second layer)₂Then, for all the calculated feature maps included in the third layer of the input feature map

Calculating to obtain a third layer (or a third layer) first output characteristic diagram qo₃Therefore, it can be seen that a first output feature map can be obtained after one layer of the input feature map is calculated, the number of the first output feature maps and the layer of the input feature mapThe numbers are equal.

Further, the step S730 includes S731 to S734, specifically:

s731, obtaining the computation characteristic graph qd_jThe corresponding weight value qw;

s732, calculating the feature map qd_jPerforming convolution calculation with the weight qw to obtain a first part of a first output characteristic diagram

S733, calculating the feature graph qd according to a fourth preset rule_jCalculating to obtain the second part of the first output characteristic diagram

S734, outputting the first part of the first output characteristic diagram

And a second portion of the first output profile

Subtracting to obtain the first output characteristic diagram qo_i。

S740, sequentially comparing all the first output characteristic graphs qo_iAnd overlapping to obtain a second output characteristic diagram.

Specifically, the first output characteristic graph qo is buffered by using FIFO firstly₁When a second first output characteristic diagram qo is obtained₂When the data is stored, the first output characteristic graph qo is firstly obtained₂And a second first output profile qo₁Add, then q o₁+qo₂The result of (2) is buffered in FIFO; when a third first output characteristic diagram qo is obtained₃When data of (2) is being calculated qo₁+qo₂+qo₃As a result of (2), qo is further reduced₁+qo₂+qo₃The result of (2) is buffered in a FIFO.

And S750, processing the second output characteristic diagram according to a third preset rule to obtain output data.

Specifically, the data in the second output characteristic diagram is subjected to offset and quantization processing, so that final output data of the neural network acceleration system is obtained, and the output data is still an unsigned 8-bit fixed point number.

Further, the step S750 includes S751 to S752, specifically:

s751, adding a bias parameter to the second output characteristic diagram to obtain an output bias characteristic diagram;

and S752, calculating the output bias characteristic diagram and the quantization parameter to obtain output data.

In the neural network acceleration method provided by the fifth embodiment of the invention, the input data calculated by the convolutional neural network is converted into fixed point numbers from floating point numbers; splitting input data into a plurality of computation characteristic graphs qd according to a first preset rule_j(ii) a Calculating a characteristic graph qd for each according to a second preset rule_jCalculating to obtain a plurality of first output characteristic graphs qo_i(ii) a Sequentially comparing all the first output characteristic graphs qo_iOverlapping to obtain a second output characteristic diagram; and processing the second output characteristic diagram according to a third preset rule to obtain output data. By converting the floating point number into the fixed point number, the calculation of the convolutional neural network does not affect the calculation accuracy while needing less logic resources, the occupied storage resources are greatly reduced, and the data transmission speed is accelerated.

It is to be noted that the foregoing is only illustrative of the preferred embodiments of the present invention and the technical principles employed. It will be understood by those skilled in the art that the present invention is not limited to the particular embodiments described herein, but is capable of various obvious changes, rearrangements and substitutions as will now become apparent to those skilled in the art without departing from the scope of the invention. Therefore, although the present invention has been described in greater detail by the above embodiments, the present invention is not limited to the above embodiments, and may include other equivalent embodiments without departing from the spirit of the present invention, and the scope of the present invention is determined by the scope of the appended claims.

Claims

1. A neural network acceleration system, comprising:

A first calculation module for calculating the characteristic graph qd for each according to a second preset rule_iCalculating to obtain a plurality of first output characteristic graphs qo_i；

2. The system of claim 1, wherein the first computing module comprises:

the weight memory is used for storing the weight qw;

And a second portion of the first output profile

Subtracting to obtain the first output characteristic diagram qo_i。

3. The system of claim 2, wherein the data processing module is further configured to convert the weights qw stored in the weight memory into fixed-point numbers.

4. The system of claim 1, wherein the feature map splitting module is specifically configured to:

5. The system of claim 1, wherein the second computing module comprises:

6. The system of claim 1, wherein the data processing module comprises:

7. A neural network acceleration method, comprising:

8. The method according to claim 7, wherein said computing a feature map qd for each of said computed feature maps according to a second preset rule_jCalculating to obtain a plurality of first output characteristic graphs qo_iThe method comprises the following steps:

A first part of the first output characteristic diagram

And a second portion of the first output profile

Subtracting to obtain the first output characteristic diagram qo_i。

9. The method of claim 7, wherein the processing the second output feature map according to a third predetermined rule to obtain output data comprises:

10. The method of claim 7, wherein converting the input data computed by the convolutional neural network from a floating point number to a fixed point number comprises:

the signed fixed-point number is converted into an unsigned fixed-point number.