CN114265696A

CN114265696A - Pooling device and pooling accelerating circuit for maximum pooling layer of convolutional neural network

Info

Publication number: CN114265696A
Application number: CN202111632969.7A
Authority: CN
Inventors: 王晓峰; 周辉; 盖一帆; 赵雄波; 蒋彭龙; 李悦; 吴松龄; 费亚男; 李超然; 吴敏; 杨庆军
Original assignee: Beijing Aerospace Automatic Control Research Institute
Current assignee: Beijing Aerospace Automatic Control Research Institute
Priority date: 2021-12-28
Filing date: 2021-12-28
Publication date: 2022-04-01

Abstract

The invention relates to a pooling device and a pooling accelerating circuit for a maximal pooling layer of a convolutional neural network, wherein the pooling device comprises a first selector, a second selector, a comparator, a constant register and a pooling register; the first input end of the comparator inputs the characteristic data in the pooling window, the second input end of the comparator is connected with the output data of the first selector, and the output end of the comparator is connected to the second selector; the first input end of the first selector is connected with the constant register, the second input end of the first selector is connected with an external pooling cache to read data from the constant register, and the third input end of the first selector is connected with the output end of the pooling register; the first output end of the second selector is used as the output end of the final result of pooling, the second output end is connected with an external pooling cache to write data into the pooling cache, and the third output end is connected with the input end of the pooling register. The invention realizes the high-efficiency calculation of the largest pooling layer in the common CNN with the smallest FPGA resource consumption as possible, thereby solving the real-time problem and the power consumption problem when the CNN is deployed in the embedded equipment.

Description

Pooling device and pooling accelerating circuit for maximum pooling layer of convolutional neural network

Technical Field

The invention belongs to the technical field of deep learning acceleration circuits, and particularly relates to a pooling device and a pooling acceleration circuit for a maximum pooling layer of a convolutional neural network.

Background

With the increasing performance of deep learning techniques, represented by Convolutional Neural Networks (CNNs), in the fields of image classification, object detection, and the like. In the process of developing the new generation aerospace model towards intellectualization, a great deal of application requirements in the aspects of target detection, fault identification and the like by using CNN appear. However, it is increasingly difficult for the conventional information processing method based on a general-purpose processor to meet the application requirements of models for several reasons: firstly, the CNN-based deep learning algorithm has huge calculated amount and parameter amount and needs high calculation force equipment to carry out real-time reasoning on the calculated amount and the parameter amount; secondly, the internal and external heat dissipation conditions of the spacecraft are harsh, and the computing equipment with high power consumption is not allowed to work for a long time; finally, most applications in the aerospace field are strong real-time application scenarios, and low-effort devices are difficult to meet real-time requirements. The general solution to the above problem is to design a CNN-oriented hardware acceleration unit based on FPGA, and a fully available acceleration unit design usually needs to include multiple functions such as convolution, pooling, nonlinear activation, preprocessing, post-processing, ShortCut, and the like. In order to implement a CNN acceleration unit design that is as versatile and efficient as possible within the limited FPGA resource consumption, the most common computing functions need to be implemented with as little resource penalty as possible. The invention discloses a simplified design method of a maximum pooling layer hardware acceleration circuit with high data throughput, which covers the calculation requirement of a common CNN algorithm.

Disclosure of Invention

In view of the foregoing analysis, the present invention aims to disclose a pooling device and a pooling acceleration circuit for a maximal pooling layer of a convolutional neural network, which implement a simplified hardware acceleration circuit for a maximal pooling layer of high data throughput that covers the computational requirements of common CNN algorithms.

The invention discloses a pooling device for a maximal pooling layer of a convolutional neural network, which comprises a first selector S1, a second selector S2, a comparator, a constant register and a pooling register, wherein the constant register is connected with the pooling register;

the first input end of the comparator inputs the characteristic data in the pooling window, the second input end is connected to the output data of the first selector S1, and the maximum value obtained through comparison is output to the second selector S2;

a first input end of the first selector S1 is connected with an output end of the constant register, a second input end is connected with an external pooling cache to read data from the constant register, and a third input end is connected with an output end of the pooling register; under the control of an external instruction, selecting the input data of one input end to output to a second input end of the comparator;

a first output end of the second selector S2 is used as a pooled final result output end, a second output end is connected with an external pooled buffer to write data into the pooled final result output end, and a third output end is connected with an input end of the pooled register; and selecting one output end to output the comparison result of the comparator under the control of an external instruction.

Further, the external control unit determines the position of the input feature data in the pooling window according to the configuration information of the current pooling layer and the position of the current input feature data in the feature map; and outputting a control command to the first selector to perform input gating control according to the position, and outputting the gating control command to the second selector to perform output gating control so as to realize pooling calculation of the pooling window.

Further, when the inputted feature data is the first data in the pooling window, the control unit controls the first selector S1 to gate the first input terminal, and the second selector S2 to gate the third output terminal;

and the maximum value obtained by selecting the data in the constant register and outputting the data to the comparator to be compared with the data is output to the pooling register through the second selector.

Further, when the input characteristic is the last data of each row in the pooling window, and not the last data of the entire pooling window, the control unit controls the first selector S1 to gate the third input terminal, and the second selector S2 to gate the second output terminal;

the data in the pooling register is selected to be output to the comparator to be compared with the data, and the maximum value is obtained and is output to an external pooling buffer for storage through the second selector S2.

Further, when the first data characterized by each row of the pooling window, and not the first data of the entire pooling window, is input, the control unit controls the first selector S1 to gate the second input terminal, and the second selector S2 to gate the third output terminal;

comparing the data of the corresponding address in the selected pooling cache with the data to obtain a maximum value, and outputting the maximum value to a pooling register through a second selector S2; and the data of the corresponding address in the pooling cache is a pooling result which is stored in advance in the upper line of the address of the pooling cache.

Further, when the input characteristic is the last number of the pooling windows, the control unit controls the first selector S1 to gate the third input terminal, and the second selector S2 to gate the first output terminal;

the data in the pooling register is selected to be output to the comparator to be compared with the data, and the maximum value is obtained as the pooling result of the pooling window to be output to the external memory for storage through the second selector S2.

Further, when the input features data of non-first and last columns in the pooling window, the control unit controls the first selector S1 to gate the third input terminal, and the second selector S2 to gate the third output terminal;

the maximum value obtained by selecting the data in the pooling register to output to the comparator to be compared with the data is stored in the pooling register through the second selector S2.

The invention also discloses a pooling accelerating circuit aiming at the maximum pooling layer of the convolutional neural network, which comprises a control unit, a pooling device array and a pooling cache;

the array of pooled cells comprising n pooled cells according to any one of claims 1-7;

the control unit is connected with the n stainers and controls the calculation of each stainer according to the position of the characteristic data input into each stainer in the stainer window;

the pooling cache is connected with the n pooling devices and is used for supporting parallel data reading and writing of the n pooling devices.

Further, the pooling cache caches the pooled intermediate results of a row in each pooling device; depth of pooling buffer is greater than or equal to W_max/S’，W_maxFor the maximum supported profile width, S' is the corresponding pooling step size.

Further, the device also comprises a data sorting adjustor;

the data sorting adjuster reorders the characteristic data and loads the characteristic data in sequence during calculation;

the default sorting mode of the feature data is represented as Fin [ N ] [ Hin ] [ Win ];

the characteristic data after the sorting mode is adjusted by the data sorting adjuster is 4-dimensional tensor Fin [ N/N ]][Hin][Win][n](ii) a Wherein H_inIs the height, W, of the feature map_inThe width of the characteristic diagram, N is the channel number of the characteristic diagram, and N is the number of the pooling devices;

and the feature data of the 4-dimensional tensor are continuously stored in the external DDR according to the sequence from the low dimension to the high dimension, and are called when the pooling acceleration circuit calculates.

The invention can realize at least one of the following beneficial effects:

the invention realizes the high-efficiency calculation of the largest pooling layer in the common CNN with the smallest FPGA resource consumption as possible, thereby solving the real-time problem and the power consumption problem when the CNN is deployed in the embedded equipment;

and the efficient calculation of the maximum pooling layer of any window shape and size in the CNN is realized;

the logic structure complexity of the part of the circuit is reduced, and the logic resource usage amount of the FPGA is reduced, so that the data throughput is highly matched with the data bandwidth.

Drawings

The drawings are only for purposes of illustrating particular embodiments and are not to be construed as limiting the invention, wherein like reference numerals are used to designate like parts throughout.

FIG. 1 is a schematic view of a pool composition connection in an embodiment of the present invention;

FIG. 2 is a schematic diagram of the connection of the pooling acceleration circuit in one embodiment of the present invention;

fig. 3 is a schematic diagram of a maximum pooling process with a pooling window size of 2 x 2 in an embodiment of the present invention.

Detailed Description

The preferred embodiments of the present invention will now be described in detail with reference to the accompanying drawings, which form a part hereof, and which together with the embodiments of the invention serve to explain the principles of the invention.

One embodiment of the present invention discloses a pooling device for maximum pooling of convolutional neural networks, as shown in fig. 1, comprising: a first selector S1, a second selector S2, a comparator, a constant register, and a pooling register;

Specifically, the constant register stores the minimum value that can be represented by the data bit width of the current feature map, and if the feature map is signed integer represented by 8 bits, the minimum value is 0 xFF. The pooling register is used to hold intermediate results of the pooling calculations.

Specifically, the external control unit determines the position of the input feature data in the input pooling window for pooling calculation according to the configuration information of the current pooling layer and in combination with the position (x, y) of the current input feature data in the feature map; outputting a control command to the first selector to perform input gating control according to the position, and outputting the gating control to the second selector S2 to perform output gating control so as to realize pooling calculation of the pooling window; the configuration information comprises a height H of the feature map_fWidth W of feature map_fNumber of channels N of the characteristic diagram, height H of the pooling window_wWidth W of pooling window_wAnd pooling step length S.

The gating control of the position of the external control unit in the input pooling window where the pooling calculation is performed based on the determined input feature data comprises:

1) when the input feature data is the first data in the pooling window, the control unit controls the first selector S1 to gate the first input terminal, and the second selector S2 to gate the third output terminal;

the maximum value obtained by selecting the data in the constant register to be output to the comparator to be compared with the data is output to the pooling register through the second selector S2.

2) When the input characteristic is the last data of each row in the pooling window, and not the last data of the entire pooling window, the control unit controls the first selector S1 to gate the third input terminal, and the second selector S2 to gate the second output terminal;

3) When the first data of each row of the pooling window, which is not the first data of the entire pooling window, is input, the control unit controls the first selector S1 to gate the second input terminal, and the second selector S2 to gate the third output terminal;

the maximum value obtained by comparing the data at the corresponding address in the selected pooled buffer with the data is output to the pooled register through the second selector S2.

And the data of the corresponding address in the pooling cache is a pooling result which is stored in advance in the upper line of the address of the pooling cache.

4) When the input characteristic is the last number of the pooling windows, the control unit controls the first selector S1 to gate the third input terminal, and the second selector S2 to gate the first output terminal;

the data in the pooling register is selected to be output to a comparator to be compared with the data, and the maximum value is obtained and is used as the pooling result of the pooling window to be output to an external memory for storage through a second selector S2; such as DDR, etc.

5) When the input features are data of non-first and last columns in the pooling window, the control unit controls the first selector S1 to gate the third input terminal, and the second selector S2 to gate the third output terminal;

According to the structure and the work flow of the pooling device, the pooling window input into the pooling device for pooling calculation can be a pooling window with any shape and size; and, support the edge expansion of any size of the feature map, but the pooling step size needs to be consistent with the pooling window size.

And for the condition that the pooling window is not complete at the edge of the feature map, special treatment is carried out according to the design concept.

The invention also discloses a pooling accelerating circuit aiming at the maximal pooling layer of the convolutional neural network, which comprises a control unit, a pooling device array and a pooling cache as shown in figure 2;

the pooling device array comprises n pooling devices as described above;

the pooling cache is connected with the n pooling devices and is used for supporting parallel data reading and writing of the n pooling devices. If the bit width of the feature data is 8 bits, the width of the pooling buffer is n × 8 bits.

In this embodiment, the pooled cache only needs to cache one line of pooled intermediate results, for example, the maximum feature map width supported by the current design is W_maxAnd the corresponding pooling step is S', the depth of the cache needs to be more than or equal to W_max/S’。

Specifically, in order to improve the continuity of loading data from the external storage DDR, the actual bandwidth of the system during operation is further improved; the pooling circuit of this embodiment further includes a data ordering adjuster;

the default ordering of the feature data is expressed as: fin [ N ] [ Hin ] [ Win ];

the feature data after the sorting mode is adjusted by the data sorting adjuster can be represented as a 4-dimensional tensor, Fin [ N/N ] as shown below][Hin][Win][n](ii) a Wherein H_inIs the height, W, of the feature map_inThe width of the feature map, N the number of channels of the feature map, and N the number of stainers.

And the characteristic data is continuously stored in the external DDR according to the sequence from the low dimension to the high dimension and is called when the pooling acceleration circuit calculates.

A specific scheme in this embodiment shows a schematic illustration of the maximum pooling process with a pooling window size of 2 × 2, such as the 2 × 2 maximum pooling process described in fig. 3. In fig. 3, the parameters of the maximum pooling layer are set as follows: height of the feature map is H_inThe width of the characteristic diagram is W_inThe number of channels of the feature map is N, the size of the pooling window is 2 x 2, and the pooling step length is 2.

In this embodiment, the number of the pooling devices is n, i.e., the feature maps of n channels can be processed simultaneously.

During the maximum pooling with a pooling window size of 2 x 2,

the data sorting adjuster sorts and adjusts the characteristic data with the default sorting mode represented as Fin [ N ] [ Hin ] [ Win ] into a 4-dimensional tensor of Fin [ N/N ] [ Hin ] [ Win ]; and the data are continuously stored in the external DDR according to the sequence from the low dimension to the high dimension and are called when the pooling acceleration circuit calculates.

The implementation process of the present invention is described by taking the first channel pooling device as an example, and the data of the first channel is shown in fig. 3. When the first number 7 enters the pooling device, it is compared with the minimum value 0xFF, and 7 is stored in the pooling register as the maximum value; when 8 enters the pooling device, it is compared with 7 in the register and 8 is stored as the maximum value in the pooling buffer. And the feature map of the first row is processed by analogy.

When the second row of characteristic data 0 enters the pooling device, comparing the second row of characteristic data with 8 in the pooling cache, and storing the 8 in the pooling register with the maximum value; when the last number 5 of the pooling windows enters the pooling device, the result is compared with 8 in the register, and 8 is output to the outside of the chip as the pooling result of the pooling window.

And by analogy, processing a complete feature map. The working principle of other channels is completely consistent with the working principle, and each pooling device works independently to perform pooling operation on the characteristic data of each channel.

In summary, the embodiment implements efficient computation of the largest pooling layer in the common CNNs with the smallest FPGA resource consumption amount as possible, thereby solving the real-time problem and the power consumption problem when the CNNs are deployed in the embedded device; and the efficient calculation of the maximum pooling layer of any window shape and size in the CNN is realized; the logic structure complexity of the part of the circuit is reduced, and the logic resource usage amount of the FPGA is reduced, so that the data throughput is highly matched with the data bandwidth.

The above description is only for the preferred embodiment of the present invention, but the scope of the present invention is not limited thereto, and any changes or substitutions that can be easily conceived by those skilled in the art within the technical scope of the present invention are included in the scope of the present invention.

Claims

1. A pooling device for maximal pooling of convolutional neural networks, comprising a first selector S1, a second selector S2, a comparator, a constant register, and a pooling register;

2. The pooling of claim 1, wherein the external control unit determines a location of the input feature data in the pooling window in combination with a location of the current input feature data in the feature map based on configuration information of the current pooling layer; and outputting a control command to the first selector to perform input gating control according to the position, and outputting the gating control command to the second selector to perform output gating control so as to realize pooling calculation of the pooling window.

3. The pooling of claim 2, wherein the control unit controls the first selector S1 to gate the first input terminal, and the second selector S2 to gate the third output terminal, when the inputted feature data is the first data in the pooling window;

4. The pooling of claim 2, wherein the control unit controls the first selector S1 to gate the third input terminal and the second selector S2 to gate the second output terminal when the input characteristic is the last data of each row in the pooling window, and not the last data of the entire pooling window;

5. The pooling of claim 2, wherein the control unit controls the first selector S1 to gate the second input terminal, and the second selector S2 to gate the third output terminal, when the input characteristic is the first data of each row of the pooling window, and not the first data of the entire pooling window;

6. The pooling of claim 2, wherein the control unit controls the first selector S1 to gate the third input terminal, and the second selector S2 to gate the first output terminal, when the input characteristic is the last number of the pooling windows;

7. The pooling of claim 2, wherein the control unit controls the first selector S1 to gate the third input terminal, and the second selector S2 to gate the third output terminal, when the input characteristic is data of a non-first column and a last column in the pooling window;

8. A pooling accelerating circuit aiming at a maximal pooling layer of a convolutional neural network is characterized by comprising a control unit, a pooling device array and a pooling cache;

9. The pooling acceleration circuit of claim 8, wherein the pooling cache caches pooled intermediate results of a row in each of the pooled; depth of pooling buffer is greater than or equal to W_max/S’，W_maxFor the maximum supported profile width, S' is the corresponding pooling step size.

10. The pooling acceleration circuit of claim 8, further comprising a data ordering adjuster;