CN114265696A - Pooling device and pooling accelerating circuit for maximum pooling layer of convolutional neural network - Google Patents

Pooling device and pooling accelerating circuit for maximum pooling layer of convolutional neural network Download PDF

Info

Publication number
CN114265696A
CN114265696A CN202111632969.7A CN202111632969A CN114265696A CN 114265696 A CN114265696 A CN 114265696A CN 202111632969 A CN202111632969 A CN 202111632969A CN 114265696 A CN114265696 A CN 114265696A
Authority
CN
China
Prior art keywords
pooling
data
selector
output
register
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111632969.7A
Other languages
Chinese (zh)
Inventor
王晓峰
周辉
盖一帆
赵雄波
蒋彭龙
李悦
吴松龄
费亚男
李超然
吴敏
杨庆军
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Aerospace Automatic Control Research Institute
Original Assignee
Beijing Aerospace Automatic Control Research Institute
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Aerospace Automatic Control Research Institute filed Critical Beijing Aerospace Automatic Control Research Institute
Priority to CN202111632969.7A priority Critical patent/CN114265696A/en
Publication of CN114265696A publication Critical patent/CN114265696A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Memory System Of A Hierarchy Structure (AREA)

Abstract

The invention relates to a pooling device and a pooling accelerating circuit for a maximal pooling layer of a convolutional neural network, wherein the pooling device comprises a first selector, a second selector, a comparator, a constant register and a pooling register; the first input end of the comparator inputs the characteristic data in the pooling window, the second input end of the comparator is connected with the output data of the first selector, and the output end of the comparator is connected to the second selector; the first input end of the first selector is connected with the constant register, the second input end of the first selector is connected with an external pooling cache to read data from the constant register, and the third input end of the first selector is connected with the output end of the pooling register; the first output end of the second selector is used as the output end of the final result of pooling, the second output end is connected with an external pooling cache to write data into the pooling cache, and the third output end is connected with the input end of the pooling register. The invention realizes the high-efficiency calculation of the largest pooling layer in the common CNN with the smallest FPGA resource consumption as possible, thereby solving the real-time problem and the power consumption problem when the CNN is deployed in the embedded equipment.

Description

Pooling device and pooling accelerating circuit for maximum pooling layer of convolutional neural network
Technical Field
The invention belongs to the technical field of deep learning acceleration circuits, and particularly relates to a pooling device and a pooling acceleration circuit for a maximum pooling layer of a convolutional neural network.
Background
With the increasing performance of deep learning techniques, represented by Convolutional Neural Networks (CNNs), in the fields of image classification, object detection, and the like. In the process of developing the new generation aerospace model towards intellectualization, a great deal of application requirements in the aspects of target detection, fault identification and the like by using CNN appear. However, it is increasingly difficult for the conventional information processing method based on a general-purpose processor to meet the application requirements of models for several reasons: firstly, the CNN-based deep learning algorithm has huge calculated amount and parameter amount and needs high calculation force equipment to carry out real-time reasoning on the calculated amount and the parameter amount; secondly, the internal and external heat dissipation conditions of the spacecraft are harsh, and the computing equipment with high power consumption is not allowed to work for a long time; finally, most applications in the aerospace field are strong real-time application scenarios, and low-effort devices are difficult to meet real-time requirements. The general solution to the above problem is to design a CNN-oriented hardware acceleration unit based on FPGA, and a fully available acceleration unit design usually needs to include multiple functions such as convolution, pooling, nonlinear activation, preprocessing, post-processing, ShortCut, and the like. In order to implement a CNN acceleration unit design that is as versatile and efficient as possible within the limited FPGA resource consumption, the most common computing functions need to be implemented with as little resource penalty as possible. The invention discloses a simplified design method of a maximum pooling layer hardware acceleration circuit with high data throughput, which covers the calculation requirement of a common CNN algorithm.
Disclosure of Invention
In view of the foregoing analysis, the present invention aims to disclose a pooling device and a pooling acceleration circuit for a maximal pooling layer of a convolutional neural network, which implement a simplified hardware acceleration circuit for a maximal pooling layer of high data throughput that covers the computational requirements of common CNN algorithms.
The invention discloses a pooling device for a maximal pooling layer of a convolutional neural network, which comprises a first selector S1, a second selector S2, a comparator, a constant register and a pooling register, wherein the constant register is connected with the pooling register;
the first input end of the comparator inputs the characteristic data in the pooling window, the second input end is connected to the output data of the first selector S1, and the maximum value obtained through comparison is output to the second selector S2;
a first input end of the first selector S1 is connected with an output end of the constant register, a second input end is connected with an external pooling cache to read data from the constant register, and a third input end is connected with an output end of the pooling register; under the control of an external instruction, selecting the input data of one input end to output to a second input end of the comparator;
a first output end of the second selector S2 is used as a pooled final result output end, a second output end is connected with an external pooled buffer to write data into the pooled final result output end, and a third output end is connected with an input end of the pooled register; and selecting one output end to output the comparison result of the comparator under the control of an external instruction.
Further, the external control unit determines the position of the input feature data in the pooling window according to the configuration information of the current pooling layer and the position of the current input feature data in the feature map; and outputting a control command to the first selector to perform input gating control according to the position, and outputting the gating control command to the second selector to perform output gating control so as to realize pooling calculation of the pooling window.
Further, when the inputted feature data is the first data in the pooling window, the control unit controls the first selector S1 to gate the first input terminal, and the second selector S2 to gate the third output terminal;
and the maximum value obtained by selecting the data in the constant register and outputting the data to the comparator to be compared with the data is output to the pooling register through the second selector.
Further, when the input characteristic is the last data of each row in the pooling window, and not the last data of the entire pooling window, the control unit controls the first selector S1 to gate the third input terminal, and the second selector S2 to gate the second output terminal;
the data in the pooling register is selected to be output to the comparator to be compared with the data, and the maximum value is obtained and is output to an external pooling buffer for storage through the second selector S2.
Further, when the first data characterized by each row of the pooling window, and not the first data of the entire pooling window, is input, the control unit controls the first selector S1 to gate the second input terminal, and the second selector S2 to gate the third output terminal;
comparing the data of the corresponding address in the selected pooling cache with the data to obtain a maximum value, and outputting the maximum value to a pooling register through a second selector S2; and the data of the corresponding address in the pooling cache is a pooling result which is stored in advance in the upper line of the address of the pooling cache.
Further, when the input characteristic is the last number of the pooling windows, the control unit controls the first selector S1 to gate the third input terminal, and the second selector S2 to gate the first output terminal;
the data in the pooling register is selected to be output to the comparator to be compared with the data, and the maximum value is obtained as the pooling result of the pooling window to be output to the external memory for storage through the second selector S2.
Further, when the input features data of non-first and last columns in the pooling window, the control unit controls the first selector S1 to gate the third input terminal, and the second selector S2 to gate the third output terminal;
the maximum value obtained by selecting the data in the pooling register to output to the comparator to be compared with the data is stored in the pooling register through the second selector S2.
The invention also discloses a pooling accelerating circuit aiming at the maximum pooling layer of the convolutional neural network, which comprises a control unit, a pooling device array and a pooling cache;
the array of pooled cells comprising n pooled cells according to any one of claims 1-7;
the control unit is connected with the n stainers and controls the calculation of each stainer according to the position of the characteristic data input into each stainer in the stainer window;
the pooling cache is connected with the n pooling devices and is used for supporting parallel data reading and writing of the n pooling devices.
Further, the pooling cache caches the pooled intermediate results of a row in each pooling device; depth of pooling buffer is greater than or equal to Wmax/S’,WmaxFor the maximum supported profile width, S' is the corresponding pooling step size.
Further, the device also comprises a data sorting adjustor;
the data sorting adjuster reorders the characteristic data and loads the characteristic data in sequence during calculation;
the default sorting mode of the feature data is represented as Fin [ N ] [ Hin ] [ Win ];
the characteristic data after the sorting mode is adjusted by the data sorting adjuster is 4-dimensional tensor Fin [ N/N ]][Hin][Win][n](ii) a Wherein HinIs the height, W, of the feature mapinThe width of the characteristic diagram, N is the channel number of the characteristic diagram, and N is the number of the pooling devices;
and the feature data of the 4-dimensional tensor are continuously stored in the external DDR according to the sequence from the low dimension to the high dimension, and are called when the pooling acceleration circuit calculates.
The invention can realize at least one of the following beneficial effects:
the invention realizes the high-efficiency calculation of the largest pooling layer in the common CNN with the smallest FPGA resource consumption as possible, thereby solving the real-time problem and the power consumption problem when the CNN is deployed in the embedded equipment;
and the efficient calculation of the maximum pooling layer of any window shape and size in the CNN is realized;
the logic structure complexity of the part of the circuit is reduced, and the logic resource usage amount of the FPGA is reduced, so that the data throughput is highly matched with the data bandwidth.
Drawings
The drawings are only for purposes of illustrating particular embodiments and are not to be construed as limiting the invention, wherein like reference numerals are used to designate like parts throughout.
FIG. 1 is a schematic view of a pool composition connection in an embodiment of the present invention;
FIG. 2 is a schematic diagram of the connection of the pooling acceleration circuit in one embodiment of the present invention;
fig. 3 is a schematic diagram of a maximum pooling process with a pooling window size of 2 x 2 in an embodiment of the present invention.
Detailed Description
The preferred embodiments of the present invention will now be described in detail with reference to the accompanying drawings, which form a part hereof, and which together with the embodiments of the invention serve to explain the principles of the invention.
One embodiment of the present invention discloses a pooling device for maximum pooling of convolutional neural networks, as shown in fig. 1, comprising: a first selector S1, a second selector S2, a comparator, a constant register, and a pooling register;
the first input end of the comparator inputs the characteristic data in the pooling window, the second input end is connected to the output data of the first selector S1, and the maximum value obtained through comparison is output to the second selector S2;
a first input end of the first selector S1 is connected with an output end of the constant register, a second input end is connected with an external pooling cache to read data from the constant register, and a third input end is connected with an output end of the pooling register; under the control of an external instruction, selecting the input data of one input end to output to a second input end of the comparator;
a first output end of the second selector S2 is used as a pooled final result output end, a second output end is connected with an external pooled buffer to write data into the pooled final result output end, and a third output end is connected with an input end of the pooled register; and selecting one output end to output the comparison result of the comparator under the control of an external instruction.
Specifically, the constant register stores the minimum value that can be represented by the data bit width of the current feature map, and if the feature map is signed integer represented by 8 bits, the minimum value is 0 xFF. The pooling register is used to hold intermediate results of the pooling calculations.
Specifically, the external control unit determines the position of the input feature data in the input pooling window for pooling calculation according to the configuration information of the current pooling layer and in combination with the position (x, y) of the current input feature data in the feature map; outputting a control command to the first selector to perform input gating control according to the position, and outputting the gating control to the second selector S2 to perform output gating control so as to realize pooling calculation of the pooling window; the configuration information comprises a height H of the feature mapfWidth W of feature mapfNumber of channels N of the characteristic diagram, height H of the pooling windowwWidth W of pooling windowwAnd pooling step length S.
The gating control of the position of the external control unit in the input pooling window where the pooling calculation is performed based on the determined input feature data comprises:
1) when the input feature data is the first data in the pooling window, the control unit controls the first selector S1 to gate the first input terminal, and the second selector S2 to gate the third output terminal;
the maximum value obtained by selecting the data in the constant register to be output to the comparator to be compared with the data is output to the pooling register through the second selector S2.
2) When the input characteristic is the last data of each row in the pooling window, and not the last data of the entire pooling window, the control unit controls the first selector S1 to gate the third input terminal, and the second selector S2 to gate the second output terminal;
the data in the pooling register is selected to be output to the comparator to be compared with the data, and the maximum value is obtained and is output to an external pooling buffer for storage through the second selector S2.
3) When the first data of each row of the pooling window, which is not the first data of the entire pooling window, is input, the control unit controls the first selector S1 to gate the second input terminal, and the second selector S2 to gate the third output terminal;
the maximum value obtained by comparing the data at the corresponding address in the selected pooled buffer with the data is output to the pooled register through the second selector S2.
And the data of the corresponding address in the pooling cache is a pooling result which is stored in advance in the upper line of the address of the pooling cache.
4) When the input characteristic is the last number of the pooling windows, the control unit controls the first selector S1 to gate the third input terminal, and the second selector S2 to gate the first output terminal;
the data in the pooling register is selected to be output to a comparator to be compared with the data, and the maximum value is obtained and is used as the pooling result of the pooling window to be output to an external memory for storage through a second selector S2; such as DDR, etc.
5) When the input features are data of non-first and last columns in the pooling window, the control unit controls the first selector S1 to gate the third input terminal, and the second selector S2 to gate the third output terminal;
the maximum value obtained by selecting the data in the pooling register to output to the comparator to be compared with the data is stored in the pooling register through the second selector S2.
According to the structure and the work flow of the pooling device, the pooling window input into the pooling device for pooling calculation can be a pooling window with any shape and size; and, support the edge expansion of any size of the feature map, but the pooling step size needs to be consistent with the pooling window size.
And for the condition that the pooling window is not complete at the edge of the feature map, special treatment is carried out according to the design concept.
The invention also discloses a pooling accelerating circuit aiming at the maximal pooling layer of the convolutional neural network, which comprises a control unit, a pooling device array and a pooling cache as shown in figure 2;
the pooling device array comprises n pooling devices as described above;
the control unit is connected with the n stainers and controls the calculation of each stainer according to the position of the characteristic data input into each stainer in the stainer window;
the pooling cache is connected with the n pooling devices and is used for supporting parallel data reading and writing of the n pooling devices. If the bit width of the feature data is 8 bits, the width of the pooling buffer is n × 8 bits.
In this embodiment, the pooled cache only needs to cache one line of pooled intermediate results, for example, the maximum feature map width supported by the current design is WmaxAnd the corresponding pooling step is S', the depth of the cache needs to be more than or equal to Wmax/S’。
Specifically, in order to improve the continuity of loading data from the external storage DDR, the actual bandwidth of the system during operation is further improved; the pooling circuit of this embodiment further includes a data ordering adjuster;
the data sorting adjuster reorders the characteristic data and loads the characteristic data in sequence during calculation;
the default ordering of the feature data is expressed as: fin [ N ] [ Hin ] [ Win ];
the feature data after the sorting mode is adjusted by the data sorting adjuster can be represented as a 4-dimensional tensor, Fin [ N/N ] as shown below][Hin][Win][n](ii) a Wherein HinIs the height, W, of the feature mapinThe width of the feature map, N the number of channels of the feature map, and N the number of stainers.
And the characteristic data is continuously stored in the external DDR according to the sequence from the low dimension to the high dimension and is called when the pooling acceleration circuit calculates.
A specific scheme in this embodiment shows a schematic illustration of the maximum pooling process with a pooling window size of 2 × 2, such as the 2 × 2 maximum pooling process described in fig. 3. In fig. 3, the parameters of the maximum pooling layer are set as follows: height of the feature map is HinThe width of the characteristic diagram is WinThe number of channels of the feature map is N, the size of the pooling window is 2 x 2, and the pooling step length is 2.
In this embodiment, the number of the pooling devices is n, i.e., the feature maps of n channels can be processed simultaneously.
During the maximum pooling with a pooling window size of 2 x 2,
the data sorting adjuster sorts and adjusts the characteristic data with the default sorting mode represented as Fin [ N ] [ Hin ] [ Win ] into a 4-dimensional tensor of Fin [ N/N ] [ Hin ] [ Win ]; and the data are continuously stored in the external DDR according to the sequence from the low dimension to the high dimension and are called when the pooling acceleration circuit calculates.
The implementation process of the present invention is described by taking the first channel pooling device as an example, and the data of the first channel is shown in fig. 3. When the first number 7 enters the pooling device, it is compared with the minimum value 0xFF, and 7 is stored in the pooling register as the maximum value; when 8 enters the pooling device, it is compared with 7 in the register and 8 is stored as the maximum value in the pooling buffer. And the feature map of the first row is processed by analogy.
When the second row of characteristic data 0 enters the pooling device, comparing the second row of characteristic data with 8 in the pooling cache, and storing the 8 in the pooling register with the maximum value; when the last number 5 of the pooling windows enters the pooling device, the result is compared with 8 in the register, and 8 is output to the outside of the chip as the pooling result of the pooling window.
And by analogy, processing a complete feature map. The working principle of other channels is completely consistent with the working principle, and each pooling device works independently to perform pooling operation on the characteristic data of each channel.
In summary, the embodiment implements efficient computation of the largest pooling layer in the common CNNs with the smallest FPGA resource consumption amount as possible, thereby solving the real-time problem and the power consumption problem when the CNNs are deployed in the embedded device; and the efficient calculation of the maximum pooling layer of any window shape and size in the CNN is realized; the logic structure complexity of the part of the circuit is reduced, and the logic resource usage amount of the FPGA is reduced, so that the data throughput is highly matched with the data bandwidth.
The above description is only for the preferred embodiment of the present invention, but the scope of the present invention is not limited thereto, and any changes or substitutions that can be easily conceived by those skilled in the art within the technical scope of the present invention are included in the scope of the present invention.

Claims (10)

1. A pooling device for maximal pooling of convolutional neural networks, comprising a first selector S1, a second selector S2, a comparator, a constant register, and a pooling register;
the first input end of the comparator inputs the characteristic data in the pooling window, the second input end is connected to the output data of the first selector S1, and the maximum value obtained through comparison is output to the second selector S2;
a first input end of the first selector S1 is connected with an output end of the constant register, a second input end is connected with an external pooling cache to read data from the constant register, and a third input end is connected with an output end of the pooling register; under the control of an external instruction, selecting the input data of one input end to output to a second input end of the comparator;
a first output end of the second selector S2 is used as a pooled final result output end, a second output end is connected with an external pooled buffer to write data into the pooled final result output end, and a third output end is connected with an input end of the pooled register; and selecting one output end to output the comparison result of the comparator under the control of an external instruction.
2. The pooling of claim 1, wherein the external control unit determines a location of the input feature data in the pooling window in combination with a location of the current input feature data in the feature map based on configuration information of the current pooling layer; and outputting a control command to the first selector to perform input gating control according to the position, and outputting the gating control command to the second selector to perform output gating control so as to realize pooling calculation of the pooling window.
3. The pooling of claim 2, wherein the control unit controls the first selector S1 to gate the first input terminal, and the second selector S2 to gate the third output terminal, when the inputted feature data is the first data in the pooling window;
and the maximum value obtained by selecting the data in the constant register and outputting the data to the comparator to be compared with the data is output to the pooling register through the second selector.
4. The pooling of claim 2, wherein the control unit controls the first selector S1 to gate the third input terminal and the second selector S2 to gate the second output terminal when the input characteristic is the last data of each row in the pooling window, and not the last data of the entire pooling window;
the data in the pooling register is selected to be output to the comparator to be compared with the data, and the maximum value is obtained and is output to an external pooling buffer for storage through the second selector S2.
5. The pooling of claim 2, wherein the control unit controls the first selector S1 to gate the second input terminal, and the second selector S2 to gate the third output terminal, when the input characteristic is the first data of each row of the pooling window, and not the first data of the entire pooling window;
comparing the data of the corresponding address in the selected pooling cache with the data to obtain a maximum value, and outputting the maximum value to a pooling register through a second selector S2; and the data of the corresponding address in the pooling cache is a pooling result which is stored in advance in the upper line of the address of the pooling cache.
6. The pooling of claim 2, wherein the control unit controls the first selector S1 to gate the third input terminal, and the second selector S2 to gate the first output terminal, when the input characteristic is the last number of the pooling windows;
the data in the pooling register is selected to be output to the comparator to be compared with the data, and the maximum value is obtained as the pooling result of the pooling window to be output to the external memory for storage through the second selector S2.
7. The pooling of claim 2, wherein the control unit controls the first selector S1 to gate the third input terminal, and the second selector S2 to gate the third output terminal, when the input characteristic is data of a non-first column and a last column in the pooling window;
the maximum value obtained by selecting the data in the pooling register to output to the comparator to be compared with the data is stored in the pooling register through the second selector S2.
8. A pooling accelerating circuit aiming at a maximal pooling layer of a convolutional neural network is characterized by comprising a control unit, a pooling device array and a pooling cache;
the array of pooled cells comprising n pooled cells according to any one of claims 1-7;
the control unit is connected with the n stainers and controls the calculation of each stainer according to the position of the characteristic data input into each stainer in the stainer window;
the pooling cache is connected with the n pooling devices and is used for supporting parallel data reading and writing of the n pooling devices.
9. The pooling acceleration circuit of claim 8, wherein the pooling cache caches pooled intermediate results of a row in each of the pooled; depth of pooling buffer is greater than or equal to Wmax/S’,WmaxFor the maximum supported profile width, S' is the corresponding pooling step size.
10. The pooling acceleration circuit of claim 8, further comprising a data ordering adjuster;
the data sorting adjuster reorders the characteristic data and loads the characteristic data in sequence during calculation;
the default sorting mode of the feature data is represented as Fin [ N ] [ Hin ] [ Win ];
the characteristic data after the sorting mode is adjusted by the data sorting adjuster is 4-dimensional tensor Fin [ N/N ]][Hin][Win][n](ii) a Wherein HinIs the height, W, of the feature mapinThe width of the characteristic diagram, N is the channel number of the characteristic diagram, and N is the number of the pooling devices;
and the feature data of the 4-dimensional tensor are continuously stored in the external DDR according to the sequence from the low dimension to the high dimension, and are called when the pooling acceleration circuit calculates.
CN202111632969.7A 2021-12-28 2021-12-28 Pooling device and pooling accelerating circuit for maximum pooling layer of convolutional neural network Pending CN114265696A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111632969.7A CN114265696A (en) 2021-12-28 2021-12-28 Pooling device and pooling accelerating circuit for maximum pooling layer of convolutional neural network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111632969.7A CN114265696A (en) 2021-12-28 2021-12-28 Pooling device and pooling accelerating circuit for maximum pooling layer of convolutional neural network

Publications (1)

Publication Number Publication Date
CN114265696A true CN114265696A (en) 2022-04-01

Family

ID=80831248

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111632969.7A Pending CN114265696A (en) 2021-12-28 2021-12-28 Pooling device and pooling accelerating circuit for maximum pooling layer of convolutional neural network

Country Status (1)

Country Link
CN (1) CN114265696A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115049885A (en) * 2022-08-16 2022-09-13 之江实验室 Storage and calculation integrated convolutional neural network image classification device and method
WO2024119862A1 (en) * 2022-12-05 2024-06-13 北京航天自动控制研究所 Neural network acceleration system

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115049885A (en) * 2022-08-16 2022-09-13 之江实验室 Storage and calculation integrated convolutional neural network image classification device and method
WO2024119862A1 (en) * 2022-12-05 2024-06-13 北京航天自动控制研究所 Neural network acceleration system

Similar Documents

Publication Publication Date Title
CN114265696A (en) Pooling device and pooling accelerating circuit for maximum pooling layer of convolutional neural network
CN110097174B (en) Method, system and device for realizing convolutional neural network based on FPGA and row output priority
US11243895B2 (en) Data pre-processing method and device, and related computer device and storage medium
WO2020119318A1 (en) Self-adaptive selection and design method for convolutional-layer hardware accelerator
KR20240105502A (en) Exploiting input data sparsity in neural network compute units
CN112633477A (en) Quantitative neural network acceleration method based on field programmable array
CN112001294A (en) YOLACT + + based vehicle body surface damage detection and mask generation method and storage device
CN111753962B (en) Adder, multiplier, convolution layer structure, processor and accelerator
CN113743587B (en) Convolutional neural network pooling calculation method, system and storage medium
CN102567254B (en) The method that adopts dma controller to carry out data normalization processing
Gong et al. Research and implementation of multi-object tracking based on vision DSP
CN111898752B (en) Apparatus and method for performing LSTM neural network operations
CN112200310B (en) Intelligent processor, data processing method and storage medium
Yu et al. A memory-efficient hardware architecture for deformable convolutional networks
Gelashvili et al. L3 fusion: Fast transformed convolutions on CPUs
CN112101538B (en) Graphic neural network hardware computing system and method based on memory computing
CN115204373A (en) Design method for fast convolution and cache mode of convolutional neural network
CN111191780B (en) Averaging pooling accumulation circuit, device and method
CN112905239B (en) Point cloud preprocessing acceleration method based on FPGA, accelerator and electronic equipment
US20180039858A1 (en) Image recognition apparatus, image recognition system, and image recognition method
Lyu et al. FLNA: An energy-efficient point cloud feature learning accelerator with dataflow decoupling
US20230073835A1 (en) Structured Pruning of Vision Transformer
CN114880775B (en) Feasible domain searching method and device based on active learning Kriging model
US20230168809A1 (en) Intelligence processor device and method for reducing memory bandwidth
CN114936636A (en) General lightweight convolutional neural network acceleration method based on FPGA

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination