CN113138957A - Chip for neural network inference and method for accelerating neural network inference - Google Patents
Chip for neural network inference and method for accelerating neural network inference Download PDFInfo
- Publication number
- CN113138957A CN113138957A CN202110336218.4A CN202110336218A CN113138957A CN 113138957 A CN113138957 A CN 113138957A CN 202110336218 A CN202110336218 A CN 202110336218A CN 113138957 A CN113138957 A CN 113138957A
- Authority
- CN
- China
- Prior art keywords
- neural network
- storage
- bit
- convolution
- parameters
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000013528 artificial neural network Methods 0.000 title claims abstract description 56
- 238000000034 method Methods 0.000 title claims abstract description 27
- 238000003860 storage Methods 0.000 claims abstract description 93
- 238000004364 calculation method Methods 0.000 claims abstract description 35
- 238000003491 array Methods 0.000 claims abstract description 24
- 238000013138 pruning Methods 0.000 claims description 21
- 238000013527 convolutional neural network Methods 0.000 claims description 11
- 238000004590 computer program Methods 0.000 claims description 10
- 238000005520 cutting process Methods 0.000 claims description 2
- 238000013473 artificial intelligence Methods 0.000 abstract description 2
- 238000010586 diagram Methods 0.000 description 14
- 238000004422 calculation algorithm Methods 0.000 description 4
- 230000006870 function Effects 0.000 description 4
- 230000006931 brain damage Effects 0.000 description 1
- 231100000874 brain damage Toxicity 0.000 description 1
- 208000029028 brain injury Diseases 0.000 description 1
- 230000006835 compression Effects 0.000 description 1
- 238000007906 compression Methods 0.000 description 1
- 230000001788 irregular Effects 0.000 description 1
- 238000004519 manufacturing process Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 238000005457 optimization Methods 0.000 description 1
- 230000000717 retained effect Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F15/00—Digital computers in general; Data processing equipment in general
- G06F15/76—Architectures of general purpose stored program computers
- G06F15/78—Architectures of general purpose stored program computers comprising a single central processing unit
- G06F15/7807—System on chip, i.e. computer system on a single chip; System in package, i.e. computer system on one or more chips in a single package
- G06F15/781—On-chip cache; Off-chip memory
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F15/00—Digital computers in general; Data processing equipment in general
- G06F15/76—Architectures of general purpose stored program computers
- G06F15/78—Architectures of general purpose stored program computers comprising a single central processing unit
- G06F15/7807—System on chip, i.e. computer system on a single chip; System in package, i.e. computer system on one or more chips in a single package
- G06F15/7817—Specially adapted for signal processing, e.g. Harvard architectures
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/082—Learning methods modifying the architecture, e.g. adding, deleting or silencing nodes or connections
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D10/00—Energy efficient computing, e.g. low power processors, power management or thermal management
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Computer Hardware Design (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Computing Systems (AREA)
- General Health & Medical Sciences (AREA)
- Artificial Intelligence (AREA)
- Evolutionary Computation (AREA)
- Computational Linguistics (AREA)
- Molecular Biology (AREA)
- Biophysics (AREA)
- Biomedical Technology (AREA)
- Data Mining & Analysis (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Life Sciences & Earth Sciences (AREA)
- Microelectronics & Electronic Packaging (AREA)
- Health & Medical Sciences (AREA)
- Signal Processing (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
The invention relates to the field of artificial intelligence, and provides a chip for neural network reasoning and a method for accelerating the neural network reasoning. The chip for neural network inference comprises a storage and calculation unit, wherein the storage and calculation unit comprises a plurality of storage and calculation arrays with different input lengths, and the plurality of storage and calculation arrays are used for deploying convolution kernels corresponding to the input lengths of the storage and calculation arrays. According to the invention, the storage calculation arrays corresponding to different input lengths are additionally arranged in the storage calculation unit to adapt to the pruned convolution kernels, so that the power consumption is reduced, the computing resources are maximally utilized, and the requirements of computing resource utilization rate and low power consumption can be met.
Description
Technical Field
The invention relates to the field of artificial intelligence, in particular to a chip for neural network reasoning and a method for accelerating the neural network reasoning.
Background
The convolutional neural network algorithm comprises a large number of parameters, the parameter quantity of the convolutional neural network model obtained by adopting a conventional training method is very large and is usually in the order of hundreds of megameters, while the inference operation needs to consume a large number of operation resources, and an embedded platform with precious hardware resources cannot bear the storage burden.
In order to accelerate the reasoning speed of the convolutional neural network algorithm on a chip, various model lightweight methods are developed at present. Network pruning is a widely used method in deep neural network compression. For example, an Optimal Brain Damage (OBD) method, in which all weight parameters in the network are regarded as a single parameter, improves the accuracy and generalization capability of the network by removing insignificant weights in the network with second derivative approximation parameter significance based on diagonal hypothesis, extreme hypothesis and quadratic hypothesis during optimization. Because single weight pruning is unstructured, parameters of a pruning part can obtain a sparse network, so that convolution kernels after pruning are irregular, and the problem that a chip is difficult to consider both computational resource utilization rate and low power consumption is solved.
Disclosure of Invention
The invention aims to provide a chip for neural network inference and a method for accelerating the neural network inference, so as to solve the problem that the chip is difficult to take account of computational resource utilization rate and low power consumption.
In order to achieve the above object, an aspect of the present invention provides a chip for neural network inference, including a storage unit, where the storage unit includes a plurality of storage computation arrays with different input lengths, and the plurality of storage computation arrays are used to deploy convolution kernels corresponding to the input lengths of the storage computation arrays.
Further, the convolution kernels deployed on the storage compute array are pruned and clustered.
Further, the storage and computation unit comprises four storage and computation arrays, and the input lengths of the four storage and computation arrays are 1 bit, 3 bits, 6 bits and 9 bits respectively.
Further, the 1-bit storage computation array is used for deploying convolution kernels of 1-bit parameters;
the 3-bit storage computing array is used for deploying convolution kernels of 2-bit or 3-bit parameters;
the 6-bit storage computing array is used for deploying convolution kernels with parameters from 4 bits to 6 bits;
the 9-bit storage compute array is used to deploy convolution kernels of 7-bit to 9-bit parameters.
Furthermore, each storage computation array corresponds to a convolution kernel, and a plurality of storage computation arrays perform parallel operation.
Another aspect of the present invention provides a method for accelerating neural network inference, based on the chip for neural network inference of claim 1, the method comprising:
pruning and clustering convolution kernel parameters of each layer of the convolution neural network;
and distributing the clustered convolution kernels to a storage calculation array of the chip for neural network inference, which corresponds to the parameter bits of the convolution kernels.
Further, the pruning and clustering of the parameters of each layer of convolution kernel of the convolutional neural network includes: pruning the parameters of each layer of convolution kernel of the convolution neural network; quantizing the parameters of each layer of the convolution kernel after pruning; and clustering the quantized convolution kernel parameters of each layer.
Further, the pruning of the parameters of each layer of convolution kernel of the convolutional neural network includes:
and acquiring parameter values of each layer of convolution kernel of the convolution neural network, and cutting off the parameters of each layer of convolution kernel, which are smaller than a preset threshold value.
Further, the allocating the clustered convolution kernels to the storage computation array of the chip for neural network inference corresponding to the parameter bits of the convolution kernels includes:
allocating the convolution kernel of the 1-bit parameter to a 1-bit storage calculation array;
allocating convolution kernels of 2-bit or 3-bit parameters to a 3-bit storage computation array;
allocating the convolution kernel with 4-bit to 6-bit parameters to a 6-bit storage calculation array;
the convolution kernel with 7-bit to 9-bit parameters is assigned to a 9-bit storage compute array.
The present invention also provides a storage medium having stored thereon computer program instructions which, when executed, implement the method of accelerating neural network inference described above.
According to the chip for neural network inference, the storage calculation arrays corresponding to different input lengths are additionally arranged in the storage calculation unit to be matched with the pruned convolution kernels, so that the power consumption is reduced, the computing resources are maximally utilized, and the requirements of computing resource utilization rate and low power consumption can be met.
Additional features and advantages of embodiments of the invention will be set forth in the detailed description which follows.
Drawings
The accompanying drawings, which are included to provide a further understanding of the embodiments of the invention and are incorporated in and constitute a part of this specification, illustrate embodiments of the invention and together with the description serve to explain the embodiments of the invention without limiting the embodiments of the invention. In the drawings:
FIG. 1 is a block diagram of a memory unit of a chip for neural network inference provided by an embodiment of the present invention;
FIGS. 2-5 are exemplary diagrams of a memory compute array of a chip for neural network inference provided by one embodiment of the present invention;
FIG. 6 is an exemplary diagram of a convolution kernel corresponding to the storage compute array shown in FIGS. 2-5;
FIG. 7 is a flow chart of a method for accelerating neural network inference provided by an embodiment of the present invention.
Detailed Description
The following detailed description of embodiments of the invention refers to the accompanying drawings. It should be understood that the detailed description and specific examples, while indicating the present invention, are given by way of illustration and explanation only, not limitation.
Typically, Convolutional Neural Networks (CNN) are implemented using a 3 × 3 convolution kernel. In order to accelerate the reasoning of the neural network algorithm, a chip for reasoning operation is designed in a mode of maximizing the parallelism of convolution kernel operation. However, the input dimensionality of the pruned neural network operation unit is not fixed, because the parameters of the cut convolution kernels of different convolution layers are not fixed (i.e. the input length of the convolution kernels is not fixed), if the operation units with the same number of bits are adopted for each convolution kernel to perform inference calculation, the computational power resource cannot be maximally utilized, the power consumption is increased, and the inference efficiency is not high.
Fig. 1 is a block diagram of a storage unit of a chip for neural network inference provided in an embodiment of the present invention. The embodiment of the invention provides a storage and calculation integrated chip for neural network reasoning, which comprises a storage and calculation unit. As shown in fig. 1, the storage computation unit includes a plurality of storage computation arrays with different input lengths, and the plurality of storage computation arrays are used for deploying convolution kernels corresponding to the input lengths of the storage computation arrays. And the convolution kernels deployed on each storage computing array are subjected to pruning and clustering processing. The chip provided by the embodiment of the invention reduces the power consumption, simultaneously maximally utilizes the computing resources, and meets the requirements of computing resource utilization rate and low power consumption.
Fig. 2-5 are exemplary diagrams of a memory compute array of a chip for neural network inference provided by an embodiment of the invention. The storage unit of the present embodiment includes four storage calculation arrays, the input lengths of which are 1 bit, 3 bits, 6 bits, and 9 bits, respectively, as shown in fig. 2 to 5, fig. 2 is a storage calculation array with an input length of 1 bit, fig. 3 is a storage calculation array with an input length of 3 bits, fig. 4 is a storage calculation array with an input length of 6 bits, and fig. 5 is a storage calculation array with an input length of 9 bits. Wherein, the 1-bit storage calculation array is used for deploying convolution kernels with 1-bit parameters; the 3-bit storage computing array is used for deploying convolution kernels with 2-bit or 3-bit parameters; the 6-bit storage computing array is used for deploying convolution kernels with parameters from 4 bits to 6 bits; a 9-bit storage compute array is used to deploy convolution kernels of 7-bit to 9-bit parameters. Fig. 6 is an exemplary diagram of a convolution kernel corresponding to the storage compute array shown in fig. 2-5. As shown in fig. 6, from left to right, the first convolution kernel corresponds to the 1-bit storage and computation array shown in fig. 2, the second convolution kernel corresponds to the 3-bit storage and computation array shown in fig. 3, the third convolution kernel corresponds to the 6-bit storage and computation array shown in fig. 4, and the fourth convolution kernel corresponds to the 9-bit storage and computation array shown in fig. 5. In the process of reasoning operation of the convolutional neural network, when the pruned and quantized convolutional kernels are distributed, each convolutional kernel is distributed to a corresponding storage calculation array, namely each storage calculation array corresponds to one convolutional kernel, and a plurality of storage calculation arrays perform parallel operation.
According to the storage and calculation integrated chip for neural network inference, storage and calculation arrays corresponding to different input lengths are additionally arranged in the storage and calculation unit to be matched with the pruned convolution kernels, so that the power consumption is reduced, the computing resources are maximally utilized, and the requirements of computing resource utilization rate and low power consumption can be met.
FIG. 7 is a flow chart of a method for accelerating neural network inference provided by an embodiment of the present invention. As shown in fig. 7, the method for accelerating neural network inference provided in this embodiment is based on the above chip for neural network inference, and includes the following steps:
and S1, pruning and clustering the convolution kernel parameters of each layer of the convolution neural network.
In a specific embodiment, pruning and clustering comprise the following sub-steps:
s11, pruning the parameters of each layer of convolution kernel of the convolutional neural network, for example, obtaining parameter values of each layer of convolution kernel of the convolutional neural network, and pruning the parameters of each layer of convolution kernel that are smaller than a preset threshold.
And S12, quantizing each layer of the pruned convolution kernel parameters, for example, int8 quantizing the retained convolution kernel parameters after pruning, and converting the convolution operation (multiplication and addition instruction) of float 32bit into the convolution operation of int8 to reduce the calculation amount.
S13, clustering the quantized convolution kernel parameters of each layer, namely clustering the reserved parameters one by one to obtain cluster0, cluster1, cluster2 and cluster3, wherein the cluster0 has fewer parameters and the cluster3 has more parameters.
And S2, distributing the clustered convolution kernels to a storage calculation array of the chip for neural network inference, wherein the storage calculation array corresponds to the parameter bits of the convolution kernels. Specifically, a convolution kernel with 1-bit parameters is allocated to a 1-bit storage computation array; allocating convolution kernels of 2-bit or 3-bit parameters to a 3-bit storage computation array; allocating the convolution kernel with 4-bit to 6-bit parameters to a 6-bit storage calculation array; the convolution kernel with 7-bit to 9-bit parameters is assigned to a 9-bit storage compute array. For example, cluster0, cluster1, cluster2 and cluster3 are input into 1-bit, 3-bit, 6-bit and 9-bit storage computing arrays respectively.
According to the method for accelerating neural network inference, the training convolution kernels after pruning and clustering are deployed to the storage calculation array with the corresponding input length, computational resources are utilized to the maximum extent, power consumption is reduced, and neural network algorithm inference can be accelerated.
Embodiments of the present invention also provide a machine-readable storage medium having stored thereon computer program instructions which, when executed, implement the above-described method of accelerating neural network inference.
As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The present application is described with reference to flowchart illustrations and/or block diagrams of methods, systems and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
Finally, it should be noted that: the above embodiments are only for illustrating the technical solutions of the present invention and not for limiting the same, and although the present invention is described in detail with reference to the above embodiments, those of ordinary skill in the art should understand that: modifications and equivalents may be made to the embodiments of the invention without departing from the spirit and scope of the invention, which is to be covered by the claims.
Claims (10)
1. A chip for neural network inference comprises a storage and computation unit, and is characterized in that the storage and computation unit comprises a plurality of storage and computation arrays with different input lengths, and the plurality of storage and computation arrays are used for deploying convolution kernels corresponding to the input lengths of the storage and computation arrays.
2. The chip for neural network inference of claim 1, wherein the convolution kernels deployed on the storage compute array are pruned and clustered.
3. The chip for neural network inference according to claim 1, wherein said memory computation unit includes four memory computation arrays, and the input lengths of the four memory computation arrays are 1 bit, 3 bits, 6 bits, and 9 bits, respectively.
4. The chip for neural network inference of claim 3, wherein the 1-bit storage computation array is used to deploy convolution kernels of 1-bit parameters;
the 3-bit storage computing array is used for deploying convolution kernels of 2-bit or 3-bit parameters;
the 6-bit storage computing array is used for deploying convolution kernels with parameters from 4 bits to 6 bits;
the 9-bit storage compute array is used to deploy convolution kernels of 7-bit to 9-bit parameters.
5. The chip for neural network inference of claim 1, wherein each said storage compute array corresponds to a convolution kernel, and a plurality of said storage compute arrays operate in parallel.
6. A method for accelerating neural network inference, based on the chip for neural network inference of claim 1, the method comprising:
pruning and clustering convolution kernel parameters of each layer of the convolution neural network;
and distributing the clustered convolution kernels to a storage calculation array of the chip for neural network inference, which corresponds to the parameter bits of the convolution kernels.
7. The method for accelerating neural network inference as claimed in claim 6, wherein said pruning and clustering convolution kernel parameters of each layer of the convolution neural network comprises:
pruning the parameters of each layer of convolution kernel of the convolution neural network;
quantizing the parameters of each layer of the convolution kernel after pruning;
and clustering the quantized convolution kernel parameters of each layer.
8. The method for accelerating neural network inference as claimed in claim 7, wherein said pruning of the parameters of the convolution kernels of each layer of the convolutional neural network comprises:
and acquiring parameter values of each layer of convolution kernel of the convolution neural network, and cutting off the parameters of each layer of convolution kernel, which are smaller than a preset threshold value.
9. The method for accelerating neural network inference as claimed in claim 6, wherein said assigning the clustered convolution kernels to a storage computation array of the chip for neural network inference corresponding to parameter bits of the convolution kernels comprises:
allocating the convolution kernel of the 1-bit parameter to a 1-bit storage calculation array;
allocating convolution kernels of 2-bit or 3-bit parameters to a 3-bit storage computation array;
allocating the convolution kernel with 4-bit to 6-bit parameters to a 6-bit storage calculation array;
the convolution kernel with 7-bit to 9-bit parameters is assigned to a 9-bit storage compute array.
10. A storage medium having computer program instructions stored thereon that, when executed, implement the method of accelerating neural network inference of any of claims 6-9.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110336218.4A CN113138957A (en) | 2021-03-29 | 2021-03-29 | Chip for neural network inference and method for accelerating neural network inference |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110336218.4A CN113138957A (en) | 2021-03-29 | 2021-03-29 | Chip for neural network inference and method for accelerating neural network inference |
Publications (1)
Publication Number | Publication Date |
---|---|
CN113138957A true CN113138957A (en) | 2021-07-20 |
Family
ID=76810135
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110336218.4A Pending CN113138957A (en) | 2021-03-29 | 2021-03-29 | Chip for neural network inference and method for accelerating neural network inference |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113138957A (en) |
Citations (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108256640A (en) * | 2016-12-28 | 2018-07-06 | 上海磁宇信息科技有限公司 | Convolutional neural networks implementation method |
CN109409512A (en) * | 2018-09-27 | 2019-03-01 | 西安交通大学 | A kind of neural computing unit, computing array and its construction method of flexibly configurable |
CN109635944A (en) * | 2018-12-24 | 2019-04-16 | 西安交通大学 | A kind of sparse convolution neural network accelerator and implementation method |
CN110222818A (en) * | 2019-05-13 | 2019-09-10 | 西安交通大学 | A kind of more bank ranks intertexture reading/writing methods for the storage of convolutional neural networks data |
CN110334799A (en) * | 2019-07-12 | 2019-10-15 | 电子科技大学 | Integrated ANN Reasoning and training accelerator and its operation method are calculated based on depositing |
CN110991608A (en) * | 2019-11-25 | 2020-04-10 | 合肥恒烁半导体有限公司 | Convolutional neural network quantitative calculation method and system |
CN111242277A (en) * | 2019-12-27 | 2020-06-05 | 中国电子科技集团公司第五十二研究所 | Convolutional neural network accelerator supporting sparse pruning and based on FPGA design |
WO2020216227A1 (en) * | 2019-04-24 | 2020-10-29 | 华为技术有限公司 | Image classification method and apparatus, and data processing method and apparatus |
CN111985602A (en) * | 2019-05-24 | 2020-11-24 | 华为技术有限公司 | Neural network computing device, method and computing device |
CN112395247A (en) * | 2020-11-18 | 2021-02-23 | 北京灵汐科技有限公司 | Data processing method and storage and calculation integrated chip |
CN112418388A (en) * | 2019-08-23 | 2021-02-26 | 中兴通讯股份有限公司 | Method and device for realizing deep convolutional neural network processing |
-
2021
- 2021-03-29 CN CN202110336218.4A patent/CN113138957A/en active Pending
Patent Citations (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108256640A (en) * | 2016-12-28 | 2018-07-06 | 上海磁宇信息科技有限公司 | Convolutional neural networks implementation method |
CN109409512A (en) * | 2018-09-27 | 2019-03-01 | 西安交通大学 | A kind of neural computing unit, computing array and its construction method of flexibly configurable |
CN109635944A (en) * | 2018-12-24 | 2019-04-16 | 西安交通大学 | A kind of sparse convolution neural network accelerator and implementation method |
WO2020216227A1 (en) * | 2019-04-24 | 2020-10-29 | 华为技术有限公司 | Image classification method and apparatus, and data processing method and apparatus |
CN110222818A (en) * | 2019-05-13 | 2019-09-10 | 西安交通大学 | A kind of more bank ranks intertexture reading/writing methods for the storage of convolutional neural networks data |
CN111985602A (en) * | 2019-05-24 | 2020-11-24 | 华为技术有限公司 | Neural network computing device, method and computing device |
CN110334799A (en) * | 2019-07-12 | 2019-10-15 | 电子科技大学 | Integrated ANN Reasoning and training accelerator and its operation method are calculated based on depositing |
CN112418388A (en) * | 2019-08-23 | 2021-02-26 | 中兴通讯股份有限公司 | Method and device for realizing deep convolutional neural network processing |
CN110991608A (en) * | 2019-11-25 | 2020-04-10 | 合肥恒烁半导体有限公司 | Convolutional neural network quantitative calculation method and system |
CN111242277A (en) * | 2019-12-27 | 2020-06-05 | 中国电子科技集团公司第五十二研究所 | Convolutional neural network accelerator supporting sparse pruning and based on FPGA design |
CN112395247A (en) * | 2020-11-18 | 2021-02-23 | 北京灵汐科技有限公司 | Data processing method and storage and calculation integrated chip |
Non-Patent Citations (1)
Title |
---|
李永博;王琴;蒋剑飞;: "稀疏卷积神经网络加速器设计", 微电子学与计算机, no. 06, pages 34 - 38 * |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110097186B (en) | Neural network heterogeneous quantitative training method | |
CN109445935B (en) | Self-adaptive configuration method of high-performance big data analysis system in cloud computing environment | |
CN111814973B (en) | Memory computing system suitable for neural ordinary differential equation network computing | |
US11928599B2 (en) | Method and device for model compression of neural network | |
CN112272102B (en) | Method and device for unloading and scheduling edge network service | |
CN112153145A (en) | Method and device for unloading calculation tasks facing Internet of vehicles in 5G edge environment | |
CN116644804B (en) | Distributed training system, neural network model training method, device and medium | |
CN113283587B (en) | Winograd convolution operation acceleration method and acceleration module | |
CN115776524A (en) | Internet of things mass data multistage scheduling transmission system for intelligent manufacturing | |
CN114580636A (en) | Neural network lightweight deployment method based on three-target joint optimization | |
CN112861996A (en) | Deep neural network model compression method and device, electronic equipment and storage medium | |
CN110263917B (en) | Neural network compression method and device | |
CN116502691A (en) | Deep convolutional neural network mixed precision quantization method applied to FPGA | |
CN111860867A (en) | Model training method and system for hybrid heterogeneous system and related device | |
CN116962176B (en) | Data processing method, device and system of distributed cluster and storage medium | |
CN113886092A (en) | Computation graph execution method and device and related equipment | |
CN117521752A (en) | Neural network acceleration method and system based on FPGA | |
CN113138957A (en) | Chip for neural network inference and method for accelerating neural network inference | |
CN111860810A (en) | Neural network operation method, device and equipment based on FPGA | |
CN116302481B (en) | Resource allocation method and system based on sparse knowledge graph link prediction | |
CN112612601A (en) | Intelligent model training method and system for distributed image recognition | |
CN115130672B (en) | Software and hardware collaborative optimization convolutional neural network calculation method and device | |
CN114065923A (en) | Compression method, system and accelerating device of convolutional neural network | |
CN111260049A (en) | Neural network implementation method based on domestic embedded system | |
CN110728372A (en) | Cluster design method and cluster architecture for dynamic loading of artificial intelligence model |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |