CN113138957A - Chip for neural network inference and method for accelerating neural network inference - Google Patents

Chip for neural network inference and method for accelerating neural network inference Download PDF

Info

Publication number
CN113138957A
CN113138957A CN202110336218.4A CN202110336218A CN113138957A CN 113138957 A CN113138957 A CN 113138957A CN 202110336218 A CN202110336218 A CN 202110336218A CN 113138957 A CN113138957 A CN 113138957A
Authority
CN
China
Prior art keywords
neural network
storage
bit
convolution
parameters
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110336218.4A
Other languages
Chinese (zh)
Inventor
聂玉虎
林龙
崔文朋
史存存
刘瑞
王岳
郑哲
万能
汪晓
章海斌
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
State Grid Corp of China SGCC
State Grid Information and Telecommunication Co Ltd
Global Energy Interconnection Research Institute
Beijing Smartchip Microelectronics Technology Co Ltd
Overhaul Branch of State Grid Anhui Electric Power Co Ltd
Original Assignee
State Grid Corp of China SGCC
State Grid Information and Telecommunication Co Ltd
Global Energy Interconnection Research Institute
Beijing Smartchip Microelectronics Technology Co Ltd
Overhaul Branch of State Grid Anhui Electric Power Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by State Grid Corp of China SGCC, State Grid Information and Telecommunication Co Ltd, Global Energy Interconnection Research Institute, Beijing Smartchip Microelectronics Technology Co Ltd, Overhaul Branch of State Grid Anhui Electric Power Co Ltd filed Critical State Grid Corp of China SGCC
Priority to CN202110336218.4A priority Critical patent/CN113138957A/en
Publication of CN113138957A publication Critical patent/CN113138957A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F15/00Digital computers in general; Data processing equipment in general
    • G06F15/76Architectures of general purpose stored program computers
    • G06F15/78Architectures of general purpose stored program computers comprising a single central processing unit
    • G06F15/7807System on chip, i.e. computer system on a single chip; System in package, i.e. computer system on one or more chips in a single package
    • G06F15/781On-chip cache; Off-chip memory
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F15/00Digital computers in general; Data processing equipment in general
    • G06F15/76Architectures of general purpose stored program computers
    • G06F15/78Architectures of general purpose stored program computers comprising a single central processing unit
    • G06F15/7807System on chip, i.e. computer system on a single chip; System in package, i.e. computer system on one or more chips in a single package
    • G06F15/7817Specially adapted for signal processing, e.g. Harvard architectures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/082Learning methods modifying the architecture, e.g. adding, deleting or silencing nodes or connections
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Computer Hardware Design (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computing Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • Computational Linguistics (AREA)
  • Molecular Biology (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Data Mining & Analysis (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Microelectronics & Electronic Packaging (AREA)
  • Health & Medical Sciences (AREA)
  • Signal Processing (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention relates to the field of artificial intelligence, and provides a chip for neural network reasoning and a method for accelerating the neural network reasoning. The chip for neural network inference comprises a storage and calculation unit, wherein the storage and calculation unit comprises a plurality of storage and calculation arrays with different input lengths, and the plurality of storage and calculation arrays are used for deploying convolution kernels corresponding to the input lengths of the storage and calculation arrays. According to the invention, the storage calculation arrays corresponding to different input lengths are additionally arranged in the storage calculation unit to adapt to the pruned convolution kernels, so that the power consumption is reduced, the computing resources are maximally utilized, and the requirements of computing resource utilization rate and low power consumption can be met.

Description

Chip for neural network inference and method for accelerating neural network inference
Technical Field
The invention relates to the field of artificial intelligence, in particular to a chip for neural network reasoning and a method for accelerating the neural network reasoning.
Background
The convolutional neural network algorithm comprises a large number of parameters, the parameter quantity of the convolutional neural network model obtained by adopting a conventional training method is very large and is usually in the order of hundreds of megameters, while the inference operation needs to consume a large number of operation resources, and an embedded platform with precious hardware resources cannot bear the storage burden.
In order to accelerate the reasoning speed of the convolutional neural network algorithm on a chip, various model lightweight methods are developed at present. Network pruning is a widely used method in deep neural network compression. For example, an Optimal Brain Damage (OBD) method, in which all weight parameters in the network are regarded as a single parameter, improves the accuracy and generalization capability of the network by removing insignificant weights in the network with second derivative approximation parameter significance based on diagonal hypothesis, extreme hypothesis and quadratic hypothesis during optimization. Because single weight pruning is unstructured, parameters of a pruning part can obtain a sparse network, so that convolution kernels after pruning are irregular, and the problem that a chip is difficult to consider both computational resource utilization rate and low power consumption is solved.
Disclosure of Invention
The invention aims to provide a chip for neural network inference and a method for accelerating the neural network inference, so as to solve the problem that the chip is difficult to take account of computational resource utilization rate and low power consumption.
In order to achieve the above object, an aspect of the present invention provides a chip for neural network inference, including a storage unit, where the storage unit includes a plurality of storage computation arrays with different input lengths, and the plurality of storage computation arrays are used to deploy convolution kernels corresponding to the input lengths of the storage computation arrays.
Further, the convolution kernels deployed on the storage compute array are pruned and clustered.
Further, the storage and computation unit comprises four storage and computation arrays, and the input lengths of the four storage and computation arrays are 1 bit, 3 bits, 6 bits and 9 bits respectively.
Further, the 1-bit storage computation array is used for deploying convolution kernels of 1-bit parameters;
the 3-bit storage computing array is used for deploying convolution kernels of 2-bit or 3-bit parameters;
the 6-bit storage computing array is used for deploying convolution kernels with parameters from 4 bits to 6 bits;
the 9-bit storage compute array is used to deploy convolution kernels of 7-bit to 9-bit parameters.
Furthermore, each storage computation array corresponds to a convolution kernel, and a plurality of storage computation arrays perform parallel operation.
Another aspect of the present invention provides a method for accelerating neural network inference, based on the chip for neural network inference of claim 1, the method comprising:
pruning and clustering convolution kernel parameters of each layer of the convolution neural network;
and distributing the clustered convolution kernels to a storage calculation array of the chip for neural network inference, which corresponds to the parameter bits of the convolution kernels.
Further, the pruning and clustering of the parameters of each layer of convolution kernel of the convolutional neural network includes: pruning the parameters of each layer of convolution kernel of the convolution neural network; quantizing the parameters of each layer of the convolution kernel after pruning; and clustering the quantized convolution kernel parameters of each layer.
Further, the pruning of the parameters of each layer of convolution kernel of the convolutional neural network includes:
and acquiring parameter values of each layer of convolution kernel of the convolution neural network, and cutting off the parameters of each layer of convolution kernel, which are smaller than a preset threshold value.
Further, the allocating the clustered convolution kernels to the storage computation array of the chip for neural network inference corresponding to the parameter bits of the convolution kernels includes:
allocating the convolution kernel of the 1-bit parameter to a 1-bit storage calculation array;
allocating convolution kernels of 2-bit or 3-bit parameters to a 3-bit storage computation array;
allocating the convolution kernel with 4-bit to 6-bit parameters to a 6-bit storage calculation array;
the convolution kernel with 7-bit to 9-bit parameters is assigned to a 9-bit storage compute array.
The present invention also provides a storage medium having stored thereon computer program instructions which, when executed, implement the method of accelerating neural network inference described above.
According to the chip for neural network inference, the storage calculation arrays corresponding to different input lengths are additionally arranged in the storage calculation unit to be matched with the pruned convolution kernels, so that the power consumption is reduced, the computing resources are maximally utilized, and the requirements of computing resource utilization rate and low power consumption can be met.
Additional features and advantages of embodiments of the invention will be set forth in the detailed description which follows.
Drawings
The accompanying drawings, which are included to provide a further understanding of the embodiments of the invention and are incorporated in and constitute a part of this specification, illustrate embodiments of the invention and together with the description serve to explain the embodiments of the invention without limiting the embodiments of the invention. In the drawings:
FIG. 1 is a block diagram of a memory unit of a chip for neural network inference provided by an embodiment of the present invention;
FIGS. 2-5 are exemplary diagrams of a memory compute array of a chip for neural network inference provided by one embodiment of the present invention;
FIG. 6 is an exemplary diagram of a convolution kernel corresponding to the storage compute array shown in FIGS. 2-5;
FIG. 7 is a flow chart of a method for accelerating neural network inference provided by an embodiment of the present invention.
Detailed Description
The following detailed description of embodiments of the invention refers to the accompanying drawings. It should be understood that the detailed description and specific examples, while indicating the present invention, are given by way of illustration and explanation only, not limitation.
Typically, Convolutional Neural Networks (CNN) are implemented using a 3 × 3 convolution kernel. In order to accelerate the reasoning of the neural network algorithm, a chip for reasoning operation is designed in a mode of maximizing the parallelism of convolution kernel operation. However, the input dimensionality of the pruned neural network operation unit is not fixed, because the parameters of the cut convolution kernels of different convolution layers are not fixed (i.e. the input length of the convolution kernels is not fixed), if the operation units with the same number of bits are adopted for each convolution kernel to perform inference calculation, the computational power resource cannot be maximally utilized, the power consumption is increased, and the inference efficiency is not high.
Fig. 1 is a block diagram of a storage unit of a chip for neural network inference provided in an embodiment of the present invention. The embodiment of the invention provides a storage and calculation integrated chip for neural network reasoning, which comprises a storage and calculation unit. As shown in fig. 1, the storage computation unit includes a plurality of storage computation arrays with different input lengths, and the plurality of storage computation arrays are used for deploying convolution kernels corresponding to the input lengths of the storage computation arrays. And the convolution kernels deployed on each storage computing array are subjected to pruning and clustering processing. The chip provided by the embodiment of the invention reduces the power consumption, simultaneously maximally utilizes the computing resources, and meets the requirements of computing resource utilization rate and low power consumption.
Fig. 2-5 are exemplary diagrams of a memory compute array of a chip for neural network inference provided by an embodiment of the invention. The storage unit of the present embodiment includes four storage calculation arrays, the input lengths of which are 1 bit, 3 bits, 6 bits, and 9 bits, respectively, as shown in fig. 2 to 5, fig. 2 is a storage calculation array with an input length of 1 bit, fig. 3 is a storage calculation array with an input length of 3 bits, fig. 4 is a storage calculation array with an input length of 6 bits, and fig. 5 is a storage calculation array with an input length of 9 bits. Wherein, the 1-bit storage calculation array is used for deploying convolution kernels with 1-bit parameters; the 3-bit storage computing array is used for deploying convolution kernels with 2-bit or 3-bit parameters; the 6-bit storage computing array is used for deploying convolution kernels with parameters from 4 bits to 6 bits; a 9-bit storage compute array is used to deploy convolution kernels of 7-bit to 9-bit parameters. Fig. 6 is an exemplary diagram of a convolution kernel corresponding to the storage compute array shown in fig. 2-5. As shown in fig. 6, from left to right, the first convolution kernel corresponds to the 1-bit storage and computation array shown in fig. 2, the second convolution kernel corresponds to the 3-bit storage and computation array shown in fig. 3, the third convolution kernel corresponds to the 6-bit storage and computation array shown in fig. 4, and the fourth convolution kernel corresponds to the 9-bit storage and computation array shown in fig. 5. In the process of reasoning operation of the convolutional neural network, when the pruned and quantized convolutional kernels are distributed, each convolutional kernel is distributed to a corresponding storage calculation array, namely each storage calculation array corresponds to one convolutional kernel, and a plurality of storage calculation arrays perform parallel operation.
According to the storage and calculation integrated chip for neural network inference, storage and calculation arrays corresponding to different input lengths are additionally arranged in the storage and calculation unit to be matched with the pruned convolution kernels, so that the power consumption is reduced, the computing resources are maximally utilized, and the requirements of computing resource utilization rate and low power consumption can be met.
FIG. 7 is a flow chart of a method for accelerating neural network inference provided by an embodiment of the present invention. As shown in fig. 7, the method for accelerating neural network inference provided in this embodiment is based on the above chip for neural network inference, and includes the following steps:
and S1, pruning and clustering the convolution kernel parameters of each layer of the convolution neural network.
In a specific embodiment, pruning and clustering comprise the following sub-steps:
s11, pruning the parameters of each layer of convolution kernel of the convolutional neural network, for example, obtaining parameter values of each layer of convolution kernel of the convolutional neural network, and pruning the parameters of each layer of convolution kernel that are smaller than a preset threshold.
And S12, quantizing each layer of the pruned convolution kernel parameters, for example, int8 quantizing the retained convolution kernel parameters after pruning, and converting the convolution operation (multiplication and addition instruction) of float 32bit into the convolution operation of int8 to reduce the calculation amount.
S13, clustering the quantized convolution kernel parameters of each layer, namely clustering the reserved parameters one by one to obtain cluster0, cluster1, cluster2 and cluster3, wherein the cluster0 has fewer parameters and the cluster3 has more parameters.
And S2, distributing the clustered convolution kernels to a storage calculation array of the chip for neural network inference, wherein the storage calculation array corresponds to the parameter bits of the convolution kernels. Specifically, a convolution kernel with 1-bit parameters is allocated to a 1-bit storage computation array; allocating convolution kernels of 2-bit or 3-bit parameters to a 3-bit storage computation array; allocating the convolution kernel with 4-bit to 6-bit parameters to a 6-bit storage calculation array; the convolution kernel with 7-bit to 9-bit parameters is assigned to a 9-bit storage compute array. For example, cluster0, cluster1, cluster2 and cluster3 are input into 1-bit, 3-bit, 6-bit and 9-bit storage computing arrays respectively.
According to the method for accelerating neural network inference, the training convolution kernels after pruning and clustering are deployed to the storage calculation array with the corresponding input length, computational resources are utilized to the maximum extent, power consumption is reduced, and neural network algorithm inference can be accelerated.
Embodiments of the present invention also provide a machine-readable storage medium having stored thereon computer program instructions which, when executed, implement the above-described method of accelerating neural network inference.
As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The present application is described with reference to flowchart illustrations and/or block diagrams of methods, systems and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
Finally, it should be noted that: the above embodiments are only for illustrating the technical solutions of the present invention and not for limiting the same, and although the present invention is described in detail with reference to the above embodiments, those of ordinary skill in the art should understand that: modifications and equivalents may be made to the embodiments of the invention without departing from the spirit and scope of the invention, which is to be covered by the claims.

Claims (10)

1. A chip for neural network inference comprises a storage and computation unit, and is characterized in that the storage and computation unit comprises a plurality of storage and computation arrays with different input lengths, and the plurality of storage and computation arrays are used for deploying convolution kernels corresponding to the input lengths of the storage and computation arrays.
2. The chip for neural network inference of claim 1, wherein the convolution kernels deployed on the storage compute array are pruned and clustered.
3. The chip for neural network inference according to claim 1, wherein said memory computation unit includes four memory computation arrays, and the input lengths of the four memory computation arrays are 1 bit, 3 bits, 6 bits, and 9 bits, respectively.
4. The chip for neural network inference of claim 3, wherein the 1-bit storage computation array is used to deploy convolution kernels of 1-bit parameters;
the 3-bit storage computing array is used for deploying convolution kernels of 2-bit or 3-bit parameters;
the 6-bit storage computing array is used for deploying convolution kernels with parameters from 4 bits to 6 bits;
the 9-bit storage compute array is used to deploy convolution kernels of 7-bit to 9-bit parameters.
5. The chip for neural network inference of claim 1, wherein each said storage compute array corresponds to a convolution kernel, and a plurality of said storage compute arrays operate in parallel.
6. A method for accelerating neural network inference, based on the chip for neural network inference of claim 1, the method comprising:
pruning and clustering convolution kernel parameters of each layer of the convolution neural network;
and distributing the clustered convolution kernels to a storage calculation array of the chip for neural network inference, which corresponds to the parameter bits of the convolution kernels.
7. The method for accelerating neural network inference as claimed in claim 6, wherein said pruning and clustering convolution kernel parameters of each layer of the convolution neural network comprises:
pruning the parameters of each layer of convolution kernel of the convolution neural network;
quantizing the parameters of each layer of the convolution kernel after pruning;
and clustering the quantized convolution kernel parameters of each layer.
8. The method for accelerating neural network inference as claimed in claim 7, wherein said pruning of the parameters of the convolution kernels of each layer of the convolutional neural network comprises:
and acquiring parameter values of each layer of convolution kernel of the convolution neural network, and cutting off the parameters of each layer of convolution kernel, which are smaller than a preset threshold value.
9. The method for accelerating neural network inference as claimed in claim 6, wherein said assigning the clustered convolution kernels to a storage computation array of the chip for neural network inference corresponding to parameter bits of the convolution kernels comprises:
allocating the convolution kernel of the 1-bit parameter to a 1-bit storage calculation array;
allocating convolution kernels of 2-bit or 3-bit parameters to a 3-bit storage computation array;
allocating the convolution kernel with 4-bit to 6-bit parameters to a 6-bit storage calculation array;
the convolution kernel with 7-bit to 9-bit parameters is assigned to a 9-bit storage compute array.
10. A storage medium having computer program instructions stored thereon that, when executed, implement the method of accelerating neural network inference of any of claims 6-9.
CN202110336218.4A 2021-03-29 2021-03-29 Chip for neural network inference and method for accelerating neural network inference Pending CN113138957A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110336218.4A CN113138957A (en) 2021-03-29 2021-03-29 Chip for neural network inference and method for accelerating neural network inference

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110336218.4A CN113138957A (en) 2021-03-29 2021-03-29 Chip for neural network inference and method for accelerating neural network inference

Publications (1)

Publication Number Publication Date
CN113138957A true CN113138957A (en) 2021-07-20

Family

ID=76810135

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110336218.4A Pending CN113138957A (en) 2021-03-29 2021-03-29 Chip for neural network inference and method for accelerating neural network inference

Country Status (1)

Country Link
CN (1) CN113138957A (en)

Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108256640A (en) * 2016-12-28 2018-07-06 上海磁宇信息科技有限公司 Convolutional neural networks implementation method
CN109409512A (en) * 2018-09-27 2019-03-01 西安交通大学 A kind of neural computing unit, computing array and its construction method of flexibly configurable
CN109635944A (en) * 2018-12-24 2019-04-16 西安交通大学 A kind of sparse convolution neural network accelerator and implementation method
CN110222818A (en) * 2019-05-13 2019-09-10 西安交通大学 A kind of more bank ranks intertexture reading/writing methods for the storage of convolutional neural networks data
CN110334799A (en) * 2019-07-12 2019-10-15 电子科技大学 Integrated ANN Reasoning and training accelerator and its operation method are calculated based on depositing
CN110991608A (en) * 2019-11-25 2020-04-10 合肥恒烁半导体有限公司 Convolutional neural network quantitative calculation method and system
CN111242277A (en) * 2019-12-27 2020-06-05 中国电子科技集团公司第五十二研究所 Convolutional neural network accelerator supporting sparse pruning and based on FPGA design
WO2020216227A1 (en) * 2019-04-24 2020-10-29 华为技术有限公司 Image classification method and apparatus, and data processing method and apparatus
CN111985602A (en) * 2019-05-24 2020-11-24 华为技术有限公司 Neural network computing device, method and computing device
CN112395247A (en) * 2020-11-18 2021-02-23 北京灵汐科技有限公司 Data processing method and storage and calculation integrated chip
CN112418388A (en) * 2019-08-23 2021-02-26 中兴通讯股份有限公司 Method and device for realizing deep convolutional neural network processing

Patent Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108256640A (en) * 2016-12-28 2018-07-06 上海磁宇信息科技有限公司 Convolutional neural networks implementation method
CN109409512A (en) * 2018-09-27 2019-03-01 西安交通大学 A kind of neural computing unit, computing array and its construction method of flexibly configurable
CN109635944A (en) * 2018-12-24 2019-04-16 西安交通大学 A kind of sparse convolution neural network accelerator and implementation method
WO2020216227A1 (en) * 2019-04-24 2020-10-29 华为技术有限公司 Image classification method and apparatus, and data processing method and apparatus
CN110222818A (en) * 2019-05-13 2019-09-10 西安交通大学 A kind of more bank ranks intertexture reading/writing methods for the storage of convolutional neural networks data
CN111985602A (en) * 2019-05-24 2020-11-24 华为技术有限公司 Neural network computing device, method and computing device
CN110334799A (en) * 2019-07-12 2019-10-15 电子科技大学 Integrated ANN Reasoning and training accelerator and its operation method are calculated based on depositing
CN112418388A (en) * 2019-08-23 2021-02-26 中兴通讯股份有限公司 Method and device for realizing deep convolutional neural network processing
CN110991608A (en) * 2019-11-25 2020-04-10 合肥恒烁半导体有限公司 Convolutional neural network quantitative calculation method and system
CN111242277A (en) * 2019-12-27 2020-06-05 中国电子科技集团公司第五十二研究所 Convolutional neural network accelerator supporting sparse pruning and based on FPGA design
CN112395247A (en) * 2020-11-18 2021-02-23 北京灵汐科技有限公司 Data processing method and storage and calculation integrated chip

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
李永博;王琴;蒋剑飞;: "稀疏卷积神经网络加速器设计", 微电子学与计算机, no. 06, pages 34 - 38 *

Similar Documents

Publication Publication Date Title
CN110097186B (en) Neural network heterogeneous quantitative training method
CN109445935B (en) Self-adaptive configuration method of high-performance big data analysis system in cloud computing environment
CN111814973B (en) Memory computing system suitable for neural ordinary differential equation network computing
US11928599B2 (en) Method and device for model compression of neural network
CN112272102B (en) Method and device for unloading and scheduling edge network service
CN112153145A (en) Method and device for unloading calculation tasks facing Internet of vehicles in 5G edge environment
CN116644804B (en) Distributed training system, neural network model training method, device and medium
CN113283587B (en) Winograd convolution operation acceleration method and acceleration module
CN115776524A (en) Internet of things mass data multistage scheduling transmission system for intelligent manufacturing
CN114580636A (en) Neural network lightweight deployment method based on three-target joint optimization
CN112861996A (en) Deep neural network model compression method and device, electronic equipment and storage medium
CN110263917B (en) Neural network compression method and device
CN116502691A (en) Deep convolutional neural network mixed precision quantization method applied to FPGA
CN111860867A (en) Model training method and system for hybrid heterogeneous system and related device
CN116962176B (en) Data processing method, device and system of distributed cluster and storage medium
CN113886092A (en) Computation graph execution method and device and related equipment
CN117521752A (en) Neural network acceleration method and system based on FPGA
CN113138957A (en) Chip for neural network inference and method for accelerating neural network inference
CN111860810A (en) Neural network operation method, device and equipment based on FPGA
CN116302481B (en) Resource allocation method and system based on sparse knowledge graph link prediction
CN112612601A (en) Intelligent model training method and system for distributed image recognition
CN115130672B (en) Software and hardware collaborative optimization convolutional neural network calculation method and device
CN114065923A (en) Compression method, system and accelerating device of convolutional neural network
CN111260049A (en) Neural network implementation method based on domestic embedded system
CN110728372A (en) Cluster design method and cluster architecture for dynamic loading of artificial intelligence model

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination