CN112598129A - Adjustable hardware-aware pruning and mapping framework based on ReRAM neural network accelerator - Google Patents
Adjustable hardware-aware pruning and mapping framework based on ReRAM neural network accelerator Download PDFInfo
- Publication number
- CN112598129A CN112598129A CN202110236303.3A CN202110236303A CN112598129A CN 112598129 A CN112598129 A CN 112598129A CN 202110236303 A CN202110236303 A CN 202110236303A CN 112598129 A CN112598129 A CN 112598129A
- Authority
- CN
- China
- Prior art keywords
- pruning
- neural network
- reram
- actor
- decision
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/082—Learning methods modifying the architecture, e.g. adding, deleting or silencing nodes or connections
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- General Health & Medical Sciences (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Evolutionary Computation (AREA)
- Artificial Intelligence (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Health & Medical Sciences (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
The invention provides a pruning and mapping framework based on adjustable hardware perception of a ReRAM neural network accelerator, which comprises a DDPG agent and the ReRAM neural network accelerator; the DDPG agent consists of a behavior decision module Actor and a Critic judgment module, wherein the behavior decision module Actor is used for making a pruning decision on a neural network; the ReRAM neural network accelerator is used for mapping a model formed under a pruning decision generated by the behavior decision module Actor, and feeding back performance parameters mapped by the model under the pruning decision as signals to the Critic; the performance parameters include energy consumption, delay and model accuracy of the simulator; the Critic judgment module updates the reward function value according to the feedback performance parameters and guides the pruning decision of the action decision module Actor at the next stage; the method of the invention utilizes the reinforcement learning DDPG agent to make a pruning scheme which is most matched with hardware and user requirements and has the highest efficiency, thereby improving the delay performance and the energy consumption performance on the hardware while ensuring the accuracy.
Description
Technical Field
The invention relates to the field of computer science artificial intelligence, in particular to a pruning and mapping framework aiming at adjustable hardware perception based on a ReRAM neural network accelerator.
Background
The deep neural network plays an important promoting role in the development of the fields of computer vision, natural language processing, robotics and the like, and the application of the neural network at the IoT equipment end is rapidly developed along with the development of a mobile Internet of things platform. Due to the computational intensive and massive data mobility of the neural network, the application of the neural network may result in high energy consumption and high latency, however, the IoT platform has limited computational resources and limited energy support, and thus the IoT devices need a more efficient neural network mapping scheme to reduce energy consumption and latency. Resistive random access memory (ReRAM) due to its very low energy leakage, high density storage and in-memory computing characteristics, ReRAM-based neural network accelerators provide solutions to the limitations of IoT devices. On the other hand, because a large number of sparse neural network models are larger and larger at present, a large amount of unnecessary resource waste and delay increase are caused, and pruning is carried out before the models are mapped to the ReRAM neural network accelerator, so that the size of the models can be greatly reduced, and the energy consumption of hardware and the delay of application are reduced. However, when the hardware specification and the type of the ReRAM neural network accelerator are different and the requirements of users for different levels of delay, energy consumption and the like are met, the traditional deep learning pruning scheme cannot sense the change of the hardware and the requirements of the users to generate the same pruning scheme, so that the performance inefficiency of model mapping on the hardware of the ReRAM neural network accelerator is caused, and the performance advantage development of the ReRAM neural network accelerator is restricted.
Disclosure of Invention
In order to more efficiently explore the mapping of a convolutional neural network on a ReRAM neural network accelerator according to the requirements of a mobile device user, the invention provides an adjustable intelligent hardware-aware pruning and mapping framework, wherein a feedback (such as delay, energy consumption, energy efficiency and the like) obtained by a reinforcement learning agent from the ReRAM neural network accelerator hardware is used for replacing signals (such as model size, floating point operation times and the like) which cannot be expressed on the hardware accelerator performance, a depth deterministic strategy gradient (DDPG) is used for searching and deciding a pruning strategy, so that the more friendly pruning strategy based on the ReRAM neural network accelerator is determined, the delay and the energy consumption of a pruned and mapped neural network model on a hardware accelerator are reduced, and the wearable mobile Internet of things device can realize the deep learning application under the limited resource, and the delay, the energy consumption and the energy consumption of the hardware and the user are reduced, The requirements of energy consumption are different, and a pruning and mapping framework which is most suitable for the requirements is found.
The technical scheme adopted by the invention is as follows:
a pruning and mapping framework based on adjustable hardware perception of a ReRAM neural network accelerator comprises a DDPG agent and the ReRAM neural network accelerator; the DDPG agent consists of a behavior decision module Actor and a Critic judgment module, wherein the behavior decision module Actor is used for making a pruning decision on a neural network model;
the ReRAM neural network accelerator is used for mapping a model formed under a pruning decision generated by the behavior decision module Actor, and feeding back performance parameters mapped by the model under the pruning decision as signals to the Critic; the performance parameters comprise energy consumption, delay and model accuracy of a ReRAM neural network accelerator;
the Critic judging module is used for updating the reward function value according to the feedback performance parameter, evaluating the performance of the action decision module Actor and guiding the pruning decision of the action decision module Actor at the next stage to make the reward function value converge;
the value of the reward function is selected according to the requirements of the userReward1 (energy consumption) and/orReward2 (deferred) update, the actual performance in hardware has been achieved:
Reward1=-Error×log(Energy)
Reward2=-Error×log(Latency)
wherein the content of the first and second substances,Error=1-accuracy,accuracyin order to be a measure of the accuracy of the model,Energyfor the power consumption performance of the ReRAM neural network accelerator,Latencyis the delay performance of the ReRAM neural network accelerator.
Further, the action decision module Actor for making a pruning decision on the neural network model specifically comprises:
the behavior decision module Actor is used for characterizing the second time according to the inputkState parameters of a layer neural networks k And outputting the sparse rate, and compressing the neural network model layer by using a compression algorithm according to the sparse rate of each layer. Namely: the current layer is compressed using a specified compression algorithm (e.g., channel pruning). Then, the agent moves to the next layerk+1, and receiving states k+1Until the last layer is completed.
Further, the state parameters k Characterization was performed using 8 features:
(k, type, in channels , out channels , stride, kernelsize, flops[k], a k-1)
whereinkIs an index of a layer or layers,typeis the kind of layer or layers that are,in channels the number of input channels is represented by,out channels the number of output channels is indicated,stridewhich represents the step size of the convolution,kernelsizerepresenting a convolutionKernel length, hence convolution kernel size ofin channels ×out channels ×kernelsize×stride;flops[k]Is the firstkNumber of floating point operations of layer and before passing to behavior decision module Actor at [0, 1 ]]Internal scaling;a k-1is the pruning action made by the previous layer and can be expressed by compression rate.
Because the traditional deep learning pruning optimization scheme guides the pruning decision of the reinforcement learning agent by using the floating point operation times or the size of the model as a pruning signal, when the hardware specification and the type of the ReRAM neural network accelerator are different and the requirements of users on different levels such as delay and energy consumption are met, the traditional deep learning pruning scheme cannot sense the change of the hardware and the requirements of the users to generate the same pruning scheme, so that the performance inefficiency of model mapping on the hardware of the ReRAM neural network accelerator is caused. The invention provides an adjustable intelligent hardware perception pruning scheme and a mapping framework, which are different from the original scheme that a neural network is directly mapped on a ReRAM neural network accelerator, the framework utilizes a deep deterministic strategy gradient (DDPG) in reinforcement learning to search and decide a pruning strategy, and selects actual performance (such as delay, energy consumption and the like) in hardware to feed back to an agent in reinforcement learning according to the requirements of a user. After hardware-aware pruning is carried out according to user requirements, the model is mapped to the ReRAM neural network accelerator, so that the delay and energy consumption of the neural network model applied to the accelerator can be greatly reduced, and the mapping performance is improved.
Drawings
FIG. 1 is an overall block diagram of the hardware-aware pruning and mapping framework of the present invention;
FIG. 2 is a flow chart of an experiment;
fig. 3 is a histogram comparing pruning strategies searched under the NeuroSim hardware configurations of three types of simulators, i.e., configuration 2, configuration 3, and configuration 4, under the delay perception scheme adopted by the VGG-16.
Detailed Description
FIG. 1 is an overall block diagram of the hardware-aware pruning and mapping framework of the present invention, as shown, including a DDPG agent and a ReRAM neural network accelerator; the ReRAM neural network accelerator comprises a plurality of Processing units (PEs), wherein each Processing unit consists of a cross array formed by a plurality of ReRAM units, an on-chip cache, a nonlinear activation Processing unit, a modulus-to-electric converter and other peripheral circuits (only the cross array, the on-chip cache and the nonlinear activation Processing unit are drawn in the figure). The DDPG agent consists of a behavior decision module Actor and a Critic judging module; the entire pruning and mapping framework contains two levels. In the first level, an Actor of a behavior decision module of the DDPG agent makes a pruning decision from the first level to the last level on a neural network model according to hardware feedback, maps the model formed under the pruning decision on a ReRAM neural network accelerator in the second level, and feeds back performance parameters mapped by the model under the pruning decision scheme as signals to a Critic judgment module in the DDPG agent of the first level. The Critic evaluation module is responsible for evaluating the performance of the behavior decision module Actor, updating the reward function value under the hardware type and the user requirement and guiding the next-stage pruning decision of the behavior decision module Actor. After a certain period number, the reward function value is converged, the system finds out an optimal pruning decision scheme, conducts CKPT model derivation after pruning according to the hardware perception pruning strategy, and then conducts model fine adjustment on the CKPT model so as to guarantee precision.
As a preferred scheme, the action decision module Actor of the DDPG agent makes a pruning decision from the first layer to the last layer on the neural network model according to the performance parameter feedback specifically includes:
the behavior decision module Actor receives a neural network model from the environmentkState parameter of layers k Outputting the sparse rate, and using a specified compression algorithm according to the sparse rate of each layer to obtain the compressed dataa k Compress the current layer, and then the proxy moves to the next layerk+1, and receiving states k+1。
For each layerkState parameter ofs k The characterization is performed by using 8 characteristics:
(k, type, in channels , out channels , stride, kernelsize, flops[k], a k-1) (1)
whereinkIs an index of a layer or layers,typeis the type of layer (including convolutional layers and fully-connected layers, denoted 0 and 1, respectively),in channels the number of input channels is represented by,out channels the number of output channels is indicated,stridewhich represents the step size of the convolution,kernelsizerepresents the convolution kernel length, so the convolution kernel size isin channels ×out channels ×kernelsize×stride;flops[k]Is the firstkNumber of floating point operations of layer and before passing to behavior decision module Actor at [0, 1 ]]Internal scaling;a k-1is the pruning action made by the previous layer.
In order to realize a pruning decision with finer granularity, the pruning action of the behavior decision module Actor adopts a compression rate to express,a k ϵ(0,1]namely: using a channel pruning algorithm, rounding to the nearest fraction that can ultimately result in an integer number of channels asa k And are combined witha k The current layer is compressed.
After the final layer of pruning decision is completed, the accuracy and the performance parameters (delay or energy consumption) of the hardware are evaluated on the ReRAM neural network accelerator by adopting a verification set, the value of the reward function is calculated and returned to the critic of the evaluation module.
The accuracy of the calculation is similar to the fine-tuned accuracy, so that the accuracy of the reward function can be evaluated without fine tuning for quick search.
The value of the reward function is passedReward1 and/orReward2, updating:
Reward1=-Error×log(Energy)
Reward2=-Error×log(Latency)
wherein the content of the first and second substances,Error=1-accuracy,accuracyin order to be a measure of the accuracy of the model,Energyfor the power consumption performance of the ReRAM neural network accelerator,Latencyis the delay performance of the ReRAM neural network accelerator.
When support for energy is very limited in some IoT devices and the user's demand for delay is not very urgent (i.e., the user places more emphasis on "energy" performance in pruning mappings), post-pruning accuracy and "energy" performance mapped on ReRAM neural network accelerator hardware may be considered in the reward function. While in some IoT devices the support for energy is somewhat sufficient and the user's demand for latency is large (i.e., the user places more emphasis on "latency" performance in the pruning mapping), the post-pruning accuracy and "latency" performance mapped on the ReRAM neural network accelerator hardware may be considered in the reward function. Therefore, the method can design the pruning scheme and the mapping frame more efficiently and accurately according to different hardware and user requirements.
In addition, the configuration of the ReRAM neural network accelerator can be adjusted, hardware perception is achieved, and a pruning scheme and a mapping framework under the optimal configuration and user requirements are obtained.
The invention is further illustrated below with reference to specific examples, in which the following experiments are carried out:
experimental configuration:
(1) operating the system: ubuntu 18.04.3 LTS;
(2) a CPU: model number 8-core Intel (R) Xeon (R) Gold 6126 CPU @ 2.60GHz, equipped with 32GB DRAM;
(3) GPU: tesla V10032 GB video memory;
(4) a storage device: 512GB, SK hynix SC311 SATA SSD; western Digital WDC WD40EZRZ-75G HDD;
configuring a neural network model:
(1) a neural network model: CIFAR10, Plain20 and VGG-16, and the structures of the CIFAR10, the Plain20 and the VGG-16 are shown in Table 1.
TABLE 1 neural network model and structural representation thereof
(2) Data set: cifar10, comprising 60000 color images, 32 × 32 in size, divided into 10 classes of 6000 images each, wherein 50000 images were used for training and 10000 images were used for testing;
(3) batch size: 1024 pictures/batch (CIFAR10, Plain20), 512 pictures/batch (VGG-16);
(4) the number of training rounds is as follows: 70 rounds (epoch);
(5) number of rounds of finetune: 50 epoch;
ReRAM neural network accelerator configuration:
experiments were performed using the simulator NeuroSim of the ReRAM neural network accelerator, the configuration of which is shown in table 2.
TABLE 2 simulator NeuroSim configuration
Experimental procedure
FIG. 2 is a flow chart of the whole pruning experiment. The method comprises the following steps:
the method comprises the following steps: when a user writes a neural network model code, inputting the pruning and mapping framework of the invention, pre-training the pruning and mapping framework and storing the pruning and mapping framework to obtain a CKPT file;
step two: searching out an optimal pruning strategy by using reinforcement learning and hardware perception;
step three: pruning is carried out according to the optimal pruning strategy obtained in the reinforcement learning in the step two, and the parameters after pruning are stored in a CKPT file;
step four: in order to ensure the accuracy rate after pruning, carrying out parameter fine adjustment on the CKPT model;
step five: and (5) simulating on a simulator NeuroSim to obtain the accuracy after pruning, and finishing the final pruning strategy mapping.
The final test results are:
in the original direct neural network mapping scheme (the number of floating point operations or the size of a model is used as a signal for guiding the pruning decision of the reinforcement learning agent), under the corresponding hardware configuration, the delays of a hardware simulator are 957150.6ns/image, 4830571.6ns/image and 1026976.8ns/image respectively for CIFAR10 (configuration 2), PIIAN 20 (configuration 1) and VGG-16 (configuration 2), and the energy consumption of the simulator is 23488814pJ, 15816979.0pJ and 8058001.0pJ respectively. By adopting the hardware perception-delay perception scheme, the delay performance of the three models is respectively improved by 57.358%, 6.771% and 38.017%, and the accuracy of Top5 is respectively improved by 0.210%, 0.190% and 0.290%. By adopting an energy consumption perception scheme, the energy consumption performance of the three models is respectively improved by 76.833%, 5.615% and 38.425%, and the accuracy of Top5 is respectively improved by 0.270%, 0.230% and 0.420%. By adopting the hardware perception pruning scheme and the mapping framework, the accuracy is ensured while the performance concerned by the user is improved.
Table 3 shows pruning strategies searched under the NeuroSim hardware configurations of the three simulators, configuration 2, configuration 3, and configuration 4, in the case that the VGG-16 employs the delay sensing scheme. Wherein the list in the pruning strategy represents the retention of each group of filter channels.
Table 3 pruning strategy searched under three simulator NeuroSim hardware configurations under delay perception scheme adopted by VGG-16
Fig. 3 is a histogram comparing pruning strategies searched under the NeuroSim hardware configurations of three types of simulators, i.e., configuration 2, configuration 3, and configuration 4, under the delay perception scheme adopted by the VGG-16. The abscissa represents the number of groups of filters, and the ordinate represents the channel retention under the pruning strategy. The histogram shows that the channel retention rate distribution of each group of filters has different trends under different hardware configurations. This also justifies the need for hardware-aware strategies in the face of different hardware configurations.
It should be understood that the above examples are only for clarity of illustration and are not intended to limit the embodiments. Other variations and modifications will be apparent to persons skilled in the art in light of the above description. This need not be, nor should all embodiments be exhaustive. And obvious variations or modifications of the invention may be made without departing from the scope of the invention.
Claims (3)
1. A pruning and mapping framework based on adjustable hardware perception of a ReRAM neural network accelerator is characterized by comprising a DDPG agent and the ReRAM neural network accelerator; the DDPG agent consists of a behavior decision module Actor and a Critic judgment module, wherein the behavior decision module Actor is used for making a pruning decision on a neural network model;
the ReRAM neural network accelerator is used for mapping a model formed under a pruning decision generated by the behavior decision module Actor, and feeding back performance parameters mapped by the model under the pruning decision as signals to the Critic; the performance parameters comprise energy consumption, delay and model accuracy of a ReRAM neural network accelerator;
the Critic judging module is used for updating the reward function value according to the feedback performance parameter, evaluating the performance of the action decision module Actor and guiding the pruning decision of the action decision module Actor at the next stage to make the reward function value converge;
the value of the reward function is passedReward1 and/orReward2, updating:
Reward1=-Error×log(Energy)
Reward2=-Error×log(Latency)
wherein the content of the first and second substances,Error=1-accuracy,accuracyin order to be a measure of the accuracy of the model,Energyfor the power consumption performance of the ReRAM neural network accelerator,Latencyis the delay performance of the ReRAM neural network accelerator.
2. The pruning and mapping framework of claim 1, wherein the behavior decision module Actor is configured to make a pruning decision for the neural network model by:
the behavior decision module Actor is used for characterizing the second time according to the inputkState parameters of a layer neural networks k Output the sparse rate, and according to each layerThe sparsity ratio uses a compression algorithm to compress the neural network model layer by layer.
3. The pruning and mapping framework of claim 2, wherein the state parameterss k Characterization was performed using 8 features:
(k, type, in channels , out channels , stride, kernelsize, flops[k], a k-1)
whereinkIs an index of a layer or layers,typeis the kind of layer or layers that are,in channels the number of input channels is represented by,out channels the number of output channels is indicated,stridewhich represents the step size of the convolution,kernelsizerepresents the convolution kernel length, so the convolution kernel size isin channels ×out channels ×kernelsize×stride;flops[k]Is the firstkNumber of floating point operations of layer and before passing to behavior decision module Actor at [0, 1 ]]Internal scaling;a k-1is the pruning action made by the previous layer.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110236303.3A CN112598129A (en) | 2021-03-03 | 2021-03-03 | Adjustable hardware-aware pruning and mapping framework based on ReRAM neural network accelerator |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110236303.3A CN112598129A (en) | 2021-03-03 | 2021-03-03 | Adjustable hardware-aware pruning and mapping framework based on ReRAM neural network accelerator |
Publications (1)
Publication Number | Publication Date |
---|---|
CN112598129A true CN112598129A (en) | 2021-04-02 |
Family
ID=75210318
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110236303.3A Pending CN112598129A (en) | 2021-03-03 | 2021-03-03 | Adjustable hardware-aware pruning and mapping framework based on ReRAM neural network accelerator |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN112598129A (en) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113392969A (en) * | 2021-05-14 | 2021-09-14 | 宁波物栖科技有限公司 | Model pruning method for reducing power consumption of CNN accelerator based on ReRAM |
CN114240192A (en) * | 2021-12-21 | 2022-03-25 | 特斯联科技集团有限公司 | Equipment optimization configuration method and system for park energy efficiency improvement based on reinforcement learning |
CN116069512A (en) * | 2023-03-23 | 2023-05-05 | 之江实验室 | Serverless efficient resource allocation method and system based on reinforcement learning |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111340227A (en) * | 2020-05-15 | 2020-06-26 | 支付宝(杭州)信息技术有限公司 | Method and device for compressing business prediction model through reinforcement learning model |
CN111600851A (en) * | 2020-04-27 | 2020-08-28 | 浙江工业大学 | Feature filtering defense method for deep reinforcement learning model |
CN112101534A (en) * | 2019-06-17 | 2020-12-18 | 英特尔公司 | Reconfigurable memory compression techniques for deep neural networks |
-
2021
- 2021-03-03 CN CN202110236303.3A patent/CN112598129A/en active Pending
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112101534A (en) * | 2019-06-17 | 2020-12-18 | 英特尔公司 | Reconfigurable memory compression techniques for deep neural networks |
CN111600851A (en) * | 2020-04-27 | 2020-08-28 | 浙江工业大学 | Feature filtering defense method for deep reinforcement learning model |
CN111340227A (en) * | 2020-05-15 | 2020-06-26 | 支付宝(杭州)信息技术有限公司 | Method and device for compressing business prediction model through reinforcement learning model |
Non-Patent Citations (2)
Title |
---|
KUAN WANG等: "HAQ: Hardware-Aware Automated Quantization with Mixed Precision", 《2019 IEEE CVPR》 * |
YIHUI HE等: "AMC: AutoML for Model Compression and Acceleration on Mobile Devices", 《EECV 2018》 * |
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113392969A (en) * | 2021-05-14 | 2021-09-14 | 宁波物栖科技有限公司 | Model pruning method for reducing power consumption of CNN accelerator based on ReRAM |
CN113392969B (en) * | 2021-05-14 | 2022-05-03 | 宁波物栖科技有限公司 | Model pruning method for reducing power consumption of CNN accelerator based on ReRAM |
CN114240192A (en) * | 2021-12-21 | 2022-03-25 | 特斯联科技集团有限公司 | Equipment optimization configuration method and system for park energy efficiency improvement based on reinforcement learning |
CN114240192B (en) * | 2021-12-21 | 2022-06-24 | 特斯联科技集团有限公司 | Equipment optimization configuration method and system for park energy efficiency improvement based on reinforcement learning |
CN116069512A (en) * | 2023-03-23 | 2023-05-05 | 之江实验室 | Serverless efficient resource allocation method and system based on reinforcement learning |
CN116069512B (en) * | 2023-03-23 | 2023-08-04 | 之江实验室 | Serverless efficient resource allocation method and system based on reinforcement learning |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110378468B (en) | Neural network accelerator based on structured pruning and low bit quantization | |
Sohoni et al. | Low-memory neural network training: A technical report | |
CN112598129A (en) | Adjustable hardware-aware pruning and mapping framework based on ReRAM neural network accelerator | |
Yuan et al. | High performance CNN accelerators based on hardware and algorithm co-optimization | |
CN110413255B (en) | Artificial neural network adjusting method and device | |
CN111079899A (en) | Neural network model compression method, system, device and medium | |
WO2020238237A1 (en) | Power exponent quantization-based neural network compression method | |
JP2023523029A (en) | Image recognition model generation method, apparatus, computer equipment and storage medium | |
CN114677548B (en) | Neural network image classification system and method based on resistive random access memory | |
Deng et al. | Reduced-precision memory value approximation for deep learning | |
TW202022798A (en) | Method of processing convolution neural network | |
CN117574976B (en) | Large language model software and hardware collaborative quantization acceleration calculation method and system | |
Struharik et al. | Conna–compressed cnn hardware accelerator | |
CN111563160A (en) | Text automatic summarization method, device, medium and equipment based on global semantics | |
CN114970853A (en) | Cross-range quantization convolutional neural network compression method | |
Guan et al. | Recursive binary neural network training model for efficient usage of on-chip memory | |
Qi et al. | Learning low resource consumption cnn through pruning and quantization | |
CN112528598B (en) | Automatic text abstract evaluation method based on pre-training language model and information theory | |
CN113240090A (en) | Image processing model generation method, image processing device and electronic equipment | |
CN117151178A (en) | FPGA-oriented CNN customized network quantification acceleration method | |
CN112183744A (en) | Neural network pruning method and device | |
CN116956997A (en) | LSTM model quantization retraining method, system and equipment for time sequence data processing | |
CN113554097B (en) | Model quantization method and device, electronic equipment and storage medium | |
TW202145078A (en) | Computing method with dynamic minibatch sizes and computing system and computer-readable storage media for performing the same | |
CN109829054A (en) | A kind of file classification method and system |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20210402 |
|
RJ01 | Rejection of invention patent application after publication |