CN112598129A - Adjustable hardware-aware pruning and mapping framework based on ReRAM neural network accelerator - Google Patents

Adjustable hardware-aware pruning and mapping framework based on ReRAM neural network accelerator Download PDF

Info

Publication number
CN112598129A
CN112598129A CN202110236303.3A CN202110236303A CN112598129A CN 112598129 A CN112598129 A CN 112598129A CN 202110236303 A CN202110236303 A CN 202110236303A CN 112598129 A CN112598129 A CN 112598129A
Authority
CN
China
Prior art keywords
pruning
neural network
reram
actor
decision
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110236303.3A
Other languages
Chinese (zh)
Inventor
何水兵
杨斯凌
陈伟剑
陈平
陈帅犇
银燕龙
任祖杰
曾令仿
杨弢
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhejiang University ZJU
Zhejiang Lab
Original Assignee
Zhejiang University ZJU
Zhejiang Lab
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhejiang University ZJU, Zhejiang Lab filed Critical Zhejiang University ZJU
Priority to CN202110236303.3A priority Critical patent/CN112598129A/en
Publication of CN112598129A publication Critical patent/CN112598129A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/082Learning methods modifying the architecture, e.g. adding, deleting or silencing nodes or connections

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention provides a pruning and mapping framework based on adjustable hardware perception of a ReRAM neural network accelerator, which comprises a DDPG agent and the ReRAM neural network accelerator; the DDPG agent consists of a behavior decision module Actor and a Critic judgment module, wherein the behavior decision module Actor is used for making a pruning decision on a neural network; the ReRAM neural network accelerator is used for mapping a model formed under a pruning decision generated by the behavior decision module Actor, and feeding back performance parameters mapped by the model under the pruning decision as signals to the Critic; the performance parameters include energy consumption, delay and model accuracy of the simulator; the Critic judgment module updates the reward function value according to the feedback performance parameters and guides the pruning decision of the action decision module Actor at the next stage; the method of the invention utilizes the reinforcement learning DDPG agent to make a pruning scheme which is most matched with hardware and user requirements and has the highest efficiency, thereby improving the delay performance and the energy consumption performance on the hardware while ensuring the accuracy.

Description

Adjustable hardware-aware pruning and mapping framework based on ReRAM neural network accelerator
Technical Field
The invention relates to the field of computer science artificial intelligence, in particular to a pruning and mapping framework aiming at adjustable hardware perception based on a ReRAM neural network accelerator.
Background
The deep neural network plays an important promoting role in the development of the fields of computer vision, natural language processing, robotics and the like, and the application of the neural network at the IoT equipment end is rapidly developed along with the development of a mobile Internet of things platform. Due to the computational intensive and massive data mobility of the neural network, the application of the neural network may result in high energy consumption and high latency, however, the IoT platform has limited computational resources and limited energy support, and thus the IoT devices need a more efficient neural network mapping scheme to reduce energy consumption and latency. Resistive random access memory (ReRAM) due to its very low energy leakage, high density storage and in-memory computing characteristics, ReRAM-based neural network accelerators provide solutions to the limitations of IoT devices. On the other hand, because a large number of sparse neural network models are larger and larger at present, a large amount of unnecessary resource waste and delay increase are caused, and pruning is carried out before the models are mapped to the ReRAM neural network accelerator, so that the size of the models can be greatly reduced, and the energy consumption of hardware and the delay of application are reduced. However, when the hardware specification and the type of the ReRAM neural network accelerator are different and the requirements of users for different levels of delay, energy consumption and the like are met, the traditional deep learning pruning scheme cannot sense the change of the hardware and the requirements of the users to generate the same pruning scheme, so that the performance inefficiency of model mapping on the hardware of the ReRAM neural network accelerator is caused, and the performance advantage development of the ReRAM neural network accelerator is restricted.
Disclosure of Invention
In order to more efficiently explore the mapping of a convolutional neural network on a ReRAM neural network accelerator according to the requirements of a mobile device user, the invention provides an adjustable intelligent hardware-aware pruning and mapping framework, wherein a feedback (such as delay, energy consumption, energy efficiency and the like) obtained by a reinforcement learning agent from the ReRAM neural network accelerator hardware is used for replacing signals (such as model size, floating point operation times and the like) which cannot be expressed on the hardware accelerator performance, a depth deterministic strategy gradient (DDPG) is used for searching and deciding a pruning strategy, so that the more friendly pruning strategy based on the ReRAM neural network accelerator is determined, the delay and the energy consumption of a pruned and mapped neural network model on a hardware accelerator are reduced, and the wearable mobile Internet of things device can realize the deep learning application under the limited resource, and the delay, the energy consumption and the energy consumption of the hardware and the user are reduced, The requirements of energy consumption are different, and a pruning and mapping framework which is most suitable for the requirements is found.
The technical scheme adopted by the invention is as follows:
a pruning and mapping framework based on adjustable hardware perception of a ReRAM neural network accelerator comprises a DDPG agent and the ReRAM neural network accelerator; the DDPG agent consists of a behavior decision module Actor and a Critic judgment module, wherein the behavior decision module Actor is used for making a pruning decision on a neural network model;
the ReRAM neural network accelerator is used for mapping a model formed under a pruning decision generated by the behavior decision module Actor, and feeding back performance parameters mapped by the model under the pruning decision as signals to the Critic; the performance parameters comprise energy consumption, delay and model accuracy of a ReRAM neural network accelerator;
the Critic judging module is used for updating the reward function value according to the feedback performance parameter, evaluating the performance of the action decision module Actor and guiding the pruning decision of the action decision module Actor at the next stage to make the reward function value converge;
the value of the reward function is selected according to the requirements of the userReward1 (energy consumption) and/orReward2 (deferred) update, the actual performance in hardware has been achieved:
Reward1=-Error×log(Energy)
Reward2=-Error×log(Latency)
wherein the content of the first and second substances,Error=1-accuracyaccuracyin order to be a measure of the accuracy of the model,Energyfor the power consumption performance of the ReRAM neural network accelerator,Latencyis the delay performance of the ReRAM neural network accelerator.
Further, the action decision module Actor for making a pruning decision on the neural network model specifically comprises:
the behavior decision module Actor is used for characterizing the second time according to the inputkState parameters of a layer neural networks k And outputting the sparse rate, and compressing the neural network model layer by using a compression algorithm according to the sparse rate of each layer. Namely: the current layer is compressed using a specified compression algorithm (e.g., channel pruning). Then, the agent moves to the next layerk+1, and receiving states k+1Until the last layer is completed.
Further, the state parameters k Characterization was performed using 8 features:
k, type, in channels , out channels , stride, kernelsize, flops[k], a k-1
whereinkIs an index of a layer or layers,typeis the kind of layer or layers that are,in channels the number of input channels is represented by,out channels the number of output channels is indicated,stridewhich represents the step size of the convolution,kernelsizerepresenting a convolutionKernel length, hence convolution kernel size ofin channels ×out channels ×kernelsize×strideflops[k]Is the firstkNumber of floating point operations of layer and before passing to behavior decision module Actor at [0, 1 ]]Internal scaling;a k-1is the pruning action made by the previous layer and can be expressed by compression rate.
Because the traditional deep learning pruning optimization scheme guides the pruning decision of the reinforcement learning agent by using the floating point operation times or the size of the model as a pruning signal, when the hardware specification and the type of the ReRAM neural network accelerator are different and the requirements of users on different levels such as delay and energy consumption are met, the traditional deep learning pruning scheme cannot sense the change of the hardware and the requirements of the users to generate the same pruning scheme, so that the performance inefficiency of model mapping on the hardware of the ReRAM neural network accelerator is caused. The invention provides an adjustable intelligent hardware perception pruning scheme and a mapping framework, which are different from the original scheme that a neural network is directly mapped on a ReRAM neural network accelerator, the framework utilizes a deep deterministic strategy gradient (DDPG) in reinforcement learning to search and decide a pruning strategy, and selects actual performance (such as delay, energy consumption and the like) in hardware to feed back to an agent in reinforcement learning according to the requirements of a user. After hardware-aware pruning is carried out according to user requirements, the model is mapped to the ReRAM neural network accelerator, so that the delay and energy consumption of the neural network model applied to the accelerator can be greatly reduced, and the mapping performance is improved.
Drawings
FIG. 1 is an overall block diagram of the hardware-aware pruning and mapping framework of the present invention;
FIG. 2 is a flow chart of an experiment;
fig. 3 is a histogram comparing pruning strategies searched under the NeuroSim hardware configurations of three types of simulators, i.e., configuration 2, configuration 3, and configuration 4, under the delay perception scheme adopted by the VGG-16.
Detailed Description
FIG. 1 is an overall block diagram of the hardware-aware pruning and mapping framework of the present invention, as shown, including a DDPG agent and a ReRAM neural network accelerator; the ReRAM neural network accelerator comprises a plurality of Processing units (PEs), wherein each Processing unit consists of a cross array formed by a plurality of ReRAM units, an on-chip cache, a nonlinear activation Processing unit, a modulus-to-electric converter and other peripheral circuits (only the cross array, the on-chip cache and the nonlinear activation Processing unit are drawn in the figure). The DDPG agent consists of a behavior decision module Actor and a Critic judging module; the entire pruning and mapping framework contains two levels. In the first level, an Actor of a behavior decision module of the DDPG agent makes a pruning decision from the first level to the last level on a neural network model according to hardware feedback, maps the model formed under the pruning decision on a ReRAM neural network accelerator in the second level, and feeds back performance parameters mapped by the model under the pruning decision scheme as signals to a Critic judgment module in the DDPG agent of the first level. The Critic evaluation module is responsible for evaluating the performance of the behavior decision module Actor, updating the reward function value under the hardware type and the user requirement and guiding the next-stage pruning decision of the behavior decision module Actor. After a certain period number, the reward function value is converged, the system finds out an optimal pruning decision scheme, conducts CKPT model derivation after pruning according to the hardware perception pruning strategy, and then conducts model fine adjustment on the CKPT model so as to guarantee precision.
As a preferred scheme, the action decision module Actor of the DDPG agent makes a pruning decision from the first layer to the last layer on the neural network model according to the performance parameter feedback specifically includes:
the behavior decision module Actor receives a neural network model from the environmentkState parameter of layers k Outputting the sparse rate, and using a specified compression algorithm according to the sparse rate of each layer to obtain the compressed dataa k Compress the current layer, and then the proxy moves to the next layerk+1, and receiving states k+1
For each layerkState parameter ofs k The characterization is performed by using 8 characteristics:
k, type, in channels , out channels , stride, kernelsize, flops[k], a k-1) (1)
whereinkIs an index of a layer or layers,typeis the type of layer (including convolutional layers and fully-connected layers, denoted 0 and 1, respectively),in channels the number of input channels is represented by,out channels the number of output channels is indicated,stridewhich represents the step size of the convolution,kernelsizerepresents the convolution kernel length, so the convolution kernel size isin channels ×out channels ×kernelsize×strideflops[k]Is the firstkNumber of floating point operations of layer and before passing to behavior decision module Actor at [0, 1 ]]Internal scaling;a k-1is the pruning action made by the previous layer.
In order to realize a pruning decision with finer granularity, the pruning action of the behavior decision module Actor adopts a compression rate to express,a k ϵ(0,1]namely: using a channel pruning algorithm, rounding to the nearest fraction that can ultimately result in an integer number of channels asa k And are combined witha k The current layer is compressed.
After the final layer of pruning decision is completed, the accuracy and the performance parameters (delay or energy consumption) of the hardware are evaluated on the ReRAM neural network accelerator by adopting a verification set, the value of the reward function is calculated and returned to the critic of the evaluation module.
The accuracy of the calculation is similar to the fine-tuned accuracy, so that the accuracy of the reward function can be evaluated without fine tuning for quick search.
The value of the reward function is passedReward1 and/orReward2, updating:
Reward1=-Error×log(Energy)
Reward2=-Error×log(Latency)
wherein the content of the first and second substances,Error=1-accuracyaccuracyin order to be a measure of the accuracy of the model,Energyfor the power consumption performance of the ReRAM neural network accelerator,Latencyis the delay performance of the ReRAM neural network accelerator.
When support for energy is very limited in some IoT devices and the user's demand for delay is not very urgent (i.e., the user places more emphasis on "energy" performance in pruning mappings), post-pruning accuracy and "energy" performance mapped on ReRAM neural network accelerator hardware may be considered in the reward function. While in some IoT devices the support for energy is somewhat sufficient and the user's demand for latency is large (i.e., the user places more emphasis on "latency" performance in the pruning mapping), the post-pruning accuracy and "latency" performance mapped on the ReRAM neural network accelerator hardware may be considered in the reward function. Therefore, the method can design the pruning scheme and the mapping frame more efficiently and accurately according to different hardware and user requirements.
In addition, the configuration of the ReRAM neural network accelerator can be adjusted, hardware perception is achieved, and a pruning scheme and a mapping framework under the optimal configuration and user requirements are obtained.
The invention is further illustrated below with reference to specific examples, in which the following experiments are carried out:
experimental configuration:
(1) operating the system: ubuntu 18.04.3 LTS;
(2) a CPU: model number 8-core Intel (R) Xeon (R) Gold 6126 CPU @ 2.60GHz, equipped with 32GB DRAM;
(3) GPU: tesla V10032 GB video memory;
(4) a storage device: 512GB, SK hynix SC311 SATA SSD; western Digital WDC WD40EZRZ-75G HDD;
configuring a neural network model:
(1) a neural network model: CIFAR10, Plain20 and VGG-16, and the structures of the CIFAR10, the Plain20 and the VGG-16 are shown in Table 1.
TABLE 1 neural network model and structural representation thereof
Figure 640255DEST_PATH_IMAGE001
(2) Data set: cifar10, comprising 60000 color images, 32 × 32 in size, divided into 10 classes of 6000 images each, wherein 50000 images were used for training and 10000 images were used for testing;
(3) batch size: 1024 pictures/batch (CIFAR10, Plain20), 512 pictures/batch (VGG-16);
(4) the number of training rounds is as follows: 70 rounds (epoch);
(5) number of rounds of finetune: 50 epoch;
ReRAM neural network accelerator configuration:
experiments were performed using the simulator NeuroSim of the ReRAM neural network accelerator, the configuration of which is shown in table 2.
TABLE 2 simulator NeuroSim configuration
Figure 88554DEST_PATH_IMAGE002
Experimental procedure
FIG. 2 is a flow chart of the whole pruning experiment. The method comprises the following steps:
the method comprises the following steps: when a user writes a neural network model code, inputting the pruning and mapping framework of the invention, pre-training the pruning and mapping framework and storing the pruning and mapping framework to obtain a CKPT file;
step two: searching out an optimal pruning strategy by using reinforcement learning and hardware perception;
step three: pruning is carried out according to the optimal pruning strategy obtained in the reinforcement learning in the step two, and the parameters after pruning are stored in a CKPT file;
step four: in order to ensure the accuracy rate after pruning, carrying out parameter fine adjustment on the CKPT model;
step five: and (5) simulating on a simulator NeuroSim to obtain the accuracy after pruning, and finishing the final pruning strategy mapping.
The final test results are:
in the original direct neural network mapping scheme (the number of floating point operations or the size of a model is used as a signal for guiding the pruning decision of the reinforcement learning agent), under the corresponding hardware configuration, the delays of a hardware simulator are 957150.6ns/image, 4830571.6ns/image and 1026976.8ns/image respectively for CIFAR10 (configuration 2), PIIAN 20 (configuration 1) and VGG-16 (configuration 2), and the energy consumption of the simulator is 23488814pJ, 15816979.0pJ and 8058001.0pJ respectively. By adopting the hardware perception-delay perception scheme, the delay performance of the three models is respectively improved by 57.358%, 6.771% and 38.017%, and the accuracy of Top5 is respectively improved by 0.210%, 0.190% and 0.290%. By adopting an energy consumption perception scheme, the energy consumption performance of the three models is respectively improved by 76.833%, 5.615% and 38.425%, and the accuracy of Top5 is respectively improved by 0.270%, 0.230% and 0.420%. By adopting the hardware perception pruning scheme and the mapping framework, the accuracy is ensured while the performance concerned by the user is improved.
Table 3 shows pruning strategies searched under the NeuroSim hardware configurations of the three simulators, configuration 2, configuration 3, and configuration 4, in the case that the VGG-16 employs the delay sensing scheme. Wherein the list in the pruning strategy represents the retention of each group of filter channels.
Table 3 pruning strategy searched under three simulator NeuroSim hardware configurations under delay perception scheme adopted by VGG-16
Figure 274816DEST_PATH_IMAGE003
Fig. 3 is a histogram comparing pruning strategies searched under the NeuroSim hardware configurations of three types of simulators, i.e., configuration 2, configuration 3, and configuration 4, under the delay perception scheme adopted by the VGG-16. The abscissa represents the number of groups of filters, and the ordinate represents the channel retention under the pruning strategy. The histogram shows that the channel retention rate distribution of each group of filters has different trends under different hardware configurations. This also justifies the need for hardware-aware strategies in the face of different hardware configurations.
It should be understood that the above examples are only for clarity of illustration and are not intended to limit the embodiments. Other variations and modifications will be apparent to persons skilled in the art in light of the above description. This need not be, nor should all embodiments be exhaustive. And obvious variations or modifications of the invention may be made without departing from the scope of the invention.

Claims (3)

1. A pruning and mapping framework based on adjustable hardware perception of a ReRAM neural network accelerator is characterized by comprising a DDPG agent and the ReRAM neural network accelerator; the DDPG agent consists of a behavior decision module Actor and a Critic judgment module, wherein the behavior decision module Actor is used for making a pruning decision on a neural network model;
the ReRAM neural network accelerator is used for mapping a model formed under a pruning decision generated by the behavior decision module Actor, and feeding back performance parameters mapped by the model under the pruning decision as signals to the Critic; the performance parameters comprise energy consumption, delay and model accuracy of a ReRAM neural network accelerator;
the Critic judging module is used for updating the reward function value according to the feedback performance parameter, evaluating the performance of the action decision module Actor and guiding the pruning decision of the action decision module Actor at the next stage to make the reward function value converge;
the value of the reward function is passedReward1 and/orReward2, updating:
Reward1=-Error×log(Energy)
Reward2=-Error×log(Latency)
wherein the content of the first and second substances,Error=1-accuracyaccuracyin order to be a measure of the accuracy of the model,Energyfor the power consumption performance of the ReRAM neural network accelerator,Latencyis the delay performance of the ReRAM neural network accelerator.
2. The pruning and mapping framework of claim 1, wherein the behavior decision module Actor is configured to make a pruning decision for the neural network model by:
the behavior decision module Actor is used for characterizing the second time according to the inputkState parameters of a layer neural networks k Output the sparse rate, and according to each layerThe sparsity ratio uses a compression algorithm to compress the neural network model layer by layer.
3. The pruning and mapping framework of claim 2, wherein the state parameterss k Characterization was performed using 8 features:
k, type, in channels , out channels , stride, kernelsize, flops[k], a k-1
whereinkIs an index of a layer or layers,typeis the kind of layer or layers that are,in channels the number of input channels is represented by,out channels the number of output channels is indicated,stridewhich represents the step size of the convolution,kernelsizerepresents the convolution kernel length, so the convolution kernel size isin channels ×out channels ×kernelsize×strideflops[k]Is the firstkNumber of floating point operations of layer and before passing to behavior decision module Actor at [0, 1 ]]Internal scaling;a k-1is the pruning action made by the previous layer.
CN202110236303.3A 2021-03-03 2021-03-03 Adjustable hardware-aware pruning and mapping framework based on ReRAM neural network accelerator Pending CN112598129A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110236303.3A CN112598129A (en) 2021-03-03 2021-03-03 Adjustable hardware-aware pruning and mapping framework based on ReRAM neural network accelerator

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110236303.3A CN112598129A (en) 2021-03-03 2021-03-03 Adjustable hardware-aware pruning and mapping framework based on ReRAM neural network accelerator

Publications (1)

Publication Number Publication Date
CN112598129A true CN112598129A (en) 2021-04-02

Family

ID=75210318

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110236303.3A Pending CN112598129A (en) 2021-03-03 2021-03-03 Adjustable hardware-aware pruning and mapping framework based on ReRAM neural network accelerator

Country Status (1)

Country Link
CN (1) CN112598129A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113392969A (en) * 2021-05-14 2021-09-14 宁波物栖科技有限公司 Model pruning method for reducing power consumption of CNN accelerator based on ReRAM
CN114240192A (en) * 2021-12-21 2022-03-25 特斯联科技集团有限公司 Equipment optimization configuration method and system for park energy efficiency improvement based on reinforcement learning
CN116069512A (en) * 2023-03-23 2023-05-05 之江实验室 Serverless efficient resource allocation method and system based on reinforcement learning

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111340227A (en) * 2020-05-15 2020-06-26 支付宝(杭州)信息技术有限公司 Method and device for compressing business prediction model through reinforcement learning model
CN111600851A (en) * 2020-04-27 2020-08-28 浙江工业大学 Feature filtering defense method for deep reinforcement learning model
CN112101534A (en) * 2019-06-17 2020-12-18 英特尔公司 Reconfigurable memory compression techniques for deep neural networks

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112101534A (en) * 2019-06-17 2020-12-18 英特尔公司 Reconfigurable memory compression techniques for deep neural networks
CN111600851A (en) * 2020-04-27 2020-08-28 浙江工业大学 Feature filtering defense method for deep reinforcement learning model
CN111340227A (en) * 2020-05-15 2020-06-26 支付宝(杭州)信息技术有限公司 Method and device for compressing business prediction model through reinforcement learning model

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
KUAN WANG等: "HAQ: Hardware-Aware Automated Quantization with Mixed Precision", 《2019 IEEE CVPR》 *
YIHUI HE等: "AMC: AutoML for Model Compression and Acceleration on Mobile Devices", 《EECV 2018》 *

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113392969A (en) * 2021-05-14 2021-09-14 宁波物栖科技有限公司 Model pruning method for reducing power consumption of CNN accelerator based on ReRAM
CN113392969B (en) * 2021-05-14 2022-05-03 宁波物栖科技有限公司 Model pruning method for reducing power consumption of CNN accelerator based on ReRAM
CN114240192A (en) * 2021-12-21 2022-03-25 特斯联科技集团有限公司 Equipment optimization configuration method and system for park energy efficiency improvement based on reinforcement learning
CN114240192B (en) * 2021-12-21 2022-06-24 特斯联科技集团有限公司 Equipment optimization configuration method and system for park energy efficiency improvement based on reinforcement learning
CN116069512A (en) * 2023-03-23 2023-05-05 之江实验室 Serverless efficient resource allocation method and system based on reinforcement learning
CN116069512B (en) * 2023-03-23 2023-08-04 之江实验室 Serverless efficient resource allocation method and system based on reinforcement learning

Similar Documents

Publication Publication Date Title
CN110378468B (en) Neural network accelerator based on structured pruning and low bit quantization
Sohoni et al. Low-memory neural network training: A technical report
CN112598129A (en) Adjustable hardware-aware pruning and mapping framework based on ReRAM neural network accelerator
Yuan et al. High performance CNN accelerators based on hardware and algorithm co-optimization
CN110413255B (en) Artificial neural network adjusting method and device
CN111079899A (en) Neural network model compression method, system, device and medium
WO2020238237A1 (en) Power exponent quantization-based neural network compression method
JP2023523029A (en) Image recognition model generation method, apparatus, computer equipment and storage medium
CN114677548B (en) Neural network image classification system and method based on resistive random access memory
Deng et al. Reduced-precision memory value approximation for deep learning
TW202022798A (en) Method of processing convolution neural network
CN117574976B (en) Large language model software and hardware collaborative quantization acceleration calculation method and system
Struharik et al. Conna–compressed cnn hardware accelerator
CN111563160A (en) Text automatic summarization method, device, medium and equipment based on global semantics
CN114970853A (en) Cross-range quantization convolutional neural network compression method
Guan et al. Recursive binary neural network training model for efficient usage of on-chip memory
Qi et al. Learning low resource consumption cnn through pruning and quantization
CN112528598B (en) Automatic text abstract evaluation method based on pre-training language model and information theory
CN113240090A (en) Image processing model generation method, image processing device and electronic equipment
CN117151178A (en) FPGA-oriented CNN customized network quantification acceleration method
CN112183744A (en) Neural network pruning method and device
CN116956997A (en) LSTM model quantization retraining method, system and equipment for time sequence data processing
CN113554097B (en) Model quantization method and device, electronic equipment and storage medium
TW202145078A (en) Computing method with dynamic minibatch sizes and computing system and computer-readable storage media for performing the same
CN109829054A (en) A kind of file classification method and system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20210402

RJ01 Rejection of invention patent application after publication