CN117639792A - Deep learning model compression method based on code table clustering - Google Patents

Deep learning model compression method based on code table clustering Download PDF

Info

Publication number
CN117639792A
CN117639792A CN202311590503.4A CN202311590503A CN117639792A CN 117639792 A CN117639792 A CN 117639792A CN 202311590503 A CN202311590503 A CN 202311590503A CN 117639792 A CN117639792 A CN 117639792A
Authority
CN
China
Prior art keywords
weight
code table
index
deep learning
learning model
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202311590503.4A
Other languages
Chinese (zh)
Inventor
黄科杰
邓军灿
沈海斌
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhejiang University ZJU
Original Assignee
Zhejiang University ZJU
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhejiang University ZJU filed Critical Zhejiang University ZJU
Priority to CN202311590503.4A priority Critical patent/CN117639792A/en
Publication of CN117639792A publication Critical patent/CN117639792A/en
Pending legal-status Critical Current

Links

Classifications

    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Compression, Expansion, Code Conversion, And Decoders (AREA)

Abstract

The invention provides a deep learning model compression method based on code table clustering, and belongs to the field of model compression in deep learning. The method comprises the steps of obtaining a code table and an index for the model weight by using a code table clustering algorithm, and reconstructing the compressed weight. The method utilizes the repeatability of the weight vector of the deep learning model, obtains the code table and the index with low memory occupation based on the code table clustering algorithm, realizes extremely high model compression rate, reduces the memory occupation of model storage and keeps good model performance.

Description

Deep learning model compression method based on code table clustering
Technical Field
The invention relates to the field of model compression in deep learning, in particular to a deep learning model compression method based on code table clustering.
Background
Deep learning has made remarkable progress in the past few years and has become a core technology in the fields of computer vision, natural language processing, speech recognition, and the like. However, deep learning models typically have a large number of parameters and complex structures, which results in huge computational resource consumption and high memory usage. As deep learning applications continue to extend to resource constrained mobile devices and edge computing devices, model compression techniques become particularly important. With the popularity of internet of things devices and intelligent mobile terminals and the rise of edge computing, the need for running complex deep learning models on low power consumption, limited computing power hardware has increased dramatically. In these scenarios, the model needs to greatly compress its computational and memory requirements while maintaining high performance. In addition, in data centers and cloud services, model compression can significantly reduce storage and transmission costs, reduce energy consumption, and improve the scalability and cost effectiveness of the system.
The current technical path for compression of the deep learning model mainly comprises the following two types: weight pruning: by identifying and removing neurons or connections in the neural network, the storage requirements of the model are reduced. The weight pruning may be unstructured (weights deleted in the parameter direction) or structured (weights deleted in the layer or channel direction). Model quantization: by reducing the network weight and accuracy of activation (e.g., from 32-bit floating point numbers to lower bit wide fixed point numbers), model size and computational complexity can be significantly reduced. Model pruning and model quantization are both lossy compression methods, so that at higher compression rates, the prediction performance may be reduced due to excessive information loss.
Disclosure of Invention
In order to solve the problem that the model performance loss is serious under the condition of higher compression rate in the existing model compression method, the invention provides a deep learning model compression method based on code table clustering, and the model weight is reconstructed by using a code table and an index with low memory occupation so as to realize the equivalent and even better model reasoning effect.
The technical scheme adopted for solving the technical problems is as follows:
the invention firstly provides a deep learning model compression method based on code table clustering, which comprises the following steps:
step S1: extracting weights of a linear layer and a convolution layer in a deep learning model, and splitting the weights according to the direction of an input channel so as to obtain a series of weight vectors; the length of the weight vector obtained by segmentation is defined as V;
step S2: setting a code table for each weight respectively, wherein the size of the code table is K, the number of codewords in the code table is K, a code table clustering algorithm is used for carrying out weight vector clustering to obtain a finally updated code table, each weight vector is distributed in the weight vector clustering process to obtain an index, and the distributed index is the position of the codeword with the shortest distance from the weight vector in the code table;
step S3: saving the code table and index which are clustered and other uncompressed data in the original deep learning model as a compressed model; when running the compressed model, for each weight, the index corresponding to its weight vector is used to retrieve the corresponding codeword, and the compressed weight with the same size as the original weight is reconstructed by using these codewords.
As a preferred scheme of the present invention, the weight size of the linear layer in step S1 is [ output channel number, input channel number ], and according to the input channel segmentation, the output channel number is the input channel number/V weight vectors are obtained; the weight size of the convolution layer is [ output channel number, input channel number, convolution kernel height, convolution kernel width ], the weight of the convolution layer is firstly reconstructed to obtain the weight of the convolution layer with the size of [ output channel number, input channel number, convolution kernel height, convolution kernel width ], and then the weight vectors of the output channel number, input channel number, convolution kernel height, convolution kernel width/V are obtained according to the segmentation of the input channels.
As a preferred scheme of the invention, the weight vector clustering in the step S2 is performed by using a code table clustering algorithm, and the specific process is as follows:
(2.1) randomly selecting K weight vectors as initial code words of the code table;
(2.2) calculating euclidean distances of the weight vectors to the codewords; finding the code word with the shortest distance to each weight vector, and distributing the index of the code word to each weight vector, wherein the index of the code word is the position of the code word in a code table;
the calculated Euclidean distance formula is:
wherein W is m For the mth weight direction in the weightsAmount of C k For the kth codeword, d (W) m ,C k ) Is W m And C k Is used for the distance between euclidean distance(s),and->Respectively W m And C k Is the i-th value of (2);
(2.3) averaging all weight vectors assigned with indexes of the same codeword as an updated value of the codeword corresponding to the index; the formula for averaging all weight vectors assigned to indexes of the same codeword is as follows:
wherein W is C k Weight vector assigned to index of same codeword, |W ε C k I is the number of weight vectors assigned to the index of the same codeword,an updated value for the codeword corresponding to the index;
and (2.4) repeating the steps (2.2) - (2.3) until the code table and the index are not updated any more, and completing the code table clustering algorithm to obtain the finally updated code table.
As a preferred embodiment of the present invention, one code table is used for each weight in step S2.
As a preferred solution of the present invention, in step S3, the codeword corresponding to the weight vector index is used to reconstruct the compressed weight with the same size as the original weight, where the formula is:
W =C[I]
wherein W is And C is a code table, and I is an index matrix corresponding to the total weight vector.
As a preferable scheme of the invention, the deep learning model to be compressed is a language big model LLaMA-7B; the linear layer weight size of the model is [ output channel number, input channel number ], the input channel number/V weight vectors of the output channel number are obtained according to the input channel segmentation, and the convolution layer weight size of the model is [ output channel number, input channel number, convolution kernel height, convolution kernel width ]; the weight of the convolution layer is firstly reconstructed to obtain a size of [ output channel number, input channel number, convolution kernel height, convolution kernel width ], and then the weight vectors (output channel number, input channel number, convolution kernel height, convolution kernel width/V) are obtained according to the segmentation of the input channels.
The beneficial effects of the invention are as follows:
1) The deep learning model clustering method based on the code table clustering provided by the invention utilizes the code table and the index to reconstruct the model weight, so that the memory occupation of the original weight of the model is reduced. The present invention allows a user to adjust the balance between compression rate and performance as desired due to the flexibility of code table size. This means that the degree of compression of the model can be tailored to the specific application scenario and performance requirements. The invention is not only suitable for the weight of the linear layer, but also can be extended to the convolution layer, so that the convolution layer can compress various deep learning models. This is useful for compressing complex models consisting of multiple types of layers.
2) The deep learning model clustering method based on the code table clustering provided by the invention has the advantages that all the steps from weight extraction and clustering to weight reconstruction are automatically performed, so that the need of manual intervention and possible human errors are greatly reduced.
3) The invention provides a deep learning model compression method based on code table clustering, which is a method for reconstructing model weights by using a code table and an index with lower memory occupation. The weight vectors of the current deep learning models have certain repeatability, the repeated vectors can be represented by using the same shared vector, the shared vector is pointed by using the same index, and a plurality of shared vectors form a code table. Because the memory occupation of the code table and the index is lower than that of the weight, the model compression method based on the code table clustering can realize higher compression rate. Due to the utilization of the repeatability of the weight vectors, the model adopting the code table and the index to reconstruct the weights can avoid serious performance loss while greatly compressing.
Drawings
FIG. 1 is a flow chart of a deep learning model compression method based on code table clustering.
FIG. 2 is a schematic diagram of weights split into weight vectors according to input channels.
FIG. 3 is a schematic diagram of a code table clustering algorithm.
FIG. 4 is a schematic diagram of reconstructing model weights using a code table and an index.
Fig. 5 is a text generation result of an original model and a model compressed by the method of the present invention.
FIG. 6 is a comparison of the performance of the different methods.
Detailed Description
The invention is further illustrated and described below in connection with specific embodiments. The described embodiments are merely exemplary of the present disclosure and do not limit the scope. The technical features of the embodiments of the invention can be combined correspondingly on the premise of no mutual conflict.
The flow chart of the deep learning model compression method based on the code table clustering shown in fig. 1 comprises the following steps:
step S1: extracting weights of a linear layer and a convolution layer in a deep learning model, and splitting the weights according to the direction of an input channel so as to obtain a series of weight vectors; the length of the weight vector obtained by segmentation is defined as V;
step S2: setting a code table for each weight respectively, wherein the size of the code table is K.V, and using a code table clustering algorithm to cluster weight vectors, wherein the specific process of the weight vector clustering is as follows:
(2.1) randomly selecting K weight vectors as initial code words of the code table;
(2.2) calculating euclidean distances of the weight vectors to the codewords; finding the code word with the shortest distance to each weight vector, and distributing the index of the code word to each weight vector, wherein the index of the code word is the position of the code word in a code table;
(2.3) averaging all weight vectors assigned with indexes of the same codeword as updated values of the corresponding codeword of the index;
(2.4) repeating the steps (2.2) - (2.3) until the code table and the index are not updated any more, and completing the code table clustering algorithm;
step S3: saving the code table and index which are clustered and other uncompressed data in the original deep learning model as a compressed model; when the compressed model is run, for each weight, the index corresponding to its weight vector is used to retrieve the corresponding codeword, and the compressed weight with the same size as the original weight is reconstructed by using these codewords.
A schematic diagram of step S1 is shown in fig. 2, and is described in detail as follows: in the step S1, a pre-trained large language model LLaMA-7B is used as a model to be compressed, and the memory occupation amount of the model is 12.38GB. The two data sets of wikietext 2 and ptb are used as test sets, the batch size at the time of verification is 4, and the single text length of the verification set is 128. The weight size of the linear layer of the model is generally [ output channel number, input channel number ], and the linear layer of the model is divided into a plurality of weight vectors with the length of V according to the input channel to obtain (output channel number is the input channel number/V) weight vectors. The size of the model convolution layer weight is generally [ output channel number, input channel number, convolution kernel height, convolution kernel width ], the size of the convolution layer weight is firstly reconstructed to obtain the size of [ output channel number, input channel number, convolution kernel height, convolution kernel width ], and then the weight vectors (output channel number, input channel number, convolution kernel height, convolution kernel width/V) are obtained according to the input channel segmentation. The length V of the weight vector obtained by the segmentation is set to 4.
A schematic diagram of step S2 is shown in fig. 3, and is described in detail as follows: each weight uses a code table with the size of K x V, wherein K and V are the number of code words and the code word length in the code table respectively, the code word length is the same as the weight vector length, and the number of code words K is set to 32768. The Euclidean distance formula from each weight vector to each codeword is:
wherein W is m As the mth weight vector in the weights, C k For the kth codeword, d (W) m ,C k ) Is W m And C k Is used for the distance between euclidean distance(s),and->Respectively W m And C k Is the i-th value of (c).
The formula for averaging all weight vectors with the same index is:
wherein W is C k Weight vector assigned to index of same codeword, |W ε C k I is the number of weight vectors with the same codeword index,is the updated value of the index.
A schematic diagram of step S3 is shown in fig. 4, and is described in detail as follows: using the code word corresponding to the weight vector index to replace the value of the original weight vector to obtain the compressed weight, wherein the memory occupation of the compressed weight is as follows:
wherein, for the linear layer, C in For the number of input channels, for the convolutional layer, C in Convolution kernel height and convolution kernel width for the number of input channels; c (C) out For the number of output channels, K and V are the number of codewords and the length of codewords in the code table respectively, T is the data storage format of uncompressed weight of the model, M W′ M is the memory occupation of the weight after compression W′ In bytes (B)。
And finally, performing performance test on the compressed model, calculating the compression rate, and verifying the compression effect of the model. The compression rate formula before and after weight compression is:
wherein, for the linear layer, C in For the number of input channels, for the convolutional layer, C in Convolution kernel height and convolution kernel width for the number of input channels; c (C) out For the number of output channels, T is the data storage format of the uncompressed weight of the model.
The embodiment of the invention adopts qualitative and quantitative aspects to evaluate the technical effect of the invention. The qualitative evaluation mainly adopts an intuitive visual inspection method to evaluate the quality of the text generated by the model. This process involves scrutinizing the generated text to detect if a logical discontinuity or unreasonable condition has occurred, such as whether a argument in the text is paradoxical or whether a statement exists with a distinct logical fault. While also checking whether the text provides only surface information without going deep into various aspects of the problem. In addition, the qualitative assessment also checks the fluency and naturalness of the generated text, such as whether a language that can be easily understood and accepted by humans is used, whether the sentence structure is reasonable, and whether the text is clear and unambiguous. It is also desirable to check whether text is creative and unique, e.g., can provide a novel perspective or solution, rather than merely repeating the common or stale perspective of abusing.
To quantitatively evaluate its performance, a confusion PPL was used for evaluation. The confusion PPL is used to quantify the difference between the generated text and the real text, and a smaller PPL value indicates a better performance of the language model, i.e., a better compression performance. Confusion PPL is an exponential form of cross entropy loss function, and large language model is a series of words w 1 ,w 2 ,…,w N Assigning probabilities, the calculation formula of the confusion degree is as follows:
wherein W is the whole word sequence, W i Is the i-th word in the word sequence, p (w i ∣w 1 ,w 2 ,…,w i-1 ) Is a model based on the conditional probability that all words preceding the i-th word are assigned to the i-th word, N is the total number of words in the word sequence, log is a logarithmic function with a base of 10, exp (magnitude) is an exponential function with a base of e.
The qualitative evaluation results of the invention are as follows: in the embodiment, the code table with the size of 32768×4 is used to compress the weight of each linear layer and the weight of each convolution layer of the pre-training large language model LLaMA-7B, the memory occupation amount of the original model is 12.38GB, the memory occupation amount of the compressed model is 2.98GB, and the compression rate is 75.91%. The generated text of the original model is shown in the upper half of fig. 5, and the generated text of the compressed model is shown in the lower half of fig. 5. As can be seen from FIG. 5, the compressed model of the embodiment can generate reasonable and deep text at a higher compression rate, which illustrates that the method provided by the invention can effectively compress the model and maintain the performance.
The quantitative evaluation results of the present invention are as follows: in the embodiment, the linear layer weight and the convolution layer weight of the pre-training large language model LLaMA-7B are respectively compressed by using a code table with the size of 32768×4, the memory occupation amount of the original model is 12.38GB, the memory occupation amount of the compressed model is 2.98GB, the compression rate is 75.91%, the confusion degree PPL of the compressed model on a wikiitext 2 data set is 14.4, and the confusion degree PPL on a ptb data set is 59.0. Compared with other methods, as shown in fig. 6, the method obtains a lower confusion PPL value with the same or even higher compression rate, namely, obtains a text generation effect more like real world text, and shows that the compression performance of the method is better.
The foregoing description of the specific embodiments of the present invention will be further described with reference to specific examples, where the contents are all explanations of the present invention, and all technical solutions belonging to the concept of the present invention belong to the protection scope of the present invention.
The foregoing examples illustrate only a few embodiments of the invention and are described in detail herein without thereby limiting the scope of the invention. It will be apparent to those skilled in the art that various modifications and improvements can be made without departing from the spirit of the invention.

Claims (8)

1. A deep learning model compression method based on code table clustering is characterized by comprising the following steps:
step S1: extracting weights of a linear layer and a convolution layer in a deep learning model, and splitting the weights according to the direction of an input channel so as to obtain a series of weight vectors; the length of the weight vector obtained by segmentation is defined as V;
step S2: setting a code table for each weight respectively, wherein the size of the code table is K, the number of codewords in the code table is K, a code table clustering algorithm is used for carrying out weight vector clustering to obtain a finally updated code table, each weight vector is distributed in the weight vector clustering process to obtain an index, and the distributed index is the position of the codeword with the shortest distance from the weight vector in the code table;
step S3: saving the code table and index which are clustered and other uncompressed data in the original deep learning model as a compressed model; when running the compressed model, for each weight, the index corresponding to its weight vector is used to retrieve the corresponding codeword, and the compressed weight with the same size as the original weight is reconstructed by using these codewords.
2. The deep learning model compression method based on code table clustering according to claim 1, wherein the weight size of the linear layer in step S1 is [ number of output channels, number of input channels ], and the number of output channels is calculated by dividing the input channels to obtain the number of input channels/V weight vectors; the weight size of the convolution layer is [ output channel number, input channel number, convolution kernel height, convolution kernel width ], the weight of the convolution layer is firstly reconstructed to obtain the weight of the convolution layer with the size of [ output channel number, input channel number, convolution kernel height, convolution kernel width ], and then the weight vectors of the output channel number, input channel number, convolution kernel height, convolution kernel width/V are obtained according to the segmentation of the input channels.
3. The deep learning model compression method based on code table clustering according to claim 1, wherein the weight vector clustering in step S2 is performed by using a code table clustering algorithm, and the specific process is as follows:
(2.1) randomly selecting K weight vectors as initial code words of the code table;
(2.2) calculating euclidean distances of the weight vectors to the codewords; finding the code word with the shortest distance to each weight vector, and distributing the index of the code word to each weight vector, wherein the index of the code word is the position of the code word in a code table;
(2.3) averaging all weight vectors assigned with indexes of the same codeword as an updated value of the codeword corresponding to the index;
and (2.4) repeating the steps (2.2) - (2.3) until the code table and the index are not updated any more, and completing the code table clustering algorithm to obtain the finally updated code table.
4. The deep learning model compression method based on code table clustering according to claim 1, wherein each weight in step S2 uses one code table.
5. The deep learning model compression method based on code table clustering according to claim 3, wherein the euclidean distance formula calculated in step S2 is:
wherein W is m As the mth weight vector in the weights, C k For the kth codeword, d (W) m ,C k ) Is W m And C k Is used for the distance between euclidean distance(s),and->Respectively W m And C k Is the i-th value of (c).
6. The deep learning model compression method based on code table clustering according to claim 3, wherein in step S2, a formula for averaging all weight vectors assigned to indexes of the same codeword is:
wherein W is C k Weight vector assigned to index of same codeword, |W ε C k I is the number of weight vectors assigned to the index of the same codeword,an updated value for the codeword corresponding to the index.
7. The deep learning model compression method based on code table clustering according to claim 1, wherein in step S3, the compressed weight having the same size as the original weight is reconstructed by using the codeword corresponding to the weight vector index, where the formula is:
W =C[I]
wherein W is And C is a code table, and I is an index matrix corresponding to the total weight vector.
8. The deep learning model compression method based on code table clustering according to claim 1, wherein the deep learning model to be compressed is a language big model LLaMA-7B.
CN202311590503.4A 2023-11-27 2023-11-27 Deep learning model compression method based on code table clustering Pending CN117639792A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311590503.4A CN117639792A (en) 2023-11-27 2023-11-27 Deep learning model compression method based on code table clustering

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311590503.4A CN117639792A (en) 2023-11-27 2023-11-27 Deep learning model compression method based on code table clustering

Publications (1)

Publication Number Publication Date
CN117639792A true CN117639792A (en) 2024-03-01

Family

ID=90022756

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311590503.4A Pending CN117639792A (en) 2023-11-27 2023-11-27 Deep learning model compression method based on code table clustering

Country Status (1)

Country Link
CN (1) CN117639792A (en)

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH01232829A (en) * 1988-03-12 1989-09-18 Graphics Commun Technol:Kk Method and apparatus for learning type multi-stage vector quantization
JPH0895599A (en) * 1994-05-06 1996-04-12 Nippon Telegr & Teleph Corp <Ntt> Encoding method and decoding method of signal and encoder and decoder using the same
US20040221192A1 (en) * 2003-04-30 2004-11-04 Giovanni Motta Method and system for minimizing the length of a defect list for a storage device
WO2016199330A1 (en) * 2015-06-12 2016-12-15 パナソニックIpマネジメント株式会社 Image coding method, image decoding method, image coding device and image decoding device
CN113595993A (en) * 2021-07-12 2021-11-02 广东工业大学 Vehicle-mounted sensing equipment joint learning method for model structure optimization under edge calculation
CN113748605A (en) * 2019-03-18 2021-12-03 弗劳恩霍夫应用研究促进协会 Method and apparatus for compressing parameters of neural network
US20220164652A1 (en) * 2019-02-15 2022-05-26 Nokia Technologies Oy Apparatus and a method for neural network compression
CN115514374A (en) * 2022-09-20 2022-12-23 国网浙江省电力有限公司嘉兴供电公司 Compression method for PMU (phasor measurement Unit) measurement data of universal microgrid

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH01232829A (en) * 1988-03-12 1989-09-18 Graphics Commun Technol:Kk Method and apparatus for learning type multi-stage vector quantization
JPH0895599A (en) * 1994-05-06 1996-04-12 Nippon Telegr & Teleph Corp <Ntt> Encoding method and decoding method of signal and encoder and decoder using the same
US20040221192A1 (en) * 2003-04-30 2004-11-04 Giovanni Motta Method and system for minimizing the length of a defect list for a storage device
WO2016199330A1 (en) * 2015-06-12 2016-12-15 パナソニックIpマネジメント株式会社 Image coding method, image decoding method, image coding device and image decoding device
US20220164652A1 (en) * 2019-02-15 2022-05-26 Nokia Technologies Oy Apparatus and a method for neural network compression
CN113748605A (en) * 2019-03-18 2021-12-03 弗劳恩霍夫应用研究促进协会 Method and apparatus for compressing parameters of neural network
CN113595993A (en) * 2021-07-12 2021-11-02 广东工业大学 Vehicle-mounted sensing equipment joint learning method for model structure optimization under edge calculation
CN115514374A (en) * 2022-09-20 2022-12-23 国网浙江省电力有限公司嘉兴供电公司 Compression method for PMU (phasor measurement Unit) measurement data of universal microgrid

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
ZHU LI: "Digital Information Feature Compression Method Based on Weighted Trust Vector", 《2020 13TH INTERNATIONAL CONFERENCE ON INTELLIGENT COMPUTATION TECHNOLOGY AND AUTOMATION (ICICTA)》, 31 December 2021 (2021-12-31), pages 1 - 4 *
李佳颖: "基于降低映射预测残差的高光谱图像压缩算法", 《计算机工程与科学》, 31 December 2020 (2020-12-31), pages 825 - 834 *

Similar Documents

Publication Publication Date Title
CN112994701B (en) Data compression method, device, electronic equipment and computer readable medium
CN109886406A (en) A kind of complex convolution neural network compression method based on depth-compression
CN113434683B (en) Text classification method, device, medium and electronic equipment
EP3740912A1 (en) Data compression by local entropy encoding
CN113204674B (en) Video-paragraph retrieval method and system based on local-overall graph inference network
CN113539273B (en) Voice recognition method and device, computer equipment and storage medium
CN111008517A (en) Tensor decomposition technology-based neural language model compression method
Huai et al. Zerobn: Learning compact neural networks for latency-critical edge systems
CN113971735A (en) Depth image clustering method, system, device, medium and terminal
CN110020721A (en) A kind of target detection deep learning network optimized approach based on compression of parameters
Huang et al. An automatic and efficient BERT pruning for edge AI systems
CN115563314A (en) Knowledge graph representation learning method for multi-source information fusion enhancement
CN114861907A (en) Data calculation method, device, storage medium and equipment
CN114880538A (en) Attribute graph community detection method based on self-supervision
CN113449849B (en) Learning type text hash method based on self-encoder
CN110929532A (en) Data processing method, device, equipment and storage medium
Yan et al. Micronet for efficient language modeling
CN116018589A (en) Method and system for product quantization based matrix compression
CN113392868A (en) Model training method, related device, equipment and storage medium
CN117639792A (en) Deep learning model compression method based on code table clustering
CN116955644A (en) Knowledge fusion method, system and storage medium based on knowledge graph
CN112116062B (en) Nonlinear compression method of multi-layer perceptron based on tensor string decomposition
CN115982634A (en) Application program classification method and device, electronic equipment and computer program product
Horváth et al. Maestro: Uncovering Low-Rank Structures via Trainable Decomposition
CN111797984A (en) Quantification and hardware acceleration method and device for multitask neural network

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination