CN114663536B - Image compression method and device - Google Patents
Image compression method and device Download PDFInfo
- Publication number
- CN114663536B CN114663536B CN202210118720.2A CN202210118720A CN114663536B CN 114663536 B CN114663536 B CN 114663536B CN 202210118720 A CN202210118720 A CN 202210118720A CN 114663536 B CN114663536 B CN 114663536B
- Authority
- CN
- China
- Prior art keywords
- image
- compressed
- hidden variable
- module
- compression
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000007906 compression Methods 0.000 title claims abstract description 84
- 230000006835 compression Effects 0.000 title claims abstract description 83
- 238000000034 method Methods 0.000 title claims abstract description 43
- 238000007781 pre-processing Methods 0.000 claims abstract description 12
- 238000012545 processing Methods 0.000 claims abstract description 8
- 238000005315 distribution function Methods 0.000 claims description 14
- 238000010606 normalization Methods 0.000 claims description 11
- 238000012549 training Methods 0.000 claims description 11
- 238000006243 chemical reaction Methods 0.000 claims description 6
- 238000004422 calculation algorithm Methods 0.000 claims description 5
- 238000004364 calculation method Methods 0.000 claims description 5
- 238000004590 computer program Methods 0.000 claims description 5
- 239000000126 substance Substances 0.000 claims 2
- 238000004891 communication Methods 0.000 description 6
- 238000010586 diagram Methods 0.000 description 6
- 238000013527 convolutional neural network Methods 0.000 description 5
- 238000013139 quantization Methods 0.000 description 4
- 238000005516 engineering process Methods 0.000 description 3
- 230000007246 mechanism Effects 0.000 description 3
- 238000005070 sampling Methods 0.000 description 3
- 238000013144 data compression Methods 0.000 description 2
- 230000003287 optical effect Effects 0.000 description 2
- 230000008569 process Effects 0.000 description 2
- 238000004458 analytical method Methods 0.000 description 1
- 230000005540 biological transmission Effects 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 238000011156 evaluation Methods 0.000 description 1
- 238000011478 gradient descent method Methods 0.000 description 1
- 238000010438 heat treatment Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
- 230000009466 transformation Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T9/00—Image coding
- G06T9/002—Image coding using neural networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/044—Recurrent networks, e.g. Hopfield networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/047—Probabilistic or stochastic networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/10—Segmentation; Edge detection
- G06T7/11—Region-based segmentation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/10—Image acquisition modality
- G06T2207/10024—Color image
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20021—Dividing image into blocks, subimages or windows
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Artificial Intelligence (AREA)
- Evolutionary Computation (AREA)
- Computing Systems (AREA)
- Molecular Biology (AREA)
- Computational Linguistics (AREA)
- Data Mining & Analysis (AREA)
- General Engineering & Computer Science (AREA)
- General Health & Medical Sciences (AREA)
- Biophysics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Biomedical Technology (AREA)
- Health & Medical Sciences (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Probability & Statistics with Applications (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Multimedia (AREA)
- Compression Or Coding Systems Of Tv Signals (AREA)
Abstract
The invention provides an image compression method and device, wherein the method comprises the following steps: acquiring an image to be compressed; dividing the image to be compressed into a plurality of image blocks based on a preprocessing rule, and inputting all the image blocks to be compressed into a pre-stored target encoder to obtain a first hidden variable; inputting the first hidden variable into a pre-stored entropy model to obtain a second hidden variable; inputting the second hidden variable into a pre-stored target decoder to obtain a compressed image block, and obtaining a compressed image according to the compressed image block; the method of the invention introduces a Transformer module in the image compression task and adopts a symmetric processing architecture to encode and decode the image, thereby improving the image compression efficiency.
Description
Technical Field
The invention belongs to the field of computer vision, and particularly relates to an image compression method and device.
Background
Image compression is the application of data compression techniques to digital images, the purpose of which is to reduce redundant information in the image data, thereby efficiently storing and transmitting the data, i.e. to achieve the best image quality at a given bit rate or compression ratio.
In the prior art, a decoder and an encoder are generally designed based on a convolutional neural network to perform an image compression task, but an image compression process based on the convolutional neural network cannot capture semantic information of an image, and a global attention mechanism is poor in the image compression task due to the utilization of spatial redundancy information of the image, so that the image compression efficiency is low.
Disclosure of Invention
The image compression method and the image compression device provided by the invention are used for solving the defect of poor rate distortion performance in image compression caused by incapability of capturing semantic information of an image when a decoder and an encoder are designed to execute an image compression task based on a convolutional neural network in the prior art, and the image compression efficiency is improved.
The invention provides an image compression method, which comprises the following steps:
acquiring an image to be compressed; dividing the image to be compressed into a plurality of image blocks based on a preprocessing rule, and inputting all the image blocks to be compressed into a pre-stored target encoder to obtain a first hidden variable, wherein the target encoder comprises a linear embedded layer module, a transform module and a block merging module; inputting the first hidden variable into a pre-stored entropy model to obtain a second hidden variable; and inputting the second hidden variable into a pre-stored target decoder to obtain a compressed image block, and obtaining a compressed image according to the compressed image block, wherein the target decoder comprises an embedding layer removing module, a transform module and a block splitting module.
According to an image compression method provided by the present invention, the method further comprises:
inputting the first hidden variable into the entropy model, obtaining the mean value and the variance of each element in the first hidden variable, and simulating the normal distribution of the first hidden variable according to the mean value and the variance of each element to obtain a probability distribution function; arithmetically encoding the first hidden variable based on the probability distribution function to obtain a target bit stream; arithmetically decoding the target bit stream based on the probability distribution function to obtain a third hidden variable; and obtaining the quantized residual loss of the third hidden variable through the entropy model, and obtaining the second hidden variable based on the third hidden variable and the quantized residual loss.
The global loss L is calculated using the following formula:
L=R+λD
wherein, λ is a hyper-parameter, R is a bit stream size obtained by compression, and D is a distortion term; and acquiring a target image compression model according to the global loss.
Training an image compression model based on a BP algorithm, and adjusting the bit stream size R and the distortion item D to reduce the global loss L so as to obtain a target hyper-parameter; and training the image compression model according to the target hyper-parameter to obtain the image compression model.
And normalizing the image to be compressed, and equally dividing the processed image into a plurality of image blocks according to a fixed division area.
The Transformer module includes a window-based attention layer, a multi-layer perceptron, and a normalization layer.
The present invention also provides an image compression apparatus, comprising:
the image acquisition module is used for acquiring an image to be compressed; the decoding module is used for dividing the image to be compressed into a plurality of image blocks based on a preprocessing rule and inputting all the image blocks to be compressed into a pre-stored target encoder so as to acquire a first hidden variable; the target encoder comprises a linear embedded layer module, a Transformer module and a block merging module; the conversion module is used for inputting the first hidden variable into a pre-stored entropy model so as to obtain a second hidden variable; the decoding module is used for inputting the second hidden variable into a pre-stored target decoder so as to obtain a compressed image block and obtain a compressed image according to the compressed image block; wherein the target decoder comprises a de-embedding layer module, the Transformer module and a block splitting module.
The present invention also provides an electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor implements the steps of the image compression method as described in any of the above when executing the program.
The invention also provides a non-transitory computer readable storage medium having stored thereon a computer program which, when executed by a processor, performs the steps of the image compression method as described in any one of the above.
The invention provides an image compression method and device, which comprises the steps of firstly obtaining an image to be compressed; then dividing the image to be compressed into a plurality of image blocks based on a preprocessing rule, and inputting all the image blocks to be compressed into a pre-stored target encoder to obtain a first hidden variable; inputting the first hidden variable into a pre-stored entropy model to obtain a second hidden variable; finally, inputting the second hidden variable into a pre-stored target decoder to obtain a compressed image block, and obtaining a compressed image according to the compressed image block; the method of the invention introduces a Transformer module in the image compression task and adopts a symmetric processing architecture to encode and decode the image, thereby improving the image compression efficiency.
Drawings
In order to more clearly illustrate the technical solutions of the present invention or the prior art, the drawings needed for the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and those skilled in the art can also obtain other drawings according to the drawings without creative efforts.
FIG. 1 is a flow chart of an image compression method according to an embodiment of the present invention;
FIG. 2 is a schematic structural diagram of a target encoder according to an embodiment of the present invention;
FIG. 3 is a block diagram of a target decoder according to an embodiment of the present invention;
FIG. 4 is a schematic flow chart of obtaining a second hidden variable according to another embodiment of the present invention;
FIG. 5 is a schematic structural diagram of a Transformer module according to another embodiment of the present invention;
FIG. 6 is a schematic structural diagram of an image compression apparatus according to an embodiment of the present invention;
fig. 7 is a schematic structural diagram of an electronic device provided by the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the technical solutions of the present invention will be clearly and completely described below with reference to the accompanying drawings, and it is obvious that the described embodiments are some, but not all embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
An image compression method provided by the embodiment of the invention is described below with reference to fig. 1, and includes:
It can be understood that image compression is an application of data compression technology to digital images, and the purpose of the technology is to reduce redundant information in image data so as to store and transmit data in a more efficient format, when the amount of image data is too large, the storage, transmission and processing of image information become very difficult, and therefore, image compression needs to be performed on an image to be processed, so that the compressed image can be effectively used; in this embodiment, 100 ten thousand RGB color images are randomly extracted from the source database as the image to be compressed, and the required image data to be compressed can also be acquired from other data acquisition platforms.
102, dividing the image to be compressed into a plurality of image blocks based on a preprocessing rule, and inputting all the image blocks to be compressed into a pre-stored target encoder to obtain a first hidden variable; wherein the target encoder comprises a linear embedding layer module, a Transformer module and a block merging module.
It can be understood that, since redundant information in the image to be compressed needs to be removed to obtain the compressed image, the embodiment first performs normalization processing on the obtained RGB color image to be compressed by using image processing software, and unifies the sizes of the images; then dividing each image into a plurality of image blocks with fixed sizes according to a certain arrangement sequence to obtain an image block sequence; and finally, inputting the image block sequence into a target encoder for encoding, wherein the encoder can map the image block into a parameter of probability distribution obeyed by an implicit variable during training, and then sampling the parameter of the probability distribution to obtain a first implicit variable, wherein the first implicit variable is obtained by quantizing and rounding the output characteristics of the image block after the image block is trained by the target encoder. As shown in fig. 2: the main structure of the target encoder comprises a linear embedded layer module, a Transformer module and a block merging module; the linear embedding layer is composed of a multilayer perceptron (MLP), the linear embedding layer can perform linear transformation on channel data of each pixel in an image, the Transformer module is composed of a window-based attention layer, the multilayer perceptron and a normalization layer, the number of the Transformer modules in each layer is 2, 6 and 2, and the block merging module is composed of the multilayer perceptron and the normalization layer and is used for performing down-sampling operation on image features.
And 103, inputting the first hidden variable into a pre-stored entropy model to obtain a second hidden variable.
It can be understood that, because the first hidden variable is obtained by quantizing and rounding the output characteristics of the image block after being trained by the target encoder, rounding is performed, so that quantization loss is caused, in order to compensate for the quantization loss of the image block after passing through the target encoder, in this embodiment, the obtained first hidden variable is input into a pre-stored entropy model to obtain a second hidden variable, and the second hidden variable compensates for the quantization loss of the first hidden variable; specifically, in this embodiment, a channel-by-channel autoregressive entropy model is constructed based on a convolutional neural network, then a first hidden variable is input into the entropy model to obtain a probability distribution function of each element in the first hidden variable, and then a second hidden variable is obtained by respectively using arithmetic coding and decoding based on the probability distribution function.
Step 104, inputting the second hidden variable into a pre-stored target decoder to obtain a compressed image block, and obtaining a compressed image according to the compressed image block; wherein the target decoder comprises a de-embedding layer module, the Transformer module and a block splitting module.
It can be understood that the decoding is the inverse process of the encoding, and the obtained second hidden variable is input into the target decoder for analysis to obtain the compressed image blocks corresponding to the RGB image, and then the compressed image blocks are spliced into a complete image again according to the ordering information of the image blocks in the above embodiment; as shown in fig. 3: the main structure of the target decoder comprises a de-embedding layer module, a Transformer module and a block splitting module; the de-embedding layer is composed of a multilayer perceptron (MLP), the transform module is composed of a window-based attention layer, a multilayer perceptron and a normalization layer, and the block merging module is composed of a multilayer perceptron and a normalization layer and is used for performing up-sampling operation on image features.
The method of the invention introduces a Transformer module in the image compression task and adopts a symmetric processing architecture to encode and decode the image, thereby improving the image compression efficiency.
Optionally, the first hidden variable is input into the entropy model, a mean value and a variance of each element in the first hidden variable are obtained, and normal distribution of the first hidden variable is simulated according to the mean value and the variance of each element to obtain a probability distribution function; arithmetically encoding the first hidden variable based on the probability distribution function to obtain a target bit stream; arithmetically decoding the target bit stream based on the probability distribution function to obtain a third hidden variable; and obtaining the quantized residual loss of the third hidden variable through the entropy model, and obtaining the second hidden variable based on the third hidden variable and the quantized residual loss.
Specifically, as shown in fig. 4, in this embodiment, an image to be compressed is normalized, and each image is divided into a plurality of image blocks; then, image block sequences are obtained from the image blocks according to a certain arrangement sequence, and the image block sequences are input into a target encoder based on a Transformer framework for training to obtain a first hidden variableThen, the first hidden variable is addedInputting the data into a channel-by-channel autoregressive entropy model constructed based on a convolutional neural network to obtainMean value mu and variance sigma corresponding to each element in the data base, simulating Gaussian distribution of each element according to the mean value mu and the variance sigma to obtain a probability distribution function of each element, and quantizing a first hidden variable according to the probability distribution function by arithmetic codingLossless compression is carried out to form a bit stream, and the bit stream is a binary character string; finally, by arithmetic decoding according to rulesAnalyzing the bit stream into a quantized third hidden variable y by the rate distribution function, and simultaneously obtaining the quantized residual loss r of the channel-by-channel autoregressive entropy model prediction hidden variable y by a loss formulaA second hidden variable may be obtained
It should be noted that the third hidden variable is obtained by lossless arithmetic coding and decoding the first hidden variable, and the values of the third hidden variable and the first hidden variable are the same.
The embodiment provides a method for inputting the hidden variables output by the encoder into the entropy model and respectively obtaining new hidden variables through arithmetic coding and decoding technologies, so that quantization residual errors of quantized hidden variables are made up, and image distortion can be reduced.
Optionally, the global loss L is calculated using the following formula:
L=R+λD
wherein, λ is a hyper-parameter, R is a bit stream size obtained by compression, and D is a distortion term; and acquiring a target image compression model according to the global loss.
Specifically, λ is a hyper-parameter, which is used to control the bit rate and compression quality of compression, so as to generate a rate-distortion curve, and the calculation formula of R is: whereinIs a super-hidden variable in an entropy model and is prior information used for solvingMean and variance of;representing prior informationUnder the condition of the reaction, the reaction kettle is used for heating,the probability value of the normal distribution of (c),to representThe conditional entropy of (a) is,representing a priori informationThe probability value of the normal distribution of (c),representing a priori informationInformation entropy of (E) x~px [·]Representing the expected value of the image x within its expression under the normal distribution px, x representing the image to be compressed,the compressed image is obtained; d is a distortion term representing the difference size between the compressed image and the image to be compressed, representing the original image x and the reconstructed imageThe common evaluation standard is Mean Square Error (MSE); the embodiment determines a suitable hyper-parameter lambda by calculating the global loss L of the image compression model, and obtains the target image compression model by using the target hyper-parameter lambda.
The method provided by the embodiment provides a method for calculating the global loss L of an image compression model, and the required hyper-parameter lambda is determined by reducing L so as to obtain image compression models with different bit rates or reconstruction quality requirements.
Optionally, training an image compression model based on a BP algorithm, and adjusting the bit stream size R and the distortion term D to reduce the global loss l to obtain a target hyper-parameter; and training the image compression model according to the target hyper-parameter to obtain the image compression model.
Specifically, the embodiment adopts a back propagation algorithm and a random gradient descent method to reduce the prediction overall error l to train the image compression model, and obtains the final image compression model through multiple iterative training; for example, the value of the hyper-parameter λ set in this embodiment is {0.0018,0.0035, 0.0067,0.0130,0.025,0.0483}, so as to obtain a plurality of image compression models suitable for different scenes; for different bit rates or reconstruction quality requirements, different image compression models are selected, a scene with high reconstruction quality requirement is selected, a larger lambda is selected, such as 0.0483, and a lower lambda is selected, such as 0.0018, in a scene with low bit rate requirement.
The embodiment provides a method for determining a hyper-parameter lambda by using a BP training algorithm so as to meet image compression models with different bit rates or reconstruction quality requirements.
Optionally, the image to be compressed is normalized, and the processed image is equally divided into a plurality of image blocks according to a fixed division area.
Specifically, in this embodiment, 100 ten thousand acquired RGB color images are used as an image to be compressed, normalization processing is performed on the image to be compressed, an RGB color image with the same size is acquired, the dimension of each image is 768 × 512 × 3, the image is divided into image blocks with a unit size of 2 in the length (768) and width (512) dimensions, and then the 98304 image blocks are sent to a target encoder for training.
The embodiment provides a data preprocessing and image block dividing method, which provides convenience for inputting an image block into a target encoder to obtain a hidden variable in a subsequent process.
Optionally, the transform module includes a window-based attention layer, a multi-layer perceptron, and a normalization layer.
According to FIG. 5, the Transformer module is composed of a Window (Windows) based attention layer (W-MSA, SW-MSA), a multi-layer perceptron (MLP) and a normalization Layer (LN); wherein W-MSA and SW-MSA are used in pairs, and assuming that the L-th layer uses W-MSA, then the L + 1-th layer uses SW-MSA. According to the comparison between the left graph and the right graph, the window can be found to be shifted, the shifted window enables the previous adjacent windows to communicate with each other, and the problem that information communication cannot be performed between different windows is solved.
The method provided by the embodiment provides a composition structure of a Transformer module, and introduces an attention mechanism into each window, so that the attention mechanism focuses more on local structure information of an input image, namely, correlation between spatial adjacent elements, thereby overcoming the difficulty of lacking semantic information in image compression and improving the distortion rate performance of image compression.
An image compression apparatus provided by an embodiment of the present invention is described with reference to fig. 6, and an image compression apparatus described below and an image compression method described above may be referred to in correspondence with each other.
The invention provides an image compression device, comprising:
an image obtaining module 601, configured to obtain an image to be compressed; a decoding module 602, configured to divide the image to be compressed into a plurality of image blocks based on a preprocessing rule, and input all the image blocks to be compressed into a pre-stored target encoder to obtain a first hidden variable; the target encoder comprises a linear embedded layer module, a Transformer module and a block merging module; a conversion module 603, configured to input the first hidden variable into a pre-stored entropy model to obtain a second hidden variable; the decoding module 604 is configured to input the second hidden variable into a pre-stored target decoder to obtain a compressed image block, and obtain a compressed image according to the compressed image block; wherein the target decoder comprises a de-embedding layer module, the transform module and a block splitting module.
The invention provides an image compression device, which firstly obtains an image to be compressed through an image obtaining module 601; then, dividing the image to be compressed into a plurality of image blocks through a decoding module 602 based on a preprocessing rule, and inputting all the image blocks to be compressed into a pre-stored target encoder to obtain a first hidden variable; inputting the first hidden variable into a pre-stored entropy model through a conversion module 603 to obtain a second hidden variable; finally, the second hidden variable is input into a pre-stored target decoder through a decoding module 604 to obtain a compressed image block, and a compressed image is obtained according to the compressed image block; the device of the invention introduces a Transformer module in the image compression task and adopts a symmetric processing architecture to encode and decode the image, thereby improving the image compression efficiency.
Fig. 7 illustrates a physical structure diagram of an electronic device, and as shown in fig. 7, the electronic device may include: a processor (processor) 710, a communication Interface (Communications Interface) 720, a memory (memory) 730, and a communication bus 740, wherein the processor 710, the communication Interface 720, and the memory 730 communicate with each other via the communication bus 740. Processor 710 may invoke logic instructions in memory 730 to perform a method of image compression, the method comprising: acquiring an image to be compressed; dividing the image to be compressed into a plurality of image blocks based on a preprocessing rule, and inputting all the image blocks to be compressed into a pre-stored target encoder to obtain a first hidden variable, wherein the target encoder comprises a linear embedding layer module, a Transformer module and a block merging module; inputting the first hidden variable into a pre-stored entropy model to obtain a second hidden variable; and inputting the second hidden variable into a pre-stored target decoder to obtain a compressed image block, and obtaining a compressed image according to the compressed image block, wherein the target decoder comprises an embedding layer removing module, a Transformer module and a block splitting module.
In addition, the logic instructions in the memory 730 can be implemented in the form of software functional units and stored in a computer readable storage medium when the software functional units are sold or used as independent products. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.
The present invention also provides a non-transitory computer readable storage medium having stored thereon a computer program which, when executed by a processor, implements an image compression method provided by the above methods, the method comprising: acquiring an image to be compressed; dividing the image to be compressed into a plurality of image blocks based on a preprocessing rule, and inputting all the image blocks to be compressed into a pre-stored target encoder to obtain a first hidden variable, wherein the target encoder comprises a linear embedding layer module, a Transformer module and a block merging module; inputting the first hidden variable into a pre-stored entropy model to obtain a second hidden variable; and inputting the second hidden variable into a pre-stored target decoder to obtain a compressed image block, and obtaining a compressed image according to the compressed image block, wherein the target decoder comprises an embedding layer removing module, a Transformer module and a block splitting module.
The above-described embodiments of the apparatus are merely illustrative, and the units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of this embodiment. One of ordinary skill in the art can understand and implement it without inventive effort.
Through the above description of the embodiments, those skilled in the art will clearly understand that each embodiment can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware. With this understanding in mind, the above-described technical solutions may be embodied in the form of a software product, which can be stored in a computer-readable storage medium such as ROM/RAM, magnetic disk, optical disk, etc., and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the methods described in the embodiments or some parts of the embodiments.
Finally, it should be noted that: the above examples are only intended to illustrate the technical solution of the present invention, and not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.
Claims (7)
1. An image compression method, comprising:
acquiring an image to be compressed;
dividing the image to be compressed into a plurality of image blocks based on a preprocessing rule, and inputting all the image blocks to be compressed into a pre-stored target encoder to obtain a first hidden variable, wherein the target encoder comprises a linear embedding layer module, a Transformer module and a block merging module; normalizing the image to be compressed, and equally dividing the processed image into a plurality of image blocks according to a fixed division area, wherein the image blocks have the same size;
inputting the first hidden variable into a pre-stored entropy model to obtain a second hidden variable;
inputting the second hidden variable into a pre-stored target decoder to obtain a compressed image block, and obtaining a compressed image according to the compressed image block, wherein the target decoder comprises an embedding layer removing module, a transform module and a block splitting module;
after the compressed image block acquires a compressed image, the method further comprises:
the global loss L is calculated using the following formula:
L=R+λD;
the method comprises the following steps that A is a hyper-parameter, and the A is used for obtaining a rate-distortion curve by controlling the bit rate and compression quality of compression;
r is the bit stream size obtained by compression, and the calculation formula of R is as follows:
wherein x is the image to be compressed,in order to be a first hidden variable of said first type,for a hyper-hidden variable in the entropy model,for obtainingMean and variance of;for prior informationUnder the condition ofThe probability value of the normal distribution of (c),is composed ofThe conditional entropy of (a) is,as a priori informationThe probability value of the normal distribution of (c),as a priori informationInformation entropy of (E) x~px [·]Is the expected value of x within its expression under its normal distribution px;
d is a distortion term and is used for representing the difference between the compressed image and the image to be compressed, and the calculation formula of D is as follows:
wherein the content of the first and second substances,for the purpose of the compressed image, the image is,denotes x anddistortion between;
and acquiring a target image compression model according to the global loss.
2. The image compression method according to claim 1, wherein inputting the first hidden variable into a pre-stored entropy model to obtain a second hidden variable specifically comprises:
inputting the first hidden variable into the entropy model, acquiring the mean value and the variance of each element in the first hidden variable, and simulating the normal distribution of the first hidden variable according to the mean value and the variance of each element to acquire a probability distribution function;
performing arithmetic coding on the first hidden variable based on the probability distribution function to obtain a target bit stream;
arithmetically decoding the target bit stream based on the probability distribution function to obtain a third hidden variable;
and obtaining the quantized residual loss of the third hidden variable through the entropy model, and obtaining the second hidden variable based on the third hidden variable and the quantized residual loss.
3. The image compression method of claim 1, wherein obtaining a target image compression model based on the global loss comprises:
training an image compression model based on a BP algorithm, and adjusting the bit stream size R and the distortion item D to reduce the global loss L so as to obtain a target hyper-parameter;
and training the image compression model according to the target hyper-parameter to obtain the image compression model.
4. The method of image compression of claim 1, wherein the transform module comprises a window-based attention layer, a multi-layer perceptron, and a normalization layer.
5. An image compression apparatus, characterized in that the apparatus comprises:
the image acquisition module is used for acquiring an image to be compressed;
the decoding module is used for dividing the image to be compressed into a plurality of image blocks based on a preprocessing rule and inputting all the image blocks to be compressed into a pre-stored target encoder to obtain a first hidden variable, wherein the target encoder comprises a linear embedding layer module, a Transformer module and a block merging module;
the decoding module is specifically configured to perform normalization processing on the image to be compressed, and equally divide the processed image into a plurality of image blocks according to a fixed division area, where the image blocks are the same in size;
the conversion module is used for inputting the first hidden variable into a pre-stored entropy model so as to obtain a second hidden variable;
the decoding module is used for inputting the second hidden variable into a pre-stored target decoder to obtain a compressed image block and obtaining a compressed image according to the compressed image block, wherein the target decoder comprises a de-embedding layer module, a Transformer module and a block splitting module;
the decoding module is further configured to, after the compressed image block obtains a compressed image, calculate a global loss L using the following formula:
L=R+λD;
the method comprises the following steps that A is a hyper-parameter, and the A is used for obtaining a rate-distortion curve by controlling the bit rate and compression quality of compression;
r is the bit stream size obtained by compression, and the calculation formula of R is as follows:
wherein, the first and the second end of the pipe are connected with each other,for the hyper-hidden variables in the entropy model,for obtainingMean and variance of;for prior informationUnder the condition ofThe probability value of the normal distribution of (c),is composed ofThe conditional entropy of (a) is,is a priori informationThe probability value of the normal distribution of (b),as a priori informationInformation entropy of (E) x~px [·]Is the expected value of the image x within its expression under the normal distribution px, x being the image to be compressed,the compressed image is obtained;
d is a distortion term and is used for representing the difference between the compressed image and the image to be compressed, and the calculation formula of D is as follows:
and acquiring a target image compression model according to the global loss.
6. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the steps of the image compression method according to any of claims 1 to 4 are implemented when the processor executes the program.
7. A non-transitory computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the steps of the image compression method according to any one of claims 1 to 4.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210118720.2A CN114663536B (en) | 2022-02-08 | 2022-02-08 | Image compression method and device |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210118720.2A CN114663536B (en) | 2022-02-08 | 2022-02-08 | Image compression method and device |
Publications (2)
Publication Number | Publication Date |
---|---|
CN114663536A CN114663536A (en) | 2022-06-24 |
CN114663536B true CN114663536B (en) | 2022-12-06 |
Family
ID=82025927
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202210118720.2A Active CN114663536B (en) | 2022-02-08 | 2022-02-08 | Image compression method and device |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN114663536B (en) |
Family Cites Families (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102238376B (en) * | 2010-04-28 | 2014-04-23 | 鸿富锦精密工业(深圳)有限公司 | Image processing system and method |
WO2014079036A1 (en) * | 2012-11-23 | 2014-05-30 | 华为技术有限公司 | Image compression method and image processing apparatus |
CN108650509B (en) * | 2018-04-04 | 2020-08-18 | 浙江工业大学 | Multi-scale self-adaptive approximate lossless coding and decoding method and system |
US11335034B2 (en) * | 2019-01-16 | 2022-05-17 | Disney Enterprises, Inc. | Systems and methods for image compression at multiple, different bitrates |
CN111986278B (en) * | 2019-05-22 | 2024-02-06 | 富士通株式会社 | Image encoding device, probability model generating device, and image compression system |
CN113259676B (en) * | 2020-02-10 | 2023-01-17 | 北京大学 | Image compression method and device based on deep learning |
CN112036292B (en) * | 2020-08-27 | 2024-06-04 | 平安科技(深圳)有限公司 | Word recognition method and device based on neural network and readable storage medium |
CN113313777B (en) * | 2021-07-29 | 2021-12-21 | 杭州博雅鸿图视频技术有限公司 | Image compression processing method and device, computer equipment and storage medium |
CN113709455B (en) * | 2021-09-27 | 2023-10-24 | 北京交通大学 | Multi-level image compression method using transducer |
-
2022
- 2022-02-08 CN CN202210118720.2A patent/CN114663536B/en active Active
Also Published As
Publication number | Publication date |
---|---|
CN114663536A (en) | 2022-06-24 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US11153566B1 (en) | Variable bit rate generative compression method based on adversarial learning | |
CN111641832B (en) | Encoding method, decoding method, device, electronic device and storage medium | |
CN113259676B (en) | Image compression method and device based on deep learning | |
CN109451308B (en) | Video compression processing method and device, electronic equipment and storage medium | |
CN110677651A (en) | Video compression method | |
CN111630570A (en) | Image processing method, apparatus and computer-readable storage medium | |
CN110753225A (en) | Video compression method and device and terminal equipment | |
CN111641826B (en) | Method, device and system for encoding and decoding data | |
CN110892419A (en) | Stop-code tolerant image compression neural network | |
CN113747163B (en) | Image coding and decoding method and compression method based on context recombination modeling | |
Ororbia et al. | Learned neural iterative decoding for lossy image compression systems | |
CN111163314A (en) | Image compression method and system | |
US20220360788A1 (en) | Image encoding method and image decoding method | |
CN114449276A (en) | Super-prior side information compensation image compression method based on learning | |
Ranjbar Alvar et al. | Joint image compression and denoising via latent-space scalability | |
KR102245682B1 (en) | Apparatus for compressing image, learning apparatus and method thereof | |
CN114663536B (en) | Image compression method and device | |
CN111161363A (en) | Image coding model training method and device | |
CN113949880B (en) | Extremely-low-bit-rate man-machine collaborative image coding training method and coding and decoding method | |
CN115393452A (en) | Point cloud geometric compression method based on asymmetric self-encoder structure | |
CN113554719B (en) | Image encoding method, decoding method, storage medium and terminal equipment | |
CN115361555A (en) | Image encoding method, image encoding device, and computer storage medium | |
CN110234011B (en) | Video compression method and system | |
CN110717948A (en) | Image post-processing method, system and terminal equipment | |
CN117173263B (en) | Image compression method for generating countermeasure network based on enhanced multi-scale residual error |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |