CN114663536B - Image compression method and device - Google Patents

Image compression method and device Download PDF

Info

Publication number
CN114663536B
CN114663536B CN202210118720.2A CN202210118720A CN114663536B CN 114663536 B CN114663536 B CN 114663536B CN 202210118720 A CN202210118720 A CN 202210118720A CN 114663536 B CN114663536 B CN 114663536B
Authority
CN
China
Prior art keywords
image
compressed
hidden variable
module
compression
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202210118720.2A
Other languages
Chinese (zh)
Other versions
CN114663536A (en
Inventor
张兆翔
宋纯锋
邹仁杰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Institute of Automation of Chinese Academy of Science
Original Assignee
Institute of Automation of Chinese Academy of Science
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Institute of Automation of Chinese Academy of Science filed Critical Institute of Automation of Chinese Academy of Science
Priority to CN202210118720.2A priority Critical patent/CN114663536B/en
Publication of CN114663536A publication Critical patent/CN114663536A/en
Application granted granted Critical
Publication of CN114663536B publication Critical patent/CN114663536B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T9/00Image coding
    • G06T9/002Image coding using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/044Recurrent networks, e.g. Hopfield networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/047Probabilistic or stochastic networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/11Region-based segmentation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10024Color image
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20021Dividing image into blocks, subimages or windows

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • Computing Systems (AREA)
  • Molecular Biology (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Biophysics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Health & Medical Sciences (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Probability & Statistics with Applications (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Multimedia (AREA)
  • Compression Or Coding Systems Of Tv Signals (AREA)

Abstract

The invention provides an image compression method and device, wherein the method comprises the following steps: acquiring an image to be compressed; dividing the image to be compressed into a plurality of image blocks based on a preprocessing rule, and inputting all the image blocks to be compressed into a pre-stored target encoder to obtain a first hidden variable; inputting the first hidden variable into a pre-stored entropy model to obtain a second hidden variable; inputting the second hidden variable into a pre-stored target decoder to obtain a compressed image block, and obtaining a compressed image according to the compressed image block; the method of the invention introduces a Transformer module in the image compression task and adopts a symmetric processing architecture to encode and decode the image, thereby improving the image compression efficiency.

Description

Image compression method and device
Technical Field
The invention belongs to the field of computer vision, and particularly relates to an image compression method and device.
Background
Image compression is the application of data compression techniques to digital images, the purpose of which is to reduce redundant information in the image data, thereby efficiently storing and transmitting the data, i.e. to achieve the best image quality at a given bit rate or compression ratio.
In the prior art, a decoder and an encoder are generally designed based on a convolutional neural network to perform an image compression task, but an image compression process based on the convolutional neural network cannot capture semantic information of an image, and a global attention mechanism is poor in the image compression task due to the utilization of spatial redundancy information of the image, so that the image compression efficiency is low.
Disclosure of Invention
The image compression method and the image compression device provided by the invention are used for solving the defect of poor rate distortion performance in image compression caused by incapability of capturing semantic information of an image when a decoder and an encoder are designed to execute an image compression task based on a convolutional neural network in the prior art, and the image compression efficiency is improved.
The invention provides an image compression method, which comprises the following steps:
acquiring an image to be compressed; dividing the image to be compressed into a plurality of image blocks based on a preprocessing rule, and inputting all the image blocks to be compressed into a pre-stored target encoder to obtain a first hidden variable, wherein the target encoder comprises a linear embedded layer module, a transform module and a block merging module; inputting the first hidden variable into a pre-stored entropy model to obtain a second hidden variable; and inputting the second hidden variable into a pre-stored target decoder to obtain a compressed image block, and obtaining a compressed image according to the compressed image block, wherein the target decoder comprises an embedding layer removing module, a transform module and a block splitting module.
According to an image compression method provided by the present invention, the method further comprises:
inputting the first hidden variable into the entropy model, obtaining the mean value and the variance of each element in the first hidden variable, and simulating the normal distribution of the first hidden variable according to the mean value and the variance of each element to obtain a probability distribution function; arithmetically encoding the first hidden variable based on the probability distribution function to obtain a target bit stream; arithmetically decoding the target bit stream based on the probability distribution function to obtain a third hidden variable; and obtaining the quantized residual loss of the third hidden variable through the entropy model, and obtaining the second hidden variable based on the third hidden variable and the quantized residual loss.
The global loss L is calculated using the following formula:
L=R+λD
wherein, λ is a hyper-parameter, R is a bit stream size obtained by compression, and D is a distortion term; and acquiring a target image compression model according to the global loss.
Training an image compression model based on a BP algorithm, and adjusting the bit stream size R and the distortion item D to reduce the global loss L so as to obtain a target hyper-parameter; and training the image compression model according to the target hyper-parameter to obtain the image compression model.
And normalizing the image to be compressed, and equally dividing the processed image into a plurality of image blocks according to a fixed division area.
The Transformer module includes a window-based attention layer, a multi-layer perceptron, and a normalization layer.
The present invention also provides an image compression apparatus, comprising:
the image acquisition module is used for acquiring an image to be compressed; the decoding module is used for dividing the image to be compressed into a plurality of image blocks based on a preprocessing rule and inputting all the image blocks to be compressed into a pre-stored target encoder so as to acquire a first hidden variable; the target encoder comprises a linear embedded layer module, a Transformer module and a block merging module; the conversion module is used for inputting the first hidden variable into a pre-stored entropy model so as to obtain a second hidden variable; the decoding module is used for inputting the second hidden variable into a pre-stored target decoder so as to obtain a compressed image block and obtain a compressed image according to the compressed image block; wherein the target decoder comprises a de-embedding layer module, the Transformer module and a block splitting module.
The present invention also provides an electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor implements the steps of the image compression method as described in any of the above when executing the program.
The invention also provides a non-transitory computer readable storage medium having stored thereon a computer program which, when executed by a processor, performs the steps of the image compression method as described in any one of the above.
The invention provides an image compression method and device, which comprises the steps of firstly obtaining an image to be compressed; then dividing the image to be compressed into a plurality of image blocks based on a preprocessing rule, and inputting all the image blocks to be compressed into a pre-stored target encoder to obtain a first hidden variable; inputting the first hidden variable into a pre-stored entropy model to obtain a second hidden variable; finally, inputting the second hidden variable into a pre-stored target decoder to obtain a compressed image block, and obtaining a compressed image according to the compressed image block; the method of the invention introduces a Transformer module in the image compression task and adopts a symmetric processing architecture to encode and decode the image, thereby improving the image compression efficiency.
Drawings
In order to more clearly illustrate the technical solutions of the present invention or the prior art, the drawings needed for the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and those skilled in the art can also obtain other drawings according to the drawings without creative efforts.
FIG. 1 is a flow chart of an image compression method according to an embodiment of the present invention;
FIG. 2 is a schematic structural diagram of a target encoder according to an embodiment of the present invention;
FIG. 3 is a block diagram of a target decoder according to an embodiment of the present invention;
FIG. 4 is a schematic flow chart of obtaining a second hidden variable according to another embodiment of the present invention;
FIG. 5 is a schematic structural diagram of a Transformer module according to another embodiment of the present invention;
FIG. 6 is a schematic structural diagram of an image compression apparatus according to an embodiment of the present invention;
fig. 7 is a schematic structural diagram of an electronic device provided by the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the technical solutions of the present invention will be clearly and completely described below with reference to the accompanying drawings, and it is obvious that the described embodiments are some, but not all embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
An image compression method provided by the embodiment of the invention is described below with reference to fig. 1, and includes:
step 101, obtaining an image to be compressed.
It can be understood that image compression is an application of data compression technology to digital images, and the purpose of the technology is to reduce redundant information in image data so as to store and transmit data in a more efficient format, when the amount of image data is too large, the storage, transmission and processing of image information become very difficult, and therefore, image compression needs to be performed on an image to be processed, so that the compressed image can be effectively used; in this embodiment, 100 ten thousand RGB color images are randomly extracted from the source database as the image to be compressed, and the required image data to be compressed can also be acquired from other data acquisition platforms.
102, dividing the image to be compressed into a plurality of image blocks based on a preprocessing rule, and inputting all the image blocks to be compressed into a pre-stored target encoder to obtain a first hidden variable; wherein the target encoder comprises a linear embedding layer module, a Transformer module and a block merging module.
It can be understood that, since redundant information in the image to be compressed needs to be removed to obtain the compressed image, the embodiment first performs normalization processing on the obtained RGB color image to be compressed by using image processing software, and unifies the sizes of the images; then dividing each image into a plurality of image blocks with fixed sizes according to a certain arrangement sequence to obtain an image block sequence; and finally, inputting the image block sequence into a target encoder for encoding, wherein the encoder can map the image block into a parameter of probability distribution obeyed by an implicit variable during training, and then sampling the parameter of the probability distribution to obtain a first implicit variable, wherein the first implicit variable is obtained by quantizing and rounding the output characteristics of the image block after the image block is trained by the target encoder. As shown in fig. 2: the main structure of the target encoder comprises a linear embedded layer module, a Transformer module and a block merging module; the linear embedding layer is composed of a multilayer perceptron (MLP), the linear embedding layer can perform linear transformation on channel data of each pixel in an image, the Transformer module is composed of a window-based attention layer, the multilayer perceptron and a normalization layer, the number of the Transformer modules in each layer is 2, 6 and 2, and the block merging module is composed of the multilayer perceptron and the normalization layer and is used for performing down-sampling operation on image features.
And 103, inputting the first hidden variable into a pre-stored entropy model to obtain a second hidden variable.
It can be understood that, because the first hidden variable is obtained by quantizing and rounding the output characteristics of the image block after being trained by the target encoder, rounding is performed, so that quantization loss is caused, in order to compensate for the quantization loss of the image block after passing through the target encoder, in this embodiment, the obtained first hidden variable is input into a pre-stored entropy model to obtain a second hidden variable, and the second hidden variable compensates for the quantization loss of the first hidden variable; specifically, in this embodiment, a channel-by-channel autoregressive entropy model is constructed based on a convolutional neural network, then a first hidden variable is input into the entropy model to obtain a probability distribution function of each element in the first hidden variable, and then a second hidden variable is obtained by respectively using arithmetic coding and decoding based on the probability distribution function.
Step 104, inputting the second hidden variable into a pre-stored target decoder to obtain a compressed image block, and obtaining a compressed image according to the compressed image block; wherein the target decoder comprises a de-embedding layer module, the Transformer module and a block splitting module.
It can be understood that the decoding is the inverse process of the encoding, and the obtained second hidden variable is input into the target decoder for analysis to obtain the compressed image blocks corresponding to the RGB image, and then the compressed image blocks are spliced into a complete image again according to the ordering information of the image blocks in the above embodiment; as shown in fig. 3: the main structure of the target decoder comprises a de-embedding layer module, a Transformer module and a block splitting module; the de-embedding layer is composed of a multilayer perceptron (MLP), the transform module is composed of a window-based attention layer, a multilayer perceptron and a normalization layer, and the block merging module is composed of a multilayer perceptron and a normalization layer and is used for performing up-sampling operation on image features.
The method of the invention introduces a Transformer module in the image compression task and adopts a symmetric processing architecture to encode and decode the image, thereby improving the image compression efficiency.
Optionally, the first hidden variable is input into the entropy model, a mean value and a variance of each element in the first hidden variable are obtained, and normal distribution of the first hidden variable is simulated according to the mean value and the variance of each element to obtain a probability distribution function; arithmetically encoding the first hidden variable based on the probability distribution function to obtain a target bit stream; arithmetically decoding the target bit stream based on the probability distribution function to obtain a third hidden variable; and obtaining the quantized residual loss of the third hidden variable through the entropy model, and obtaining the second hidden variable based on the third hidden variable and the quantized residual loss.
Specifically, as shown in fig. 4, in this embodiment, an image to be compressed is normalized, and each image is divided into a plurality of image blocks; then, image block sequences are obtained from the image blocks according to a certain arrangement sequence, and the image block sequences are input into a target encoder based on a Transformer framework for training to obtain a first hidden variable
Figure BDA0003497610690000061
Then, the first hidden variable is added
Figure BDA0003497610690000063
Inputting the data into a channel-by-channel autoregressive entropy model constructed based on a convolutional neural network to obtain
Figure BDA0003497610690000062
Mean value mu and variance sigma corresponding to each element in the data base, simulating Gaussian distribution of each element according to the mean value mu and the variance sigma to obtain a probability distribution function of each element, and quantizing a first hidden variable according to the probability distribution function by arithmetic coding
Figure BDA0003497610690000064
Lossless compression is carried out to form a bit stream, and the bit stream is a binary character string; finally, by arithmetic decoding according to rulesAnalyzing the bit stream into a quantized third hidden variable y by the rate distribution function, and simultaneously obtaining the quantized residual loss r of the channel-by-channel autoregressive entropy model prediction hidden variable y by a loss formula
Figure BDA0003497610690000071
A second hidden variable may be obtained
Figure BDA0003497610690000072
It should be noted that the third hidden variable is obtained by lossless arithmetic coding and decoding the first hidden variable, and the values of the third hidden variable and the first hidden variable are the same.
The embodiment provides a method for inputting the hidden variables output by the encoder into the entropy model and respectively obtaining new hidden variables through arithmetic coding and decoding technologies, so that quantization residual errors of quantized hidden variables are made up, and image distortion can be reduced.
Optionally, the global loss L is calculated using the following formula:
L=R+λD
wherein, λ is a hyper-parameter, R is a bit stream size obtained by compression, and D is a distortion term; and acquiring a target image compression model according to the global loss.
Specifically, λ is a hyper-parameter, which is used to control the bit rate and compression quality of compression, so as to generate a rate-distortion curve, and the calculation formula of R is:
Figure BDA0003497610690000073
Figure BDA0003497610690000074
wherein
Figure BDA0003497610690000075
Is a super-hidden variable in an entropy model and is prior information used for solving
Figure BDA0003497610690000076
Mean and variance of;
Figure BDA0003497610690000077
representing prior information
Figure BDA0003497610690000078
Under the condition of the reaction, the reaction kettle is used for heating,
Figure BDA0003497610690000079
the probability value of the normal distribution of (c),
Figure BDA00034976106900000710
to represent
Figure BDA00034976106900000711
The conditional entropy of (a) is,
Figure BDA00034976106900000712
representing a priori information
Figure BDA00034976106900000713
The probability value of the normal distribution of (c),
Figure BDA00034976106900000714
representing a priori information
Figure BDA00034976106900000715
Information entropy of (E) x~px [·]Representing the expected value of the image x within its expression under the normal distribution px, x representing the image to be compressed,
Figure BDA00034976106900000716
the compressed image is obtained; d is a distortion term representing the difference size between the compressed image and the image to be compressed,
Figure BDA00034976106900000717
Figure BDA00034976106900000718
representing the original image x and the reconstructed image
Figure BDA00034976106900000719
The common evaluation standard is Mean Square Error (MSE); the embodiment determines a suitable hyper-parameter lambda by calculating the global loss L of the image compression model, and obtains the target image compression model by using the target hyper-parameter lambda.
The method provided by the embodiment provides a method for calculating the global loss L of an image compression model, and the required hyper-parameter lambda is determined by reducing L so as to obtain image compression models with different bit rates or reconstruction quality requirements.
Optionally, training an image compression model based on a BP algorithm, and adjusting the bit stream size R and the distortion term D to reduce the global loss l to obtain a target hyper-parameter; and training the image compression model according to the target hyper-parameter to obtain the image compression model.
Specifically, the embodiment adopts a back propagation algorithm and a random gradient descent method to reduce the prediction overall error l to train the image compression model, and obtains the final image compression model through multiple iterative training; for example, the value of the hyper-parameter λ set in this embodiment is {0.0018,0.0035, 0.0067,0.0130,0.025,0.0483}, so as to obtain a plurality of image compression models suitable for different scenes; for different bit rates or reconstruction quality requirements, different image compression models are selected, a scene with high reconstruction quality requirement is selected, a larger lambda is selected, such as 0.0483, and a lower lambda is selected, such as 0.0018, in a scene with low bit rate requirement.
The embodiment provides a method for determining a hyper-parameter lambda by using a BP training algorithm so as to meet image compression models with different bit rates or reconstruction quality requirements.
Optionally, the image to be compressed is normalized, and the processed image is equally divided into a plurality of image blocks according to a fixed division area.
Specifically, in this embodiment, 100 ten thousand acquired RGB color images are used as an image to be compressed, normalization processing is performed on the image to be compressed, an RGB color image with the same size is acquired, the dimension of each image is 768 × 512 × 3, the image is divided into image blocks with a unit size of 2 in the length (768) and width (512) dimensions, and then the 98304 image blocks are sent to a target encoder for training.
The embodiment provides a data preprocessing and image block dividing method, which provides convenience for inputting an image block into a target encoder to obtain a hidden variable in a subsequent process.
Optionally, the transform module includes a window-based attention layer, a multi-layer perceptron, and a normalization layer.
According to FIG. 5, the Transformer module is composed of a Window (Windows) based attention layer (W-MSA, SW-MSA), a multi-layer perceptron (MLP) and a normalization Layer (LN); wherein W-MSA and SW-MSA are used in pairs, and assuming that the L-th layer uses W-MSA, then the L + 1-th layer uses SW-MSA. According to the comparison between the left graph and the right graph, the window can be found to be shifted, the shifted window enables the previous adjacent windows to communicate with each other, and the problem that information communication cannot be performed between different windows is solved.
The method provided by the embodiment provides a composition structure of a Transformer module, and introduces an attention mechanism into each window, so that the attention mechanism focuses more on local structure information of an input image, namely, correlation between spatial adjacent elements, thereby overcoming the difficulty of lacking semantic information in image compression and improving the distortion rate performance of image compression.
An image compression apparatus provided by an embodiment of the present invention is described with reference to fig. 6, and an image compression apparatus described below and an image compression method described above may be referred to in correspondence with each other.
The invention provides an image compression device, comprising:
an image obtaining module 601, configured to obtain an image to be compressed; a decoding module 602, configured to divide the image to be compressed into a plurality of image blocks based on a preprocessing rule, and input all the image blocks to be compressed into a pre-stored target encoder to obtain a first hidden variable; the target encoder comprises a linear embedded layer module, a Transformer module and a block merging module; a conversion module 603, configured to input the first hidden variable into a pre-stored entropy model to obtain a second hidden variable; the decoding module 604 is configured to input the second hidden variable into a pre-stored target decoder to obtain a compressed image block, and obtain a compressed image according to the compressed image block; wherein the target decoder comprises a de-embedding layer module, the transform module and a block splitting module.
The invention provides an image compression device, which firstly obtains an image to be compressed through an image obtaining module 601; then, dividing the image to be compressed into a plurality of image blocks through a decoding module 602 based on a preprocessing rule, and inputting all the image blocks to be compressed into a pre-stored target encoder to obtain a first hidden variable; inputting the first hidden variable into a pre-stored entropy model through a conversion module 603 to obtain a second hidden variable; finally, the second hidden variable is input into a pre-stored target decoder through a decoding module 604 to obtain a compressed image block, and a compressed image is obtained according to the compressed image block; the device of the invention introduces a Transformer module in the image compression task and adopts a symmetric processing architecture to encode and decode the image, thereby improving the image compression efficiency.
Fig. 7 illustrates a physical structure diagram of an electronic device, and as shown in fig. 7, the electronic device may include: a processor (processor) 710, a communication Interface (Communications Interface) 720, a memory (memory) 730, and a communication bus 740, wherein the processor 710, the communication Interface 720, and the memory 730 communicate with each other via the communication bus 740. Processor 710 may invoke logic instructions in memory 730 to perform a method of image compression, the method comprising: acquiring an image to be compressed; dividing the image to be compressed into a plurality of image blocks based on a preprocessing rule, and inputting all the image blocks to be compressed into a pre-stored target encoder to obtain a first hidden variable, wherein the target encoder comprises a linear embedding layer module, a Transformer module and a block merging module; inputting the first hidden variable into a pre-stored entropy model to obtain a second hidden variable; and inputting the second hidden variable into a pre-stored target decoder to obtain a compressed image block, and obtaining a compressed image according to the compressed image block, wherein the target decoder comprises an embedding layer removing module, a Transformer module and a block splitting module.
In addition, the logic instructions in the memory 730 can be implemented in the form of software functional units and stored in a computer readable storage medium when the software functional units are sold or used as independent products. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.
The present invention also provides a non-transitory computer readable storage medium having stored thereon a computer program which, when executed by a processor, implements an image compression method provided by the above methods, the method comprising: acquiring an image to be compressed; dividing the image to be compressed into a plurality of image blocks based on a preprocessing rule, and inputting all the image blocks to be compressed into a pre-stored target encoder to obtain a first hidden variable, wherein the target encoder comprises a linear embedding layer module, a Transformer module and a block merging module; inputting the first hidden variable into a pre-stored entropy model to obtain a second hidden variable; and inputting the second hidden variable into a pre-stored target decoder to obtain a compressed image block, and obtaining a compressed image according to the compressed image block, wherein the target decoder comprises an embedding layer removing module, a Transformer module and a block splitting module.
The above-described embodiments of the apparatus are merely illustrative, and the units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of this embodiment. One of ordinary skill in the art can understand and implement it without inventive effort.
Through the above description of the embodiments, those skilled in the art will clearly understand that each embodiment can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware. With this understanding in mind, the above-described technical solutions may be embodied in the form of a software product, which can be stored in a computer-readable storage medium such as ROM/RAM, magnetic disk, optical disk, etc., and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the methods described in the embodiments or some parts of the embodiments.
Finally, it should be noted that: the above examples are only intended to illustrate the technical solution of the present invention, and not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.

Claims (7)

1. An image compression method, comprising:
acquiring an image to be compressed;
dividing the image to be compressed into a plurality of image blocks based on a preprocessing rule, and inputting all the image blocks to be compressed into a pre-stored target encoder to obtain a first hidden variable, wherein the target encoder comprises a linear embedding layer module, a Transformer module and a block merging module; normalizing the image to be compressed, and equally dividing the processed image into a plurality of image blocks according to a fixed division area, wherein the image blocks have the same size;
inputting the first hidden variable into a pre-stored entropy model to obtain a second hidden variable;
inputting the second hidden variable into a pre-stored target decoder to obtain a compressed image block, and obtaining a compressed image according to the compressed image block, wherein the target decoder comprises an embedding layer removing module, a transform module and a block splitting module;
after the compressed image block acquires a compressed image, the method further comprises:
the global loss L is calculated using the following formula:
L=R+λD;
the method comprises the following steps that A is a hyper-parameter, and the A is used for obtaining a rate-distortion curve by controlling the bit rate and compression quality of compression;
r is the bit stream size obtained by compression, and the calculation formula of R is as follows:
Figure FDA0003880823640000011
wherein x is the image to be compressed,
Figure FDA0003880823640000012
in order to be a first hidden variable of said first type,
Figure FDA0003880823640000013
for a hyper-hidden variable in the entropy model,
Figure FDA0003880823640000014
for obtaining
Figure FDA0003880823640000015
Mean and variance of;
Figure FDA0003880823640000016
for prior information
Figure FDA0003880823640000017
Under the condition of
Figure FDA0003880823640000018
The probability value of the normal distribution of (c),
Figure FDA0003880823640000019
is composed of
Figure FDA00038808236400000110
The conditional entropy of (a) is,
Figure FDA00038808236400000111
as a priori information
Figure FDA00038808236400000112
The probability value of the normal distribution of (c),
Figure FDA00038808236400000113
as a priori information
Figure FDA00038808236400000114
Information entropy of (E) x~px [·]Is the expected value of x within its expression under its normal distribution px;
d is a distortion term and is used for representing the difference between the compressed image and the image to be compressed, and the calculation formula of D is as follows:
Figure FDA0003880823640000021
wherein the content of the first and second substances,
Figure FDA0003880823640000022
for the purpose of the compressed image, the image is,
Figure FDA0003880823640000023
denotes x and
Figure FDA0003880823640000024
distortion between;
and acquiring a target image compression model according to the global loss.
2. The image compression method according to claim 1, wherein inputting the first hidden variable into a pre-stored entropy model to obtain a second hidden variable specifically comprises:
inputting the first hidden variable into the entropy model, acquiring the mean value and the variance of each element in the first hidden variable, and simulating the normal distribution of the first hidden variable according to the mean value and the variance of each element to acquire a probability distribution function;
performing arithmetic coding on the first hidden variable based on the probability distribution function to obtain a target bit stream;
arithmetically decoding the target bit stream based on the probability distribution function to obtain a third hidden variable;
and obtaining the quantized residual loss of the third hidden variable through the entropy model, and obtaining the second hidden variable based on the third hidden variable and the quantized residual loss.
3. The image compression method of claim 1, wherein obtaining a target image compression model based on the global loss comprises:
training an image compression model based on a BP algorithm, and adjusting the bit stream size R and the distortion item D to reduce the global loss L so as to obtain a target hyper-parameter;
and training the image compression model according to the target hyper-parameter to obtain the image compression model.
4. The method of image compression of claim 1, wherein the transform module comprises a window-based attention layer, a multi-layer perceptron, and a normalization layer.
5. An image compression apparatus, characterized in that the apparatus comprises:
the image acquisition module is used for acquiring an image to be compressed;
the decoding module is used for dividing the image to be compressed into a plurality of image blocks based on a preprocessing rule and inputting all the image blocks to be compressed into a pre-stored target encoder to obtain a first hidden variable, wherein the target encoder comprises a linear embedding layer module, a Transformer module and a block merging module;
the decoding module is specifically configured to perform normalization processing on the image to be compressed, and equally divide the processed image into a plurality of image blocks according to a fixed division area, where the image blocks are the same in size;
the conversion module is used for inputting the first hidden variable into a pre-stored entropy model so as to obtain a second hidden variable;
the decoding module is used for inputting the second hidden variable into a pre-stored target decoder to obtain a compressed image block and obtaining a compressed image according to the compressed image block, wherein the target decoder comprises a de-embedding layer module, a Transformer module and a block splitting module;
the decoding module is further configured to, after the compressed image block obtains a compressed image, calculate a global loss L using the following formula:
L=R+λD;
the method comprises the following steps that A is a hyper-parameter, and the A is used for obtaining a rate-distortion curve by controlling the bit rate and compression quality of compression;
r is the bit stream size obtained by compression, and the calculation formula of R is as follows:
Figure FDA0003880823640000031
wherein, the first and the second end of the pipe are connected with each other,
Figure FDA0003880823640000032
for the hyper-hidden variables in the entropy model,
Figure FDA0003880823640000033
for obtaining
Figure FDA0003880823640000034
Mean and variance of;
Figure FDA0003880823640000035
for prior information
Figure FDA0003880823640000036
Under the condition of
Figure FDA0003880823640000037
The probability value of the normal distribution of (c),
Figure FDA0003880823640000038
is composed of
Figure FDA0003880823640000039
The conditional entropy of (a) is,
Figure FDA00038808236400000310
is a priori information
Figure FDA00038808236400000311
The probability value of the normal distribution of (b),
Figure FDA00038808236400000312
as a priori information
Figure FDA00038808236400000313
Information entropy of (E) x~px [·]Is the expected value of the image x within its expression under the normal distribution px, x being the image to be compressed,
Figure FDA00038808236400000314
the compressed image is obtained;
d is a distortion term and is used for representing the difference between the compressed image and the image to be compressed, and the calculation formula of D is as follows:
Figure FDA00038808236400000315
wherein the content of the first and second substances,
Figure FDA0003880823640000041
denotes x and
Figure FDA0003880823640000042
distortion between;
and acquiring a target image compression model according to the global loss.
6. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the steps of the image compression method according to any of claims 1 to 4 are implemented when the processor executes the program.
7. A non-transitory computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the steps of the image compression method according to any one of claims 1 to 4.
CN202210118720.2A 2022-02-08 2022-02-08 Image compression method and device Active CN114663536B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210118720.2A CN114663536B (en) 2022-02-08 2022-02-08 Image compression method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210118720.2A CN114663536B (en) 2022-02-08 2022-02-08 Image compression method and device

Publications (2)

Publication Number Publication Date
CN114663536A CN114663536A (en) 2022-06-24
CN114663536B true CN114663536B (en) 2022-12-06

Family

ID=82025927

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210118720.2A Active CN114663536B (en) 2022-02-08 2022-02-08 Image compression method and device

Country Status (1)

Country Link
CN (1) CN114663536B (en)

Family Cites Families (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102238376B (en) * 2010-04-28 2014-04-23 鸿富锦精密工业(深圳)有限公司 Image processing system and method
WO2014079036A1 (en) * 2012-11-23 2014-05-30 华为技术有限公司 Image compression method and image processing apparatus
CN108650509B (en) * 2018-04-04 2020-08-18 浙江工业大学 Multi-scale self-adaptive approximate lossless coding and decoding method and system
US11335034B2 (en) * 2019-01-16 2022-05-17 Disney Enterprises, Inc. Systems and methods for image compression at multiple, different bitrates
CN111986278B (en) * 2019-05-22 2024-02-06 富士通株式会社 Image encoding device, probability model generating device, and image compression system
CN113259676B (en) * 2020-02-10 2023-01-17 北京大学 Image compression method and device based on deep learning
CN112036292B (en) * 2020-08-27 2024-06-04 平安科技(深圳)有限公司 Word recognition method and device based on neural network and readable storage medium
CN113313777B (en) * 2021-07-29 2021-12-21 杭州博雅鸿图视频技术有限公司 Image compression processing method and device, computer equipment and storage medium
CN113709455B (en) * 2021-09-27 2023-10-24 北京交通大学 Multi-level image compression method using transducer

Also Published As

Publication number Publication date
CN114663536A (en) 2022-06-24

Similar Documents

Publication Publication Date Title
US11153566B1 (en) Variable bit rate generative compression method based on adversarial learning
CN111641832B (en) Encoding method, decoding method, device, electronic device and storage medium
CN113259676B (en) Image compression method and device based on deep learning
CN109451308B (en) Video compression processing method and device, electronic equipment and storage medium
CN110677651A (en) Video compression method
CN111630570A (en) Image processing method, apparatus and computer-readable storage medium
CN110753225A (en) Video compression method and device and terminal equipment
CN111641826B (en) Method, device and system for encoding and decoding data
CN110892419A (en) Stop-code tolerant image compression neural network
CN113747163B (en) Image coding and decoding method and compression method based on context recombination modeling
Ororbia et al. Learned neural iterative decoding for lossy image compression systems
CN111163314A (en) Image compression method and system
US20220360788A1 (en) Image encoding method and image decoding method
CN114449276A (en) Super-prior side information compensation image compression method based on learning
Ranjbar Alvar et al. Joint image compression and denoising via latent-space scalability
KR102245682B1 (en) Apparatus for compressing image, learning apparatus and method thereof
CN114663536B (en) Image compression method and device
CN111161363A (en) Image coding model training method and device
CN113949880B (en) Extremely-low-bit-rate man-machine collaborative image coding training method and coding and decoding method
CN115393452A (en) Point cloud geometric compression method based on asymmetric self-encoder structure
CN113554719B (en) Image encoding method, decoding method, storage medium and terminal equipment
CN115361555A (en) Image encoding method, image encoding device, and computer storage medium
CN110234011B (en) Video compression method and system
CN110717948A (en) Image post-processing method, system and terminal equipment
CN117173263B (en) Image compression method for generating countermeasure network based on enhanced multi-scale residual error

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant