WO2023093377A1 - 编解码方法及电子设备 - Google Patents

编解码方法及电子设备 Download PDF

Info

Publication number
WO2023093377A1
WO2023093377A1 PCT/CN2022/125944 CN2022125944W WO2023093377A1 WO 2023093377 A1 WO2023093377 A1 WO 2023093377A1 CN 2022125944 W CN2022125944 W CN 2022125944W WO 2023093377 A1 WO2023093377 A1 WO 2023093377A1
Authority
WO
WIPO (PCT)
Prior art keywords
feature
feature map
decoded
group
channels
Prior art date
Application number
PCT/CN2022/125944
Other languages
English (en)
French (fr)
Inventor
师一博
王晶
赵寅
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Publication of WO2023093377A1 publication Critical patent/WO2023093377A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/134Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or criterion affecting or controlling the adaptive coding
    • H04N19/136Incoming video signal characteristics or properties
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/169Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding
    • H04N19/17Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object
    • H04N19/172Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object the region being a picture, frame or field
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/42Methods or arrangements for coding, decoding, compressing or decompressing digital video signals characterised by implementation details or hardware specially adapted for video compression or decompression, e.g. dedicated software implementation
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/90Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using coding techniques not provided for in groups H04N19/10-H04N19/85, e.g. fractals
    • H04N19/91Entropy coding, e.g. variable length coding [VLC] or arithmetic coding

Definitions

  • the embodiments of the present application relate to the field of data processing, and in particular, to a codec method and electronic equipment.
  • AI Artificial Intelligence, artificial intelligence
  • image compression algorithm is implemented based on deep learning.
  • traditional image compression technologies such as JPEG (Joint Photographic Experts Group, Joint Photographic Experts Group), BPG (Better Portable Graphics, better Portable Graphics), etc.
  • the AI image compression algorithm predicts the entropy estimation feature of a channel in the feature map matrix based on the information of all channels of the feature map matrix.
  • the correlation between the channels of the feature map matrix is low. Therefore, if all channel information of the feature map matrix is fused, there will be a large amount of unusable information, which will affect the efficiency of encoding and decoding.
  • the present application provides a codec method and electronic equipment.
  • an embodiment of the present application provides an encoding method, and the method includes: first, acquiring an image to be encoded. Next, based on the image to be encoded, a first feature map matrix is generated, where the first feature map matrix includes first feature maps of c channels, and c is a positive integer. Subsequently, the feature map group consisting of the first feature maps of k channels is fused within the group to obtain the first entropy estimation feature corresponding to the feature map group, where k is a positive integer smaller than c. Next, determine a probability distribution corresponding to the first feature map matrix according to the first entropy estimation feature, and then encode the first feature map matrix according to the probability distribution to obtain a code stream.
  • the entropy estimation feature is determined by using the feature map group composed of the feature maps of some channels for intra-group fusion. Compared with the entropy estimation feature determined by fusion of the feature maps of all channels, the introduction of invalid information is reduced, and then It can reduce coding computing power and improve coding efficiency.
  • the first feature maps of c channels and the first feature maps of k channels may be used to form a feature map group, and then N feature map groups may be obtained.
  • N is an integer greater than 1, and N is determined according to c and k.
  • N groups of first entropy estimation features can be obtained.
  • the amount of invalid information can be reduced. Introduced, thereby reducing the coding computing power, thereby improving the coding efficiency.
  • the quality of the reconstructed image can also be improved.
  • the N groups of first entropy estimation features are combined to obtain the first entropy estimation feature corresponding to the first feature map matrix; according to the first entropy estimation feature corresponding to the first feature map matrix, the probability of the first feature map matrix is determined distributed.
  • each feature map group may be the same or different, and this application is not limited thereto.
  • different feature map groups may contain the first feature map of the same channel.
  • different feature map groups may contain first feature maps of different channels.
  • the feature map group composed of the first feature maps of k channels is fused within the group to obtain the first entropy estimation feature corresponding to the feature map group, including: using the autoregressive weight matrix corresponding to the feature map group, Local spatial information is extracted from the feature map group to obtain the first entropy estimation feature corresponding to the feature map group.
  • M is the total number of output channels of the autoregressive model.
  • the feature map group corresponding to c1 ks1*ks2, where "ks1*ks2" represents the size of the convolution kernel of the autoregressive model, and ks1 may or may not be equal to ks2, which is not limited in this application. That is to say, among M i output channels, each output channel corresponds to k weight maps whose size is ks1*ks2.
  • the first feature map matrix includes: a second feature map matrix and a third feature map matrix, wherein the second feature map matrix includes the second feature map matrix of c channels The feature map, the third feature map matrix includes the third feature map of c channels; the feature map group consisting of the first feature maps of k channels is fused within the group to obtain the first entropy estimation feature corresponding to the feature map group, Including: performing intra-group fusion on the feature map group composed of the second feature map of k channels to obtain the first entropy estimation feature corresponding to the feature map group composed of the third feature map of k channels; according to the first entropy estimation feature, determining the probability distribution corresponding to the first feature map matrix, including: determining the probability distribution corresponding to the third feature map matrix according to the first entropy estimation feature corresponding to the feature map group composed of the third feature map of k channels; Encoding the first feature map matrix with a probability distribution to obtain a code stream includes: encoding the
  • the second feature map matrix and the third feature map matrix can be obtained by space-dividing the first feature map matrix.
  • the second feature map and the third feature map of each channel are added to obtain the first feature map of the channel.
  • the first feature map matrix includes a third feature map matrix
  • the third feature map matrix includes a third feature map of c channels
  • the feature map group composed of the first feature map is fused within the group to obtain the first entropy estimation feature corresponding to the feature map group, including: performing intra-group fusion on the feature map group composed of the third feature map of k channels to obtain The first entropy estimation feature corresponding to the feature map group composed of the third feature map of k channels; according to the first entropy estimation feature, determining the probability distribution corresponding to the first feature map matrix, including: according to the third feature of k channels
  • the first entropy estimation feature corresponding to the feature map group composed of graphs determines the probability distribution corresponding to the third feature map matrix; encodes the first feature map matrix according to the probability distribution to obtain a code stream, including: corresponding to the third feature map matrix
  • the probability distribution of encode the third feature map matrix to obtain the code stream.
  • the method further includes: performing feature extraction on the second feature map matrix included in the first feature map matrix to obtain a fourth feature map matrix; according to the fourth feature The map matrix is used to determine the second entropy estimation feature; according to the second entropy estimation feature, the probability distribution corresponding to the second feature map matrix is determined.
  • the method further includes: encoding the fourth feature map matrix to obtain a code stream. In this way, it is convenient for the decoding end to decode the second feature map matrix from the code stream.
  • the embodiment of the present application provides a decoding method, the method includes: obtaining a code stream, decoding the eigenvalues corresponding to the feature points of c channels from the code stream, and obtaining the first feature map matrix, where c is positive Integer; then, perform image reconstruction based on the first feature map matrix, and output the reconstructed image.
  • the decoded information group corresponding to the first feature point to be decoded, where the decoded information group includes the decoded information of the channel corresponding to the first feature point to be decoded and The decoded information of the other k-1 channels, k is a positive integer less than c; the decoded information group is fused within the group to obtain the first entropy estimation feature corresponding to the first feature point to be decoded.
  • the probability distribution corresponding to the first feature point to be decoded is determined, and then the first feature point to be decoded is decoded according to the probability distribution to obtain the corresponding feature value, wherein , the first feature point to be decoded is any feature point to be decoded.
  • the entropy estimation feature corresponding to the feature point to be decoded is determined. In terms of characteristics, it can reduce the introduction of invalid information, thereby reducing the decoding computing power, thereby improving decoding efficiency.
  • the decoded information includes feature values corresponding to the decoded feature points.
  • the intra-group fusion of the decoded information group is carried out to obtain the first entropy estimation feature corresponding to the first feature point to be decoded, including: using the autoregressive weight matrix corresponding to the decoded information group to perform the decoding on the decoded information group
  • the local spatial information is extracted to obtain the first entropy estimation feature corresponding to the first feature point to be decoded.
  • the feature points include feature points at the first preset position and feature points at the second preset position; the first feature point to be decoded is located at the first preset position The feature point at the set position; the method includes: decoding the fourth feature map matrix from the code stream, the fourth feature map matrix includes the feature value corresponding to the feature point at the second preset position in the first feature map matrix The feature obtained by feature extraction; for the second feature point to be decoded at the second preset position: based on the fourth feature map matrix, determine the second entropy estimation feature corresponding to the second feature point to be decoded; estimate the feature according to the second entropy , determining a probability distribution corresponding to the second point to be decoded; decoding the second feature point to be decoded according to the probability distribution to obtain a corresponding feature value.
  • the computing power of determining the first entropy estimation feature is greater; therefore, only determining the first en
  • the decoded information group includes the feature value corresponding to the decoded feature point at the second preset position in the channel corresponding to the first feature point to be decoded, and other k - the feature value corresponding to the decoded feature point at the second preset position in one channel. In this way, parallel decoding can be performed on the first feature point to be decoded at the first preset position, thereby further improving decoding efficiency.
  • the decoded information group includes the feature value corresponding to the decoded feature point at the first preset position in the channel corresponding to the first feature point to be decoded, and other k - the feature value corresponding to the decoded feature point at the first preset position in one channel.
  • the second aspect and any implementation manner of the second aspect correspond to the first aspect and any implementation manner of the first aspect respectively.
  • technical effects corresponding to the second aspect and any implementation manner of the second aspect reference may be made to the technical effects corresponding to the above-mentioned first aspect and any implementation manner of the first aspect, and details are not repeated here.
  • the embodiment of the present application provides an encoder, configured to implement the encoding method in the first aspect or any possible implementation manner of the first aspect.
  • the third aspect and any implementation manner of the third aspect correspond to the first aspect and any implementation manner of the first aspect respectively.
  • the technical effects corresponding to the third aspect and any one of the implementation manners of the third aspect refer to the above-mentioned first aspect and the technical effects corresponding to any one of the implementation manners of the first aspect, which will not be repeated here.
  • the embodiment of the present application provides a decoder, configured to perform the decoding method in the second aspect or any possible implementation manner of the second aspect.
  • the fourth aspect and any implementation manner of the fourth aspect correspond to the second aspect and any implementation manner of the second aspect respectively.
  • the technical effects corresponding to the fourth aspect and any one of the implementation manners of the fourth aspect refer to the above-mentioned second aspect and the technical effects corresponding to any one of the implementation manners of the second aspect, and details are not repeated here.
  • an embodiment of the present application provides an electronic device, including: a memory and a processor, the memory is coupled to the processor; the memory stores program instructions, and when the program instructions are executed by the processor, the electronic device executes the first aspect or An encoding method in any possible implementation manner of the first aspect.
  • the fifth aspect and any implementation manner of the fifth aspect correspond to the first aspect and any implementation manner of the first aspect respectively.
  • the technical effects corresponding to the fifth aspect and any one of the implementation manners of the fifth aspect refer to the technical effects corresponding to the above-mentioned first aspect and any one of the implementation manners of the first aspect, and details are not repeated here.
  • an embodiment of the present application provides an electronic device, including: a memory and a processor, the memory is coupled to the processor; the memory stores program instructions, and when the program instructions are executed by the processor, the electronic device executes the second aspect or The decoding method in any possible implementation manner of the second aspect.
  • the sixth aspect and any implementation manner of the sixth aspect correspond to the second aspect and any implementation manner of the second aspect respectively.
  • the technical effects corresponding to the sixth aspect and any one of the implementation manners of the sixth aspect refer to the above-mentioned second aspect and the technical effects corresponding to any one of the implementation manners of the second aspect, and details are not repeated here.
  • the embodiment of the present application provides a chip, including one or more interface circuits and one or more processors; the interface circuit is used to receive signals from the memory of the electronic device and send signals to the processor, and the signals include memory Computer instructions stored in the computer; when the processor executes the computer instructions, the electronic device is made to execute the encoding method in the first aspect or any possible implementation manner of the first aspect.
  • the seventh aspect and any implementation manner of the seventh aspect correspond to the first aspect and any implementation manner of the first aspect respectively.
  • the technical effects corresponding to the seventh aspect and any one of the implementation manners of the seventh aspect refer to the above-mentioned first aspect and the technical effects corresponding to any one of the implementation manners of the first aspect, and details are not repeated here.
  • the embodiment of the present application provides a chip, including one or more interface circuits and one or more processors; the interface circuit is used to receive signals from the memory of the electronic device and send signals to the processor, and the signals include memory Computer instructions stored in the computer; when the processor executes the computer instructions, the electronic device is made to execute the second aspect or the decoding method in any possible implementation manner of the second aspect.
  • the eighth aspect and any implementation manner of the eighth aspect correspond to the second aspect and any implementation manner of the second aspect respectively.
  • the technical effects corresponding to the eighth aspect and any one of the implementation manners of the eighth aspect refer to the above-mentioned second aspect and the technical effects corresponding to any one of the implementation manners of the second aspect, and details are not repeated here.
  • the embodiment of the present application provides a computer storage medium, the computer readable storage medium stores a computer program, and when the computer program runs on the computer or the processor, the computer or the processor executes the first aspect or the first aspect An encoding method in any possible implementation of .
  • the ninth aspect and any implementation manner of the ninth aspect correspond to the first aspect and any implementation manner of the first aspect respectively.
  • the technical effects corresponding to the ninth aspect and any one of the implementation manners of the ninth aspect refer to the above-mentioned first aspect and the technical effects corresponding to any one of the implementation manners of the first aspect, and details are not repeated here.
  • the embodiment of the present application provides a computer storage medium, the computer readable storage medium stores a computer program, and when the computer program runs on the computer or the processor, the computer or the processor executes the second aspect or the second aspect
  • the decode method in any possible implementation of .
  • the tenth aspect and any implementation manner of the tenth aspect correspond to the second aspect and any implementation manner of the second aspect respectively.
  • the embodiment of the present application provides a computer program product, the computer program product includes a software program, and when the software program is executed by a computer or a processor, the method in the first aspect or any possible implementation manner of the first aspect steps are executed.
  • the eleventh aspect and any implementation manner of the eleventh aspect correspond to the first aspect and any implementation manner of the first aspect respectively.
  • the technical effects corresponding to the eleventh aspect and any one of the implementation manners of the eleventh aspect refer to the above-mentioned first aspect and the technical effects corresponding to any one of the implementation manners of the first aspect, and details are not repeated here.
  • the embodiment of the present application provides a computer program product.
  • the computer program product includes a software program.
  • the software program is executed by a computer or a processor, the method in the second aspect or any possible implementation of the second aspect step is executed.
  • the twelfth aspect and any implementation manner of the twelfth aspect correspond to the second aspect and any implementation manner of the second aspect respectively.
  • For the technical effects corresponding to the twelfth aspect and any one of the implementation manners of the twelfth aspect refer to the above-mentioned second aspect and the technical effects corresponding to any one of the implementation manners of the second aspect, and details are not repeated here.
  • Fig. 1 is a schematic diagram showing an exemplary system framework structure
  • FIG. 2 is a schematic diagram of an exemplary encoding process
  • Fig. 3a is a schematic diagram of an exemplary feature map group
  • Fig. 3b is a schematic diagram of an exemplary feature map group
  • Fig. 3c is a schematic diagram of an exemplary fusion process within a group
  • Figure 3d is a schematic diagram of an exemplary fusion process within a group
  • FIG. 4 is a schematic diagram of an exemplary decoding process
  • Fig. 5a is a schematic diagram of an exemplary decoding process
  • Fig. 5b is a schematic diagram of an exemplary decoded information group
  • Figure 5c is a schematic diagram of an exemplary fusion process within a group
  • Figure 5d is a schematic diagram of an exemplary fusion process within a group
  • Fig. 5e is a schematic diagram showing an exemplary compression effect
  • FIG. 6 is a schematic structural diagram of an exemplary encoding and decoding framework
  • FIG. 7 is a schematic diagram of an exemplary encoding process
  • FIG. 8 is a schematic diagram of a feature map division process schematically shown
  • FIG. 9 is a schematic diagram of an exemplary decoding process
  • FIG. 10 is a schematic diagram of an exemplary encoding process
  • Fig. 11a is a schematic diagram of an exemplary decoding process
  • Fig. 11b is a schematic diagram showing an exemplary compression effect
  • Fig. 12 is a schematic structural diagram of the device shown exemplarily.
  • first and second in the description and claims of the embodiments of the present application are used to distinguish different objects, rather than to describe a specific order of objects.
  • first target object, the second target object, etc. are used to distinguish different target objects, rather than describing a specific order of the target objects.
  • words such as “exemplary” or “for example” are used as examples, illustrations or illustrations. Any embodiment or design scheme described as “exemplary” or “for example” in the embodiments of the present application shall not be interpreted as being more preferred or more advantageous than other embodiments or design schemes. Rather, the use of words such as “exemplary” or “such as” is intended to present related concepts in a concrete manner.
  • multiple processing units refer to two or more processing units; multiple systems refer to two or more systems.
  • Fig. 1 is a schematic diagram showing an exemplary system framework structure. It should be understood that the system shown in FIG. 1 is only an example, and the system of the present application may have more or fewer components than those shown in the figure, may combine two or more components, or may have different component configuration.
  • the various components shown in FIG. 1 may be implemented in hardware, software, or a combination of hardware and software, including one or more signal processing and/or application specific integrated circuits.
  • the image compression process may be as follows: the image to be encoded is input to the AI encoding unit, and after being processed by the AI encoding unit, the feature value and probability distribution corresponding to the feature points to be encoded are output. Then, the feature value and probability distribution corresponding to the feature point to be encoded are input to the entropy encoding unit, and the entropy encoding unit performs entropy encoding on the feature value of the feature point to be encoded according to the probability distribution corresponding to the feature point to be encoded, and outputs a code stream.
  • the image decompression process can be as follows: after the entropy decoding unit obtains the code stream, it can perform entropy decoding on the feature points to be decoded according to the probability distribution predicted by the AI decoding unit for the feature points to be decoded in the code stream , output the feature value corresponding to the decoded feature point to the AI decoding unit.
  • the AI decoding unit performs image reconstruction based on the feature values corresponding to the decoded feature points, and outputs the reconstructed image.
  • entropy coding refers to the coding that does not lose any information according to the principle of entropy in the coding process
  • entropy coding can include many kinds, such as Shannon (Shannon) coding, Huffman (Huffman) coding and arithmetic coding (arithmetic coding) etc. , this application is not limited to this.
  • the image to be encoded that is input to the AI encoding unit may be a RAW (unprocessed) image, an RGB (RedGreenBlue, red, green, blue) image, and a YUV ("Y” indicates brightness (Luminance, Luma), "U” and “V” are any one of chromaticity and concentration (Chrominance, Chroma)) images, which is not limited in the present application.
  • the compression process and the decompression process may be performed in the same electronic device, or may be performed in different electronic devices, which is not limited in the present application.
  • the AI encoding unit and the AI decoding unit may be set in an NPU (Neural network Processing Unit, embedded neural network processor) or a GPU (Graphics Processing Unit, graphics processing unit).
  • An exemplary entropy encoding unit and an entropy decoding unit may be set in a CPU (Central Processing Unit, central processing unit).
  • the present application may be applied to compressing and decompressing an independent image, and may also be applied to compressing and decompressing multiple frames of images in a video sequence, which is not limited in the present application.
  • this application can be applied to a variety of scenarios, for example, scenarios where Huawei Cloud stores (or transmits) images (or videos), and for example, video surveillance scenarios, and for example, live broadcast scenarios, etc., and this application does not make any limit.
  • Fig. 2 is a schematic diagram of an encoding process exemplarily shown.
  • the encoding end may obtain the image to be encoded, and then may refer to S202 to S205 to encode the image to be encoded to obtain a corresponding code stream.
  • the image to be encoded may be subjected to space transformation, and the image to be encoded may be transformed into another space, so as to reduce temporal redundancy and spatial redundancy of the image to be encoded, to obtain the first feature map matrix.
  • the first feature map matrix includes first feature maps of c channels, where c is a positive integer.
  • each first feature map may include h*w feature points.
  • an autoregressive model may be used to determine the first entropy estimation feature corresponding to the first feature map matrix.
  • the total number of input channels of the autoregressive model is c, and the total number of output channels is M; wherein, M is a positive integer, and M can be greater than c, or less than c, or equal to c, which can be set according to requirements.
  • M is a positive integer
  • M can be greater than c, or less than c, or equal to c, which can be set according to requirements.
  • one input channel of the autoregressive model may correspond to at least one output channel
  • one output channel of the autoregressive model may correspond to at least one input channel.
  • the first feature maps of the c channels of the first feature map matrix can be used as c-channel inputs and input to autoregressive model.
  • the autoregressive model can use the first feature maps of k channels in the first feature map matrix to form a feature map group; in this way, N feature map groups can be obtained, and N is an integer greater than 1, which can be specifically determined according to k and c are determined.
  • the number k of channels included in each feature map group may be the same or different, which is not limited in the present application.
  • Fig. 3a is a schematic diagram of an exemplary feature map group. Wherein, a rectangle in Fig. 3a represents a first feature map. Among them, the number k of channels contained in each feature map group in Figure 3a is the same.
  • each feature map group includes 2 first feature maps.
  • Fig. 3b is a schematic diagram of an exemplary feature map group. Among them, a rectangle in Figure 3b represents a first feature map, and the number of channels k contained in each feature map group in Figure 3b is different.
  • the first feature map of c channels can be used, and the first feature map of 1 channel can be used to form feature map group 1; the first feature map of c channels can be used, and 2 channels
  • the first feature map of c channels can be used to form feature map group 2; the first feature maps of c channels can be used, and the first feature maps of 3 channels can be used to form feature map group 3; ...
  • FIG. 3a and FIG. 3b are only an example of the present application, and k can also be set to other values according to requirements, which is not limited in the present application.
  • different feature map groups can contain the first feature map of the same channel, for example, feature map group 1 includes the first feature map of channel 1 and the first feature map of channel 2, and feature map group 2 can include channel 2 The first feature map of , the first feature map of channel 3, and the first feature map of channel 4.
  • Different feature map groups may contain first feature maps of different channels, as shown in Fig. 3a and Fig. 3b; the present application does not limit this.
  • the fusion within a group may refer to fusing the feature maps of k channels in the feature map group, so that the first entropy estimation feature corresponding to the feature map group can be obtained.
  • the autoregressive model may perform intra-group fusion on the N feature map groups to obtain the first entropy estimation features respectively corresponding to the N feature map groups.
  • the determination of the first entropy estimation feature corresponding to the i-th feature map group will be described below as an example.
  • i is an integer between 1 and N, and the value of i can be between 1 and N.
  • each output channel corresponds to k weight maps whose size is ks1*ks2.
  • M i 1
  • the number of output channels corresponding to the i-th feature map group is 1, and the output channel corresponds to two weight maps of size ks1*ks2.
  • the autoregressive weight matrix corresponding to the i-th feature map group may be used to extract local spatial information from the i-th feature map group to obtain the first entropy estimation feature corresponding to the i-th feature map group.
  • the weight map of the j-th output channel corresponding to the i-th feature map group can be used to convolve with the first feature maps of k channels in the i-th feature map group to obtain k convolution results ; Fuse the k convolution results to obtain the first entropy estimation feature of the i-th feature map group corresponding to the j-th output channel. Merge the i-th feature map group corresponding to the first entropy estimation features of the M i output channels to obtain the first entropy estimation feature corresponding to the i-th feature map group.
  • j is a number between 1 and M i , including 1 and M i .
  • Fig. 3c is a schematic diagram of an intra-group fusion process exemplarily shown.
  • the number of output channels corresponding to feature map group 1 in FIG. 3c(1) is 1, and output channel 1 corresponds to two weight maps: weight map 11 and weight map 12 .
  • the weight map 11 can be used to convolve with the first feature map of input channel 1 to obtain the convolution result
  • the weight map 12 can be used to convolve with the first feature map of input channel 2 to obtain the convolution
  • the result of the plot is 12.
  • the convolution result 11 and the convolution result 12 are fused to obtain the first entropy estimation feature corresponding to the output channel 1 of the feature map group 1. In this way, the first entropy estimation feature corresponding to feature map group 1 can be obtained.
  • the number of output channels corresponding to feature map group 1 in Figure 3c(2) is 2, and output channel 1 corresponds to two weight maps: weight map 11 and weight map 12, and output channel 2 corresponds to 2 weight maps: weight map 21 and weight map 22.
  • the weight map 11 can be used to convolve with the first feature map of input channel 1 to obtain the convolution result 11
  • the weight map 12 can be used to convolve with the first feature map of input channel 2 to obtain the convolution
  • the result of the plot is 12.
  • the convolution result 11 and the convolution result 12 are fused to obtain the first entropy estimation feature corresponding to the output channel 1 of the feature map group 1.
  • the weight map 21 can be used to convolve with the first feature map of the input channel 1 to obtain the convolution result 13
  • the weight map 22 can be used to convolve with the first feature map of the input channel 2 to obtain the convolution result 14 .
  • the convolution result 13 and the convolution result 14 are fused to obtain the first entropy estimation feature corresponding to the output channel 2 of the feature map group 1.
  • output channel 1 corresponds to 2 weight maps: weight map 11 and weight map 12
  • output channel 2 corresponds to 2 weight maps
  • output channel 3 corresponds to 2 weight maps: weight map 31 and weight map 32.
  • the weight map 11 can be used to convolve with the first feature map of input channel 1 to obtain the convolution result 11
  • the weight map 12 can be used to convolve with the first feature map of input channel 2 to obtain the convolution
  • the result of the plot is 12.
  • the convolution result 11 and the convolution result 12 are fused to obtain the first entropy estimation feature corresponding to the output channel 1 of the feature map group 1.
  • the weight map 21 can be used to perform convolution with the first feature map of input channel 1 to obtain the convolution result 13
  • the weight map 22 can be used to perform convolution with the first feature map of input channel 2 to obtain the convolution result 14 .
  • the convolution result 13 and the convolution result 14 are fused to obtain the first entropy estimation feature corresponding to the output channel 2 of the feature map group 1.
  • the weight map 31 can be used to perform convolution with the first feature map of input channel 1 to obtain the convolution result 15
  • the weight map 32 can be used to perform convolution with the first feature map of input channel 2 to obtain the convolution result 16 .
  • the convolution result 15 and the convolution result 16 are fused to obtain the first entropy estimation feature corresponding to the output channel 3 of the feature map group 1 .
  • merge feature map group 1 corresponding to the first entropy estimated feature in output channel 1, feature map group 1 corresponding to the first entropy estimated feature in output channel 2, and feature map group 1 corresponding to the first entropy estimated feature in output channel 3 the first entropy estimation feature corresponding to feature map group 1 can be obtained.
  • this application does not limit which weight map of the output channel is used to convolve with the first feature map of which channel in the feature map group to obtain the feature map group 1 corresponding to the output channel.
  • the first entropy estimate feature For example, for feature map group 1, the weight map 12 can also be used to convolve with the first feature map of input channel 1 to obtain the convolution result 11, and the weight map 11 can be used to perform convolution with the first feature map of input channel 2 , to obtain the convolution result 12; then fuse the convolution result 11 and the convolution result 12 to obtain the first entropy estimation feature corresponding to the output channel 1 of the feature map group 1.
  • the weight map 22 can be used to convolve with the first feature map of the input channel 1 to obtain the convolution result 13 and the weight map 21 can be used to convolve with the first feature map of the input channel 2 to obtain the convolution result 14; then The convolution result 13 and the convolution result 14 are fused to obtain the first entropy estimation feature corresponding to the output channel 2 of the feature map group 1.
  • the weight map 32 can be used to convolve with the first feature map of the input channel 1 to obtain the convolution result 15, and the weight map 31 can be used to convolve with the first feature map of the input channel 2 to obtain the convolution result 16; then The convolution result 15 and the convolution result 16 are fused to obtain the first entropy estimation feature corresponding to the output channel 3 of the feature map group 1.
  • the weight map of the jth output channel corresponding to the i-th feature map group can be used, respectively for the first feature map of the k channels in the i-th feature map group, centered on (g1, g2)
  • the ks1*ks2 area of the ks1*ks2 area is used to extract spatial information, and the g-th group of feature points in the i-th feature map group (composed of feature points located at (g1, g2) in the first feature map of the k channels, where g1 is 1 An integer between ⁇ h (including 1 and h), g2 is an integer between 1 and w (including 1 and w), g is an integer between 1 and h*w (including 1 and h*w)) corresponding The first entropy estimate feature at the jth output channel.
  • (g1, g2) is the integer index of the position coordinates of the first feature map
  • g1, g2 represent the coordinate indexes of the horizontal and vertical directions respectively
  • the position
  • the weight map of the j-th output channel corresponding to the i-th feature map group can be used, respectively, with the first feature map of the k channels in the i-th feature map group, centered on (g1, g2)
  • the eigenvalues corresponding to each feature point in the ks1*ks2 area of ks1*ks2 are convoluted to obtain k convolution results; the k convolution results are fused to obtain the g-th group of feature points in the i-th feature map group corresponding to the j-th
  • the first entropy estimated features of the output channels can be used, respectively, with the first feature map of the k channels in the i-th feature map group, centered on (g1, g2)
  • the eigenvalues corresponding to each feature point in the ks1*ks2 area of ks1*ks2 are convoluted to obtain k convolution results; the k convolution results are fused to obtain the g-th group of feature
  • Fig. 3d is a schematic diagram of an intra-group fusion process exemplarily shown.
  • channel 1 and channel 2 in feature map group 1 correspond to input channel 1 and input channel 2 of the autoregressive model, respectively, and the number of output channels corresponding to feature map group 1 is 2.
  • the thirteenth group of feature points includes feature point A1 in the first feature map of input channel 1 and feature point A2 in the first feature map of input channel 2 .
  • a 3*3 weight map corresponding to output channel 1 can be used, and the feature points in the 3*3 area centered on feature point A1 in the first feature map of input channel 1 (such as In Figure 3d, the feature value corresponding to the gray square) is convolved to obtain the convolution result 21; and another 3*3 weight map corresponding to the output channel 1 is used, and the feature point A2 is used in the first feature map of channel 2 as
  • the eigenvalues corresponding to the feature points in the central 3*3 area are convolved to obtain the convolution result 22; then the convolution result 21 and the convolution result 22 are fused to obtain the 13th group Feature points correspond to the first entropy estimated features in output channel 1.
  • a 3*3 weight map corresponding to the output channel 2 may be used, and the feature points in the 3*3 area centered on the feature point A1 in the first feature map of the input channel 1 ( Convolve the feature value corresponding to the gray square in Figure 3d to obtain the convolution result 23; and use another 3*3 weight map corresponding to the output channel 2, and use the feature points in the first feature map of the input channel 2 Convolve the feature values corresponding to the feature points in the 3*3 area centered on A2 (the gray square in Figure 3d) to obtain the convolution result 24; then fuse the convolution result 23 and the convolution result 24 to obtain the first
  • the 13 groups of feature points correspond to the first entropy estimated features in output channel 2.
  • the N feature map groups are respectively fused within the group to obtain the first entropy estimation features respectively corresponding to the N feature map groups, that is, to obtain N groups of first entropy estimation features.
  • the decoder predicts the first entropy estimation feature of the feature point to be decoded according to the feature value corresponding to the decoded feature point.
  • the feature points in the area of ks1*ks2 centered on the corresponding position of the feature point to be decoded in the first feature map include decoded feature points and undecoded feature points, and undecoded feature points cannot participate in the calculation.
  • the weight value of the position corresponding to the position of the undecoded feature point is 0.
  • the decoding end decodes the feature points of each channel according to a preset decoding sequence, so it can determine which positions in the weight map have a weight value of 0.
  • the preset decoding sequence can be set according to requirements, which is not limited in this application.
  • N groups of first entropy estimation features may be combined to obtain the first entropy estimation features corresponding to the first feature map matrix.
  • the first entropy estimation feature R c2*h*w , c2 M.
  • the probability estimation may be performed according to the first entropy estimation feature corresponding to the first feature map matrix to obtain the probability distribution corresponding to the first feature map matrix.
  • the probability distribution R c*h*w*P that is to say, the number of channels of the probability distribution is the same as the number of channels of the first feature map matrix, and each feature point in the first feature map matrix corresponds to P parameters (such as mean , variance, etc.), P is an integer greater than 0, which is not limited by the present application.
  • feature extraction may be performed on the first feature map matrix to obtain a fifth feature map matrix; then, according to the fifth feature map matrix, the second entropy estimation feature corresponding to the first feature map matrix is determined.
  • probability estimation may be performed by combining the first entropy estimation feature corresponding to the first feature map matrix and the second entropy estimation feature corresponding to the first feature map matrix, to obtain a probability distribution corresponding to the first feature map matrix.
  • the first entropy estimation feature and the second entropy estimation feature may be aggregated (such as concatenated), and probability estimation may be performed according to the aggregation result to obtain a probability distribution corresponding to the first feature map matrix.
  • the first feature map matrix may be coded according to the probability distribution corresponding to the first feature map to obtain a code stream corresponding to the image to be coded.
  • the code stream corresponding to the image to be encoded may be stored, or the code stream corresponding to the image to be encoded may be transmitted to the decoding end.
  • the fifth feature map matrix when the fifth feature map matrix is determined, the fifth feature map matrix may be encoded to obtain a code stream corresponding to the fifth feature map matrix. Then the code stream corresponding to the fifth feature map matrix can be stored, and the code stream corresponding to the fifth feature map matrix can also be transmitted to the decoding end.
  • the amount of invalid information can be reduced. Introduced, thereby reducing the coding computing power, thereby improving the coding efficiency. In addition, the quality of the reconstructed image can also be improved.
  • Fig. 4 is a schematic diagram of a decoding process exemplarily shown.
  • the decoding end can obtain the code stream, and then can decode the code stream, and refer to the following S402-S403.
  • S402. Decode eigenvalues corresponding to feature points of c channels from the code stream to obtain a first feature map matrix.
  • the code stream may include the encoding information corresponding to each feature point in the first feature map of c channels, and the encoding information corresponding to each feature point may be decoded to obtain the feature value corresponding to each feature point; all feature points The corresponding eigenvalues can form a first feature map matrix.
  • the decoding end may perform parallel decoding or serial decoding on the feature points of different channels.
  • the feature points of the same channel at the decoding end may be serially decoded or parallelly decoded, which is not limited in the present application.
  • Fig. 5a is a schematic diagram of an exemplary decoding process.
  • Fig. 5a shows a preset decoding sequence for feature points in a channel.
  • Fig. 5a is an example of a first feature map of one channel in the first feature map matrix, and the size of the first feature map is 10*10, wherein each square represents a feature point.
  • the decoder can decode each feature point in the first feature map sequentially according to the order shown in FIG. After all the feature points in one row are decoded, each feature point in the second row can be decoded from left to right, and so on, until the decoding of all the feature points in the first feature map matrix is completed.
  • this application can also use other preset decoding sequences other than the preset decoding sequence shown in Figure 5a to perform serial decoding on each feature point, which can be set according to requirements, and this application does not limit this .
  • all the feature points in the first feature map matrix may be sequentially determined as the first feature points to be decoded according to the preset decoding order, and then decoding may be performed on the first feature points to be decoded referring to S4021-S4024.
  • the decoded information group includes the decoded information of the channel corresponding to the first feature point to be decoded and the decoded information of other k-1 channels, where k is less than A positive integer of c.
  • the decoded information may include feature values corresponding to the decoded feature points.
  • the decoder may use the feature value corresponding to the feature point as an input to the autoregressive model.
  • the feature values of all decoded feature points of the same channel are input from the same channel of the autoregressive model.
  • the corresponding probability distribution may be determined according to the feature value corresponding to the decoded feature point.
  • the autoregressive model can use k (k is a positive integer less than c) channels of decoded information (that is, the feature values corresponding to the decoded feature points) to form a decoded information group; in this way, N Decoded information group, N is an integer greater than 1, which can be determined according to k and c.
  • the number k of channels included in each feature map group may be the same or different, which is not limited in the present application.
  • Fig. 5b is a schematic diagram of an exemplary decoded information group. Wherein, it is assumed that the decoding end performs parallel decoding on c channels, and performs serial decoding on the feature points of each channel according to the preset decoding sequence in FIG. 5a.
  • a large square in Figure 5b represents a first feature map, a small square in the large square represents a feature point, a gray rectangle represents a decoded feature point, and a white rectangle represents an undecoded feature point.
  • each decoded information group includes 2 channels of decoded information.
  • the number k of channels included in each decoded information group may also be different, which is not limited in this application.
  • the way that the decoding end uses the decoded information of c channels to form N decoded information groups is the same as the way that the encoding end uses the first feature maps of c channels to form N feature map groups.
  • the channel where the first feature point to be decoded can be determined, and then from the N decoded information groups, determine the decoded information group to which the channel where the first feature point to be decoded belongs .
  • the decoded information group to which the channel where the first feature point to be decoded belongs can be referred to as the i-th decoded information group.
  • the channels included in the i-th decoded information group are: the channel corresponding to the first feature point to be decoded and other k-1 channels; the i-th decoded information group includes the channel corresponding to the first feature point to be decoded The decoded information and the decoded information of other k-1 channels.
  • the autoregressive weight matrix corresponding to the i-th decoded information group may be used to extract local spatial information from the i-th decoded information group to obtain the first entropy estimation feature corresponding to the first feature point to be decoded.
  • the weight map of the j-th output channel corresponding to the i-th decoded information group can be used to convolve with the decoded information of k channels in the i-th decoded information group to obtain k convolutions Result; the k convolution results are fused to obtain the first entropy estimation feature of the i-th decoded information group corresponding to the j-th output channel. Combining the first entropy estimation features corresponding to the M i output channels of the i-th decoded information group to obtain the first entropy estimation features corresponding to the i-th decoded information group.
  • j is a number between 1 and M i , including 1 and M i .
  • the first entropy estimation feature corresponding to the i-th decoded information group is the first entropy estimation feature corresponding to the first feature point to be decoded.
  • Fig. 5c is a schematic diagram of an intra-group fusion process exemplarily shown.
  • the number of output channels corresponding to decoded information group 1 in Figure 5c is 2, output channel 1 corresponds to 2 weight maps: weight map 11 and weight map 12, and output channel 2 corresponds to 2 weight maps: Weight Map 21 and Weight Map 22.
  • the weight map 11 can be used to convolve the decoded information of the input channel 1 to obtain the convolution result 11
  • the weight map 12 can be used to convolve the decoded information of the input channel 2 to obtain the convolution Result 12.
  • the convolution result 11 and the convolution result 12 are fused to obtain the first entropy estimation feature corresponding to the output channel 1 of the ith decoded information group.
  • the weight map 21 can be used to convolve with the decoded information of the input channel 1 to obtain the convolution result 13
  • the weight map 22 can be used to perform convolution with the decoded information of the input channel 2 to obtain the convolution result 14 . Then, the convolution result 13 and the convolution result 14 are fused to obtain the first entropy estimation feature corresponding to the output channel 2 of the ith decoded information group.
  • this application does not limit which weight map of the output channel is used to convolve with the decoded information of which channel in the decoded information group to obtain the first entropy estimation feature of the output channel .
  • the weight map 12 it is also possible to use the weight map 12 to perform convolution with the decoded information of the input channel 1 to obtain the convolution result 11, and to use the weight map 11 to perform convolution with the decoded information of the input channel 2, Obtain the convolution result 12; then fuse the convolution result 11 and the convolution result 12 to obtain the first entropy estimation feature corresponding to the output channel 1 of the ith decoded information group.
  • weight map 22 can be used to convolve with the decoded information of the input channel 1 to obtain the convolution result 13 and the weight map 21 can be used to convolve with the decoded information of the input channel 2 to obtain the convolution result 14; then the convolution is fused
  • the product result 13 and the convolution result 14 are used to obtain the first entropy estimation feature corresponding to the output channel 2 of the ith decoded information group.
  • the weight map of the j-th output channel corresponding to the i-th decoded information group can be used, and for the k channels of the i-th decoded information group, (g1, g2) (the first to-be-decoded The location corresponding to the feature point) is the center of the decoded information in the ks1*ks2 area for spatial information extraction, and the g-th group of feature points in the i-th decoded information group (k pieces contained in the i-th decoded information group It consists of undecoded feature points located at (g1,g2) in the channel, where g1 is an integer between 1 and h (including 1 and h), g2 is 1 to w (including 1 and w), and g is 1 to h An integer between *w (including 1 and h*w)) corresponds to the first entropy estimation feature of the jth output channel.
  • the first entropy estimation feature corresponding to the j-th output channel of the first feature point to be decoded in the i-th decoded information group can be obtained.
  • the weight map of the j-th output channel corresponding to the i-th decoded information group can be used, and ks1*
  • the decoded information in the ks2 area is convoluted to obtain k convolution results; the k convolution results are fused to obtain the first feature point to be decoded in the i-th decoded information group corresponding to the j-th output channel
  • An entropy estimation feature can be used, and ks1*
  • the decoded information in the ks2 area is convoluted to obtain k convolution results; the k convolution results are fused to obtain the first feature point to be decoded in the i-th decoded information group corresponding to the j-th output channel An entropy estimation feature.
  • Fig. 5d is a schematic diagram of an intra-group fusion process exemplarily shown.
  • channel 1 and channel 2 in the decoded information group 1 correspond to input channel 1 and input channel 2 of the autoregressive model respectively, and the number of output channels corresponding to the decoded information group 1 is 2.
  • the gray squares in input channel 1 and input channel 2 are decoded feature points, and the white squares are undecoded feature points.
  • a 3*3 weight map corresponding to output channel 1 can be used, and the 3*3 area centered on feature point A1 of input channel 1 (such as Area1 in Fig. 5d) has been Decode the feature value corresponding to the feature point to perform convolution to obtain the convolution result 21; and use another 3*3 weight map corresponding to the output channel 1, and the 3*3 area centered on the feature point A2 of the input channel 2 ( Convolve the eigenvalues corresponding to the decoded feature points in Area 2) in Figure 5d to obtain the convolution result 22; then fuse the convolution result 21 and the convolution result 22 to obtain the corresponding The first entropy estimate feature in output channel 1.
  • a 3*3 weight map corresponding to output channel 2 can be used, and the 3*3 area centered on feature point A1 of input channel 1 (such as Area1 in FIG. 5d )
  • the feature value corresponding to the decoded feature point is convolved to obtain the convolution result 23; and another 3*3 weight map corresponding to the output channel 2 is used, and the 3*3 area centered on the feature point A2 of the input channel 2 (As shown in Area2 in Figure 5d) the feature value corresponding to the decoded feature point is convoluted to obtain the convolution result 24; then the convolution result 23 and the convolution result 24 are fused to obtain the first feature point to be decoded A1 Corresponds to the first entropy estimated feature in output channel 2.
  • the first entropy estimation feature corresponding to the first feature point to be decoded is determined.
  • the probability estimation may be performed according to the first entropy estimation feature corresponding to the first feature point to be encoded to obtain the probability distribution corresponding to the first feature point to be encoded.
  • the probability distribution corresponding to the first feature point to be encoded corresponds to a set of probability distribution parameters.
  • each group of probability distribution parameters may include at least one parameter, such as mean value, variance, etc., which is not limited in the present application.
  • the decoding end can extract the fifth feature map from the code stream matrix. Then, according to the fifth feature map matrix, the second entropy estimation feature corresponding to all the feature points in the code stream can be determined.
  • the second entropy estimation feature corresponding to the first feature point to be decoded can be determined from the second entropy estimation feature corresponding to all feature points in the code stream, and then the first entropy estimation feature corresponding to the first feature point to be decoded can be combined Perform probability estimation with the second entropy estimation feature to obtain a probability distribution corresponding to the first feature point to be decoded.
  • the first entropy estimation feature and the second entropy estimation feature corresponding to the first feature point to be decoded can be aggregated (such as spliced), and the probability estimation is performed according to the aggregation result to obtain the probability distribution corresponding to the first feature point to be decoded .
  • the first feature point to be decoded can be decoded according to the probability distribution corresponding to the first feature point to be decoded to obtain the feature value corresponding to the first feature point to be decoded. At this time, the first feature point to be decoded becomes become decoded feature points.
  • the decoding end may perform spatial inverse transformation on the first feature map matrix to perform image reconstruction to obtain a reconstructed image.
  • the entropy estimation feature corresponding to the feature point to be decoded is determined. In terms of characteristics, it can reduce the introduction of invalid information, thereby reducing the decoding computing power, thereby improving decoding efficiency.
  • Fig. 5e is a schematic diagram showing an exemplary compression effect.
  • the ordinate in Fig. 5e is PSNR (Peak Signal to Noise Ratio, peak signal-to-noise ratio), and the unit is dB (decibel), which can be used to characterize the quality of image reconstruction, the larger the PSNR, the greater the image reconstruction The higher the quality.
  • the abscissa is Bits per pixel (the number of bits used to store each pixel, the smaller the compression rate is), the unit is BPP (bit/pixel).
  • the dotted curve is the relationship curve between the image reconstruction quality of the present application and the size of the code stream, and the solid curve is the image reconstruction quality of the prior art and the size of the code stream. Comparing the two curves, it can be seen that when the size of the code stream is the same, the compression rate of the present application The higher the image reconstruction quality for the /decompression scheme.
  • the feature map matrix Y2 represents the first feature map matrix
  • the feature map matrix Z2 represents the fifth feature map matrix
  • the entropy estimation feature phi represents the first entropy estimation feature
  • the entropy estimation feature psi represents the second entropy estimation feature as an example.
  • FIG. 6 is a schematic structural diagram of an encoding and decoding framework exemplarily shown.
  • the encoding network, quantization unit D1, autoregressive unit, aggregation unit, super-encoding network, quantization unit D2, super-decoding network and probability estimation unit belong to the AI encoding unit in FIG. 1 .
  • the decoding network, autoregressive unit, aggregation unit, super decoding network and probability estimation unit belong to the AI decoding unit in FIG. 1 .
  • the entropy coding unit A1 and the entropy coding unit B1 belong to the entropy coding unit in FIG. 1 .
  • the entropy decoding unit A2 and the entropy decoding unit B2 belong to the entropy decoding unit in FIG. 1 .
  • the AI coding unit and the AI decoding unit can be jointly trained, so that each network and unit in the AI coding unit and the AI decoding unit can learn corresponding parameters.
  • the autoregressive unit, aggregation unit, super decoding network and probability estimation unit in the AI coding unit, and the autoregressive unit, aggregation unit, super decoding network and probability estimation unit in the AI decoding unit can be shared.
  • the encoding network can be used to perform spatial transformation on the image to be encoded, and transform the image to be encoded to another space.
  • the encoding network may be a convolutional neural network.
  • a super-encoded network can be used to extract features.
  • the hypercoding network may be a convolutional neural network.
  • the quantization unit (including the quantization unit D1 and the quantization unit D2 ) may be used to perform quantization processing.
  • the aggregation unit can be used to perform probability estimation based on entropy estimation features, and output probability distribution.
  • the aggregation unit may be a convolutional neural network.
  • the probability estimation unit can be used for probability estimation and output probability distribution.
  • the probability estimation unit C2 may be a discrete probability estimation unit.
  • the entropy encoding unit A1 may be configured to perform encoding according to the probability distribution determined by the aggregation unit, so as to reduce the statistical redundancy of the output features.
  • the entropy encoding unit B1 may be configured to perform encoding according to the probability distribution determined by the probability estimation unit, so as to reduce the statistical redundancy of the output features.
  • the entropy decoding unit A2 may be configured to perform decoding according to the probability distribution determined by the aggregation unit.
  • the entropy decoding unit B2 may be configured to perform decoding according to the probability distribution determined by the probability estimation unit.
  • the decoding network can be used to perform inverse spatial transformation on information obtained by entropy decoding, and output a reconstructed image.
  • the decoding network may be a convolutional neural network.
  • a superdecoding network may be used to determine features associated with entropy estimation.
  • the super-decoding network can be a convolutional neural network.
  • the autoregressive unit may include an autoregressive model, configured to determine entropy estimation features according to the autoregressive weight matrix.
  • the image to be encoded is input to the encoding network, the image to be encoded is transformed into another space through the encoding network, and the feature map matrix Y1 is output.
  • the feature map matrix Y1 is input to the quantization unit D1, the feature map matrix Y1 is quantized through the quantization unit D1, and the feature map matrix Y2 is output.
  • the quantization unit D1 may perform quantization processing on the feature value corresponding to each feature point in the feature map matrix Y1 according to a preset quantization step size to obtain a feature map matrix Y2 ⁇ R c*h*w .
  • the feature map matrix Y2 is input to the super-encoding network, and feature extraction is performed on the feature map matrix Y2 through the super-encoding network to obtain the feature map matrix Z1, and then the feature map matrix Z1 is input to the quantization unit D2. After quantization processing is performed on the feature map matrix Z1 by the quantization unit D2, the feature map matrix Z2 is output. Then, on the one hand, the feature map matrix Z2 is input to the probability estimation unit, processed by the probability estimation unit, and the probability distribution PB1 of each feature point in the feature map matrix Z2 is output to the entropy coding unit B1. On the other hand, the feature map matrix Z2 is input to the entropy encoding unit B1.
  • the entropy encoding unit B1 encodes the feature map matrix Z2 according to the probability distribution PB1, and outputs the code stream SB to the entropy decoding unit B2.
  • the probability estimation unit can predict the probability distribution PB2 of the feature points to be decoded in the code stream SB, and input the probability distribution PB2 to the entropy decoding unit B2.
  • the entropy decoding unit B2 can decode the feature points to be decoded in the code stream SB according to the probability distribution PB2, and output the feature map matrix Z2 to the super decoding network.
  • the super-decoding network obtains the feature map matrix Z2, it can convert the feature map matrix Z2 into the entropy estimation feature psi, and input the entropy estimation feature psi to the aggregation unit.
  • the feature map matrix Y2 may be input to the autoregressive unit, the feature map matrix Y2 is processed through the autoregressive unit, and the entropy estimation feature phi is output to the aggregation unit. This process can refer to the description above, and will not be repeated here.
  • the aggregation unit can perform probability estimation based on the entropy estimation feature phi and the entropy estimation feature psi, predict the probability distribution PA1 corresponding to each feature point in the feature map matrix Y2, and input the probability distribution PA1 to the entropy encoding unit A1.
  • the entropy encoding unit A1 can encode each feature point in the feature map matrix Y2 according to the probability distribution PA1, and output a code stream SA. So far, the encoding of the image to be encoded is completed.
  • both the code stream SA obtained by encoding the feature map matrix Y2 and the code stream SB obtained by encoding the feature map matrix Z2 can be sent to the entropy decoding unit A2. Or, during decoding, the code stream SA and the code stream SB are obtained by the entropy decoding unit A2.
  • the decoding process can be as follows:
  • the entropy decoding unit A2 first decodes the feature map matrix Z2 from the code stream SB, and sends the feature map matrix Z2 to the super decoding network. Then, the super-decoding network converts the feature map matrix Z2 into an entropy estimation feature psi, which is output to the aggregation unit.
  • the code stream SA contains the encoding information of each feature point in the feature map matrix Y2
  • the entropy decoding unit A2 decodes the encoding information of each feature point in the code stream SA, and can obtain the feature value corresponding to each feature point,
  • the feature map matrix Y2 is obtained.
  • the entropy decoding unit A2 can input the feature value corresponding to the decoded feature point to the autoregressive unit, and determine the entropy estimation feature corresponding to the first feature point to be decoded via the autoregressive unit phi, which can refer to the description above, and will not be repeated here; then the entropy estimation feature phi is output to the aggregation unit. Then, the aggregation unit performs probability estimation based on the entropy estimation feature phi and the entropy estimation feature psi, predicts the probability distribution PA2 corresponding to the first feature point to be decoded, and inputs the probability distribution PA2 to the entropy decoding unit A2.
  • the entropy encoding unit A2 may decode the first feature point to be decoded according to the probability distribution PA2 corresponding to the first feature point to be decoded, to obtain the corresponding feature value.
  • the entropy decoding unit A2 can decode the code stream SA, output the feature map matrix Y2 to the decoding network, and perform spatial inverse transformation on the feature map matrix Y2 through the decoding network to obtain a reconstructed image.
  • the entropy decoding unit A2 may decode the feature points of different channels in parallel or serially.
  • the entropy decoding unit A2 can decode the feature points of the same channel serially or in parallel, which is not limited in this application.
  • the feature map matrix Y1 can also be input to the super-encoding network, and the feature map matrix Z2 can be obtained through the super-encoding network and the quantization unit D2, which is not limited in this application.
  • network and unit in the dotted line box on the right side of Fig. 6 may also be other networks and other units, which can be set according to requirements, and this application does not limit it.
  • the AI encoding unit, AI decoding unit, entropy encoding unit and entropy decoding unit of the present application may also include other networks and units for generating other entropy estimation features, and then input other entropy estimation features into the aggregation unit, the aggregation unit performs probability estimation according to the entropy estimation feature phi, the entropy estimation feature psi and other entropy estimation features to generate a probability distribution, which is not limited in the present application.
  • the AI encoding unit, AI decoding unit, entropy encoding unit, and entropy decoding unit of this application may not include the network and units in the dotted line box on the right side of Figure 6, which can be set according to requirements.
  • the embodiment of this application There is no limit to this.
  • the AI encoding unit, AI decoding unit, entropy encoding unit, and entropy decoding unit do not include the network and units in the dotted box on the right side of Figure 6, there is no need to generate the entropy estimation feature psi during the encoding and decoding process, and the aggregation unit only Probability estimation can be performed according to the entropy estimation feature phi.
  • Fig. 7 is a schematic diagram of an encoding process exemplarily shown.
  • the first feature map matrix may include a second feature map matrix and a third feature map matrix, wherein the second feature map matrix includes second feature maps of c channels, and the third feature map matrix includes c channels of The third feature map.
  • the second feature map and the third feature map of each channel are added to obtain the first feature map of the channel.
  • the first feature map matrix may be space-divided to obtain the second feature map matrix and the third feature map matrix. It should be understood that other manners may also be used to determine the second feature map matrix and the third feature map matrix, which is not limited in the present application. Wherein, the present application takes the space division of the first feature map matrix to obtain the second feature map matrix and the third feature map matrix as an example for illustration.
  • the second feature map matrix includes second feature maps of c channels
  • the third feature map matrix includes c channels.
  • the first feature map matrix may be space-divided to obtain the second feature map matrix and the third feature map matrix.
  • performing spatial division on the first feature map matrix may refer to dividing the first feature map of each channel into a second feature map and a third feature map according to a preset division rule. In this way, a second feature map matrix including the second feature map of c channels, and a third feature map matrix including the third feature map of c channels can be obtained.
  • the preset division rule can be set according to requirements, for example, the feature points located in the second preset position in the first feature map are divided into feature points in the second feature map, and the feature points in the first preset position in the first feature map are divided into feature points in the first feature map. It is assumed that the feature points of the position are divided into feature points of the third feature map, which is not limited in the present application.
  • the feature value corresponding to the feature point at the second preset position in the second feature map is the feature value corresponding to the feature point at the second preset position in the first feature map;
  • a feature value corresponding to a feature point at a preset position is 0.
  • the feature value corresponding to the feature point at the first preset position is the feature value corresponding to the feature point at the first preset position in the first feature map; in the third feature map, the feature value at the second preset position Let the feature value corresponding to the feature point of the position be 0.
  • the first preset position and the second preset position can be set according to requirements, for example, assuming that the position of a feature point in the first feature map matrix is (wi, hi), then the second preset position can be: wi The position where +hi is equal to an odd number, the first preset position: the position where wi+hi is equal to an even number, which is not limited in this application.
  • the second preset position can be: the position where wi is an odd number
  • the first preset position the position where wi is an even number
  • the second preset position can be: the position where hi is an odd number
  • the first preset position can be: where hi is an even number location, which is not limited in this application.
  • FIG. 8 is a schematic diagram of a feature map division process schematically shown.
  • the size of the first feature map is 5*5. If the first preset position is: the position where wi+hi is an even number, and the second preset position: the position where wi+hi is an odd number, then the second feature point and the fourth feature point in the first row of the first feature map can be feature points, the 1st feature point, the 3rd feature point and the 5th feature point in the second line, the 2nd feature point and the 4th feature point in the third line, the 1st feature point in the fourth line point, the 3rd feature point and the 5th feature point, and the 2nd feature point and the 4th feature point in the fifth row are determined as the second feature map.
  • the feature points at the first preset position are shown as gray squares in FIG. 7 .
  • the first feature point, the third feature point and the fifth feature point in the first line of the first feature map, the second feature point and the fourth feature point in the second line, and the third feature point in the third line 1 feature point, the 3rd feature point and the 5th feature point, the 2nd feature point and the 4th feature point in the fourth row, the 1st feature point, the 3rd feature point and the 5th feature point in the fifth row feature points are determined as the third feature map.
  • the feature points at the second preset position are shown as gray squares in FIG. 7 .
  • feature extraction can be performed on the second feature map matrix to obtain a fourth feature map matrix; then, according to the fourth feature map matrix, the second entropy estimate corresponding to the second feature map matrix is determined feature; and then estimate the feature according to the second entropy to determine the probability distribution corresponding to the second feature map matrix.
  • the encoding end may also encode the fourth feature map matrix to obtain a code stream corresponding to the fourth feature map matrix.
  • the probability distribution corresponding to the third feature map matrix can be determined according to the above method of determining the probability distribution corresponding to the first feature map matrix; refer to S705:
  • an autoregressive model may be used to determine the first entropy estimation feature corresponding to the third feature map matrix.
  • the second feature maps of the c channels of the second feature map matrix may be input to the autoregressive model as c channels of input.
  • the autoregressive model can use the second feature maps of k channels in the second feature map matrix to form a feature map group; in this way, N feature map groups can be obtained, and N is an integer greater than 1, which can be specifically determined according to k and c are determined.
  • the number k of channels contained in each feature map group may be the same or different, and this application is not limited thereto.
  • the autoregressive model performs intra-group fusion on the feature map group composed of the second feature map of k channels, and obtains the first entropy estimation feature corresponding to the feature map group composed of the third feature map of k channels ; and then determine the first entropy estimation feature corresponding to the third feature map matrix.
  • the third feature map matrix does not need to be input to the autoregressive model, but the third feature map of k channels in the third feature map matrix can still be used to form a feature map group; in this way, N feature maps can be obtained group, N is an integer greater than 1, which can be determined according to k and c.
  • the number k of channels included in each feature map group may be the same or different, which is not limited in the present application.
  • the feature map group composed of the second feature map of k channels can be called feature map group A
  • the feature map group composed of the third feature map of k channels can be called feature map group b.
  • the N feature map groups A and N feature map groups B are in one-to-one correspondence, that is, each feature map group A and each feature map group B contain the same channels.
  • each output channel corresponds to k weight maps whose size is ks1*ks2.
  • M i 1
  • the number of output channels corresponding to the i-th feature map group A is 1, and the output channel corresponds to two weight maps of size ks1*ks2.
  • the autoregressive weight matrix corresponding to the i-th feature map group A can be used to extract the local spatial information of the i-th feature map group A to obtain the first entropy corresponding to the i-th feature map group B estimated features.
  • the weight map of the jth output channel corresponding to the i-th feature map group A can be used to convolve with the second feature map of k channels in the i-th feature map group A to obtain k volumes
  • the result of the convolution; the k convolution results are fused to obtain the first entropy estimation feature of the i-th feature map group B corresponding to the j-th output channel.
  • j is a number between 1 and M i , including 1 and M i . This can refer to the above description of "determining the i-th feature map group corresponds to the first entropy estimation feature of the j-th output channel", and will not be repeated here.
  • the manner of determining the first entropy estimation feature of each group of feature points in the i-th feature map group B in the j-th output channel can refer to the above description, and will not be repeated here.
  • the probability estimation may be performed according to the first entropy estimation feature corresponding to the third feature map matrix to obtain the probability distribution corresponding to the third feature map matrix.
  • the probability distribution R c*h*w*P that is to say, the number of channels of the probability distribution is the same as the number of channels of the third feature map matrix, and each feature point in the third feature map matrix corresponds to P parameters (such as mean , variance, etc.), P is an integer greater than 0, which is not limited by the present application.
  • feature extraction may be performed on the third feature map matrix to obtain a sixth feature map matrix; then, according to the sixth feature map matrix, the second entropy estimation feature corresponding to the third feature map matrix is determined.
  • probability estimation may be performed by combining the first entropy estimation feature corresponding to the third feature map matrix and the second entropy estimation feature corresponding to the third feature map matrix to obtain a probability distribution corresponding to the third feature map matrix.
  • the first entropy estimation feature and the second entropy estimation feature may be aggregated (such as concatenated), and probability estimation may be performed according to the aggregation result to obtain a probability distribution corresponding to the third feature map matrix.
  • the sixth feature map matrix when the sixth feature map matrix is determined, the sixth feature map matrix may be encoded to obtain a code stream corresponding to the sixth feature map matrix.
  • the code stream corresponding to the fourth feature map matrix, and the code stream corresponding to the sixth feature map matrix can be stored, the code stream corresponding to the image to be encoded, the fourth feature map
  • the code stream corresponding to the matrix and the code stream corresponding to the sixth feature map matrix, or the code stream corresponding to the image to be encoded, the code stream corresponding to the fourth feature map matrix, and the code stream corresponding to the sixth feature map matrix can also be sent to the decoding end .
  • Fig. 9 is a schematic diagram of a decoding process exemplarily shown.
  • the decoder can decode the fourth feature map matrix from the code stream; then, according to the fourth feature map matrix, decode the feature point to be decoded at the second preset position from the code stream The corresponding eigenvalues, and then obtain the second feature map matrix.
  • all the feature points at the second preset position may be sequentially determined as the second feature points to be decoded, and then decoded for the second feature points to be decoded.
  • the second feature point to be decoded based on the fourth feature map matrix, determine the second entropy estimation feature corresponding to the second feature point to be decoded; according to the second entropy estimation feature corresponding to the second feature point to be decoded, determine A probability distribution corresponding to the second point to be decoded; decoding the second feature point to be decoded according to the probability distribution corresponding to the second feature point to be decoded to obtain a corresponding feature value.
  • the eigenvalues corresponding to all the second to-be-decoded feature points at the second preset position can be obtained, and then the second feature map matrix can be obtained.
  • all the feature points at the first preset position may be sequentially determined as the first feature points to be decoded, and then decoded for the first feature points to be decoded; refer to the following: S9025 ⁇ S9028:
  • the decoded information group includes the feature value corresponding to the decoded feature point at the second preset position in the channel corresponding to the first feature point to be decoded, and other k- The feature value corresponding to the decoded feature point at the second preset position in one channel, where k is a positive integer smaller than c.
  • the first entropy estimation feature corresponding to the feature map group composed of the third feature map of k channels is obtained , the second feature map matrix can be input into the autoregressive model, and the autoregressive model determines the first entropy estimation feature corresponding to the first feature point to be decoded according to the second feature map matrix.
  • the autoregressive model can use k (k is a positive integer less than c) channels corresponding to the feature points at the second preset position (that is, the second feature map of c channels) to form a Decoded information groups; in this way, N decoded information groups can be obtained, and N is an integer greater than 1, which can be determined according to k and c.
  • the channel where the first feature point to be decoded can be determined, and then from the N decoded information groups, determine the decoded information group to which the channel where the first feature point to be decoded belongs .
  • the decoded information group to which the channel of the first feature point to be decoded belongs may be referred to as the i-th decoded information group.
  • the channels included in the i-th decoded information group are: the channel corresponding to the first feature point to be decoded and other k-1 channels; the i-th decoded information group includes the channel corresponding to the first feature point to be decoded
  • the second feature map (the feature value corresponding to the feature point at the second preset position), and the second feature map of the other k-1 channels (the feature value corresponding to the feature point at the second preset position).
  • the weight map of the j-th output channel corresponding to the i-th decoded information group can be used to convolve with the second feature maps of the k channels in the i-th decoded information group to obtain k convolutions
  • the result of the convolution; the k convolution results are fused to obtain the first entropy estimation feature corresponding to the jth output channel of the ith decoded information group.
  • the manner of determining the first entropy estimation feature of each group of feature points in the i-th decoded information group in the j-th output channel can refer to the above description, and will not be repeated here.
  • the probability estimation may be performed according to the first entropy estimation feature corresponding to the first feature point to be encoded to obtain the probability distribution corresponding to the first feature point to be encoded.
  • the probability distribution corresponding to the first feature point to be encoded corresponds to a set of probability distribution parameters.
  • each group of probability distribution parameters may include at least one parameter, such as mean value, variance, etc., which is not limited in the present application.
  • the decoding end can extract the sixth feature map from the code stream matrix. Then, according to the sixth feature map matrix, the second entropy estimation feature corresponding to all the feature points in the code stream can be determined.
  • the second entropy estimation feature corresponding to the first feature point to be decoded can be determined from the second entropy estimation feature corresponding to all feature points in the code stream, and then the first entropy estimation feature corresponding to the first feature point to be encoded can be combined Probability estimation is performed on the second entropy estimation feature corresponding to the first feature point to be encoded to obtain a probability distribution corresponding to the first feature point to be decoded.
  • the first entropy estimation feature and the second entropy estimation feature may be aggregated (such as concatenated), and probability estimation is performed according to the aggregation result to obtain a probability distribution corresponding to the first feature point to be decoded.
  • the first feature point to be decoded can be decoded according to the probability distribution corresponding to the first feature point to be decoded to obtain the corresponding feature value, that is, the feature value corresponding to the feature point in the third feature map matrix.
  • a third feature map matrix can be obtained.
  • the second feature map matrix and the third feature map matrix can be superimposed according to channels to obtain the first feature map matrix; and then based on the first feature map The image matrix is reconstructed to obtain the reconstructed image.
  • the computing power of determining the first entropy estimation feature is greater; therefore, only part of the features to be decoded is determined
  • the first entropy estimation feature corresponding to the point can further improve the decoding efficiency.
  • FIG. 10 is a schematic diagram of an exemplary encoding process.
  • the second feature map matrix includes second feature maps of c channels
  • the third feature map matrix includes c channels The third feature map.
  • the autoregressive weight matrix corresponding to the i-th feature map group B can be used to extract the local spatial information of the i-th feature map group to obtain the first entropy estimate corresponding to the i-th feature map group B feature.
  • the weight map of the j-th output channel corresponding to the i-th feature map group B can be used, which correspond to the decoded feature points at the first preset position in the k channels in the i-th feature map group Convolve the eigenvalues of the k convolution results to obtain k convolution results; fuse the k convolution results to obtain the first entropy estimation feature of the i-th feature map group B corresponding to the j-th output channel. Merge the first entropy estimation features corresponding to the i-th feature map group B corresponding to the M i output channels to obtain the first entropy estimation features corresponding to the i-th feature map group B.
  • j is a number between 1 and M i , including 1 and M i . This can refer to the above description of "determining the first entropy estimation feature of the jth output channel corresponding to the i-th feature map group", and will not be repeated here.
  • Fig. 11a is a schematic diagram of a decoding process exemplarily shown.
  • S1102. Decode eigenvalues corresponding to feature points of c channels from the code stream to obtain a first feature map matrix.
  • the corresponding feature value can be determined by referring to the following S11021-S11028:
  • the decoded information group includes the decoded information group includes the feature value corresponding to the decoded feature point at the first preset position in the channel corresponding to the first feature point to be decoded , and feature values corresponding to the decoded feature points at the first preset position in the other k-1 channels, where k is a positive integer smaller than c.
  • the first entropy estimation feature corresponding to the feature map group composed of the third feature map of k channels is obtained , then the feature value corresponding to the decoded feature point at the first preset position can be input into the autoregressive model, and the autoregressive model can determine the feature value corresponding to the decoded feature point at the first preset position.
  • a first entropy estimation feature corresponding to the feature point to be decoded is obtained.
  • the autoregressive model can use k (k is a positive integer less than c) channel feature values corresponding to the feature points at the first preset position to form a decoded information group; in this way, N already In the decoding information group, N is an integer greater than 1, which can be determined according to k and c.
  • the channel where the first feature point to be decoded can be determined, and then from the N decoded information groups, determine the decoded information group to which the channel where the first feature point to be decoded belongs .
  • the decoded information group to which the channel of the first feature point to be decoded belongs may be referred to as the i-th decoded information group.
  • the channels included in the i-th decoded information group are: the channel corresponding to the first feature point to be decoded and other k-1 channels; the i-th decoded information group includes the channel corresponding to the first feature point to be decoded The feature value corresponding to the feature point at the first preset position, and the feature value corresponding to the feature point at the first preset position in the other k-1 channels.
  • intra-group fusion is performed on the decoded information groups to obtain the first entropy estimation feature corresponding to the first feature point to be decoded.
  • first entropy estimation feature corresponding to the first feature point to be decoded.
  • the decoding computing power can be reduced, and the decoding efficiency can be improved; moreover, compared with determining the second entropy estimation, the computing power of determining the first entropy estimation feature is greater; therefore, only part of the feature points to be decoded can be determined
  • the corresponding first entropy estimation feature can further improve decoding efficiency.
  • Fig. 11b is a schematic diagram showing an exemplary compression effect.
  • the ordinate in Fig. 11b is PSNR (Peak Signal to Noise Ratio, peak signal-to-noise ratio), the unit is dB (decibel), and can be used to characterize the quality of image reconstruction, the larger the PSNR, the greater the image reconstruction The higher the quality.
  • the abscissa is Bits perpixel (the number of bits used to store each pixel, the smaller the compression rate is), the unit is BPP (bits per pixel).
  • the dotted curve is the relationship curve between the image reconstruction quality of the present application and the size of the code stream, and the solid curve is the image reconstruction quality of the prior art and the size of the code stream. Comparing the two curves, it can be seen that when the size of the code stream is the same, the compression rate of the present application /decompression scheme with higher image reconstruction quality.
  • FIG. 12 shows a schematic block diagram of an apparatus 1200 according to an embodiment of the present application.
  • the apparatus 1200 may include: a processor 1201 and a transceiver/transceiving pin 1202 , and optionally, a memory 1203 .
  • bus 1204 includes a power bus, a control bus, and a status signal bus in addition to a data bus.
  • bus 1204 includes a power bus, a control bus, and a status signal bus in addition to a data bus.
  • the various buses are referred to as bus 1204 in the figure.
  • the memory 1203 may be used for the instructions in the foregoing method embodiments.
  • the processor 1201 can be used to execute instructions in the memory 1203, and control the receiving pin to receive signals, and control the sending pin to send signals.
  • Apparatus 1200 may be the electronic device or the chip of the electronic device in the foregoing method embodiments.
  • This embodiment also provides a computer storage medium, in which computer instructions are stored, and when the computer instructions are run on the electronic device, the electronic device is made to execute the above related method steps to implement the codec method in the above embodiment.
  • This embodiment also provides a computer program product, which, when running on a computer, causes the computer to execute the above related steps, so as to implement the encoding and decoding method in the above embodiment.
  • the embodiments of the present application also provide a device, which may specifically be a chip, a component or a module, and the device may include a connected processor and a memory; wherein the memory is used to store computer-executable instructions, and when the device is running, The processor can execute the computer-executable instructions stored in the memory, so that the chip executes the encoding and decoding methods in the foregoing method embodiments.
  • the electronic device, computer storage medium, computer program product or chip provided in this embodiment is all used to execute the corresponding method provided above, therefore, the beneficial effects it can achieve can refer to the corresponding method provided above The beneficial effects in the method will not be repeated here.
  • the disclosed devices and methods may be implemented in other ways.
  • the device embodiments described above are only illustrative.
  • the division of modules or units is only a logical function division. In actual implementation, there may be other division methods.
  • multiple units or components can be combined or It may be integrated into another device, or some features may be omitted, or not implemented.
  • the mutual coupling or direct coupling or communication connection shown or discussed may be through some interfaces, and the indirect coupling or communication connection of devices or units may be in electrical, mechanical or other forms.
  • a unit described as a separate component may or may not be physically separated, and a component shown as a unit may be one physical unit or multiple physical units, which may be located in one place or distributed to multiple different places. Part or all of the units can be selected according to actual needs to achieve the purpose of the solution of this embodiment.
  • each functional unit in each embodiment of the present application may be integrated into one processing unit, each unit may exist separately physically, or two or more units may be integrated into one unit.
  • the above-mentioned integrated units can be implemented in the form of hardware or in the form of software functional units.
  • an integrated unit is realized in the form of a software function unit and sold or used as an independent product, it can be stored in a readable storage medium.
  • the technical solution of the embodiment of the present application is essentially or the part that contributes to the prior art, or all or part of the technical solution can be embodied in the form of a software product, and the software product is stored in a storage medium Among them, several instructions are included to make a device (which may be a single-chip microcomputer, a chip, etc.) or a processor (processor) execute all or part of the steps of the methods in various embodiments of the present application.
  • the aforementioned storage medium includes: various media that can store program codes such as U disk, mobile hard disk, read only memory (ROM), random access memory (random access memory, RAM), magnetic disk or optical disk.
  • the steps of the methods or algorithms described in connection with the disclosure of the embodiments of the present application may be implemented in the form of hardware, or may be implemented in the form of a processor executing software instructions.
  • the software instructions can be composed of corresponding software modules, and the software modules can be stored in random access memory (Random Access Memory, RAM), flash memory, read-only memory (Read Only Memory, ROM), erasable programmable read-only memory ( Erasable Programmable ROM, EPROM), Electrically Erasable Programmable Read-Only Memory (Electrically EPROM, EEPROM), registers, hard disk, removable hard disk, CD-ROM, or any other form of storage medium known in the art.
  • An exemplary storage medium is coupled to the processor such the processor can read information from, and write information to, the storage medium.
  • the storage medium may also be a component of the processor.
  • the processor and storage medium can be located in the ASIC.
  • the functions described in the embodiments of the present application may be implemented by hardware, software, firmware or any combination thereof.
  • the functions may be stored on or transmitted over as one or more instructions or code on a computer-readable medium.
  • Computer-readable media includes both computer storage media and communication media including any medium that facilitates transfer of a computer program from one place to another.
  • a storage media may be any available media that can be accessed by a general purpose or special purpose computer.

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Computing Systems (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Compression, Expansion, Code Conversion, And Decoders (AREA)
  • Compression Or Coding Systems Of Tv Signals (AREA)

Abstract

本申请实施例提供了一种编解码方法及电子设备。该编码方法包括:首先,获取待编码图像;然后基于待编码图像,生成第一特征图矩阵,其中,第一特征图矩阵包括c个通道的第一特征图;接着,对由k个通道的第一特征图组成的特征图组进行组内融合,得到特征图组对应的第一熵估计特征,k小于c;然后根据第一熵估计特征,确定第一特征图矩阵对应的概率分布,以及根据概率分布对第一特征图矩阵进行编码,得到码流。这样,通过采用部分通道的特征图组成的特征图组进行组内融合,来确定熵估计特征,相对于采用全通道的特征图进行融合确定熵估计特征而言,减少了无效信息的引入,进而减少编码算力,从而提高编码效率。

Description

编解码方法及电子设备
本申请要求于2021年11月24日提交中国国家知识产权局、申请号为202111407946.6、申请名称为“编解码方法及电子设备”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请实施例涉及数据处理领域,尤其涉及一种编解码方法及电子设备。
背景技术
AI(Artificial Intelligence,人工智能)图像压缩算法,是基于深度学习来实现的,相对于传统图像压缩技术(如JPEG(Joint Photographic Experts Group,联合图像专家组)、BPG(Better Portable Graphics,更好的可移植图形)等)而言,压缩效果更好。
目前,AI图像压缩算法是根据特征图矩阵所有通道的信息,来预测特征图矩阵中一个通道的熵估计特征。然而,在压缩中,为了保存更多的信息量,特征图矩阵各通道之间的相关性较低,因此将特征图矩阵所有通道信息进行融合,存在大量不可利用的信息,影响编解码效率。
发明内容
为了解决上述技术问题,本申请提供一种编解码方法及电子设备。
第一方面,本申请实施例提供一种编码方法,该方法包括:首先,获取待编码图像。接着,基于待编码图像,生成第一特征图矩阵,其中,第一特征图矩阵包括c个通道的第一特征图,c为正整数。随后,对由k个通道的第一特征图组成的特征图组进行组内融合,得到特征图组对应的第一熵估计特征,其中,k为小于c的正整数。接着,根据第一熵估计特征,确定第一特征图矩阵对应的概率分布,然后,根据概率分布对第一特征图矩阵进行编码,得到码流。这样,通过采用部分通道的特征图组成的特征图组进行组内融合,来确定熵估计特征,相对于采用全通道的特征图进行融合确定熵估计特征而言,减少了无效信息的引入,进而能够减少编码算力,提高编码效率。
示例性的,可以采用c个通道的第一特征图中,k个通道的第一特征图,组成一个特征图组,进而可以得到N个特征图组。其中,N为大于1的整数,N根据c和k确定。进而,可以得到N组第一熵估计特征。这样,通过分别采用各特征图组内的特征图进行融合,来确定各特征图组对应的熵估计特征,相对于采用全通道的特征图进行融合确定熵估计特征而言,能够减少无效信息的引入,进而减少编码算力,从而提高编码效率。此外,还能够提高重建图像的质量。
示例性的,将N组第一熵估计特征合并,得到第一特征图矩阵对应的第一熵估计特征;依据第一特征图矩阵对应的第一熵估计特征,确定第一特征图矩阵的概率分布。
示例性的,每个特征图组所包含的通道数k可以相同,也可以不相同,本申请对此 不作限制。
示例性的,不同特征图组可以包含同一通道的第一特征图。
示例性的,不同特征图组可以包含不同通道的第一特征图。
根据第一方面,对由k个通道的第一特征图组成的特征图组进行组内融合,得到特征图组对应的第一熵估计特征,包括:采用特征图组对应的自回归权重矩阵,对特征图组进行局部空间信息提取,得到特征图组对应的第一熵估计特征。
示例性的,假设特征图组对应的输出通道数为M i
Figure PCTCN2022125944-appb-000001
M为自回归模型的输出通道总数。此时,特征图组对应的
Figure PCTCN2022125944-appb-000002
c1=ks1*ks2,其中,“ks1*ks2”表示自回归模型卷积核的尺寸,ks1可以等于ks2,也可以不等于ks2,本申请对此不作限制。也就是说,M i个输出通道中,每个输出通道对应k个尺寸为ks1*ks2的权重图。例如,k=2,M i=1,则特征图组对应的输出通道数为1,该输出通道对应2个尺寸为ks1*ks2的权重图。又例如,k=2,M i=5,则特征图组对应的输出通道数为5,这5个输出通道中每个输出通道对应2个尺寸为权重图。
根据第一方面,或者以上第一方面的任意一种实现方式,第一特征图矩阵包括:第二特征图矩阵和第三特征图矩阵,其中,第二特征图矩阵包括c个通道的第二特征图,第三特征图矩阵包括c个通道的第三特征图;对由k个通道的第一特征图组成的特征图组进行组内融合,得到特征图组对应的第一熵估计特征,包括:对由k个通道的第二特征图组成的特征图组进行组内融合,得到由k个通道的第三特征图组成的特征图组对应的第一熵估计特征;根据第一熵估计特征,确定第一特征图矩阵对应的概率分布,包括:根据由k个通道的第三特征图组成的特征图组对应的第一熵估计特征,确定第三特征图矩阵对应的概率分布;根据概率分布对第一特征图矩阵进行编码,得到码流,包括:根据第三特征图矩阵对应的概率分布,对第三特征图矩阵进行编码,得到码流。这样,仅需确定第一特征图中部分特征点对应的第一熵估计特征,能够进一步提高编码效率。
示例性的,可以通过对第一特征图矩阵进行空间划分,得到第二特征图矩阵和第三特征图矩阵。其中,每个通道的第二特征图和第三特征图相加,可以得到该通道的第一特征图。
根据第一方面,或者以上第一方面的任意一种实现方式,第一特征图矩阵包括第三特征图矩阵,第三特征图矩阵包括c个通道的第三特征图;对由k个通道的第一特征图组成的特征图组进行组内融合,得到特征图组对应的第一熵估计特征,包括:对由k个通道的第三特征图组成的特征图组进行组内融合,得到由k个通道的第三特征图组成的特征图组对应的第一熵估计特征;根据第一熵估计特征,确定第一特征图矩阵对应的概率分布,包括:根据由k个通道的第三特征图组成的特征图组对应的第一熵估计特征,确定第三特征图矩阵对应的概率分布;根据概率分布对第一特征图矩阵进行编码,得到码流,包括:根据第三特征图矩阵对应的概率分布,对第三特征图矩阵进行编码,得到 码流。这样,仅需确定第一特征图中部分特征点对应的第一熵估计特征,能够进一步提高编码效率。
根据第一方面,或者以上第一方面的任意一种实现方式,该方法还包括:对第一特征图矩阵包括的第二特征图矩阵进行特征提取,得到第四特征图矩阵;根据第四特征图矩阵,确定第二熵估计特征;根据第二熵估计特征,确定第二特征图矩阵对应的概率分布。
根据第一方面,或者以上第一方面的任意一种实现方式,该方法还包括:对第四特征图矩阵进行编码,得到码流。这样,便于解码端从码流中解码出第二特征图矩阵。
第二方面,本申请实施例提供一种解码方法,该方法包括:获取码流,从码流中解码c个通道的特征点对应的特征值,得到第一特征图矩阵,其中,c为正整数;然后,基于第一特征图矩阵进行图像重建,输出重建图像。其中,针对码流中的第一待解码特征点:首先,确定第一待解码特征点对应的已解码信息组,其中,已解码信息组包括第一待解码特征点对应通道的已解码信息和其他k-1个通道的已解码信息,k为小于c的正整数;对已解码信息组进行组内融合,得到第一待解码特征点对应的第一熵估计特征。接着,根据第一待解码特征点对应的第一熵估计特征,确定第一待解码特征点对应的概率分布,然后根据概率分布对第一待解码特征点进行解码,得到对应的特征值,其中,第一待解码特征点为任一待解码特征点。这样,通过采用待解码特征点所在通道所属的已解码信息组内的已解码信息进行融合,来确定待解码特征点对应的熵估计特征,相对于采用全通道的已解码信息确定对应的熵估计特征而言,能够减少无效信息的引入,进而减少解码算力,从而提高解码效率。
示例性的,已解码信息包括已解码特征点对应的特征值。
根据第二方面,对已解码信息组进行组内融合,得到第一待解码特征点对应的第一熵估计特征,包括:采用已解码信息组对应的自回归权重矩阵,对已解码信息组进行局部空间信息提取,得到第一待解码特征点对应的第一熵估计特征。
根据第二方面,或者以上第二方面的任意一种实现方式,特征点包括位于第一预设位置的特征点和第二预设位置的特征点;第一待解码特征点为位于第一预设位置的特征点;该方法包括:从码流中解码出第四特征图矩阵,第四特征图矩阵包括通过对第一特征图矩阵中,位于第二预设位置的特征点对应的特征值进行特征提取得到的特征;针对位于第二预设位置的第二待解码特征点:基于第四特征图矩阵,确定第二待解码特征点对应的第二熵估计特征;根据第二熵估计特征,确定第二待解码点对应的概率分布;根据概率分布对第二待解码特征点进行解码,得到对应的特征值。相对于确定第二熵估计而言,确定第一熵估计特征的算力更大;因此仅确定部分待解码特征点对应的第一熵估计特征,能够进一步提高解码效率。
根据第二方面,或者以上第二方面的任意一种实现方式,已解码信息组包括第一待解码特征点对应通道中位于第二预设位置的已解码特征点对应的特征值,和其他k-1个通道中位于第二预设位置的已解码特征点对应的特征值。这样,能够对位于第一预设位置的第一待解码特征点进行并行解码,进而进一步提高解码效率。
根据第二方面,或者以上第二方面的任意一种实现方式,已解码信息组包括第一待解码特征点对应通道中位于第一预设位置的已解码特征点对应的特征值,和其他k-1个通道中位于第一预设位置的已解码特征点对应的特征值。
第二方面以及第二方面的任意一种实现方式分别与第一方面以及第一方面的任意一种实现方式相对应。第二方面以及第二方面的任意一种实现方式所对应的技术效果可参见上述第一方面以及第一方面的任意一种实现方式所对应的技术效果,此处不再赘述。
第三方面,本申请实施例提供一种编码器,用于执行第一方面或第一方面的任意可能的实现方式中的编码方法。
第三方面以及第三方面的任意一种实现方式分别与第一方面以及第一方面的任意一种实现方式相对应。第三方面以及第三方面的任意一种实现方式所对应的技术效果可参见上述第一方面以及第一方面的任意一种实现方式所对应的技术效果,此处不再赘述。
第四方面,本申请实施例提供一种解码器,用于执行第二方面或第二方面的任意可能的实现方式中的解码方法。
第四方面以及第四方面的任意一种实现方式分别与第二方面以及第二方面的任意一种实现方式相对应。第四方面以及第四方面的任意一种实现方式所对应的技术效果可参见上述第二方面以及第二方面的任意一种实现方式所对应的技术效果,此处不再赘述。
第五方面,本申请实施例提供一种电子设备,包括:存储器和处理器,存储器与处理器耦合;存储器存储有程序指令,当程序指令由处理器执行时,使得电子设备执行第一方面或第一方面的任意可能的实现方式中的编码方法。
第五方面以及第五方面的任意一种实现方式分别与第一方面以及第一方面的任意一种实现方式相对应。第五方面以及第五方面的任意一种实现方式所对应的技术效果可参见上述第一方面以及第一方面的任意一种实现方式所对应的技术效果,此处不再赘述。
第六方面,本申请实施例提供一种电子设备,包括:存储器和处理器,存储器与处理器耦合;存储器存储有程序指令,当程序指令由处理器执行时,使得电子设备执行第二方面或第二方面的任意可能的实现方式中的解码方法。
第六方面以及第六方面的任意一种实现方式分别与第二方面以及第二方面的任意一种实现方式相对应。第六方面以及第六方面的任意一种实现方式所对应的技术效果可参见上述第二方面以及第二方面的任意一种实现方式所对应的技术效果,此处不再赘述。
第七方面,本申请实施例提供一种芯片,包括一个或多个接口电路和一个或多个处理器;接口电路用于从电子设备的存储器接收信号,并向处理器发送信号,信号包括存储器中存储的计算机指令;当处理器执行计算机指令时,使得电子设备执行第一方面或第一方面的任意可能的实现方式中的编码方法。
第七方面以及第七方面的任意一种实现方式分别与第一方面以及第一方面的任意一种实现方式相对应。第七方面以及第七方面的任意一种实现方式所对应的技术效果可参见上述第一方面以及第一方面的任意一种实现方式所对应的技术效果,此处不再赘述。
第八方面,本申请实施例提供一种芯片,包括一个或多个接口电路和一个或多个处理器;接口电路用于从电子设备的存储器接收信号,并向处理器发送信号,信号包括存储器中存储的计算机指令;当处理器执行计算机指令时,使得电子设备执行第二方面或第二方面的任意可能的实现方式中的解码方法。
第八方面以及第八方面的任意一种实现方式分别与第二方面以及第二方面的任意一种实现方式相对应。第八方面以及第八方面的任意一种实现方式所对应的技术效果可参见上述第二方面以及第二方面的任意一种实现方式所对应的技术效果,此处不再赘述。
第九方面,本申请实施例提供一种计算机存储介质,计算机可读存储介质存储有计算机程序,当计算机程序运行在计算机或处理器上时,使得计算机或处理器执行第一方面或第一方面的任意可能的实现方式中的编码方法。
第九方面以及第九方面的任意一种实现方式分别与第一方面以及第一方面的任意一种实现方式相对应。第九方面以及第九方面的任意一种实现方式所对应的技术效果可参见上述第一方面以及第一方面的任意一种实现方式所对应的技术效果,此处不再赘述。
第十方面,本申请实施例提供一种计算机存储介质,计算机可读存储介质存储有计算机程序,当计算机程序运行在计算机或处理器上时,使得计算机或处理器执行第二方面或第二方面的任意可能的实现方式中的解码方法。
第十方面以及第十方面的任意一种实现方式分别与第二方面以及第二方面的任意一种实现方式相对应。第十方面以及第十方面的任意一种实现方式所对应的技术效果可参见上述第二方面以及第二方面的任意一种实现方式所对应的技术效果,此处不再赘述。
第十一方面,本申请实施例提供一种计算机程序产品,计算机程序产品包含软件程序,当软件程序被计算机或处理器执行时,使得第一方面或第一方面的任意可能的实现方式中方法的步骤被执行。
第十一方面以及第十一方面的任意一种实现方式分别与第一方面以及第一方面的任意一种实现方式相对应。第十一方面以及第十一方面的任意一种实现方式所对应的技术效果可参见上述第一方面以及第一方面的任意一种实现方式所对应的技术效果,此处不再赘述。
第十二方面,本申请实施例提供一种计算机程序产品,计算机程序产品包含软件程序,当软件程序被计算机或处理器执行时,使得第二方面或二方面的任意可能的实现方式中方法的步骤被执行。
第十二方面以及第十二方面的任意一种实现方式分别与第二方面以及第二方面的任意一种实现方式相对应。第十二方面以及第十二方面的任意一种实现方式所对应的技术效果可参见上述第二方面以及第二方面的任意一种实现方式所对应的技术效果,此处不再赘述。
附图说明
图1为示例性示出***框架结构示意图;
图2为示例性示出的编码流程示意图;
图3a为示例性示出的特征图组的示意图;
图3b为示例性示出的特征图组的示意图;
图3c为示例性示出的组内融合过程示意图;
图3d为示例性示出的组内融合过程示意图;
图4为示例性示出的解码流程示意图;
图5a为示例性示出的解码过程示意图;
图5b为示例性示出的已解码信息组的示意图;
图5c为示例性示出的组内融合过程示意图;
图5d为示例性示出的组内融合过程示意图;
图5e为示例性示出的压缩效果示意图;
图6为示例性示出的编解码框架结构示意图;
图7为示例性示出的编码流程示意图;
图8为示例性示出的特征图划分流程示意图;
图9为示例性示出的解码流程示意图;
图10为示例性示出的编码流程示意图;
图11a为示例性示出的解码流程示意图;
图11b为示例性示出的压缩效果示意图;
图12为示例性示出的装置的结构示意图。
具体实施方式
下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例是本申请一部分实施例,而不是全部的实施例。基于本申请中的实施例,本领域普通技术人员在没有作出创造性劳动前提下所获得的所有其他实施例,都属于本申请保护的范围。
本文中术语“和/或”,仅仅是一种描述关联对象的关联关系,表示可以存在三种关系,例如,A和/或B,可以表示:单独存在A,同时存在A和B,单独存在B这三种情况。
本申请实施例的说明书和权利要求书中的术语“第一”和“第二”等是用于区别不同的对象,而不是用于描述对象的特定顺序。例如,第一目标对象和第二目标对象等是用于区别不同的目标对象,而不是用于描述目标对象的特定顺序。
在本申请实施例中,“示例性的”或者“例如”等词用于表示作例子、例证或说明。本申请实施例中被描述为“示例性的”或者“例如”的任何实施例或设计方案不应被解释为比其它实施例或设计方案更优选或更具优势。确切而言,使用“示例性的”或者“例如”等词旨在以具体方式呈现相关概念。
在本申请实施例的描述中,除非另有说明,“多个”的含义是指两个或两个以上。例如,多个处理单元是指两个或两个以上的处理单元;多个***是指两个或两个以上的***。
图1为示例性示出***框架结构示意图。应该理解的是,图1所示***仅是一个范例,本申请的***可以具有比图中所示的更多的或者更少的部件,可以组合两个或多个的部件,或者可以具有不同的部件配置。图1中所示出的各种部件可以在包括一个或多个信号处理和/或专用集成电路在内的硬件、软件、或硬件和软件的组合中实现。
参照图1,示例性的,图像的压缩过程可以如下:将待编码图像输入至AI编码单元,经过AI编码单元的处理,输出待编码特征点对应的特征值和概率分布。然后将待编码特征点对应的特征值和概率分布,输入至熵编码单元,由熵编码单元根据待编码特征点对应的概率分布,对待编码特征点的特征值进行熵编码,输出码流。
继续参照图1,示例性的,图像的解压过程可以如下:熵解码单元获取到码流后,可以根据AI解码单元针对码流中待解码特征点预测的概率分布,对待解码特征点进行熵解码,输出已解码特征点对应的特征值至AI解码单元。经由AI解码单元基于已解码特征点对应的特征值进行图像重建,输出重建图像。
示例性的,熵编码是指编码过程中按熵原理不丢失任何信息的编码,熵编码可以包括多种,如香农(Shannon)编码、哈夫曼(Huffman)编码和算术编码(arithmetic coding)等,本申请对此不作限制。
示例性的,输入至AI编码单元的待编码图像,可以是RAW(未加工)图像、RGB(RedGreenBlue,红绿蓝)图像和YUV(“Y”表示明亮度(Luminance、Luma),“U”和“V”则是色度、浓度(Chrominance、Chroma))图像中的任意一种,本申请对此不作限制。
示例性的,压缩过程和解压过程可以在同一个电子设备中进行,也可以在不同电子设备中进行,本申请对此不作限制。
示例性的,AI编码单元和AI解码单元,可以设置于NPU(Neural network Processing Unit,嵌入式神经网络处理器)或GPU(Graphics Processing Unit,图形处理器)中。示例性的熵编码单元和熵解码单元,可以设置于CPU(Central Processing Unit,中央处理器)中。
示例性的,本申请可以应用于对一张独立的图像进行压缩和解压,也可以应用于对视频序列中多帧图像进行压缩和解压,本申请对此不作限制。
示例性的,本申请可以应用于多种场景,例如,华为云存储(或传输)图像(或视频)的场景,又例如,视频监控场景,还例如,直播场景等等,本申请对此不作限制。
图2为示例性示出的编码流程示意图。
S201,获取待编码图像。
示例性的,编码端可以获取待编码图像,然后可以参照S202~S205,对待编码图像进行编码,得到对应的码流。
S202,基于待编码图像,生成第一特征图矩阵,第一特征图矩阵包括c个通道的第一特征图,c为正整数。
示例性的,可以对待编码图像进行空间变换,将待编码图像变换至另一个空间,以降低待编码图像的时间冗余和空间冗余,得到第一特征图矩阵。示例性的,第一特征图矩阵包括c个通道的第一特征图,c为正整数。
示例性的,第一特征图矩阵∈R c*h*w,其中,“c”是指第一特征图矩阵的通道数,“h”表示每个通道输出的第一特征图的高,“w”表示每个通道输出的第一特征图的宽。其中,每个第一特征图可以包括h*w个特征点。
S203,对由k个通道的第一特征图组成的特征图组进行组内融合,得到特征图组对应的第一熵估计特征,k为小于c的正整数。
示例性的,可以采用自回归模型确定第一特征图矩阵对应的第一熵估计特征。示例性的,自回归模型的输入通道总数为c,输出通道总数为M;其中,M为正整数,M可以大于c,也可以小于c,还可以等于c,具体可以按照需求设置,本申请对此不作限制。示例性的,自回归模型的一个输入通道可以对应至少一个输出通道,自回归模型一个输出通道可以对应至少一个输入通道。
示例性的,第一特征图矩阵的通道与自回归模型的输入通道是一一对应的,这样,可以将第一特征图矩阵c个通道的第一特征图,分别作为c路输入,输入至自回归模型。
示例性的,自回归模型可以采用第一特征图矩阵中k个通道的第一特征图,组成一个特征图组;这样,可以得到N个特征图组,N为大于1的整数,具体可以根据k和c确定。
示例性的,每个特征图组所包含的通道数k可以相同,也可以不同,本申请对此不作限制。
图3a为示例性示出的特征图组的示意图。其中,图3a中的一个矩形,表征一个第一特征图。其中,图3a中每个特征图组包含的通道数k相同。
参照图3a,示例性的,k=2,进而采用c个通道的第一特征图中,每2个通道的第一特征图,组成一个特征图组。假设,c=192,则N=96,也就是说,可以采用192个通道中每2个通道的第一特征图,组成一个特征图组,得到96个特征图组。其中,这192个特征图组中,每个特征图组包括2个第一特征图。
图3b为示例性示出的特征图组的示意图。其中,图3b中的一个矩形,表征一个第一特征图,图3b中每个特征图组包含的通道数k不同。
参照图3b,示例性的,可以采用c个通道的第一特征图中,1个通道的第一特征图, 组成特征图组1;可以采用c个通道的第一特征图中,2个通道的第一特征图,组成特征图组2;可以采用c个通道的第一特征图中,3个通道的第一特征图,组成特征图组3;……
需要说明的是,图3a和图3b仅是本申请的一个示例,k还可以按照需求设置为其他数值,本申请对此不作限制。
需要说明的是,不同特征图组可以包含同一通道的第一特征图,例如,特征图组1包括通道1的第一特征图和通道2的第一特征图,特征图组2可以包括通道2的第一特征图、通道3的第一特征图和通道4的第一特征图。不同特征图组可以包含不同通道的第一特征图,如图3a和图3b所示;本申请对此不作限制。
示例性的,组内融合可以是指将特征图组内k个通道的特征图进行融合,这样,可以得到特征图组对应的第一熵估计特征。
示例性的,自回归模型可以分别对N个特征图组进行组内融合,得到N个特征图组分别对应的第一熵估计特征。以下确定第i个特征图组对应的第一熵估计特征为例进行示例性说明。其中,i为1~N之间的整数,i的取值可以为1和N。
示例性的,假设第i个特征图组对应的输出通道数为M i
Figure PCTCN2022125944-appb-000003
此时,第i个特征图组对应的
Figure PCTCN2022125944-appb-000004
c1=ks1*ks2,其中,“ks1*ks2”表示自回归模型卷积核的尺寸,ks1可以等于ks2,也可以不等于ks2,本申请对此不作限制。也就是说,M i个输出通道中,每个输出通道对应k个尺寸为ks1*ks2的权重图。例如,k=2,M i=1,则第i个特征图组对应的输出通道数为1,该输出通道对应2个尺寸为ks1*ks2的权重图。又例如,k=2,M i=5,则第i个特征图组对应的输出通道数为5,这5个输出通道中每个输出通道对应2个尺寸为权重图。
示例性的,可以采用第i个特征图组对应的自回归权重矩阵,对第i个特征图组进行局部空间信息提取,得到第i个特征图组对应的第一熵估计特征。
示例性的,可以采用第i个特征图组对应的第j个输出通道的权重图,分别与第i个特征图组中k个通道的第一特征图进行卷积,得到k个卷积结果;将k个卷积结果进行融合,得到第i个特征图组对应在第j个输出通道的第一熵估计特征。将第i个特征图组对应在M i个输出通道的第一熵估计特征合并,得到第i个特征图组对应的第一熵估计特征。其中,j为1~M i之间的数,包括1和M i
图3c为示例性示出的组内融合过程示意图。其中,图3c中用于说明组内融合过程的特征图组为图3a中特征图组1,特征图组1包含的通道数k=2;特征图组1中的通道1和通道2分别对应自回归模型的输入通道1和输入通道2。
参照图3c(1),示例性的,图3c(1)中特征图组1对应的输出通道数为1,输出通道1对应2个权重图:权重图11和权重图12。针对特征图组1,可以采用权重图11与输入通道1的第一特征图进行卷积,得到卷积结果11,以及采用权重图12与输入通道2的第一特征图进行卷积,得到卷积结果12。然后融合卷积结果11和卷积结果12,得到特征图组1对应在输出通道1的第一熵估计特征。这样,可以得到特征图组1对应的第一熵估计特征。
参照图3c(2),示例性的,图3c(2)中特征图组1对应的输出通道数为2,输出通道1对应2个权重图:权重图11和权重图12,输出通道2对应2个权重图:权重图21和权重图22。针对特征图组1,可以采用权重图11与输入通道1的第一特征图进行卷积,得到卷积结果11,以及采用权重图12与输入通道2的第一特征图进行卷积,得到卷积结果12。然后融合卷积结果11和卷积结果12,得到特征图组1对应在输出通道1的第一熵估计特征。以及可以采用权重图21与输入通道1的第一特征图进行卷积,得到卷积结果13,以及采用权重图22与输入通道2的第一特征图进行卷积,得到卷积结果14。然后融合卷积结果13和卷积结果14,得到特征图组1对应在输出通道2的第一熵估计特征。接着,合并特征图组1对应在输出通道1的第一熵估计特征和特征图组1对应在输出通道2的第一熵估计特征,可以得到特征图组1对应的第一熵估计特征。
参照图3c(3),示例性的,图3c中特征图组1对应的输出通道数为3,输出通道1对应2个权重图:权重图11和权重图12,输出通道2对应2个权重图:权重图21和权重图22,输出通道3对应2个权重图:权重图31和权重图32。针对特征图组1,可以采用权重图11与输入通道1的第一特征图进行卷积,得到卷积结果11,以及采用权重图12与输入通道2的第一特征图进行卷积,得到卷积结果12。然后融合卷积结果11和卷积结果12,得到特征图组1对应在输出通道1的第一熵估计特征。可以采用权重图21与输入通道1的第一特征图进行卷积,得到卷积结果13,以及采用权重图22与输入通道2的第一特征图进行卷积,得到卷积结果14。然后融合卷积结果13和卷积结果14,得到特征图组1对应在输出通道2的第一熵估计特征。可以采用权重图31与输入通道1的第一特征图进行卷积,得到卷积结果15,以及采用权重图32与输入通道2的第一特征图进行卷积,得到卷积结果16。然后融合卷积结果15和卷积结果16,得到特征图组1对应在输出通道3的第一熵估计特征。接着,合并特征图组1对应在输出通道1的第一熵估计特征、特征图组1对应在输出通道2的第一熵估计特征和特征图组1对应在输出通道3的第一熵估计特征,可以得到特征图组1对应的第一熵估计特征。
需要说明的是,本申请针对每个输出通道,不限制采用该输出通道的哪个权重图与特征图组内哪个通道的第一特征图进行卷积,得到特征图组1对应在该输出通道的第一熵估计特征。例如,针对特征图组1,也可以采用权重图12与输入通道1的第一特征图进行卷积,得到卷积结果11,以及采用权重图11与输入通道2的第一特征图进行卷积,得到卷积结果12;然后融合卷积结果11和卷积结果12,得到特征图组1对应在输出通道1的第一熵估计特征。以及可以采用权重图22与输入通道1的第一特征图进行卷积,得到卷积结果13,以及采用权重图21与输入通道2的第一特征图进行卷积,得到卷积结果14;然后融合卷积结果13和卷积结果14,得到特征图组1对应在输出通道2的第一熵估计特征。以及可以采用权重图32与输入通道1的第一特征图进行卷积,得到卷积结果15,以及采用权重图31与输入通道2的第一特征图进行卷积,得到卷积结果16;然后融合卷积结果15和卷积结果16,得到特征图组1对应在输出通道3的第一熵估计特征。
示例性的,可以采用第i个特征图组对应的第j个输出通道的权重图,分别对第i个特征图组中k个通道的第一特征图中,以(g1,g2)为中心的ks1*ks2区域进行空间信息提取,得到第i个特征图组中第g组特征点(由k个通道的第一特征图中位于(g1,g2)的特征点组 成,其中,g1为1~h之间的整数(包括1和h),g2为1~w之间的整数(包括1和w),g为1~h*w之间的整数(包括1和h*w))对应在第j个输出通道的第一熵估计特征。其中,(g1,g2)为第一特征图的位置坐标整数索引,g1、g2分别表示水平、竖直方向坐标索引,第一特征图左上角的位置索引为(1,1)。
示例性的,可以采用第i个特征图组对应的第j个输出通道的权重图,分别与第i个特征图组中k个通道的第一特征图中,以(g1,g2)为中心的ks1*ks2区域内各特征点对应的特征值进行卷积,得到k个卷积结果;将k个卷积结果进行融合,得到第i个特征图组中第g组特征点对应在第j个输出通道的第一熵估计特征。将第i个特征图组中h*w组特征点对应在第j个输出通道的第一熵估计特征合并,得到第i个特征图组对应在第j个输出通道的第一熵估计特征。
图3d为示例性示出的组内融合过程示意图。其中,图3d中用于说明组内融合过程的特征图组为图3a中特征图组1,特征图组1包括的通道数k=2。其中,特征图组1中的通道1和通道2分别对应自回归模型的输入通道1和输入通道2,特征图组1对应的输出通道数为2。
参照图3d,示例性的,输入通道1和输入通道2的第一特征图尺寸h=w=5,每一个小方块,对应一个特征点。假设,g=13,则第13组特征点包括输入通道1的第一特征图中的特征点A1与输入通道2的第一特征图中的特征点A2。此外,假设ks1=ks2=3。
参照图3d,示例性的,可以采用输出通道1对应的一个3*3的权重图,与输入通道1的第一特征图中以特征点A1为中心的3*3区域内的特征点(如图3d中灰色方块)对应的特征值进行卷积,得到卷积结果21;以及采用输出通道1对应的另一个3*3的权重图,与通道2的第一特征图中以特征点A2为中心的3*3区域内的特征点(如图3d中灰色方块)对应的特征值进行卷积,得到卷积结果22;然后将卷积结果21和卷积结果22进行融合,得到第13组特征点对应在输出通道1的第一熵估计特征。
继续参照图3d,示例性的,可以采用输出通道2对应的一个3*3的权重图,与输入通道1的第一特征图中以特征点A1为中心的3*3区域内的特征点(如图3d中灰色方块)对应的特征值进行卷积,得到卷积结果23;以及采用输出通道2对应的另一个3*3的权重图,与输入通道2的第一特征图中以特征点A2为中心的3*3区域内的特征点(如图3d中灰色方块)对应的特征值进行卷积,得到卷积结果24;然后将卷积结果23和卷积结果24进行融合,得到第13组特征点对应在输出通道2的第一熵估计特征。
这样,按照上述方式,分别对N个特征图组进行组内融合,得到N个特征图组分别对应的第一熵估计特征,也就是得到N组第一熵估计特征。
需要说明的是,由于解码端在解码时,是根据已解码特征点对应的特征值,来预测待解码特征点的第一熵估计特征的。第一特征图中以待解码特征点对应位置为中心的ks1*ks2的区域内的特征点,包括已解码特征点和未解码特征点,而未解码特征点无法参与计算。为了保证编解码的一致性,每个输出通道的权重图中,与未解码特征点的位置所对应位置的权重值为0。示例性的,解码端针对每个通道的特征点按照预设解码顺序进行解码,因此可以确定权重图中哪些位置的权重值为0。示例性的,预设解码顺序可以按照需求设置,本申请对此不作限制。
S204,根据第一熵估计特征,确定第一特征图矩阵对应的概率分布。
示例性的,可以将N组第一熵估计特征合并,得到第一特征图矩阵对应的第一熵估计特征。其中,第一熵估计特征R c2*h*w,c2=M。
一种可能的方式中,可以根据第一特征图矩阵对应的第一熵估计特征进行概率估计,得到第一特征图矩阵对应的概率分布。其中,概率分布R c*h*w*P,也就是说,概率分布的通道数与第一特征图矩阵的通道数相同,第一特征图矩阵中每个特征点对应P个参数(如均值、方差等),P为大于0的整数,本申请对此不作限制。
一种可能的方式中,可以对第一特征图矩阵进行特征提取,得到第五特征图矩阵;然后根据第五特征图矩阵,确定第一特征图矩阵对应的第二熵估计特征。随后,可以结合第一特征图矩阵对应的第一熵估计特征和第一特征图矩阵对应的第二熵估计特征进行概率估计,得到第一特征图矩阵对应的概率分布。示例性的,可以将第一熵估计特征和第二熵估计特征进行聚合(如拼接),根据聚合结果进行概率估计,得到第一特征图矩阵对应的概率分布。
S205,根据概率分布对第一特征图矩阵进行编码,得到码流。
示例性的,可以根据第一特征图对应的概率分布,对第一特征图矩阵进行编码,得到待编码图像对应的码流。
示例性的,可以存储待编码图像对应的码流,也可以将待编码图像对应的码流传输至解码端。
示例性的,当确定了第五特征图矩阵,可以对第五特征图矩阵进行编码,得到第五特征图矩阵对应的码流。然后可以存储第五特征图矩阵对应的码流,也可以将待第五特征图矩阵对应的码流传输至解码端。
这样,通过分别采用各特征图组内的特征图进行融合,来确定各特征图组对应的熵估计特征,相对于采用全通道的特征图进行融合确定熵估计特征而言,能够减少无效信息的引入,进而减少编码算力,从而提高编码效率。此外,还能够提高重建图像的质量。
图4为示例性示出的解码流程示意图。
S401,获取码流。
示例性的,解码端可以获取码流,然后可以对码流进行解码,可以参照下述S402~S403。
S402,从码流中解码c个通道的特征点对应的特征值,得到第一特征图矩阵。
示例性的,码流中可以包括c个通道的第一特征图中各特征点对应的编码信息,可以对各特征点对应的编码信息进行解码,得到各特征点对应的特征值;所有特征点对应的特征值可以组成第一特征图矩阵。
示例性的,解码端对不同通道的特征点可以是并行解码,也可以是串行解码。示例性的,解码端对同一通道的特征点可以是串行解码,也可以是并行解码,本申请对此不作限制。
图5a为示例性示出的解码过程示意图。图5a示出的是针对一个通道中特征点的一种预设解码顺序。
参照图5a,示例性的,图5a为第一特征图矩阵中一个通道的第一特征图的示例,第一特征图的尺寸为10*10,其中,每个方块表示一个特征点。示例性的,解码端可以按照图5a中的顺序依次对第一特征图中的每个特征点进行解码,即从第一行开始,从左到右依次对每个特征点进行解码,待第一行所有特征点均被解码后,可以开始对第二行各个特征点,从左到右进行解码,以此类推,直到完成第一特征图矩阵中所有特征点的解码为止。
需要说明的是,本申请也可以采用除图5a所示的预设解码顺序之外的其他预设解码顺序,对各特征点进行串行解码,具体可以按照需求设置,本申请对此不作限制。
示例性的,可以按照预设解码顺序,将第一特征图矩阵中所有特征点,依次确定为第一待解码特征点,然后可以参照S4021~S4024,针对第一待解码特征点进行解码。
S4021,确定第一待解码特征点对应的已解码信息组,已解码信息组包括第一待解码特征点对应通道的已解码信息和其他k-1个通道的已解码信息,其中,k为小于c的正整数。
示例性的,已解码信息可以包括已解码特征点对应的特征值。
示例性的,解码端每解码一个通道中的一个特征点后,可以将该特征点对应的特征值作为一路输入,输入至自回归模型。其中,同一通道的所有已解码特征点的特征值,从自回归模型的同一个通道输入。然后,针对每个第一待解码特征点,可以根据已解码特征点对应的特征值,确定对应的概率分布。
示例性的,自回归模型可以采用k(k为小于c的正整数)个通道的已解码信息(即已解码特征点对应的特征值),组成一个已解码信息组;这样,可以得到N个已解码信息组,N为大于1的整数,具体可以根据k和c确定。
示例性的,每个特征图组所包含的通道数k可以相同,也可以不同,本申请对此不作限制。
图5b为示例性示出的已解码信息组的示意图。其中,假设解码端对c个通道进行并行解码,针对每个通道的特征点按照图5a的预设解码顺序进行串行解码。图5b中的一个大方块,表征一个第一特征图,大方块中的小方块表征一个特征点,灰色矩形表征已解码特征点,白色矩形表征未解码特征点。图5b中每个已解码信息组所包含的通道数k相同,每个已解码信息组所包含的通道数k=2。
参照图5b,示例性的,采用c个通道的已解码信息中,每2个通道的已解码信息,组成一个已解码信息组。假设,c=192,则N=96,也就是说,可以采用192个通道中每2个通道的已解码信息,组成一个已解码信息组,得到96个已解码信息组。其中,这192个已解码信息组中,每个已解码信息组包括2个通道的已解码信息。
需要说明的是,k还可以按照需求设置为其他数值如k=1,本申请对此不作限制。此外,每个已解码信息组所包含的通道数k也可以不同,本申请对此不作限制。
需要说明的是,解码端采用c个通道的已解码信息,组成N个已解码信息组的方式,与上述编码端采用c个通道的第一特征图,组成N个特征图组的方式相同。
示例性的,在确定N个已解码信息组后,可以确定第一待解码特征点所在通道,然后从N个已解码信息组中,确定第一待解码特征点所在通道所属的已解码信息组。为了 便于描述,后续可以将第一待解码特征点所在通道所属的已解码信息组,称为第i个已解码信息组。
示例性的,第i个已解码信息组所包括的通道为:第一待解码特征点对应通道和其他k-1个通道;第i个已解码信息组包括第一待解码特征点对应通道的已解码信息和其他k-1个通道的已解码信息。
S4022,对已解码信息组进行组内融合,得到第一待解码特征点对应的第一熵估计特征。
示例性的,第i个已解码信息组对应的
Figure PCTCN2022125944-appb-000005
c1=ks1*ks2。也就是说,M i个输出通道中,每个输出通道对应k个尺寸为ks1*ks2的权重图。需要说明的,第i个已解码信息组对应的自回归权重矩阵,和第i个特征图组对应的自回归权重矩阵相同。
示例性的,可以采用第i个已解码信息组对应的自回归权重矩阵,对第i个已解码信息组进行局部空间信息提取,得到第一待解码特征点对应的第一熵估计特征。
示例性的,可以采用第i个已解码信息组对应的第j个输出通道的权重图,分别与第i个已解码信息组中k个通道的已解码信息进行卷积,得到k个卷积结果;将k个卷积结果进行融合,得到第i个已解码信息组对应在第j个输出通道的第一熵估计特征。将第i个已解码信息组对应在M i个输出通道的第一熵估计特征合并,得到第i个已解码信息组对应的第一熵估计特征。其中,j为1~M i之间的数,包括1和M i
示例性的,第i个已解码信息组对应的第一熵估计特征,也就是第一待解码特征点对应的第一熵估计特征。
图5c为示例性示出的组内融合过程示意图。其中,图5c中用于说明组内融合过程的已解码信息组为图5b中已解码信息组1,已解码信息组1所包含的通道数k=2;已解码信息组1中的通道1和通道2分别对应自回归模型的输入通道1和输入通道2。
参照图5c,示例性的,图5c中已解码信息组1对应的输出通道数为2,输出通道1对应2个权重图:权重图11和权重图12,输出通道2对应2个权重图:权重图21和权重图22。针对已解码信息组1,可以采用权重图11与输入通道1的已解码信息进行卷积,得到卷积结果11,以及采用权重图12与输入通道2的已解码信息进行卷积,得到卷积结果12。然后融合卷积结果11和卷积结果12,得到第i个已解码信息组对应在输出通道1的第一熵估计特征。以及可以采用权重图21与输入通道1的已解码信息进行卷积,得到卷积结果13,以及采用权重图22与输入通道2的已解码信息进行卷积,得到卷积结果14。然后融合卷积结果13和卷积结果14,得到第i个已解码信息组对应在输出通道2的第一熵估计特征。接着,合并第i个已解码信息组对应在输出通道1的第一熵估计特征和第i个已解码信息组对应在输出通道2的第一熵估计特征,可以得到已解码信息组1对应的第一熵估计特征,进而可以得到第一待解码特征点对应的第一熵估计特征。
需要说明的是,本申请针对每个输出通道,不限制采用该输出通道的哪个权重图,与已解码信息组内哪个通道的已解码信息进行卷积,得到该输出通道的第一熵估计特征。例如,针对已解码信息组1,也可以采用权重图12与输入通道1的已解码信息进行卷积,得到卷积结果11,以及采用权重图11与输入通道2的已解码信息进行卷积,得到卷积结 果12;然后融合卷积结果11和卷积结果12,得到第i个已解码信息组对应在输出通道1的第一熵估计特征。以及可以采用权重图22与输入通道1的已解码信息进行卷积,得到卷积结果13,以及采用权重图21与输入通道2的已解码信息进行卷积,得到卷积结果14;然后融合卷积结果13和卷积结果14,得到第i个已解码信息组对应在输出通道2的第一熵估计特征。
示例性的,可以采用第i个已解码信息组对应的第j个输出通道的权重图,分别对第i个已解码信息组的k个通道中,以(g1,g2)(第一待解码特征点对应的位置)为中心的ks1*ks2区域内已解码信息进行空间信息提取,得到第i个已解码信息组中第g组特征点(由第i个已解码信息组所包含的k个通道中位于(g1,g2)的未解码特征点组成,其中,g1为1~h之间的整数(包括1和h),g2为1~w(包括1和w),g为1~h*w之间的整数(包括1和h*w))对应在第j个输出通道的第一熵估计特征。由于第g组特征点包含了第一待解码特征点,进而可以得到第i个已解码信息组中第一待解码特征点对应在第j个输出通道的第一熵估计特征。
示例性的,可以采用第i个已解码信息组对应的第j个输出通道的权重图,分别与第i个已解码信息组的k个通道中,以(g1,g2)为中心的ks1*ks2区域内已解码信息进行卷积,得到k个卷积结果;将k个卷积结果进行融合,得到第i个已解码信息组中第一待解码特征点对应在第j个输出通道的第一熵估计特征。
图5d为示例性示出的组内融合过程示意图。其中,图5d中用于说明组内融合过程的已解码信息组为图5b中已解码信息组1,已解码信息组1所包含的通道数k=2。其中,已解码信息组1中的通道1和通道2分别对应自回归模型的输入通道1和输入通道2,已解码信息组1对应的输出通道数为2。
参照图5d,示例性的,第一特征图矩阵尺寸为h=w=5,每一个小方块,对应一个特征点。输入通道1和输入通道2中灰色方块是已解码特征点,白色方块是未解码特征点。假设,g=13,则第13组特征点包括输入通道1中的特征点A1与输入通道2中的特征点A2,第一待解码特征点为特征点A1。此外,假设ks1=ks2=3。
参照图5d,示例性的,可以采用输出通道1对应的一个3*3的权重图,与输入通道1的以特征点A1为中心的3*3区域(如图5d中的Area1)内的已解码特征点对应的特征值进行卷积,得到卷积结果21;以及采用输出通道1对应的另一个3*3的权重图,与输入通道2的以特征点A2为中心的3*3区域(如图5d中的Area2)内的已解码特征点对应的特征值进行卷积,得到卷积结果22;然后将卷积结果21和卷积结果22进行融合,得到第一待解码特征点A1对应在输出通道1的第一熵估计特征。
继续参照图5d,示例性的,可以采用输出通道2对应的一个3*3的权重图,与输入通道1的以特征点A1为中心的3*3区域(如图5d中的Area1)内的已解码特征点对应的特征值进行卷积,得到卷积结果23;以及采用输出通道2对应的另一个3*3的权重图,与输入通道2的以特征点A2为中心的3*3区域(如图5d中的Area2)内的已解码特征点对应的特征值进行卷积,得到卷积结果24;然后将卷积结果23和卷积结果24进行融合,得到第一待解码特征点A1对应在输出通道2的第一熵估计特征。
这样,按照上述方式,确定第一待解码特征点对应的第一熵估计特征。
S4023,根据第一待解码特征点对应的第一熵估计特征,确定第一待解码特征点对应的概率分布。
一种可能的方式中,可以根据第一待编码特征点对应的第一熵估计特征进行概率估计,得到第一待编码特征点对应的概率分布。其中,第一待编码特征点对应的概率分布对应一组概率分布参数。其中,每组概率分布参数可以包括至少一个参数,如均值、方差等,本申请对此不作限制。
一种可能的方式中,若编码端将第五特征图矩阵编码为码流,并将第五特征图矩阵的码流发送至解码端后,则解码端可以从码流中提取第五特征图矩阵。然后可以根据第五特征图矩阵,确定码流中所有特征点对应的第二熵估计特征。接着,可以从码流中所有特征点对应的第二熵估计特征中,确定第一待解码特征点对应的第二熵估计特征,然后可以结合第一待解码特征点对应的第一熵估计特征和第二熵估计特征进行概率估计,得到第一待解码特征点对应的概率分布。示例性的,可以将第一待解码特征点对应的第一熵估计特征和第二熵估计特征进行聚合(如拼接),根据聚合结果进行概率估计,得到第一待解码特征点对应的概率分布。
S4024,根据概率分布对第一待解码特征点进行解码,得到对应的特征值。
示例性的,可以根据第一待解码特征点对应的概率分布,对第一待解码特征点进行解码,得到第一待解码特征点对应的特征值,此时第一待解码特征点,即变成了已解码特征点。
S403,基于第一特征图矩阵进行图像重建,输出重建图像。
示例性的,解码端可以对第一特征图矩阵进行空间反变换,来进行图像重建,得到重建图像。
这样,通过采用待解码特征点所在通道所属的已解码信息组内的已解码信息进行融合,来确定待解码特征点对应的熵估计特征,相对于采用全通道的已解码信息确定对应的熵估计特征而言,能够减少无效信息的引入,进而减少解码算力,从而提高解码效率。
图5e为示例性示出的压缩效果示意图。
参照图5e,示例性的,图5e中的纵坐标为PSNR(Peak Signal to Noise Ratio,峰值信噪比),单位为dB(分贝),可以用于表征图像重建质量,PSNR越大,图像重建质量越高。横坐标为Bits per pixel(存储每个像素所用的位数,越小代表压缩码率越小),单位为BPP(位/像素)。图5e中,虚曲线为本申请图像重建质量与码流大小的关系曲线,实曲线为现有技术图像重建质量与码流大小,对比两个曲线可知,码流大小相同时,本申请的压缩/解压方案的图像重建质量越高。
以下以特征图矩阵Y2表示第一特征图矩阵,特征图矩阵Z2表示第五特征图矩阵,熵估计特征phi表示第一熵估计特征,熵估计特征psi表示第二熵估计特征为例,对编解码过程进行具体说明。
图6为示例性示出的编解码框架结构示意图。
参照图6,示例性的,编码网络、量化单元D1、自回归单元、聚合单元、超编码网络、量化单元D2、超解码网络和概率估计单元,属于图1中的AI编码单元。示例性的, 解码网络、自回归单元、聚合单元、超解码网络和概率估计单元,属于图1中的AI解码单元。
示例性的,熵编码单元A1和熵编码单元B1,属于图1中的熵编码单元。
示例性的,熵解码单元A2和熵解码单元B2,属于图1中的熵解码单元。
示例性的,AI编码单元和AI解码单元,可以联合训练,以使AI编码单元和AI解码单元中的各网络和单元,学习到对应的参数。示例性的,AI编码单元中的自回归单元、聚合单元、超解码网络和概率估计单元,和AI解码单元中的自回归单元、聚合单元、超解码网络和概率估计单元,可以共享。
示例性的,编码网络,可以用于对待编码图像进行空间变换,将待编码图像变换到另一空间。示例性的,编码网络可以是卷积神经网络。
示例性的,超编码网络,可以用于提取特征。示例性的,超编码网络可以是卷积神经网络。
示例性的,量化单元(包括量化单元D1和量化单元D2),可以用于进行量化处理。
示例性的,聚合单元,可以用于基于熵估计特征进行概率估计,输出概率分布。示例性的,聚合单元可以是卷积神经网络。
示例性的,概率估计单元,可以用于概率估计,输出概率分布。可选地,概率估计单元C2可以是离散概率估计单元。
示例性的,熵编码单元A1,可以用于根据聚合单元确定的概率分布进行编码,降低输出特征的统计冗余。
示例性的,熵编码单元B1,可以用于根据概率估计单元确定的概率分布进行编码,降低输出特征的统计冗余。
示例性的,熵解码单元A2,可以用于根据聚合单元确定的概率分布进行解码。
示例性的,熵解码单元B2,可以用于根据概率估计单元确定的概率分布进行解码。
示例性的,解码网络,可以用于对熵解码得到的信息进行反空间变换,输出重建图像。示例性的,解码网络可以是卷积神经网络。
示例性的,超解码网络,可以用于确定与熵估计相关联的特征。示例性的。超解码网络可以是卷积神经网络。
示例性的,自回归单元,可以包括自回归模型,用于根据自回归权重矩阵,确定熵估计特征。
继续参照图6,编码过程可以如下:
示例性的,将待编码图像输入至编码网络,经由编码网络将待编码图像变换到另一个空间,输出特征图矩阵Y1。将特征图矩阵Y1输入至量化单元D1,经由量化单元D1对特征图矩阵Y1进行量化处理,输出特征图矩阵Y2。其中,特征图矩阵Y1∈R c*h*w
示例性的,量化单元D1可以对特征图矩阵Y1中每个特征点对应的特征值,按照预设量化步长进行量化处理,得到特征图矩阵Y2∈R c*h*w
示例性的,一方面,将特征图矩阵Y2输入至超编码网络,经由超编码网络对特征图矩阵Y2进行特征提取,得到特征图矩阵Z1,然后将特征图矩阵Z1输入至量化单元D2。 经由量化单元D2对特征图矩阵Z1进行量化处理后,输出特征图矩阵Z2。然后,一方面,将特征图矩阵Z2输入至概率估计单元,经由概率估计单元进行处理,输出特征图矩阵Z2中各特征点的概率分布PB1至熵编码单元B1。另一方面,将特征图矩阵Z2输入至熵编码单元B1。熵编码单元B1根据概率分布PB1对特征图矩阵Z2进行编码,输出码流SB至熵解码单元B2。接着,概率估计单元可以预测码流SB中待解码特征点的概率分布PB2,并将概率分布PB2输入至熵解码单元B2。随后,熵解码单元B2可以根据概率分布PB2对码流SB中待解码特征点进行解码,输出特征图矩阵Z2至超解码网络中。超解码网络获取到特征图矩阵Z2后,可以将特征图矩阵Z2转换为熵估计特征psi,并将熵估计特征psi输入至聚合单元。
示例性的,另一方面,可以将特征图矩阵Y2输入至自回归单元,经由自回归单元对特征图矩阵Y2进行处理,输出熵估计特征phi至聚合单元。这个过程可以参数上文的描述,在此不再赘述。
示例性的,聚合单元可以基于熵估计特征phi和熵估计特征psi进行概率估计,预测特征图矩阵Y2中各特征点对应的概率分布PA1,以及将概率分布PA1输入至熵编码单元A1。
示例性的,熵编码单元A1可以根据概率分布PA1,对特征图矩阵Y2中各特征点进行编码,输出码流SA,至此,完成了对待编码图像的编码。
需要说明的是,完成对待编码图像的编码后,可以将针对特征图矩阵Y2编码得到的码流SA和对特征图矩阵Z2编码得到的码流SB,均发送至熵解码单元A2。或者,在解码时,由熵解码单元A2,获取码流SA和码流SB。
继续参照图6,解码过程可以如下:
示例性的,熵解码单元A2先从码流SB中解码出特征图矩阵Z2,将特征图矩阵Z2至超解码网络。然后,超解码网络将特征图矩阵Z2转换为熵估计特征psi,并输出至聚合单元。
示例性的,码流SA中包含了特征图矩阵Y2中各特征点的编码信息,熵解码单元A2对码流SA中各特征点的编码信息进行解码,可以得到各特征点对应的特征值,从而得到特征图矩阵Y2。
示例性的,针对每个第一待解码特征点:熵解码单元A2可以将已解码特征点对应的特征值输入至自回归单元,经由自回归单元确定第一待解码特征点对应的熵估计特征phi,这可以参照上文中的描述,在此不再赘述;然后将熵估计特征phi输出至聚合单元。然后,经由聚合单元基于熵估计特征phi和熵估计特征psi进行概率估计,预测第一待解码特征点对应的概率分布PA2,以及将概率分布PA2输入至熵解码单元A2。接着,熵编码单元A2可以根据第一待解码特征点对应的概率分布PA2,对第一待解码特征点进行解码,得到对应的特征值。这样,重复上述步骤,熵解码单元A2可以对码流SA解码,输出特征图矩阵Y2至解码网络,经由解码网络对特征图矩阵Y2进行空间反变换,得到重建图像。
示例性的,熵解码单元A2对不同通道的特征点可以是并行解码,也可以是串行解码。示例性的,熵解码单元A2对同一通道的特征点可以是串行解码,也可以是并行解码,本 申请对此不作限制。
需要说明的是,在编码过程中,也可以将特征图矩阵Y1输入至超编码网络,经由超编码网络、量化单元D2得到特征图矩阵Z2,本申请对此不作限制。
需要说明的是,图6中右侧虚线框中的网络、单元,也可以是其他网络、其他单元,具体可以按照需求设置,本申请对此不作限制。
需要说明的是,本申请的AI编码单元、AI解码单元、熵编码单元和熵解码单元中,还可以包括用于生成其他熵估计特征的其他网络和单元,然后将其他熵估计特征输入至聚合单元,由聚合单元根据熵估计特征phi、熵估计特征psi和其他熵估计特征进行概率估计,生成概率分布,本申请对此不作限制。
需要说明的是,本申请的AI编码单元、AI解码单元、熵编码单元和熵解码单元,也可以不包括图6右侧虚线框中的网络和单元,具体可以按照需求设置,本申请实施例对此不作限制。当AI编码单元、AI解码单元、熵编码单元和熵解码单元,不包括图6右侧虚线框中的网络和单元时,在编解码过程中,均无需生成熵估计特征psi,且聚合单元仅根据熵估计特征phi进行概率估计即可。
图7为示例性示出的编码流程示意图。
S701,获取待编码图像。
S702,基于待编码图像,生成第一特征图矩阵,第一特征图矩阵包括c个通道的第一特征图,c为正整数。
示例性的,S701~S702,可以参照上述S201~S202的描述,在此不再赘述。
示例性的,第一特征图矩阵可以包括第二特征图矩阵和第三特征图矩阵,其中,第二特征图矩阵包括c个通道的第二特征图,第三特征图矩阵包括c个通道的第三特征图。
示例性的,每个通道的第二特征图和第三特征图相加,可以得到该通道的第一特征图。
一种可能的方式中,可以对第一特征图矩阵进行空间划分,得到第二特征图矩阵和第三特征图矩阵。应该理解的是,还可以采用其他方式,确定第二特征图矩阵和第三特征图矩阵,本申请对此不作限制。其中,本申请以对第一特征图矩阵进行空间划分,得到第二特征图矩阵和第三特征图矩阵为例进行示例性说明。
S703,将第一特征图矩阵进行空间划分,得到第二特征图矩阵和第三特征图矩阵,第二特征图矩阵包括c个通道的第二特征图,第三特征图矩阵包括c个通道的第三特征图。
示例性的,在得到第一特征图矩阵后,可以将第一特征图矩阵进行空间划分,得到第二特征图矩阵和第三特征图矩阵。示例性的,将第一特征图矩阵进行空间划分可以是指,按照预设划分规则,将每个通道的第一特征图划分为第二特征图和第三特征图。这样,可以得到包含c个通道的第二特征图的第二特征图矩阵,以及包含c个通道的第三特征图的第三特征图矩阵。
示例性的,预设划分规则可以按照需求设置,例如将第一特征图中位于第二预设位置的特征点,划分为第二特征图的特征点,将第一特征图中位于第一预设位置的特征点, 划分为第三特征图的特征点,本申请对此不作限制。这样,第二特征图中,位于第二预设位置的特征点对应的特征值,为第一特征图中位于第二预设位置的特征点对应的特征值;第二特征图中,位于第一预设位置的特征点对应的特征值为0。第三特征图中,位于第一预设位置的特征点对应的特征值,为第一特征图中位于第一预设位置的特征点对应的特征值;第三特征图中,位于第二预设位置的特征点对应的特征值为0。
其中,第一预设位置和第二预设位置可以按照需求设置,例如,假设第一特征图矩阵中的一个特征点的位置为(wi,hi),则第二预设位置可以为:wi+hi等于奇数的位置,第一预设位置:wi+hi等于偶数的位置,本申请对此不作限制。又例如,假设第一特征图矩阵中的一个特征点的位置为(wi,hi),则第二预设位置可以为:wi为奇数的位置,第一预设位置:wi为偶数的位置,本申请对此不作限制。又例如,假设第一特征图矩阵中的一个特征点的位置为(wi,hi),则第二预设位置可以为:hi为奇数的位置,第一预设位置可以为:hi为偶数的位置,本申请对此不作限制。
图8为示例性示出的特征图划分流程示意图。
参照图8,示例性的,第一特征图的尺寸为5*5。若第一预设位置为:wi+hi为偶数的位置,第二预设位置:wi+hi为奇数的位置,则可以将第一特征图中第一行的第2个特征点和第4个特征点,第二行的第1个特征点、第3个特征点和第5个特征点,第三行的第2个特征点和第4个特征点,第四行的第1个特征点、第3个特征点和第5个特征点,以及第五行的第2个特征点和第4个特征点,确定为第二特征图。其中,第二特征图中,位于第一预设位置的特征点如图7中灰色方块所示。以及将第一特征图中第一行的第1个特征点、第3个特征点和第5个特征点,第二行的第2个特征点和第4个特征点,第三行的第1个特征点、第3个特征点和第5个特征点,第四行的第2个特征点和第4个特征点,第五行的第1个特征点、第3个特征点和第5个特征点,确定为第三特征图。其中,第三特征图中,位于第二预设位置的特征点如图7中灰色方块所示。
S704,确定第二特征图矩阵对应的概率分布。
示例性的,针对第二特征图矩阵,可以对第二特征图矩阵进行特征提取,得到的第四特征图矩阵;然后根据第四特征图矩阵,确定第二特征图矩阵对应的第二熵估计特征;再根据第二熵估计特征,确定第二特征图矩阵对应的概率分布。
此外,编码端还可以对第四特征图矩阵进行编码,得到第四特征图矩阵对应的码流。
示例性的,针对第三特征图矩阵,可以按照上述确定第一特征图矩阵对应概率分布的方式,来确定第三特征图矩阵对应的概率分布;可以参照S705:
S705,对由k个通道的第二特征图组成的特征图组进行组内融合,得到由k个通道的第三特征图组成的特征图组对应的第一熵估计特征,k为小于c的正整数。
示例性的,可以采用自回归模型确定第三特征图矩阵对应的第一熵估计特征。示例性的,可以将第二特征图矩阵c个通道的第二特征图,分别作为c路输入,输入至自回归模型。
示例性的,自回归模型可以采用第二特征图矩阵中k个通道的第二特征图,组成一个特征图组;这样,可以得到N个特征图组,N为大于1的整数,具体可以根据k和c确定。示例性的,每个特征图组所包含的通道数k可以相同,也可以不同,本申请对此 不作限制。具体的采用k个第二特征图组成特征图组方式,可以参照上文针对第一特征图的描述,在此不再赘述。
示例性的,由自回归模型对由k个通道的第二特征图组成的特征图组进行组内融合,得到由k个通道的第三特征图组成的特征图组对应的第一熵估计特征;进而确定第三特征图矩阵对应的第一熵估计特征。
示例性的,第三特征图矩阵可以无需输入至自回归模型,但是依然可以采用第三特征图矩阵中k个通道的第三特征图,组成一个特征图组;这样,可以得到N个特征图组,N为大于1的整数,具体可以根据k和c确定。示例性的,每个特征图组所包含的通道数k可以相同,也可以不同,本申请对此不作限制。
为了便于后续描述,可以将采用k个通道的第二特征图组成的特征图组,称为特征图组A,将采用k个通道的第三特征图组成的特征图组,称为特征图组B。
示例性的,N个特征图组A和N个特征图组B是一一对应的,也就是说,每个特征图组A和每个特征图组B包含的通道是相同的。
示例性的,假设第i个特征图组A对应的输出通道数为M i
Figure PCTCN2022125944-appb-000006
此时,第i个特征图组A对应的
Figure PCTCN2022125944-appb-000007
c1=ks1*ks2,其中,“ks1*ks2”表示自回归模型卷积核的尺寸,ks1可以等于ks2,也可以不等于ks2,本申请对此不作限制。也就是说,M i个输出通道中,每个输出通道对应k个尺寸为ks1*ks2的权重图。例如,k=2,M i=1,则第i个特征图组A对应的输出通道数为1,该输出通道对应2个尺寸为ks1*ks2的权重图。又例如,k=2,M i=5,则第i个特征图组A对应的输出通道数为5,这5个输出通道中每个输出通道对应2个尺寸为权重图。
一种可能的方式中,可以采用第i个特征图组A对应的自回归权重矩阵,对第i个特征图组A进行局部空间信息提取,得到第i个特征图组B对应的第一熵估计特征。
示例性的,可以采用第i个特征图组A对应的第j个输出通道的权重图,分别与第i个特征图组A中k个通道的第二特征图进行卷积,得到k个卷积结果;将k个卷积结果进行融合,得到第i个特征图组B对应在第j个输出通道的第一熵估计特征。将第i个特征图组B对应在M i个输出通道的第一熵估计特征合并,得到第i个特征图组B对应的第一熵估计特征。其中,j为1~M i之间的数,包括1和M i。这可以参照上文的“确定第i个特征图组对应在第j个输出通道的第一熵估计特征”的描述,在此不再赘述。
示例性的,确定第i个特征图组B中的每组特征点在第j个输出通道的第一熵估计特征的方式,可以参照上文的描述,在此不再赘述。
S706,根据第一熵估计特征,确定第三特征图矩阵对应的概率分布。
一种可能的方式中,可以根据第三特征图矩阵对应的第一熵估计特征进行概率估计,得到第三特征图矩阵对应的概率分布。其中,概率分布R c*h*w*P,也就是说,概率分布的通道数与第三特征图矩阵的通道数相同,第三特征图矩阵中每个特征点对应P个参数(如均值、方差等),P为大于0的整数,本申请对此不作限制。
一种可能的方式中,可以对第三特征图矩阵进行特征提取,得到第六特征图矩阵; 然后根据第六特征图矩阵,确定第三特征图矩阵对应的第二熵估计特征。随后,可以结合第三特征图矩阵对应的第一熵估计特征和第三特征图矩阵对应的第二熵估计特征进行概率估计,得到第三特征图矩阵对应的概率分布。示例性的,可以将第一熵估计特征和第二熵估计特征进行聚合(如拼接),根据聚合结果进行概率估计,得到第三特征图矩阵对应的概率分布。
示例性的,当确定了第六特征图矩阵,可以对第六特征图矩阵进行编码,得到第六特征图矩阵对应的码流。
S707,根据第二特征图矩阵对应的概率分布对第二特征图矩阵进行编码,以及根据第三特征图矩阵对应的概率分布对第三特征图矩阵进行编码,得到码流。
示例性的,待编码得到待编码图像对应的码流、第四特征图矩阵对应的码流和第六特征图矩阵对应的码流后,可以存储待编码图像对应的码流、第四特征图矩阵对应的码流和第六特征图矩阵对应的码流,也可以将待编码图像对应的码流、第四特征图矩阵对应的码流和第六特征图矩阵对应的码流发送至解码端。
这样,不仅能够减少无效信息的引入,进而减少编码算力,从而提高编码效率以及提高重建图像的质量;而且相对于确定第二熵估计特征而言,确定第一熵估计特征的算力更大;因此仅确定第一特征图中部分特征点对应的第一熵估计特征,能够进一步提高编码效率。
图9为示例性示出的解码流程示意图。
S901,获取码流。
S902,从码流中解码c个通道的特征点对应的特征值,得到第一特征图矩阵。
S9021,从码流中解码出第四特征图矩阵。
示例性的,解码端接收到码流后,可以从码流中解码出第四特征图矩阵;然后根据第四特征图矩阵,从码流中解码出位于第二预设位置的待解码特征点对应的特征值,进而得到第二特征图矩阵。示例性的,可以按照预设解码顺序,将位于第二预设位置的所有特征点,依次确定为第二待解码特征点,然后针对第二待解码特征点进行解码。
S9022,基于第四特征图矩阵,确定第二待解码特征点对应的第二熵估计特征。
S9023,根据第二熵估计特征,确定第二待解码点对应的概率分布。
S9024,根据概率分布对第二待解码特征点进行解码,得到对应的特征值。
示例性的,针对第二待解码特征点:基于第四特征图矩阵,确定第二待解码特征点对应的第二熵估计特征;根据第二待解码特征点对应的第二熵估计特征,确定第二待解码点对应的概率分布;根据第二待解码特征点对应的概率分布对第二待解码特征点进行解码,得到对应的特征值。这样,可以得到所有位于第二预设位置的第二待解码特征点对应的特征值,进而得到第二特征图矩阵。
示例性的,可以按照预设解码顺序,将位于第一预设位置的所有特征点,依次确定为第一待解码特征点,然后针对第一待解码特征点进行解码;可以参照如下:S9025~S9028:
S9025,确定第一待解码特征点对应的已解码信息组,已解码信息组包括第一待解码特征点对应通道中位于第二预设位置的已解码特征点对应的特征值,和其他k-1个通道 中位于第二预设位置的已解码特征点对应的特征值,其中,k为小于c的正整数。
示例性的,当编码端是对由k个通道的第二特征图组成的特征图组进行组内融合,得到由k个通道的第三特征图组成的特征图组对应的第一熵估计特征时,则可以将第二特征图矩阵输入至自回归模型,由自回归模型根据第二特征图矩阵,确定第一待解码特征点对应的第一熵估计特征。
示例性的,自回归模型可以采用k(k为小于c的正整数)个通道的位于第二预设位置的特征点对应的特征值(也就是c个通道的第二特征图),组成一个已解码信息组;这样,可以得到N个已解码信息组,N为大于1的整数,具体可以根据k和c确定。示例性的,在确定N个已解码信息组后,可以确定第一待解码特征点所在通道,然后从N个已解码信息组中,确定第一待解码特征点所在通道所属的已解码信息组。为了便于描述,后续可以将第一待解码特征点所在通道所属的已解码信息组,称为第i个已解码信息组。
示例性的,第i个已解码信息组所包括的通道为:第一待解码特征点对应通道和其他k-1个通道;第i个已解码信息组包括第一待解码特征点对应通道的第二特征图(位于第二预设位置的特征点对应的特征值),和其他k-1个通道的第二特征图(位于第二预设位置的特征点对应的特征值)。
S9026,对已解码信息组进行组内融合,得到第一待解码特征点对应的第一熵估计特征。
示例性的,可以采用第i个已解码信息组对应的第j个输出通道的权重图,分别与第i个已解码信息组中k个通道的第二特征图进行卷积,得到k个卷积结果;将k个卷积结果进行融合,得到第i个已解码信息组对应在第j个输出通道的第一熵估计特征。将第i个已解码信息组对应在M i个输出通道的第一熵估计特征合并,得到第i个已解码信息组对应的第一熵估计特征,也就是第一待解码特征点对应的第一熵估计特征。其中,j为1~M i之间的数,包括1和M i。具体可以参照上文确定第一特征图中第一待解码特征点的第一熵估计特征的描述,在此不再赘述。
示例性的,确定第i个已解码信息组中的每组特征点在第j个输出通道的第一熵估计特征的方式,可以参照上文的描述,在此不再赘述。
这样,能够对位于第一预设位置的第一待解码特征点进行并行解码,进而进一步提高解码效率。
S9027,根据第一待解码特征点对应的第一熵估计特征,确定第一待解码特征点对应的概率分布。
一种可能的方式中,可以根据第一待编码特征点对应的第一熵估计特征进行概率估计,得到第一待编码特征点对应的概率分布。其中,第一待编码特征点对应的概率分布对应一组概率分布参数。其中,每组概率分布参数可以包括至少一个参数,如均值、方差等,本申请对此不作限制。
一种可能的方式中,若编码端将第六特征图矩阵编码为码流,并将第六特征图矩阵的码流发送至解码端后,则解码端可以从码流中提取第六特征图矩阵。然后可以根据第六特征图矩阵,确定码流中所有特征点对应的第二熵估计特征。接着,可以从码流中所有特征点对应的第二熵估计特征中,确定第一待解码特征点对应的第二熵估计特征,然 后可以结合第一待编码特征点对应的第一熵估计特征和第一待编码特征点对应的第二熵估计特征进行概率估计,得到第一待解码特征点对应的概率分布。示例性的,可以将第一熵估计特征和第二熵估计特征进行聚合(如拼接),根据聚合结果进行概率估计,得到第一待解码特征点对应的概率分布。
S9028,根据概率分布对第一待解码特征点进行解码,得到对应的特征值。
然后,可以根据第一待解码特征点对应的概率分布,对第一待解码特征点进行解码,得到对应的特征值,也就是第三特征图矩阵中特征点对应的特征值。待对所有第一待解码特征点进行解码后,可以得到第三特征图矩阵。
S903,基于第一特征图矩阵进行图像重建,输出重建图像。
示例性的,在解码出第二特征图矩阵和第三特征图矩阵后,可以将第二特征图矩阵和第三特征图按照通道进行叠加,得到第一特征图矩阵;然后可以基于第一特征图矩阵进行图像重建,得到重建图像。
这样,不仅能够减少无效信息的引入,进而减少解码算力以及提高解码效率;而且相对于确定第二熵估计而言,确定第一熵估计特征的算力更大;因此仅确定部分待解码特征点对应的第一熵估计特征,能够进一步提高解码效率。
图10为示例性示出的编码流程示意图。
S1001,获取待编码图像。
S1002,基于待编码图像,生成第一特征图矩阵,第一特征图矩阵包括c个通道的第一特征图,c为正整数。
S1003,将第一特征图矩阵进行空间划分,得到第二特征图矩阵和第三特征图矩阵,第二特征图矩阵包括c个通道的第二特征图,第三特征图矩阵包括c个通道的第三特征图。
S1004,确定第二特征图矩阵对应的概率分布。
示例性的,S1001~S1004,可以参照上述S701~S704的描述,在此不再赘述。
S1005,对由k个通道的第三特征图组成的特征图组进行组内融合,得到由k个通道的第三特征图组成的特征图组对应的第一熵估计特征,k为小于c的正整数。
一种可能的方式中,可以采用第i个特征图组B对应的自回归权重矩阵,对第i个特征图组进行局部空间信息提取,得到第i个特征图组B对应的第一熵估计特征。
示例性的,可以采用第i个特征图组B对应的第j个输出通道的权重图,分别与第i个特征图组中k个通道中,位于第一预设位置的已解码特征点对应的特征值进行卷积,得到k个卷积结果;将k个卷积结果进行融合,得到第i个特征图组B对应在第j个输出通道的第一熵估计特征。将第i个特征图组B对应在M i个输出通道的第一熵估计特征合并,得到第i个特征图组B对应的第一熵估计特征。其中,j为1~M i之间的数,包括1和M i。这可以参照上文“确定第i个特征图组对应的第j个输出通道的第一熵估计特征”的描述,在此不再赘述。
示例性的,确定第i个特征图组B中的每组特征点对应在第j个输出通道的第一熵估计特征的方式,可以参照上文的描述,在此不再赘述。
S1006,根据第一熵估计特征,确定第三特征图矩阵对应的概率分布。
S1007,根据第二特征图矩阵对应的概率分布对第二特征图矩阵进行编码,以及根据第三特征图矩阵对应的概率分布对第三特征图矩阵进行编码,得到码流。
示例性的,S1006~S1007,可以参照上述S706~S707的描述,在此不再赘述。
这样,不仅能够减少无效信息的引入,进而减少编码算力,从而提高编码效率以及提高重建图像的质量;而且相对于确定第二熵估计而言,确定第一熵估计特征的算力更大;因此仅确定第一特征图中部分特征点对应的第一熵估计特征,能够进一步提高编码效率。
图11a为示例性示出的解码流程示意图。
S1101,获取码流。
S1102,从码流中解码c个通道的特征点对应的特征值,得到第一特征图矩阵。
示例性的,针对每个待解码特征点,可以参照如下S11021~S11028,确定对应的特征值:
S11021,从码流中解码出第四特征图矩阵。
S11022,基于第四特征图矩阵,确定第二待解码特征点对应的第二熵估计特征。
S11023,根据第二熵估计特征,确定第二待解码点对应的概率分布。
S11024,根据概率分布对第二待解码特征点进行解码,得到对应的特征值。
示例性的,S11021~S11024,可以参照上文S9021~S9024,在此不再赘述。
S11025,确定第一待解码特征点对应的已解码信息组,已解码信息组包括已解码信息组包括第一待解码特征点对应通道中位于第一预设位置的已解码特征点对应的特征值,和其他k-1个通道中位于第一预设位置的已解码特征点对应的特征值,其中,k为小于c的正整数。
S11026,对已解码信息组进行组内融合,得到第一待解码特征点对应的第一熵估计特征。
示例性的,当编码端是对由k个通道的第三特征图组成的特征图组进行组内融合,得到由k个通道的第三特征图组成的特征图组对应的第一熵估计特征时,则可以将位于第一预设位置的已解码特征点对应的特征值输入至自回归模型,由自回归模型根据位于第一预设位置的已解码特征点对应的特征值,确定位于第一待解码特征点对应的第一熵估计特征。
示例性的,自回归模型可以采用k(k为小于c的正整数)个通道的位于第一预设位置的特征点对应的特征值,组成一个已解码信息组;这样,可以得到N个已解码信息组,N为大于1的整数,具体可以根据k和c确定。示例性的,在确定N个已解码信息组后,可以确定第一待解码特征点所在通道,然后从N个已解码信息组中,确定第一待解码特征点所在通道所属的已解码信息组。为了便于描述,后续可以将第一待解码特征点所在通道所属的已解码信息组,称为第i个已解码信息组。
示例性的,第i个已解码信息组所包括的通道为:第一待解码特征点对应通道和其他k-1个通道;第i个已解码信息组包括第一待解码特征点对应通道中位于第一预设位置 的特征点对应的特征值,和其他k-1个通道中位于第一预设位置的特征点对应的特征值。
示例性的,对已解码信息组进行组内融合,得到第一待解码特征点对应的第一熵估计特征,具体可以参照上文的描述,在此不再赘述。
S11027,根据第一待解码特征点对应的第一熵估计特征,确定第一待解码特征点对应的概率分布。
S11028,根据概率分布对第一待解码特征点进行解码,得到对应的特征值。
示例性的,S11027~S11028,可以参照上文S9027~S9028,在此不再赘述。
S1103,基于第一特征图矩阵进行图像重建,输出重建图像。
这样,不仅能够减少无效信息的引入,减少解码算力以及提高解码效率;而且相对于确定第二熵估计而言,确定第一熵估计特征的算力更大;因此仅确定部分待解码特征点对应的第一熵估计特征,能够进一步提高解码效率。
图11b为示例性示出的压缩效果示意图。
参照图11b,示例性的,图11b中的纵坐标为PSNR(Peak Signal to Noise Ratio,峰值信噪比),单位为dB(分贝),可以用于表征图像重建质量,PSNR越大,图像重建质量越高。横坐标为Bits perpixel(存储每个像素所用的位数,越小代表压缩码率越小),单位为BPP(位/像素)。图11b中,虚曲线为本申请图像重建质量与码流大小的关系曲线,实曲线为现有技术图像重建质量与码流大小,对比两个曲线可知,码流大小相同时,本申请的压缩/解压方案的图像重建质量越高。
一个示例中,图12示出了本申请实施例的一种装置1200的示意性框图装置1200可包括:处理器1201和收发器/收发管脚1202,可选地,还包括存储器1203。
装置1200的各个组件通过总线1204耦合在一起,其中总线1204除包括数据总线之外,还包括电源总线、控制总线和状态信号总线。但是为了清楚说明起见,在图中将各种总线都称为总线1204。
可选地,存储器1203可以用于前述方法实施例中的指令。该处理器1201可用于执行存储器1203中的指令,并控制接收管脚接收信号,以及控制发送管脚发送信号。
装置1200可以是上述方法实施例中的电子设备或电子设备的芯片。
其中,上述方法实施例涉及的各步骤的所有相关内容均可以援引到对应功能模块的功能描述,在此不再赘述。
本实施例还提供一种计算机存储介质,该计算机存储介质中存储有计算机指令,当该计算机指令在电子设备上运行时,使得电子设备执行上述相关方法步骤实现上述实施例中的编解码方法。
本实施例还提供了一种计算机程序产品,当该计算机程序产品在计算机上运行时,使得计算机执行上述相关步骤,以实现上述实施例中的编解码方法。
另外,本申请的实施例还提供一种装置,这个装置具体可以是芯片,组件或模块, 该装置可包括相连的处理器和存储器;其中,存储器用于存储计算机执行指令,当装置运行时,处理器可执行存储器存储的计算机执行指令,以使芯片执行上述各方法实施例中的编解码方法。
其中,本实施例提供的电子设备、计算机存储介质、计算机程序产品或芯片均用于执行上文所提供的对应的方法,因此,其所能达到的有益效果可参考上文所提供的对应的方法中的有益效果,此处不再赘述。
通过以上实施方式的描述,所属领域的技术人员可以了解到,为描述的方便和简洁,仅以上述各功能模块的划分进行举例说明,实际应用中,可以根据需要而将上述功能分配由不同的功能模块完成,即将装置的内部结构划分成不同的功能模块,以完成以上描述的全部或者部分功能。
在本申请所提供的几个实施例中,应该理解到,所揭露的装置和方法,可以通过其它的方式实现。例如,以上所描述的装置实施例仅仅是示意性的,例如,模块或单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,例如多个单元或组件可以结合或者可以集成到另一个装置,或一些特征可以忽略,或不执行。另一点,所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些接口,装置或单元的间接耦合或通信连接,可以是电性,机械或其它的形式。
作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是一个物理单元或多个物理单元,即可以位于一个地方,或者也可以分布到多个不同地方。可以根据实际的需要选择其中的部分或者全部单元来实现本实施例方案的目的。
另外,在本申请各个实施例中的各功能单元可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中。上述集成的单元既可以采用硬件的形式实现,也可以采用软件功能单元的形式实现。
本申请各个实施例的任意内容,以及同一实施例的任意内容,均可以自由组合。对上述内容的任意组合均在本申请的范围之内。
集成的单元如果以软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在一个可读取存储介质中。基于这样的理解,本申请实施例的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的全部或部分可以以软件产品的形式体现出来,该软件产品存储在一个存储介质中,包括若干指令用以使得一个设备(可以是单片机,芯片等)或处理器(processor)执行本申请各个实施例方法的全部或部分步骤。而前述的存储介质包括:U盘、移动硬盘、只读存储器(read only memory,ROM)、随机存取存储器(random access memory,RAM)、磁碟或者光盘等各种可以存储程序代码的介质。
上面结合附图对本申请的实施例进行了描述,但是本申请并不局限于上述的具体实施方式,上述的具体实施方式仅仅是示意性的,而不是限制性的,本领域的普通技术人员在本申请的启示下,在不脱离本申请宗旨和权利要求所保护的范围情况下,还可做出很多形式,均属于本申请的保护之内。
结合本申请实施例公开内容所描述的方法或者算法的步骤可以硬件的方式来实现, 也可以是由处理器执行软件指令的方式来实现。软件指令可以由相应的软件模块组成,软件模块可以被存放于随机存取存储器(Random Access Memory,RAM)、闪存、只读存储器(Read Only Memory,ROM)、可擦除可编程只读存储器(Erasable Programmable ROM,EPROM)、电可擦可编程只读存储器(Electrically EPROM,EEPROM)、寄存器、硬盘、移动硬盘、只读光盘(CD-ROM)或者本领域熟知的任何其它形式的存储介质中。一种示例性的存储介质耦合至处理器,从而使处理器能够从该存储介质读取信息,且可向该存储介质写入信息。当然,存储介质也可以是处理器的组成部分。处理器和存储介质可以位于ASIC中。
本领域技术人员应该可以意识到,在上述一个或多个示例中,本申请实施例所描述的功能可以用硬件、软件、固件或它们的任意组合来实现。当使用软件实现时,可以将这些功能存储在计算机可读介质中或者作为计算机可读介质上的一个或多个指令或代码进行传输。计算机可读介质包括计算机存储介质和通信介质,其中通信介质包括便于从一个地方向另一个地方传送计算机程序的任何介质。存储介质可以是通用或专用计算机能够存取的任何可用介质。
上面结合附图对本申请的实施例进行了描述,但是本申请并不局限于上述的具体实施方式,上述的具体实施方式仅仅是示意性的,而不是限制性的,本领域的普通技术人员在本申请的启示下,在不脱离本申请宗旨和权利要求所保护的范围情况下,还可做出很多形式,均属于本申请的保护之内。

Claims (19)

  1. 一种编码方法,其特征在于,所述方法包括:
    获取待编码图像;
    基于所述待编码图像,生成第一特征图矩阵,所述第一特征图矩阵包括c个通道的第一特征图,c为正整数;
    对由k个通道的第一特征图组成的特征图组进行组内融合,得到所述特征图组对应的第一熵估计特征,k为小于c的正整数;
    根据所述第一熵估计特征,确定所述第一特征图矩阵对应的概率分布;
    根据所述概率分布对所述第一特征图矩阵进行编码,得到码流。
  2. 根据权利要求1所述的方法,其特征在于,
    所述对由k个通道的第一特征图组成的特征图组进行组内融合,得到所述特征图组对应的第一熵估计特征,包括:
    采用所述特征图组对应的自回归权重矩阵,对所述特征图组进行局部空间信息提取,得到所述特征图组对应的第一熵估计特征。
  3. 根据权利要求1所述的方法,其特征在于,所述第一特征图矩阵包括:第二特征图矩阵和第三特征图矩阵,其中,所述第二特征图矩阵包括c个通道的第二特征图,所述第三特征图矩阵包括c个通道的第三特征图;
    所述对由k个通道的第一特征图组成的特征图组进行组内融合,得到所述特征图组对应的第一熵估计特征,包括:
    对由所述k个通道的第二特征图组成的特征图组进行组内融合,得到由所述k个通道的第三特征图组成的特征图组对应的第一熵估计特征;
    所述根据所述第一熵估计特征,确定所述第一特征图矩阵对应的概率分布,包括:
    根据由所述k个通道的第三特征图组成的特征图组对应的第一熵估计特征,确定所述第三特征图矩阵对应的概率分布;
    所述根据所述概率分布对所述第一特征图矩阵进行编码,得到码流,包括:
    根据所述第三特征图矩阵对应的概率分布,对所述第三特征图矩阵进行编码,得到所述码流。
  4. 根据权利要求1所述的方法,其特征在于,所述第一特征图矩阵包括第三特征图矩阵,所述第三特征图矩阵包括c个通道的第三特征图;
    所述对由k个通道的第一特征图组成的特征图组进行组内融合,得到所述特征图组对应的第一熵估计特征,包括:
    对由所述k个通道的第三特征图组成的特征图组进行组内融合,得到由所述k个通道的第三特征图组成的特征图组对应的第一熵估计特征;
    所述根据所述第一熵估计特征,确定所述第一特征图矩阵对应的概率分布,包括:
    根据由所述k个通道的第三特征图组成的特征图组对应的第一熵估计特征,确定所述第三特征图矩阵对应的概率分布;
    所述根据所述概率分布对所述第一特征图矩阵进行编码,得到码流,包括:
    根据所述第三特征图矩阵对应的概率分布,对所述第三特征图矩阵进行编码,得到所述码流。
  5. 根据权利要求3或4所述的方法,其特征在于,所述方法还包括:
    对所述第一特征图矩阵包括的第二特征图矩阵进行特征提取,得到第四特征图矩阵;
    根据所述第四特征图矩阵,确定第二熵估计特征;
    根据所述第二熵估计特征,确定所述第二特征图矩阵对应的概率分布;
    依据所述第二特征图矩阵对应的概率分布,对所述第二特征图矩阵进行编码,得到码流。
  6. 根据权利要求5所述的方法,其特征在于,所述方法还包括:
    对所述第四特征图矩阵进行编码,得到码流。
  7. 一种解码方法,其特征在于,所述方法包括:
    获取码流,从所述码流中解码c个通道的特征点对应的特征值,得到第一特征图矩阵,c为正整数;
    其中,针对第一待解码特征点:确定所述第一待解码特征点对应的已解码信息组,其中,所述已解码信息组包括所述第一待解码特征点对应通道的已解码信息和其他k-1个通道的已解码信息,k为小于c的正整数;对所述已解码信息组进行组内融合,得到所述第一待解码特征点对应的第一熵估计特征;根据所述第一待解码特征点对应的第一熵估计特征,确定所述第一待解码特征点对应的概率分布;根据所述概率分布对所述第一待解码特征点进行解码,得到对应的特征值,其中,第一待解码特征点为任一待解码特征点;
    基于所述第一特征图矩阵进行图像重建,输出重建图像。
  8. 根据权利要求7所述的方法,其特征在于,所述对所述已解码信息组进行组内融合,得到所述第一待解码特征点对应的第一熵估计特征,包括:
    采用所述已解码信息组对应的自回归权重矩阵,对所述已解码信息组进行局部空间信息提取,得到所述第一待解码特征点对应的第一熵估计特征。
  9. 根据权利要求7所述的方法,其特征在于,所述特征点包括位于第一预设位置的特征点和第二预设位置的特征点;所述第一待解码特征点为位于所述第一预设位置的特征点;
    所述方法还包括:
    从所述码流中解码出第四特征图矩阵,所述第四特征图矩阵包括通过对所述第一特 征图矩阵中,位于所述第二预设位置的特征点对应的特征值进行特征提取得到的特征;
    针对位于所述第二预设位置的第二待解码特征点:基于所述第四特征图矩阵,确定所述第二待解码特征点对应的第二熵估计特征;根据所述第二熵估计特征,确定所述第二待解码点对应的概率分布;根据所述概率分布对所述第二待解码特征点进行解码,得到对应的特征值。
  10. 根据权利要求9所述的方法,其特征在于,
    所述已解码信息组包括所述第一待解码特征点对应通道中位于所述第二预设位置的已解码特征点对应的特征值,和其他k-1个通道中位于所述第二预设位置的已解码特征点对应的特征值。
  11. 根据权利要求9所述的方法,其特征在于,
    所述已解码信息组包括所述第一待解码特征点对应通道中位于所述第一预设位置的已解码特征点对应的特征值,和其他k-1个通道中位于所述第一预设位置的已解码特征点对应的特征值。
  12. 一种编码器,其特征在于,用于执行上述权利要求1至权利要求6中任一项所述的编码方法。
  13. 一种解码器,其特征在于,用于执行上述权利要求7至权利要求11中任一项所述的解码方法。
  14. 一种电子设备,其特征在于,包括:
    存储器和处理器,所述存储器与所述处理器耦合;
    所述存储器存储有程序指令,当所述程序指令由所述处理器执行时,使得所述电子设备执行权利要求1至权利要求6中任一项所述的编码方法。
  15. 一种电子设备,其特征在于,包括:
    存储器和处理器,所述存储器与所述处理器耦合;
    所述存储器存储有程序指令,当所述程序指令由所述处理器执行时,使得所述电子设备执行权利要求7至权利要求11中任一项所述的解码方法。
  16. 一种芯片,其特征在于,包括一个或多个接口电路和一个或多个处理器;所述接口电路用于从电子设备的存储器接收信号,并向所述处理器发送所述信号,所述信号包括存储器中存储的计算机指令;当所述处理器执行所述计算机指令时,使得所述电子设备执行权利要求1至权利要求6中任一项所述的编码方法。
  17. 一种芯片,其特征在于,包括一个或多个接口电路和一个或多个处理器;所述接 口电路用于从电子设备的存储器接收信号,并向所述处理器发送所述信号,所述信号包括存储器中存储的计算机指令;当所述处理器执行所述计算机指令时,使得所述电子设备执行权利要求7至权利要求11中任一项所述的解码方法。
  18. 一种计算机存储介质,其特征在于,所述计算机可读存储介质存储有计算机程序,当所述计算机程序运行在计算机或处理器上时,使得所述计算机或所述处理器执行如权利要求1至11任一项所述的方法。
  19. 一种计算机程序产品,其特征在于,所述计算机程序产品包含软件程序,当所述软件程序被计算机或处理器执行时,使得权利要求1至11任一项所述的方法的步骤被执行。
PCT/CN2022/125944 2021-11-24 2022-10-18 编解码方法及电子设备 WO2023093377A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202111407946.6A CN116170596A (zh) 2021-11-24 2021-11-24 编解码方法及电子设备
CN202111407946.6 2021-11-24

Publications (1)

Publication Number Publication Date
WO2023093377A1 true WO2023093377A1 (zh) 2023-06-01

Family

ID=86411882

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/125944 WO2023093377A1 (zh) 2021-11-24 2022-10-18 编解码方法及电子设备

Country Status (2)

Country Link
CN (1) CN116170596A (zh)
WO (1) WO2023093377A1 (zh)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117409091A (zh) * 2022-07-07 2024-01-16 华为技术有限公司 编解码方法及电子设备

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20200154145A1 (en) * 2018-11-08 2020-05-14 Alibaba Group Holding Limited Content-weighted deep residual learning for video in-loop filtering
CN111986278A (zh) * 2019-05-22 2020-11-24 富士通株式会社 图像编码装置、概率模型生成装置和图像压缩***
CN111988629A (zh) * 2019-05-22 2020-11-24 富士通株式会社 图像编码方法和装置、图像解码方法和装置

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20200154145A1 (en) * 2018-11-08 2020-05-14 Alibaba Group Holding Limited Content-weighted deep residual learning for video in-loop filtering
CN111986278A (zh) * 2019-05-22 2020-11-24 富士通株式会社 图像编码装置、概率模型生成装置和图像压缩***
CN111988629A (zh) * 2019-05-22 2020-11-24 富士通株式会社 图像编码方法和装置、图像解码方法和装置

Also Published As

Publication number Publication date
CN116170596A (zh) 2023-05-26

Similar Documents

Publication Publication Date Title
WO2019105179A1 (zh) 颜色分量的帧内预测方法及装置
US20230026080A1 (en) Method and device for coding the geometry of a point cloud
CN111131828B (zh) 一种图像压缩方法、装置、电子设备和存储介质
CN112053408B (zh) 基于深度学习的人脸图像压缩方法及装置
WO2019114225A1 (zh) 编码单元划分确定方法及装置、计算设备及可读存储介质
WO2023093377A1 (zh) 编解码方法及电子设备
CN115442609A (zh) 特征数据编解码方法和装置
WO2019184489A1 (zh) 图像块编码中的变换方法、解码中的反变换方法及装置
CN107682699B (zh) 一种近无损图像压缩方法
KR101303503B1 (ko) 컬러 이미지들을 위한 조인트 스칼라 임베디드 그래픽 코딩
WO2020113827A1 (zh) 图像压缩方法
US12022078B2 (en) Picture processing method and apparatus
WO2023225808A1 (en) Learned image compress ion and decompression using long and short attention module
WO2024007753A1 (zh) 编解码方法及电子设备
CN114979711B (zh) 音视频或图像分层压缩方法和装置
WO2023082955A1 (zh) 编解码方法及电子设备
CN111669579B (zh) 进行编码和解码的方法、编码端、解码端和***
WO2022258009A1 (zh) 熵编码、解码方法及装置
WO2023174107A1 (zh) 编码方法及电子设备
WO2023040745A1 (zh) 特征图编解码方法和装置
WO2023169190A1 (zh) 编解码方法及电子设备
WO2023206420A1 (zh) 视频编解码方法、装置、设备、***及存储介质
CN115546328B (zh) 图片映射方法、压缩方法、解码方法、电子设备
WO2024140683A1 (zh) 编解码方法及电子设备
CN114554175B (zh) 一种基于分类重排的二维点云距离图像的无损压缩方法

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22897460

Country of ref document: EP

Kind code of ref document: A1