WO2022253088A1 - 编解码方法、装置、设备、存储介质、计算机程序及产品 - Google Patents

编解码方法、装置、设备、存储介质、计算机程序及产品 Download PDF

Info

Publication number
WO2022253088A1
WO2022253088A1 PCT/CN2022/095149 CN2022095149W WO2022253088A1 WO 2022253088 A1 WO2022253088 A1 WO 2022253088A1 CN 2022095149 W CN2022095149 W CN 2022095149W WO 2022253088 A1 WO2022253088 A1 WO 2022253088A1
Authority
WO
WIPO (PCT)
Prior art keywords
feature
feature point
feature points
image
points
Prior art date
Application number
PCT/CN2022/095149
Other languages
English (en)
French (fr)
Inventor
师一博
王晶
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Priority to EP22815134.6A priority Critical patent/EP4336835A1/en
Publication of WO2022253088A1 publication Critical patent/WO2022253088A1/zh
Priority to US18/521,067 priority patent/US20240095964A1/en

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/44Decoders specially adapted therefor, e.g. video decoders which are asymmetric with respect to the encoder
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • G06N3/0455Auto-encoder networks; Encoder-decoder networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/0464Convolutional networks [CNN, ConvNet]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/047Probabilistic or stochastic networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T9/00Image coding
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/102Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
    • H04N19/13Adaptive entropy coding, e.g. adaptive variable length coding [AVLC] or context adaptive binary arithmetic coding [CABAC]
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/42Methods or arrangements for coding, decoding, compressing or decompressing digital video signals characterised by implementation details or hardware specially adapted for video compression or decompression, e.g. dedicated software implementation
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/90Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using coding techniques not provided for in groups H04N19/10-H04N19/85, e.g. fractals
    • H04N19/91Entropy coding, e.g. variable length coding [VLC] or arithmetic coding
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/088Non-supervised learning, e.g. competitive learning

Definitions

  • the embodiments of the present application relate to the technical field of encoding and decoding, and in particular, relate to an encoding and decoding method, device, device, storage medium, computer program, and product.
  • Image compression technology can realize the effective transmission and storage of image information, and plays an important role in the current media era where the types of image information and the amount of data are increasing.
  • Image compression technology includes image encoding and decoding, and encoding and decoding performance (reflecting image quality) and encoding and decoding efficiency (reflecting time-consuming) are elements that need to be considered in image compression technology.
  • lossy image compression standards such as JPEG and PNG have been formed after long-term research and optimization by technicians.
  • these relatively traditional image compression technologies have encountered bottlenecks in the improvement of encoding and decoding performance, and have been unable to meet the needs of the era of increasing multimedia application data.
  • deep learning technology With the wide application of deep learning technology in image recognition, target detection and other fields, deep learning technology has also been applied to image compression tasks, so that the encoding and decoding efficiency is higher than traditional image compression technology.
  • VAE variational auto-encoder
  • the neural network model is used to serially calculate the probability distribution of each feature point of the image, and the image is decoded based on the probability distribution.
  • the calculation of the probability distribution is realized by the neural network model, and the serial calculation will make the decoding efficiency low.
  • How to break through the efficiency bottleneck caused by serial computing during decoding without reducing the performance of encoding and decoding is a problem that needs to be paid attention to in the research of encoding and decoding methods based on VAE.
  • Embodiments of the present application provide a codec method, device, device, storage medium, and computer program, which can break through the efficiency bottleneck caused by serial computing when decoding based on VAE while ensuring that the codec performance is not reduced. Described technical scheme is as follows:
  • a decoding method comprising:
  • the step of determining the first image feature of any group of feature points is: parallelly determine the probability distribution of each feature point in this any group, based on each The probability distribution of the feature points is analyzed from the code stream to obtain the first image features of each feature point in any group; the image is reconstructed based on the first image features of the plurality of feature points.
  • multiple feature points are divided into multiple groups based on specified values in the decoding process, and the probability distribution of each feature point in the same group is determined in parallel, which can speed up decoding efficiency. That is to say, this solution can break through the efficiency bottleneck caused by serial calculation when decoding based on VAE, and effectively improve the decoding efficiency.
  • this method is applied to a codec including a context model, and when it is decoded to any one of the multiple groups, all the surrounding information of each feature point in any one group has been decoded, that is, the The feature points in any group satisfy the condition that the surrounding information has been decoded.
  • the plurality of feature points include the first feature point
  • determining the probability distribution of the first feature point includes: if the first feature point is not the first feature point in the plurality of feature points, then from each decoded feature point In the first image feature of the point, the surrounding information of the first feature point is determined, and the first feature point is a feature point in any group; the surrounding information of the first feature point is input into the context model, and the first feature point output by the context model is obtained.
  • the context feature of the feature point based on the prior feature of the first feature point and the context feature of the first feature point, the probability distribution of the first feature point is determined.
  • the surrounding information of the first feature point includes the first image feature of the decoded feature point in the neighborhood with the first feature point as the geometric center, the size of the neighborhood is determined based on the size of the receptive field used by the context model,
  • the surrounding information includes at least the first image features of n feature points around the first feature point, where n is greater than or equal to 4. That is to say, in order to ensure the codec performance and image quality, this solution uses as much surrounding information as possible while ensuring the compression rate.
  • the plurality of feature points include a first feature point
  • determining the probability distribution of the first feature point includes: if the first feature point is the first feature point in the plurality of feature points, based on the first feature point The prior features of , determine the probability distribution of the first feature point.
  • the specified value is determined based on the size of the receptive field used by the context model; dividing the plurality of feature points into multiple groups based on the specified value includes: determining a slope based on the specified value, and the slope is used to indicate features classified into the same group The degree of inclination of the straight line where the point is located; based on the inclination, the plurality of feature points are divided into groups. That is, this solution determines a set of feature points that can be decoded in parallel based on the size of the receptive field.
  • the specified value is determined by the size of the largest receptive field among the multiple receptive fields with different sizes.
  • the receptive field used by the context model includes a receptive field with a size of 5*5.
  • an encoding method which includes:
  • multiple feature points are divided into multiple groups based on specified values during the encoding process, and each of the multiple groups is sequentially
  • the first image feature of the set of feature points is encoded into the code stream.
  • the grouping is also done in the same manner in the decoding process, and the probability distribution is determined in parallel for each feature point in the same group, so as to improve the decoding efficiency. That is to say, this solution can break through the efficiency bottleneck caused by serial calculation when decoding based on VAE, and effectively improve the decoding efficiency.
  • determining the first image feature, probability distribution, and first super-priori feature of each feature point in the multiple feature points of the image includes: based on the image, determining the multiple feature points the first image features of the plurality of feature points; based on the first image features of the plurality of feature points, determine the first hyper-priori features of the plurality of feature points, and determine the probability distribution of each feature point in the plurality of feature points in parallel.
  • the plurality of feature points include a first feature point
  • determining the probability distribution of the first feature point includes: if the first feature point is not the first feature point among the plurality of feature points, then based on the first feature point The first image feature of the point, determine the prior feature of the first feature point, the first feature point is a feature point in the plurality of feature points; from the first image features of the plurality of feature points, determine the first feature The surrounding information of the point; input the surrounding information of the first feature point into the context model, and obtain the context feature of the first feature point output by the context model; based on the prior feature of the first feature point and the context feature of the first feature point, determine the first feature point The probability distribution of a feature point.
  • the plurality of feature points include a first feature point
  • determining the probability distribution of the first feature point includes: if the first feature point is the first feature point in the plurality of feature points, based on the first feature point The prior features of , determine the probability distribution of the first feature point.
  • the specified value is determined based on the size of the receptive field used by the context model; dividing the plurality of feature points into multiple groups based on the specified value includes: determining a slope based on the specified value, and the slope is used to indicate that the points are divided into the same group The degree of inclination of the straight line where the feature point is located; based on the slope, the plurality of feature points are divided into multiple groups.
  • the specified value is determined by the size of the largest receptive field among the multiple receptive fields with different sizes.
  • the receptive field used by the context model includes a receptive field with a size of 5*5.
  • a decoding device in a third aspect, is provided, and the decoding device has a function of realizing the behavior of the decoding method in the first aspect above.
  • the decoding device includes one or more modules, and the one or more modules are used to implement the decoding method provided in the first aspect above.
  • a decoding device which includes:
  • the first determination module is used to determine the priori characteristics of each of the multiple feature points of the image to be decoded based on the code stream;
  • a grouping module configured to divide the plurality of feature points into multiple groups based on specified values
  • the second determination module is used to sequentially determine the first image feature of each group of feature points in the plurality of groups based on the prior features of the plurality of feature points; wherein, the step of determining the first image feature of any group of feature points is : determining the probability distribution of each feature point in any group in parallel, based on the probability distribution of each feature point in this any group, parsing out the first image feature of each feature point in this any group from the code stream;
  • a reconstruction module configured to reconstruct an image based on the first image features of the plurality of feature points.
  • the plurality of feature points include the first feature point
  • the second determination module includes:
  • the first processing submodule is used to determine the surrounding information of the first feature point from the decoded first image features of each feature point if the first feature point is not the first feature point among the plurality of feature points , the first feature point is a feature point in any group;
  • the second processing submodule is used to input the surrounding information of the first feature point into the context model, and obtain the context feature of the first feature point output by the context model;
  • the third processing sub-module is used to determine the probability distribution of the first feature point based on the prior feature of the first feature point and the context feature of the first feature point.
  • the surrounding information of the first feature point includes the first image feature of the decoded feature point in the neighborhood with the first feature point as the geometric center, the size of the neighborhood is determined based on the size of the receptive field used by the context model, The surrounding information includes at least the first image features of n feature points around the first feature point, where n is greater than or equal to 4.
  • the plurality of feature points include the first feature point
  • the second determination module includes:
  • the fourth processing submodule is used to determine the probability distribution of the first feature point based on the prior feature of the first feature point if the first feature point is the first feature point among the plurality of feature points.
  • the specified value is determined based on the size of the receptive field used by the context model
  • Grouping modules include:
  • the first determining submodule is used to determine the slope based on the specified value, and the slope is used to indicate the degree of inclination of the straight line where the feature points divided into the same group are located;
  • the dividing sub-module is used for dividing the plurality of feature points into groups based on the slope.
  • the specified value is determined by the size of the largest receptive field among the multiple receptive fields with different sizes.
  • the receptive field used by the context model includes a receptive field with a size of 5*5.
  • an encoding device in a fourth aspect, is provided, and the encoding device has a function of implementing the behavior of the encoding method in the second aspect above.
  • the encoding device includes one or more modules, and the one or more modules are used to implement the encoding method provided in the second aspect above.
  • an encoding device comprising:
  • the first determination module is used to determine the first image feature, probability distribution and first super prior feature of each feature point in a plurality of feature points of the image based on the image to be encoded;
  • a grouping module configured to divide the plurality of feature points into multiple groups based on specified values
  • the first encoding module is used to sequentially encode the first image features of each set of feature points in the multiple groups into the code stream based on the probability distribution of the plurality of feature points;
  • the second encoding module is configured to encode the first super prior features of the plurality of feature points into a code stream.
  • the first determination module includes:
  • the first determining submodule is used to determine the first image features of the plurality of feature points based on the image
  • the second determination sub-module is configured to determine the first hyper-priori features of the plurality of feature points based on the first image features of the plurality of feature points, and determine the probability distribution of each feature point in the plurality of feature points in parallel.
  • the plurality of feature points include a first feature point
  • the second determining submodule is used for:
  • the first feature point is the non-first feature point in the multiple feature points, then based on the first image feature of the first feature point, determine the prior feature of the first feature point, the first feature point is the multiple features A feature point in the point;
  • the probability distribution of the first feature point is determined.
  • the plurality of feature points include a first feature point
  • the second determining submodule is used for:
  • the probability distribution of the first feature point is determined based on the prior feature of the first feature point.
  • the specified value is determined based on the size of the receptive field used by the context model
  • Grouping modules include:
  • the third determining submodule is used to determine the slope based on the specified value, and the slope is used to indicate the degree of inclination of the straight line where the feature points divided into the same group are located;
  • the dividing sub-module is used for dividing the plurality of feature points into groups based on the slope.
  • the specified value is determined by the size of the largest receptive field among the multiple receptive fields with different sizes.
  • the receptive field used by the context model includes a receptive field with a size of 5*5.
  • a decoding end device in a fifth aspect, includes a processor and a memory, and the memory is used to store a program for executing the decoding method provided in the first aspect above.
  • the processor is configured to execute the program stored in the memory, so as to implement the decoding method provided in the first aspect above.
  • the decoding device may further include a communication bus, which is used to establish a connection between the processor and the memory.
  • an encoding end device includes a processor and a memory, and the memory is used to store a program for executing the encoding method provided in the second aspect above.
  • the processor is configured to execute the program stored in the memory, so as to implement the encoding method provided by the second aspect above.
  • the encoding end device may further include a communication bus, which is used to establish a connection between the processor and the memory.
  • a computer-readable storage medium and instructions are stored in the storage medium, and when the instructions are run on a computer, the computer is made to execute the steps of the decoding method described in the first aspect above, or execute The steps of the encoding method described in the second aspect above.
  • a computer program product containing instructions, which, when the instructions are run on a computer, cause the computer to execute the steps of the decoding method described in the first aspect above, or execute the encoding method described in the second aspect above method steps.
  • a computer program is provided, and when the computer program is executed, the steps of the decoding method described in the above-mentioned first aspect are realized, or the steps of the encoding method described in the above-mentioned second aspect are realized.
  • multiple feature points are divided into multiple groups based on specified values during the decoding process, and the probability distribution of each feature point in the same group is determined in parallel, which can speed up decoding efficiency.
  • the multiple feature groups are also grouped in the same grouping manner during the encoding process, and the first image feature of each set of feature points in the multiple groups is sequentially encoded into the code stream. That is to say, this solution can break through the efficiency bottleneck caused by serial calculation when decoding based on VAE, and effectively improve the decoding efficiency.
  • FIG. 1 is a schematic diagram of an implementation environment provided by an embodiment of the present application.
  • FIG. 2 is a schematic structural diagram of a codec framework provided by an embodiment of the present application.
  • Fig. 3 is a schematic diagram of a codec sequence provided by an embodiment of the present application.
  • FIG. 4 is a flow chart of an encoding method provided in an embodiment of the present application.
  • Fig. 5 is a schematic diagram of using peripheral information in encoding provided by an embodiment of the present application.
  • Fig. 6 is a schematic diagram of another codec sequence provided by the embodiment of the present application.
  • Fig. 7 is a schematic diagram of another codec sequence provided by the embodiment of the present application.
  • Fig. 8 is a schematic diagram of another codec sequence provided by the embodiment of the present application.
  • Fig. 9 is a schematic diagram of another codec sequence provided by the embodiment of the present application.
  • Fig. 10 is a schematic structural diagram of another codec framework provided by the embodiment of the present application.
  • FIG. 11 is a flow chart of a decoding method provided by an embodiment of the present application.
  • FIG. 12 is a schematic structural diagram of a decoding device provided by an embodiment of the present application.
  • FIG. 13 is a schematic structural diagram of an encoding device provided by an embodiment of the present application.
  • Fig. 14 is a schematic block diagram of a codec device provided by an embodiment of the present application.
  • the network architecture and business scenarios described in the embodiments of the present application are for more clearly illustrating the technical solutions of the embodiments of the present application, and do not constitute limitations on the technical solutions provided by the embodiments of the present application.
  • the technical solutions provided by the embodiments of this application are also applicable to similar technical problems.
  • Pixel depth bits per pixel, BPP is the number of bits used to store each pixel, and the smaller the BPP, the smaller the compression rate.
  • Bit rate In image compression, it refers to the encoding length required for unit pixel encoding. The higher the bit rate, the better the quality of image reconstruction.
  • Peak signal to noise ratio Peak signal to noise ratio
  • MS-SSIM Multi-scale structural similarity index measure
  • CNN Convolution neural network
  • VAE Variational autoencoder
  • FIG. 1 is a schematic diagram of an implementation environment provided by an embodiment of the present application.
  • the implementation environment includes source device 10 , destination device 20 , link 30 and storage device 40 .
  • the source device 10 may generate encoded images. Therefore, the source device 10 may also be called an image encoding device.
  • Destination device 20 may decode the encoded image generated by source device 10 . Therefore, the destination device 20 may also be referred to as an image decoding device.
  • Link 30 may receive encoded images generated by source device 10 and may transmit the encoded images to destination device 20 .
  • the storage device 40 can receive the coded image generated by the source device 10 and store the coded image. Under such conditions, the destination device 20 can directly obtain the coded image from the storage device 40 .
  • the storage device 40 may correspond to a file server or another intermediate storage device that may save the encoded images generated by the source device 10, in which case the destination device 20 may store the encoded images via streaming or downloading from the storage device 40. encoded image of .
  • Both the source device 10 and the destination device 20 may include one or more processors and a memory coupled to the one or more processors, and the memory may include random access memory (random access memory, RAM), read-only memory ( read-only memory, ROM), charged erasable programmable read-only memory (electrically erasable programmable read-only memory, EEPROM), flash memory, can be used to store the desired program in the form of instructions or data structures that can be accessed by the computer Any other media etc. of the code.
  • RAM random access memory
  • ROM read-only memory
  • EEPROM electrically erasable programmable read-only memory
  • flash memory can be used to store the desired program in the form of instructions or data structures that can be accessed by the computer Any other media etc. of the code.
  • both the source device 10 and the destination device 20 may include a mobile phone, a smart phone, a personal digital assistant (Personal Digital Assistant, PDA), a wearable device, a PPC (pocket PC), a tablet computer, a smart car, a smart TV , smart speakers, desktop computers, mobile computing devices, notebook (e.g., laptop) computers, tablet computers, set-top boxes, telephone handsets such as so-called "smart" phones, televisions, cameras, display devices, digital Media players, video game consoles, in-vehicle computers, or the like.
  • PDA Personal Digital Assistant
  • PPC personal digital assistant
  • Link 30 may include one or more media or devices capable of transmitting encoded images from source device 10 to destination device 20 .
  • link 30 may include one or more communication media that enable source device 10 to transmit encoded images directly to destination device 20 in real-time.
  • the source device 10 may modulate the encoded image based on a communication standard, such as a wireless communication protocol, etc., and may send the modulated image to the destination device 20 .
  • the one or more communication media may include wireless and/or wired communication media, for example, the one or more communication media may include radio frequency (radio frequency, RF) spectrum or one or more physical transmission lines.
  • the one or more communication media may form part of a packet-based network, such as a local area network, a wide area network, or a global network (eg, the Internet), among others.
  • the one or more communication media may include routers, switches, base stations, or other devices that facilitate communication from the source device 10 to the destination device 20, etc., which are not specifically limited in this embodiment of the present application.
  • the storage device 40 may store the received encoded image sent by the source device 10 , and the destination device 20 may directly acquire the encoded image from the storage device 40 .
  • the storage device 40 may include any one of a variety of distributed or locally accessed data storage media, for example, any one of the various distributed or locally accessed data storage media may be Hard disk drive, Blu-ray Disc, digital versatile disc (DVD), compact disc read-only memory (CD-ROM), flash memory, volatile or nonvolatile memory, or Any other suitable digital storage medium for storing encoded images, etc.
  • the storage device 40 may correspond to a file server or another intermediate storage device that may save the encoded images generated by the source device 10, and the destination device 20 may transmit or download the encoded images via the storage device 40.
  • the file server may be any type of server capable of storing encoded images and sending the encoded images to destination device 20 .
  • the file server may include a network server, a file transfer protocol (file transfer protocol, FTP) server, a network attached storage (network attached storage, NAS) device, or a local disk drive.
  • Destination device 20 may acquire the encoded images over any standard data connection, including an Internet connection.
  • Any standard data connection may include a wireless channel (e.g., a Wi-Fi connection), a wired connection (e.g., a digital subscriber line (DSL), cable modem, etc.), or is suitable for obtaining encoded data stored on a file server.
  • a wireless channel e.g., a Wi-Fi connection
  • a wired connection e.g., a digital subscriber line (DSL), cable modem, etc.
  • DSL digital subscriber line
  • cable modem etc.
  • the transmission of encoded images from storage device 40 may be a streaming transmission, a download transmission, or a combination of both.
  • the implementation environment shown in FIG. 1 is only one possible implementation, and the technology of the embodiment of the present application is not only applicable to the source device 10 shown in FIG.
  • the decoding destination device 20 may also be applicable to other devices capable of encoding images and decoding encoded images, which is not specifically limited in this embodiment of the present application.
  • the source device 10 includes a data source 120 , an encoder 100 and an output interface 140 .
  • output interface 140 may include a conditioner/demodulator (modem) and/or a transmitter, where a transmitter may also be referred to as a transmitter.
  • Data source 120 may include an image capture device (e.g., video camera, etc.), an archive containing previously captured images, a feed interface for receiving images from an image content provider, and/or a computer graphics system for generating images, or A combination of these sources of images.
  • the data source 120 may send an image to the encoder 100, and the encoder 100 may encode the received image sent by the data source 120 to obtain an encoded image.
  • the encoder may send encoded images to an output interface.
  • source device 10 sends the encoded images directly to destination device 20 via output interface 140 .
  • the encoded images may also be stored on storage device 40 for later retrieval by destination device 20 for decoding and/or display.
  • the destination device 20 includes an input interface 240 , a decoder 200 and a display device 220 .
  • input interface 240 includes a receiver and/or a modem.
  • the input interface 240 may receive the encoded image via the link 30 and/or from the storage device 40, and then send it to the decoder 200, and the decoder 200 may decode the received encoded image to obtain a decoded image.
  • the decoder may transmit the decoded image to the display device 220 .
  • the display device 220 may be integrated with the destination device 20 or may be external to the destination device 20 . In general, the display device 220 displays the decoded images.
  • the display device 220 can be any type of display device in various types, for example, the display device 220 can be a liquid crystal display (liquid crystal display, LCD), a plasma display, an organic light-emitting diode (organic light-emitting diode, OLED) monitor or other type of display device.
  • the display device 220 can be a liquid crystal display (liquid crystal display, LCD), a plasma display, an organic light-emitting diode (organic light-emitting diode, OLED) monitor or other type of display device.
  • encoder 100 and decoder 200 may be individually integrated with the encoder and decoder, and may include appropriate multiplexer-demultiplexer (multiplexer-demultiplexer) , MUX-DEMUX) unit or other hardware and software for encoding both audio and video in a common data stream or in separate data streams.
  • the MUX-DEMUX unit may conform to the ITU H.223 multiplexer protocol, or other protocols such as user datagram protocol (UDP), if applicable.
  • Each of the encoder 100 and the decoder 200 can be any one of the following circuits: one or more microprocessors, digital signal processing (digital signal processing, DSP), application specific integrated circuit (application specific integrated circuit, ASIC) ), field-programmable gate array (FPGA), discrete logic, hardware, or any combination thereof. If the techniques of the embodiments of the present application are implemented partially in software, the device may store instructions for the software in a suitable non-transitory computer-readable storage medium, and may use one or more processors in hardware The instructions are executed to implement the technology of the embodiments of the present application. Any of the foregoing (including hardware, software, a combination of hardware and software, etc.) may be considered to be one or more processors. Each of encoder 100 and decoder 200 may be included in one or more encoders or decoders, either of which may be integrated into a combined encoding in a corresponding device Part of a codec/decoder (codec).
  • codec codec/decoder
  • Embodiments of the present application may generally refer to the encoder 100 as “signaling” or “sending” certain information to another device such as the decoder 200 .
  • the term “signaling” or “sending” may generally refer to the transmission of syntax elements and/or other data for decoding a compressed image. This transfer can occur in real time or near real time. Alternatively, this communication may occur after a period of time, such as upon encoding when storing syntax elements in an encoded bitstream to a computer-readable storage medium, which the decoding device may then perform after the syntax elements are stored on this medium The syntax element is retrieved at any time.
  • the codec method provided by the embodiment of the present application can be applied to various scenarios, and the coded image in various scenarios can be an image included in an image file, or an image included in a video file. It should be noted that, in combination with the implementation environment shown in FIG. 1 , any of the following encoding methods may be executed by the encoder 100 in the source device 10 . Any of the following decoding methods may be performed by the decoder 200 in the destination device 20 .
  • codec method provided in the embodiment of the present application can be applied to any codec method provided in the video and image compression framework of the VAE method.
  • codec model of the basic VAE method is introduced.
  • the original image is input into the encoding network model to extract features to obtain the image feature y to be quantized with multiple feature points.
  • the image feature y to be quantized is also referred to as the second image feature in the embodiment of this application.
  • the first image feature of the multiple feature points Input to the super coding network model to obtain the super prior feature z to be quantized of the multiple feature points, the super prior feature z to be quantized is also called the second super prior feature z in the embodiment of the present application, for multiple Quantize the second super prior feature z of the feature points to obtain the first super prior feature z of the multiple feature points
  • the first hyper-prior feature of the multiple feature points according to the specified probability distribution Entropy coding is carried out to convert Encode stream.
  • bit sequence obtained by performing entropy coding is a part of the bit sequence included in the code stream, and this part of the bit sequence (as shown by the black and white bars on the right in FIG. 2 ) can be called the super-prior bit stream.
  • the first super prior features of the multiple feature points Input the super decoding network model to obtain the prior features ⁇ of the multiple feature points.
  • the first image feature of the multiple feature points Input a context model (context model, CM) to obtain the context features ⁇ of the multiple feature points.
  • CM context model
  • the probability distribution N( ⁇ , ⁇ ) of the multiple feature points is estimated through the probability distribution estimation model (shown as gather model, GM).
  • the probability distribution N( ⁇ , ⁇ ) of each feature point in turn, the first image feature of each feature point in the multiple feature points to code stream.
  • bit sequence obtained by performing entropy coding is a part of the bit sequence included in the code stream, and this part of the bit sequence (as shown by the black and white bars on the left in FIG. 2 ) can be called an image bit stream.
  • first entropy decoding from the super prior bit stream included in the code stream according to the specified probability distribution to obtain the first super prior features of the multiple feature points The first super prior feature of the multiple feature points Input the super decoding network model to obtain the prior features ⁇ of the multiple feature points.
  • estimate the probability distribution of the first feature point based on the prior characteristics of the first feature point, based on the probability distribution of the first feature point, from the image bit stream included in the code stream The first image feature of the first feature point is parsed out.
  • the surrounding information of the first feature point is determined, and the surrounding information of the first feature point is Information input context model CM obtains the context features of the first feature point, combines the prior features and context features of the first feature point, and estimates the probability distribution of the first feature point through the probability distribution estimation model GM, based on the probability of the first feature point distribution, analyzing the first image feature of the first feature point from the image bit stream included in the code stream. Entropy decoding the first image features of the multiple feature points from the code stream After that, will Input the decoded network model to get the reconstructed image.
  • both the encoding end device and the decoding end device in the related art calculate the probability distribution of each of the multiple feature points in sequence, as shown in Fig. 3, in the related art, the decoding end sequentially estimates the probability distribution of each feature point according to the order indicated by the arrows in FIG. .
  • the current decoding end decodes a feature point filled with black, and the feature point filled with oblique lines is a decoded feature point.
  • the feature The surrounding information of the point includes the first image features of the 12 feature points filled with oblique lines in the black thick line frame.
  • the encoding and decoding model of the VAE method includes two parts, one is the feature extraction and decoding module, and the other is the entropy encoding module.
  • the entropy coding module the compression performance can be greatly improved by introducing context information (that is, surrounding information) and super prior information.
  • FIG. 4 is a flow chart of an encoding method provided by an embodiment of the present application.
  • the encoding method is applied to an encoding end device, and the encoding method includes the following steps.
  • Step 401 Based on the image to be encoded, determine the first image feature, probability distribution and first super-prior feature of each feature point among the multiple feature points of the image.
  • the image to be encoded is an image in an image file or an image in a video file, and the image to be encoded may be in any form, which is not limited in this embodiment of the present application.
  • the implementation process of determining the first image feature, probability distribution and first super-priori feature of each feature point in the multiple feature points of the image is: based on the image, determine The first image features of the plurality of feature points, based on the first image features of the plurality of feature points, determine the first hyper-priori features of the plurality of feature points, and determine the feature points of the plurality of feature points in parallel Probability distributions.
  • the implementation process of determining the first image features of the plurality of feature points is: input the image into the encoding network model, obtain the second image features of the plurality of feature points output by the encoding network model, and The second image features of the multiple feature points are quantized to obtain the first image features of the multiple feature points.
  • the implementation process of determining the first super prior features of the multiple feature points is: input the first image features of the multiple feature points into the super-encoding network model to obtain the super-priority
  • the second super-prior features of the multiple feature points output by the encoding network model are quantized to obtain the first super-prior features of the multiple feature points.
  • the quantization step size of variable quantization can be determined based on different coding rates, that is, the ratio of coding rate and quantization step size is stored in advance Corresponding relationship, based on the encoding rate adopted in the embodiment of the present application, the corresponding quantization step size is obtained from the corresponding relationship.
  • scalar quantization may also have an offset, that is, the data to be quantized (such as the second image feature or the second super-prior feature) is biased by the offset, and then the scalar quantization is performed according to the quantization step size.
  • the multiple feature points include the first feature point
  • the implementation process of determining the probability distribution of the first feature point is: if the first feature point is not the first feature point among the multiple feature points, then based on the first feature point
  • the first image feature determine the prior feature of the first feature point, determine the surrounding information of the first feature point from the first image features of the multiple feature points, input the surrounding information of the first feature point into the context model, and obtain
  • the context feature of the first feature point output by the context model determines the probability distribution of the first feature point based on the prior feature of the first feature point and the context feature of the first feature point.
  • the first feature point is one of the multiple feature points.
  • the implementation process of determining the prior feature of the first feature point is: based on the first image feature of the first feature point, determining the first super prior feature of the first feature point , based on the first super prior feature of the first feature point, determine the prior feature of the first feature point.
  • the realization process of determining the first super-priority feature of the first feature point is the realization process of determining the first super-priority feature of any feature point in the plurality of feature points. This process has been described above has been introduced, and will not be repeated here.
  • the implementation process of determining the prior feature of the first feature point is: input the first super-priority feature of the first feature point into the super decoding network model, and obtain the super decoding The prior feature of the first feature point output by the network model.
  • the encoding network model, super-encoding network model, and super-decoding network model described above are all pre-trained.
  • the embodiments of the present application do not limit the network structure and training methods of the encoding network model, super-encoding network model, and super-decoding network model.
  • the network structure of the encoding network model, the super-encoding network model and the super-decoding network model can be a fully connected network or a convolutional neural network (CNN), and the convolution in the convolutional neural network can be 2D convolution or 3D convolution.
  • CNN convolutional neural network
  • the embodiments of the present application do not limit the number of layers included in the network structures of the encoding network model, the super-encoding network model, and the super-decoding network model and the number of nodes in each layer.
  • the network structures of the encoding network model, the super-encoding network model and the super-decoding network model are all CNN, and the convolution in CNN is 2D convolution as an example.
  • the encoding network model outputs the first
  • the second image feature is represented by a C*W*H dimensional matrix
  • the first image feature of the multiple feature points obtained by the quantization process is also represented by a C*W*H dimensional matrix, where C is the number of channels of CNN, and W* H represents the size of the feature map formed by the plurality of feature points.
  • the second super prior features of multiple feature points obtained based on the super coding network model the first super prior features of multiple feature points obtained by quantization, and the priors of multiple feature points obtained based on the super decoding network model
  • the features are also represented by a C*W*H dimensional matrix.
  • the context model in the embodiment of the present application is also pre-trained, and the embodiment of the present application does not limit the network structure and training method of the context model.
  • the network structure of the context model can be a mask region CNN (mask region CNN, Mask R-CNN), where the receptive field is used in Mask R-CNN to extract context features, and one or more receptive fields can be used in the context model.
  • the size of each receptive field in the one or more receptive fields is different, which is not limited in this embodiment of the present application.
  • the receptive field used by the context model includes a receptive field with a size of 5*5.
  • the convolution in the context model can be 2D convolution or 3D convolution. Assuming that the convolution in the context model is 2D convolution, the size of the receptive field can be 3*3, 5*5 or 7*7, etc.
  • the surrounding information of the first feature point is the image feature that needs to be used to determine the context feature of the first feature point.
  • the implementation process of determining the surrounding information of the first feature point from the first image features of the multiple feature points is: determining the first image feature of the multiple feature points according to preset rules.
  • the surrounding information of the first feature point includes first image features of at least n feature points in a neighborhood with the first feature point as a geometric center, the size of the neighborhood is determined based on the size of the receptive field used by the context model,
  • the surrounding information of the first feature point includes at least the first image features of n feature points around the first feature point, where n is greater than or equal to 4.
  • the surrounding information of feature point A includes the first image features of 6 feature points in the thick line frame with feature point A as the geometric center, and these six feature points are located above feature point A (including directly above and obliquely above) .
  • the surrounding information of feature point B includes the first image features of 12 feature points in the thick line frame with feature line B as the geometric center, and these 12 feature points are located at the two feature points directly to the left of feature point B and The top 10 feature points.
  • the surrounding information of feature point C also includes the first image features of 12 feature points, and the relative positions of these 12 feature points and feature point C are similar to the relative positions of feature point B and the 12 feature points corresponding to feature point B.
  • the surrounding information of the feature point D includes the first image features of the 8 feature points in the thick line frame with the feature point D as the geometric center, and these 8 feature points are located at the two feature points directly to the left of the feature point D and The 6 feature points above.
  • the surrounding information of the first feature point is The information is input into the context model, and the context feature of the first feature point output by the context model is obtained. Afterwards, based on the prior feature of the first feature point and the context feature of the first feature point, the probability distribution of the first feature point is determined.
  • the implementation process of determining the probability distribution of the first feature points is: combining the priori features of the first feature points with the The context feature is input into the probability distribution estimation model, and the probability distribution of the first feature point output by the probability distribution estimation model is obtained, and the probability distribution is represented by a mean value and a standard deviation.
  • the probability distribution estimation model is pre-trained, and the network structure of the probability distribution estimation model is a neural network, such as CNN.
  • the probability distribution estimation model is the GM model introduced above.
  • the probability distribution of the first feature point is determined based on the prior feature of the first feature point. That is to say, for the first feature point, the surrounding information is not used in the encoding process, or the surrounding information of the first feature point is set to 0. It should be noted that if the first feature point is the first feature point among the multiple feature points, then the realization process of determining the probability distribution of the first feature point is: input the prior feature of the first feature point into the probability distribution estimation model, and obtain the probability distribution of the first feature point output by the probability distribution estimation model.
  • the surrounding information of the first feature point is 0, and the implementation process of determining the probability distribution of the first feature point is: input the surrounding information of the first feature point into the context model, and obtain the context feature of the first feature point output by the context model,
  • the context feature of the first feature point is 0, and the prior feature and context feature 0 of the first feature point are input into the probability distribution estimation model to obtain the probability distribution of the first feature point output by the probability distribution estimation model.
  • the surrounding information of each feature point Perform feature extraction to obtain multiple first context features of each feature point corresponding to the corresponding receptive field, that is, the context feature of each feature point is determined based on multiple first context features of the corresponding feature point, the multiple first context features and
  • the multiple receptive fields are in one-to-one correspondence. Simply put, several first context features are obtained for each feature point by using several receptive fields.
  • the context features of each feature point include multiple first context features of the corresponding feature point, and after obtaining multiple first context features of each feature point based on multiple receptive fields used in the context model , the context model outputs multiple first context features of each feature point, and then inputs the prior features of each feature point and multiple first context features of the corresponding feature point into the probability distribution estimation model to obtain the output of the probability distribution estimation model The probability distribution of the corresponding feature points.
  • the context features of each feature point include a plurality of first context features of the corresponding feature point.
  • the context model uses three receptive fields with dimensions of 3*3, 5*5, and 7*7 respectively, then for each feature point, three first context features are obtained, and the first context features of each feature point are The experimental features and the three first context features of the corresponding feature points are input into the probability distribution estimation model, and the probability distribution of the corresponding feature points output by the probability distribution estimation model is obtained.
  • the context model continues to process the multiple first context features of each feature point, Obtain the context features of the corresponding feature points output by the context model, and then input the prior features and context features of each feature point into the probability distribution estimation model to obtain the probability distribution of the corresponding feature points output by the probability distribution estimation model.
  • the context feature of each feature point is a context feature obtained by fusing multiple first context features of the corresponding feature point.
  • the above describes the implementation process of determining the first image feature, probability distribution and first super-priori feature of each feature point in the multiple feature points of the image based on the image to be encoded.
  • the implementation process Similar to the related process of the VAE method introduced above.
  • the steps can be regarded as two branches, and these two branches can be executed in parallel to speed up the encoding speed.
  • the encoding efficiency can be guaranteed by determining the probability distribution of each feature point in parallel.
  • Step 402 Divide the plurality of feature points into groups based on specified values.
  • this scheme optimizes the encoding and decoding sequence of each feature point so that the probability of some feature points can be determined in parallel distributed.
  • the multiple feature points are divided into multiple groups based on the specified value, each group includes at least one feature point, and the subsequent encoding end device can follow the introduction of step 403 below to sequentially divide each group of feature points in the multiple groups
  • the first image feature of is encoded into the code stream.
  • the specified value is determined based on the size of the receptive field used by the context model.
  • the specified value is determined by the size of the largest receptive field among the multiple receptive fields with different sizes.
  • the encoding process uses a convolutional network
  • the convolution in the convolutional network is a 2D convolution as an example.
  • the implementation process of dividing the plurality of feature points into multiple groups based on the specified value is: determining a slope based on the specified value, and dividing the multiple feature points into multiple groups based on the slope.
  • the slope is used to indicate the degree of inclination of the straight line where the feature points classified into the same group are located.
  • the slope is intuitive.
  • the feature points A, B, C, and D are divided into the same group. It can be seen that the feature points A, B, C, and D are actually on a straight line, and the slope can indicate the degree of inclination of the straight line.
  • the slope is not intuitive.
  • 2D convolution is taken as an example to introduce the grouping method.
  • the principle of the grouping method corresponding to 3D convolution is the same as the grouping method corresponding to 2D convolution.
  • the principle of the method is the same.
  • the multiple feature points are divided into multiple groups
  • the tth cycle of the cyclic method includes: if there are undivided feature points in the plurality of feature points, the feature points whose abscissa is t-i*k and ordinate is i in the plurality of feature points are divided into One group.
  • k is the slope
  • t, i and t-i*k are all integers
  • the minimum value of t and i is 0.
  • the first feature point is the feature point in the upper left corner
  • the codec number of the first feature point is 1.
  • the feature points with the same codec number in Figure 6 belong to the same group, and the codec number is from small to large.
  • the order is the encoding and decoding order of each group in the multiple groups
  • the last feature point is the feature point in the lower right corner
  • the encoding and decoding sequence number of the last feature point is the largest. It can be seen that the encoding and decoding sequence shown in Figure 6 starts from the upper left corner, first goes to the right, and then gradually goes to the lower right corner.
  • Figure 7 it is conceivable that if Figure 6 is rotated 90 degrees counterclockwise, another codec sequence as shown in Figure 7 can be obtained.
  • the first feature point is the feature point in the lower left corner of the feature map
  • the encoding and decoding sequence starts from the lower left corner, first goes up, then gradually moves to the upper right corner, and the last feature point is the feature point in the upper right corner.
  • Figure 8 by mirroring Figure 6 based on the oblique line formed by connecting the upper left corner to the lower right corner, another encoding and decoding order as shown in Figure 8 is obtained.
  • the first feature point is the upper left corner of the feature map
  • the feature points of the codec sequence starts from the upper left corner, goes down first, then gradually moves to the lower right corner, and the last feature point is the feature point of the lower right corner. It can be seen that a variety of codec sequences can actually be obtained according to this scheme. Intuitively, eight codec sequences can be obtained by rotating or/or mirroring transformation of Figure 6, and eight codec sequences can be obtained. are similar in nature.
  • the first feature point is used as the coordinate origin
  • the direction from the first feature point to the second feature point is the horizontal axis
  • the other side perpendicular to the horizontal axis on the feature map is On the vertical axis, the plurality of feature points are divided into groups according to the slope.
  • the 3D feature map includes multiple 2D feature maps.
  • the plane of each 2D feature map is parallel to the xy plane and is divided into the same group of features
  • the points are scattered on each 2D feature map included in the 3D feature map, so that spatial parallel encoding and decoding can be achieved with a high degree of parallelism.
  • this solution can realize parallel determination of probability distribution in the subsequent decoding process without changing the surrounding information available for each feature point by adjusting the codec sequence of each feature point.
  • FIG. 5 similar to the related art, only adjusting the encoding and decoding sequence while ensuring the most available peripheral information during encoding and decoding will not reduce the encoding and decoding performance, but also improve the encoding and decoding efficiency.
  • Step 403 Based on the probability distribution of the plurality of feature points, sequentially encode the first image feature of each set of feature points in the plurality of groups into a code stream.
  • the first image feature of each feature point in the multiple groups is sequentially encoded into the code stream. That is to say, according to the encoding and decoding sequence after grouping, first encode the group with the smaller codec sequence number of the feature point, and then encode the group with the larger codec sequence number of the feature point, until the group of the multiple feature points The first image features of each feature point are encoded into the code stream.
  • the implementation process of sequentially encoding the first image features of each group of feature points in the multiple groups into the code stream is: based on the probability distribution of the multiple features, sequentially for the multiple groups Entropy encoding is performed on the first image feature of each group of feature points in the method to obtain the image bit sequence corresponding to the feature point in the corresponding group, and the image bit sequence of the feature point in the corresponding group is written into the code stream.
  • the image bit sequences of the plurality of feature points in the code stream constitute an image bit stream.
  • the entropy coding adopts the entropy coding model based on the probability distribution for entropy coding, and the entropy coding can use arithmetic coding (arithmetic coding), range coding (range coding, RC) or Huffman (huffman) coding.
  • arithmetic coding arithmetic coding
  • range coding range coding
  • RC range coding
  • Huffman Huffman
  • Step 404 Encode the first super-prior features of the plurality of feature points into a code stream.
  • the implementation process of encoding the first super-prior features of the multiple feature points into the code stream is: according to the specified probability distribution, encode the first super-prior features of the multiple feature points into the code stream.
  • entropy encoding is performed on the first super-prior features of the multiple feature points to obtain the super-prior bit sequences of the multiple feature points, and the multiple feature points The super prior bit sequence is written into the code stream.
  • the first super prior feature can also be encoded into the code stream by way of entropy coding.
  • the super-prior bit sequences of the multiple feature points in the code stream constitute a super-prior bit stream. That is, the code stream includes two parts, one part is the image bit stream, and the other part is the super prior bit stream.
  • the specified probability distribution is a probability distribution determined in advance through the probability distribution network model, and the embodiment of the present application does not limit the network structure and training method of the probability distribution network model used to train the specified probability distribution.
  • the network structure of the probability distribution network model may be a fully connected network or CNN.
  • the embodiment of the present application does not limit the number of layers included in the network structure of the probability distribution network model and the number of nodes in each layer.
  • step 402 and step 403 can be executed serially, that is, the feature points are grouped and then encoded sequentially, or step 402 and step 403 are executed in parallel, that is, while grouping according to the above-mentioned loop method, each point is better Group, that is, encode the first image feature of the feature points in the group into the code stream, and then divide the next group until the last group is divided, and encode the first image feature of the feature points in the last group into the code stream.
  • the first image feature of the multiple feature points Input the super-encoding network model to obtain the second super-priority feature z of the multiple feature points, and quantize z to obtain the first super-priority feature of the multiple feature points
  • the probability distribution of the multiple feature points is obtained by estimating the model through the probability distribution.
  • the current number of cycles is t.
  • multiple feature points are divided into multiple groups based on the specified values in the encoding process, and the multiple groups are sequentially
  • the first image feature of each group of feature points is encoded into the code stream.
  • the grouping is also done in the same manner in the decoding process, and the probability distribution is determined in parallel for each feature point in the same group, so as to improve the decoding efficiency. That is to say, this solution can break through the efficiency bottleneck caused by serial calculation when decoding based on VAE, and effectively improve the decoding efficiency.
  • FIG. 11 is a flowchart of a decoding method provided by an embodiment of the present application.
  • the decoding method is applied to a decoding end device, and the decoding method includes the following steps.
  • Step 1101 Determine the priori features of each feature point among the feature points of the image to be decoded based on the code stream.
  • the implementation process of determining the priori features of each of the multiple feature points of the image to be decoded based on the code stream is: determining the first super-priori features of the multiple feature points based on the code stream, Based on the first super prior features of the multiple feature points, determine the prior features of the multiple feature points.
  • the implementation process of determining the first super-prior features of the multiple feature points based on the code stream may be: according to the specified probability distribution, perform entropy decoding on the code stream to obtain the first super-prior features of the multiple feature points.
  • the implementation process of determining the prior features of the multiple feature points can be: input the first super prior features of the multiple feature points into the super decoding network model, and obtain The prior features of the plurality of feature points output by the super decoding network model.
  • the decoding method in this step corresponds to the encoding method at the encoding end, and the specified probability distribution in this step is the same as that in the encoding end.
  • the network structure of the decoding network model is consistent.
  • Step 1102 Divide the plurality of feature points into groups based on specified values.
  • the multiple feature points also need to be divided into multiple groups based on the specified value at the decoding end, and the grouping method in this step is the same as the grouping method at the encoding end, that is, the multiple feature points are grouped based on the specified value
  • the implementation process of dividing the points into multiple groups may be as follows: determining a slope based on a specified value, and dividing the plurality of feature points into multiple groups based on the slope.
  • the specified value is determined based on the size of the receptive field used by the context model, and the slope is used to indicate the inclination of the straight line where the feature points classified into the same group are located.
  • the specified value is determined by the size of the largest receptive field among the multiple receptive fields with different sizes. It should be noted that, for the specific implementation manner of the grouping, refer to the relevant introduction in the aforementioned encoding method, and details are not repeated here.
  • Step 1103 Based on the prior characteristics of the plurality of feature points, sequentially determine the first image feature of each set of feature points in the multiple groups, wherein the step of determining the first image feature of any set of feature points is: parallelly determine the The probability distribution of each feature point in any group is based on the probability distribution of each feature point in any group, and the first image feature of each feature point in any group is analyzed from the code stream.
  • the first image of each group of feature points in the multiple groups is sequentially determined at the decoding end based on the prior characteristics of the multiple feature points feature.
  • the probability distribution of each feature point in any group is determined in parallel, and then, based on the probability distribution of each feature point in any group, the The first image feature of each feature point in any group.
  • the probability distribution is also determined at the decoding end according to the encoding and decoding serial numbers shown in FIG. 6 .
  • the multiple feature points include the first feature point
  • the implementation method of determining the probability distribution of the first feature point is: if the first feature point is not the first feature point among the multiple feature points, then from the decoded Among the first image features of each feature point, determine the surrounding information of the first feature point, input the surrounding information of the first feature point into the context model, and obtain the context feature of the first feature point output by the context model, based on the first feature point
  • the prior feature and the context feature of the first feature point determine the probability distribution of the first feature point.
  • the first feature point is a feature point in any one of the groups.
  • the surrounding information of the first feature point includes the first image feature of the decoded feature point in the neighborhood with the first feature point as the geometric center, the size of the neighborhood is determined based on the size of the receptive field used by the context model, The surrounding information of the first feature point includes at least the first image features of n feature points around the first feature point, where n is greater than or equal to 4.
  • the surrounding information of the first feature point in the decoding method at the decoding end is the same as the surrounding information of the first feature point in the encoding method at the encoding end, and will not be repeated here.
  • the probability distribution of the first feature point is determined based on the prior feature of the first feature point.
  • the implementation process of determining the probability distribution of the first feature point at the decoding end is the same as that at the encoding end, and will not be repeated here.
  • Step 1104 Reconstruct an image based on the first image features of the plurality of feature points.
  • the implementation of reconstructing an image based on the first image features of the multiple feature points is: input the first image features of the multiple feature points into the decoding network model, and obtain the reconstructed output of the decoding network model Image.
  • the decoding network model in this step corresponds to the network structure of the encoding network model at the encoding end, that is, the decoding step in the decoding network model is the inverse process of the encoding step in the encoding network model, for example, the encoding and decoding shown in Figure 10 framework, the network structure of the decoding network model is opposite to that of the encoding network model.
  • step 1102 and step 1103 can be executed serially, that is, the feature points are grouped and then decoded sequentially, or step 1102 and step 1103 are executed in parallel, that is, while grouping according to the above-mentioned cycle method, each point is good Group, that is to determine the probability distribution of each feature point in the group in parallel, and analyze the first image feature of the feature point in the group from the code stream based on the probability distribution, and then divide the next group, and so on, until the last one is divided. group, and analyze the probability distribution of the feature points in the last group from the code stream.
  • the image bit stream included in the code stream is entropy decoded to obtain the first image features of the plurality of feature points
  • the first image feature of the multiple feature points Input the decoding network to get the reconstructed image.
  • the encoding method provided in the embodiment of the present application is used to conduct experiments on the test sets Kodak and CLIC respectively.
  • the resolution of the image to be encoded in the test set Kodak is 512*768, and the resolution of the image to be encoded in the test set CLIC is 2048*1367.
  • the context model in encoding and decoding uses a single receptive field with a size of 5*5. The experimental results of this experiment are shown in Table 1.
  • Ctx serial represents the coding and decoding method of the related technology
  • Ctx parallel represents the coding and decoding method provided by the embodiment of the present application
  • Enc represents encoding
  • Dec represents decoding.
  • This solution is the same as the coding and decoding framework of the related technology, but the encoding and decoding of feature The decoding order is different. It can be seen that, compared with the prior art, this solution can greatly save decoding time, and the encoding and decoding efficiency of this solution is higher. It should be noted that, compared with related technologies, this solution does not reduce or change the available surrounding information, so the encoding and decoding performance of this solution is equivalent to that of related technologies, that is, this solution will not reduce the quality of reconstructed images. quality.
  • the encoding and decoding framework shown in Figure 10 the context model in the encoding and decoding framework uses three receptive fields, and the sizes of these three receptive fields are 3*3, 5*5 and 7*7 respectively , this scheme is the same as the encoding and decoding framework of the related technology, but the encoding and decoding order of the feature points is different.
  • the experimental results of this experiment are shown in Table 2. Among them, the ratio Ratio represents the saving rate of encoding and decoding time of this scheme compared with the related technology, Enc-R is the saving rate of encoding time, Dec-R is the saving rate of decoding time, and the positive saving rate means saving time, saving Negative rates indicate increased time.
  • this scheme can save 84.6% of the decoding time on the test set Kodak, and can save 92% of the encoding time on the test set CLIC. If this scheme is adopted, the saving rate of decoding time increases with The higher the image resolution becomes, the higher the image resolution is, the higher the proportion of feature points that can be decoded in parallel in the process of adopting this solution.
  • t s represents the coding time of the related technology
  • t p represents the coding time of the present scheme
  • t s represents the decoding time of the related technology
  • t p represents the decoding time of the present scheme
  • this solution is actually a parallelization method of entropy coding based on probability distribution using context features. Compared with related technologies, it can greatly reduce the decoding time without changing the available surrounding information. And the higher the resolution of the image, the higher the saving rate of encoding and decoding time, the more complex the context model (such as the more receptive fields), the higher the saving rate of encoding and decoding time, in the multi-level context model and probability distribution estimation model Among them, this solution can save nearly 10 times the time compared with related technologies. In addition, this solution does not need to change the overall method of related technologies, so there is no need to retrain the network model in the encoding and decoding framework, which means that this solution is more convenient to apply and will not reduce the encoding and decoding performance.
  • multiple feature points are divided into multiple groups based on specified values in the decoding process, and the probability distribution of each feature point in the same group is determined in parallel, which can speed up decoding efficiency. That is to say, this solution can break through the efficiency bottleneck caused by serial calculation when decoding based on VAE, and effectively improve the decoding efficiency.
  • Fig. 12 is a schematic structural diagram of a decoding device 1200 provided by an embodiment of the present application.
  • the decoding device 1200 can be implemented by software, hardware or a combination of the two to become part or all of the decoding end device.
  • the decoding end device can be as shown in Fig. 1 Source device shown.
  • the apparatus 1200 includes: a first determination module 1201 , a grouping module 1202 , a second determination module 1203 and a reconstruction module 1204 .
  • the first determination module 1201 is configured to determine a priori features of each of the multiple feature points of the image to be decoded based on the code stream;
  • a grouping module 1202 configured to divide the plurality of feature points into multiple groups based on specified values
  • the second determining module 1203 is used to sequentially determine the first image feature of each group of feature points in the multiple groups based on the prior features of the plurality of feature points; wherein, the step of determining the first image feature of any group of feature points To: determine the probability distribution of each feature point in any group in parallel, and analyze the first image feature of each feature point in the any group from the code stream based on the probability distribution of each feature point in the any group;
  • a reconstruction module 1204 configured to reconstruct an image based on the first image features of the plurality of feature points.
  • the plurality of feature points include the first feature point
  • the second determination module 1203 includes:
  • the first processing submodule is used to determine the surrounding information of the first feature point from the decoded first image features of each feature point if the first feature point is not the first feature point among the plurality of feature points , the first feature point is a feature point in any group;
  • the second processing submodule is used to input the surrounding information of the first feature point into the context model, and obtain the context feature of the first feature point output by the context model;
  • the third processing sub-module is used to determine the probability distribution of the first feature point based on the prior feature of the first feature point and the context feature of the first feature point.
  • the surrounding information of the first feature point includes the first image feature of the decoded feature point in the neighborhood with the first feature point as the geometric center, the size of the neighborhood is determined based on the size of the receptive field used by the context model, The surrounding information includes at least the first image features of n feature points around the first feature point, where n is greater than or equal to 4.
  • the plurality of feature points include the first feature point
  • the second determining module 1203 includes:
  • the fourth processing submodule is used to determine the probability distribution of the first feature point based on the prior feature of the first feature point if the first feature point is the first feature point among the plurality of feature points.
  • the specified value is determined based on the size of the receptive field used by the context model
  • Grouping module 1202 includes:
  • the first determining submodule is used to determine the slope based on the specified value, and the slope is used to indicate the degree of inclination of the straight line where the feature points divided into the same group are located;
  • the dividing submodule is used for dividing the plurality of feature points into groups based on the slope.
  • the specified value is determined by the size of the largest receptive field among the multiple receptive fields with different sizes.
  • the receptive field used by the context model includes a receptive field with a size of 5*5.
  • multiple feature points are divided into multiple groups based on specified values in the decoding process, and the probability distribution of each feature point in the same group is determined in parallel, which can speed up decoding efficiency. That is to say, this solution can break through the efficiency bottleneck caused by serial calculation when decoding based on VAE, and effectively improve the decoding efficiency.
  • Figure 13 is a schematic structural diagram of an encoding device 1300 provided by an embodiment of the present application.
  • the encoding device 1300 can be implemented by software, hardware or a combination of the two to become part or all of the encoding end device.
  • the encoding end device can be as shown in Figure 1 destination device shown.
  • the apparatus 1300 includes: a first determination module 1301 , a grouping module 1302 , a first encoding module 1303 and a second encoding module 1304 .
  • the first determining module 1301 is configured to determine, based on the image to be encoded, the first image feature, probability distribution, and first super-prior feature of each feature point among the multiple feature points of the image;
  • the first encoding module 1303 is configured to sequentially encode the first image features of each set of feature points in the plurality of groups into a code stream based on the probability distribution of the plurality of feature points;
  • the second encoding module 1304 is configured to encode the first super prior features of the plurality of feature points into a code stream.
  • the first determining module 1301 includes:
  • the first determining submodule is used to determine the first image features of the plurality of feature points based on the image
  • the second determination submodule is used to determine the first super-priori feature of the plurality of feature points based on the first image feature of the plurality of feature points, and determine the probability distribution of each feature point in the plurality of feature points in parallel.
  • the plurality of feature points include a first feature point
  • the second determining submodule is used for:
  • the first feature point is the non-first feature point in the multiple feature points, then based on the first image feature of the first feature point, determine the prior feature of the first feature point, the first feature point is the multiple features A feature point in the point;
  • the probability distribution of the first feature point is determined.
  • the plurality of feature points include a first feature point
  • the second determining submodule is used for:
  • the probability distribution of the first feature point is determined based on the prior feature of the first feature point.
  • the specified value is determined based on the size of the receptive field used by the context model
  • Grouping module 1302 includes:
  • the third determining submodule is used to determine the slope based on the specified value, and the slope is used to indicate the degree of inclination of the straight line where the feature points divided into the same group are located;
  • the dividing submodule is used for dividing the plurality of feature points into groups based on the slope.
  • the specified value is determined by the size of the largest receptive field among the multiple receptive fields with different sizes.
  • the receptive field used by the context model includes a receptive field with a size of 5*5.
  • multiple feature points are divided into multiple groups based on the specified values in the encoding process, and the multiple groups are sequentially
  • the first image feature of each group of feature points is encoded into the code stream.
  • the grouping is also done in the same manner in the decoding process, and the probability distribution is determined in parallel for each feature point in the same group, so as to speed up the decoding efficiency. That is to say, this solution can break through the efficiency bottleneck caused by serial calculation when decoding based on VAE, and effectively improve the decoding efficiency.
  • the division of the above-mentioned functional modules is used as an example for illustration. In practical applications, the above-mentioned function allocation can be completed by different functional modules according to needs. The internal structure is divided into different functional modules to complete all or part of the functions described above.
  • the encoding device and the encoding method embodiments provided in the above embodiments belong to the same idea, and the specific implementation process thereof is detailed in the method embodiments, and will not be repeated here.
  • FIG. 14 is a schematic block diagram of a codec device 1400 used in an embodiment of the present application.
  • the codec apparatus 1400 may include a processor 1401 , a memory 1402 and a bus system 1403 .
  • the processor 1401 and the memory 1402 are connected through the bus system 1403, the memory 1402 is used to store instructions, and the processor 1401 is used to execute the instructions stored in the memory 1402 to perform various encoding or decoding described in the embodiments of this application method. To avoid repetition, no detailed description is given here.
  • the processor 1401 can be a central processing unit (central processing unit, CPU), and the processor 1401 can also be other general-purpose processors, DSP, ASIC, FPGA or other programmable logic devices, discrete gates Or transistor logic devices, discrete hardware components, etc.
  • a general-purpose processor may be a microprocessor, or the processor may be any conventional processor, or the like.
  • the memory 1402 may include a ROM device or a RAM device. Any other suitable type of storage device may also be used as memory 1402 .
  • Memory 1402 may include code and data 14021 accessed by processor 1401 using bus 1403 .
  • the memory 1402 may further include an operating system 14023 and an application program 14022, where the application program 14022 includes at least one program that allows the processor 1401 to execute the encoding or decoding method described in the embodiment of this application.
  • the application program 14022 may include applications 1 to N, which further include an encoding or decoding application (codec application for short) that executes the encoding or decoding method described in the embodiment of this application.
  • the bus system 1403 may include not only a data bus, but also a power bus, a control bus, and a status signal bus. However, for clarity of illustration, the various buses are labeled as bus system 1403 in the figure.
  • the codec apparatus 1400 may further include one or more output devices, such as a display 1404 .
  • display 1404 may be a touch-sensitive display that incorporates a display with a haptic unit operable to sense touch input.
  • the display 1404 may be connected to the processor 1401 via the bus 1403 .
  • codec apparatus 1400 may execute the encoding method in the embodiment of the present application, and may also execute the decoding method in the embodiment of the present application.
  • Computer-readable media may include computer-readable storage media, which correspond to tangible media, such as data storage media, or communication media including any medium that facilitates transfer of a computer program from one place to another (eg, based on a communication protocol) .
  • a computer-readable medium may generally correspond to (1) a non-transitory tangible computer-readable storage medium, or (2) a communication medium, such as a signal or carrier wave.
  • Data storage media may be any available media that can be accessed by one or more computers or one or more processors to retrieve instructions, code and/or data structures for implementation of the techniques described in this application.
  • a computer program product may include a computer readable medium.
  • such computer-readable storage media may include RAM, ROM, EEPROM, CD-ROM or other optical disk storage, magnetic disk storage or other magnetic storage, flash memory, or any other medium that can contain the desired program code in the form of a computer and can be accessed by a computer.
  • any connection is properly termed a computer-readable medium.
  • coaxial cable, fiber optic cable, twisted pair, digital subscriber line (DSL), or wireless technologies such as infrared, radio, and microwave
  • coaxial cable Wire, fiber optic cable, twisted pair, DSL, or wireless technologies such as infrared, radio, and microwave are included in the definition of media.
  • Disk and disc includes compact disc (CD), laser disc, optical disc, DVD and blu-ray disc where disks usually reproduce data magnetically, while discs reproduce data optically with lasers. Combinations of the above should also be included within the scope of computer-readable media.
  • DSPs digital signal processors
  • ASICs application specific integrated circuits
  • FPGAs field programmable logic arrays
  • DSPs digital signal processors
  • ASICs application specific integrated circuits
  • FPGAs field programmable logic arrays
  • DSPs digital signal processors
  • ASICs application specific integrated circuits
  • FPGAs field programmable logic arrays
  • the term "processor,” as used herein may refer to any of the foregoing structure or any other structure suitable for implementation of the techniques described herein.
  • the functionality described by the various illustrative logical blocks, modules, and steps described herein may be provided within dedicated hardware and/or software modules configured for encoding and decoding, or in conjunction with into the combined codec.
  • the techniques may be fully implemented in one or more circuits or logic elements.
  • various illustrative logical blocks, units, and modules in the encoder 100 and the decoder 200 may be understood as corresponding circuit devices or logic elements.
  • inventions of the present application may be implemented in a wide variety of devices or devices, including a wireless handset, an integrated circuit (IC), or a group of ICs (eg, a chipset).
  • IC integrated circuit
  • a group of ICs eg, a chipset
  • Various components, modules or units are described in the embodiments of the present application to emphasize the functional aspects of the apparatus for performing the disclosed technology, but they do not necessarily need to be realized by different hardware units. Indeed, as described above, the various units may be combined in a codec hardware unit in conjunction with suitable software and/or firmware, or by interoperating hardware units (comprising one or more processors as described above) to supply.
  • all or part of them may be implemented by software, hardware, firmware or any combination thereof.
  • software When implemented using software, it may be implemented in whole or in part in the form of a computer program product.
  • the computer program product includes one or more computer instructions. When the computer instructions are loaded and executed on the computer, the processes or functions according to the embodiments of the present application will be generated in whole or in part.
  • the computer can be a general purpose computer, a special purpose computer, a computer network, or other programmable devices.
  • the computer instructions may be stored in or transmitted from one computer-readable storage medium to another computer-readable storage medium, for example, the computer instructions may be transmitted from a website, computer, server, or data center Transmission to another website site, computer, server, or data center via wired (eg coaxial cable, optical fiber, digital subscriber line (DSL)) or wireless (eg infrared, wireless, microwave, etc.).
  • the computer-readable storage medium may be any available medium that can be accessed by a computer, or may be a data storage device such as a server or a data center integrated with one or more available media.
  • the available medium may be a magnetic medium (for example: floppy disk, hard disk, magnetic tape), an optical medium (for example: digital versatile disc (digital versatile disc, DVD)) or a semiconductor medium (for example: solid state disk (solid state disk, SSD)) Wait.
  • a magnetic medium for example: floppy disk, hard disk, magnetic tape
  • an optical medium for example: digital versatile disc (digital versatile disc, DVD)
  • a semiconductor medium for example: solid state disk (solid state disk, SSD)
  • the computer-readable storage medium mentioned in the embodiment of the present application may be a non-volatile storage medium, in other words, may be a non-transitory storage medium.
  • the information including but not limited to user equipment information, user personal information, etc.
  • data including but not limited to data used for analysis, stored data, displayed data, etc.
  • All signals are authorized by the user or fully authorized by all parties, and the collection, use and processing of relevant data need to comply with relevant laws, regulations and standards of relevant countries and regions.
  • the images, videos, etc. involved in the embodiments of this application are obtained under the condition of full authorization.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Multimedia (AREA)
  • General Physics & Mathematics (AREA)
  • Signal Processing (AREA)
  • Biophysics (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Biomedical Technology (AREA)
  • Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Probability & Statistics with Applications (AREA)
  • Compression Or Coding Systems Of Tv Signals (AREA)

Abstract

本申请实施例公开了一种编解码方法、装置、设备、存储介质及计算机程序,属于编解码技术领域。在本申请实施例中,解码过程中基于指定数值将多个特征点划分为多组,对于同一组中的各个特征点并行确定概率分布,这样能够加快解码效率。相应地,在编码过程中也是按照同样的分组方式将该多个特征组进行分组,依次该多组中每组特征点的第一图像特征编入码流。也即是,本方案能够突破基于变分自编码器VAE进行解码时串行计算所带来的效率瓶颈,有效提高解码效率。

Description

编解码方法、装置、设备、存储介质、计算机程序及产品
本申请要求于2021年5月29日提交的申请号为202110596003.6、发明名称为“编解码方法、装置、设备、存储介质及计算机程序”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请实施例涉及编解码技术领域,特别涉及一种编解码方法、装置、设备、存储介质、计算机程序及产品。
背景技术
图像压缩技术能够实现图像信息的有效传输和存储,对于当前图像信息的种类和数据量越来越大的媒体时代起着重要作用。图像压缩技术包括对图像的编码和解码,而编解码性能(体现图像质量)以及编解码效率(体现耗时)是图像压缩技术中需要考虑的要素。
在相关技术中,经过技术人员长期的研究与优化,目前已经形成了诸如JPEG、PNG等有损图像压缩标准。但是这些较为传统的图像压缩技术在编解码性能的提升上遇到了瓶颈,已无法满足多媒体应用数据日益增多的时代需求。而随着深度学习技术在图像识别、目标检测等领域的广泛应用,深度学习技术也被应用于图像压缩任务中,这样编解码效率比传统的图像压缩技术更高。例如,使用基于深度学习技术的变分自编码器(variational auto-encoder,VAE)进行图像编解码,能够大幅提升编解码性能。
然而,在基于深度学习技术的图像压缩方法的研究过程中,如何有效地保证编解码性能,同时又能提升编解码效率,是需要关注和研究的问题。例如,相关技术中在使用VAE进行图像解码的过程中,采用神经网络模型串行计算图像的各个特征点的概率分布,基于概率分布解码出图像。而计算概率分布由神经网络模型实现,串行计算会使解码效率较低。如何保证不降低编解码性能的同时,突破解码时串行计算所带来的效率瓶颈,是基于VAE进行编解码方法研究中需要关注的问题。
发明内容
本申请实施例提供了一种编解码方法、装置、设备、存储介质及计算机程序,能够保证不降低编解码性能的同时,突破基于VAE解码时串行计算所带来的效率瓶颈。所述技术方案如下:
第一方面,提供了一种解码方法,所述方法包括:
基于码流确定待解码的图像的多个特征点中各个特征点的先验特征;基于指定数值将该多个特征点划分为多组;基于该多个特征点的先验特征,依次确定多组中每组特征点的第一图像特征;其中,确定任一组特征点的第一图像特征的步骤为:并行确定该任一组中各个特征点的概率分布,基于该任一组中各个特征点的概率分布,从码流中解析出该任一组中各个特征点的第一图像特征;基于该多个特征点的第一图像特征重建图像。
也即是,在本申请实施例中,解码过程中基于指定数值将多个特征点划分为多组,对于同一组中的各个特征点并行确定概率分布,这样能够加快解码效率。也即是,本方案能够突破基于VAE进行解码时串行计算所带来的效率瓶颈,有效提高解码效率。
需要说明的是,该方法应用于包括上下文模型的编解码器中,当解码到该多组中任一组时,该任一组中各个特征点的周边信息已全部解码得到,也即是该任一组中的特征点满足周边信息已解码得到的条件。
其中,该多个特征点包括第一特征点,确定第一特征点的概率分布,包括:若第一特征点为该多个特征点中的非首个特征点,则从已解码的各个特征点的第一图像特征中,确定第一特征点的周边信息,第一特征点为任一组中的一个特征点;将第一特征点的周边信息输入上下文模型,得到上下文模型输出的第一特征点的上下文特征;基于第一特征点的先验特征和第一特征点的上下文特征,确定第一特征点的概率分布。
可选地,第一特征点的周边信息包括以第一特征点为几何中心的邻域内已解码的特征点的第一图像特征,该邻域的大小基于上下文模型使用的感受野的尺寸确定,该周边信息至少包括第一特征点周边的n个特征点的第一图像特征,n大于或等于4。也即是,为了保证编解码性能,保证图像质量,本方案在保证压缩率的同时尽量利用较多的周边信息。
可选地,该多个特征点包括第一特征点,确定第一特征点的概率分布,包括:若第一特征点为该多个特征点中的首个特征点,则基于第一特征点的先验特征,确定第一特征点的概率分布。
可选地,指定数值基于上下文模型使用的感受野的尺寸确定;基于指定数值将该多个特征点划分为多组,包括:基于指定数值确定斜率,该斜率用于指示划分为同一组的特征点所在直线的倾斜程度;基于该斜率,将该多个特征点划分为多组。也即是,本方案基于感受野的尺寸来确定可以并行解码的一组特征点。
可选地,若上下文模型使用多个尺寸不同的感受野,则指定数值是通过该多个尺寸不同的感受野中最大感受野的尺寸确定。
可选地,上下文模型使用的感受野包括尺寸为5*5的感受野。
第二方面,提供了一种编码方法,该方法包括:
基于待编码的图像,确定该图像的多个特征点中各个特征点的第一图像特征、概率分布和第一超先验特征;基于指定数值将该多个特征点划分为多组;基于该多个特征点的概率分布,依次将该多组中每组特征点的第一图像特征编入码流;将该多个特征点的第一超先验特征编入码流。
也即是,在本申请实施例中,为了在解码过程中并行确定概率分布,以提高解码效率,在编码过程中基于指定数值将多个特征点划分为多组,依次将该多组中每组特征点的第一图像特征编入码流。这样,在解码过程中也按照同样的方式分组,对于同一组中的各个特征点并行确定概率分布,以提高解码效率。也即是,本方案能够突破基于VAE进行解码时串行计算所带来的效率瓶颈,有效提高解码效率。
可选地,基于待编码的图像,确定该图像的多个特征点中各个特征点的第一图像特征、概率分布和第一超先验特征,包括:基于该图像,确定该多个特征点的第一图像特征;基于该多个特征点的第一图像特征,确定该多个特征点的第一超先验特征,以及并行确定该多个 特征点中各个特征点的概率分布。
与解码方法相对应,该方法也应用于包括上下文模型的编解码器。可选地,该多个特征点包括第一特征点,确定第一特征点的概率分布,包括:若第一特征点为该多个特征点中的非首个特征点,则基于第一特征点的第一图像特征,确定第一特征点的先验特征,第一特征点为该多个特征点中的一个特征点;从该多个特征点的第一图像特征中,确定第一特征点的周边信息;将第一特征点的周边信息输入上下文模型,得到上下文模型输出的第一特征点的上下文特征;基于第一特征点的先验特征和第一特征点的上下文特征,确定第一特征点的概率分布。
可选地,该多个特征点包括第一特征点,确定第一特征点的概率分布,包括:若第一特征点为该多个特征点中的首个特征点,则基于第一特征点的先验特征,确定第一特征点的概率分布。
可选地,指定数值基于上下文模型使用的感受野的尺寸确定;基于指定数值将该多个特征点划分为多组,包括:基于该指定数值确定斜率,该斜率用于指示划分为同一组的特征点所在直线的倾斜程度;基于该斜率,将该多个特征点划分为多组。
可选地,若上下文模型使用多个尺寸不同的感受野,则指定数值是通过该多个尺寸不同的感受野中最大感受野的尺寸确定。
可选地,上下文模型使用的感受野包括尺寸为5*5的感受野。
第三方面,提供了一种解码装置,所述解码装置具有实现上述第一方面中解码方法行为的功能。所述解码装置包括一个或多个模块,该一个或多个模块用于实现上述第一方面所提供的解码方法。
也即是,提供了一种解码装置,该解码装置包括:
第一确定模块,用于基于码流确定待解码的图像的多个特征点中各个特征点的先验特征;
分组模块,用于基于指定数值将该多个特征点划分为多组;
第二确定模块,用于基于该多个特征点的先验特征,依次确定该多组中每组特征点的第一图像特征;其中,确定任一组特征点的第一图像特征的步骤为:并行确定该任一组中各个特征点的概率分布,基于该任一组中各个特征点的概率分布,从码流中解析出该任一组中各个特征点的第一图像特征;
重建模块,用于基于该多个特征点的第一图像特征重建图像。
可选地,该多个特征点包括第一特征点,第二确定模块包括:
第一处理子模块,用于若第一特征点为该多个特征点中的非首个特征点,则从已解码的各个特征点的第一图像特征中,确定第一特征点的周边信息,第一特征点为该任一组中的一个特征点;
第二处理子模块,用于将第一特征点的周边信息输入上下文模型,得到上下文模型输出的第一特征点的上下文特征;
第三处理子模块,用于基于第一特征点的先验特征和第一特征点的上下文特征,确定第一特征点的概率分布。
可选地,第一特征点的周边信息包括以第一特征点为几何中心的邻域内已解码的特征点的第一图像特征,该邻域的大小基于上下文模型使用的感受野的尺寸确定,该周边信息至少 包括第一特征点周边的n个特征点的第一图像特征,n大于或等于4。
可选地,该多个特征点包括第一特征点,第二确定模块包括:
第四处理子模块,用于若第一特征点为该多个特征点中的首个特征点,则基于第一特征点的先验特征,确定第一特征点的概率分布。
可选地,指定数值基于上下文模型使用的感受野的尺寸确定;
分组模块包括:
第一确定子模块,用于基于指定数值确定斜率,该斜率用于指示划分为同一组的特征点所在直线的倾斜程度;
划分子模块,用于基于该斜率,将该多个特征点划分为多组。
可选地,若上下文模型使用多个尺寸不同的感受野,则指定数值是通过该多个尺寸不同的感受野中最大感受野的尺寸确定。
可选地,上下文模型使用的感受野包括尺寸为5*5的感受野。
第四方面,提供了一种编码装置,所述编码装置具有实现上述第二方面中编码方法行为的功能。所述编码装置包括一个或多个模块,该一个或多个模块用于实现上述第二方面所提供的编码方法。
也即是,提供了一种编码装置,所述装置包括:
第一确定模块,用于基于待编码的图像,确定图像的多个特征点中各个特征点的第一图像特征、概率分布和第一超先验特征;
分组模块,用于基于指定数值将该多个特征点划分为多组;
第一编码模块,用于基于该多个特征点的概率分布,依次将该多组中每组特征点的第一图像特征编入码流;
第二编码模块,用于将该多个特征点的第一超先验特征编入码流。
可选地,第一确定模块包括:
第一确定子模块,用于基于该图像,确定该多个特征点的第一图像特征;
第二确定子模块,用于基于该多个特征点的第一图像特征,确定该多个特征点的第一超先验特征,以及并行确定该多个特征点中各个特征点的概率分布。
可选地,该多个特征点包括第一特征点,第二确定子模块用于:
若第一特征点为该多个特征点中的非首个特征点,则基于第一特征点的第一图像特征,确定第一特征点的先验特征,第一特征点为该多个特征点中的一个特征点;
从该多个特征点的第一图像特征中,确定第一特征点的周边信息;
将第一特征点的周边信息输入上下文模型,得到上下文模型输出的第一特征点的上下文特征;
基于第一特征点的先验特征和第一特征点的上下文特征,确定第一特征点的概率分布。
可选地,该多个特征点包括第一特征点,第二确定子模块用于:
若第一特征点为该多个特征点中的首个特征点,则基于第一特征点的先验特征,确定第一特征点的概率分布。
可选地,该指定数值基于上下文模型使用的感受野的尺寸确定;
分组模块包括:
第三确定子模块,用于基于该指定数值确定斜率,该斜率用于指示划分为同一组的特征点所在直线的倾斜程度;
划分子模块,用于基于该斜率,将该多个特征点划分为多组。
可选地,若上下文模型使用多个尺寸不同的感受野,则该指定数值是通过该多个尺寸不同的感受野中最大感受野的尺寸确定。
可选地,上下文模型使用的感受野包括尺寸为5*5的感受野。
第五方面,提供了一种解码端设备,所述解码端设备包括处理器和存储器,所述存储器用于存储执行上述第一方面所提供的解码方法的程序。所述处理器被配置为用于执行所述存储器中存储的程序,以实现上述第一方面提供的解码方法。
可选地,所述解码端设备还可以包括通信总线,该通信总线用于该处理器与存储器之间建立连接。
第六方面,提供了一种编码端设备,所述编码端设备包括处理器和存储器,所述存储器用于存储执行上述第二方面所提供的编码方法的程序。所述处理器被配置为用于执行所述存储器中存储的程序,以实现上述第二方面提供的编码方法。
可选地,所述编码端设备还可以包括通信总线,该通信总线用于该处理器与存储器之间建立连接。
第七方面,提供了一种计算机可读存储介质,所述存储介质内存储有指令,当所述指令在计算机上运行时,使得计算机执行上述第一方面所述的解码方法的步骤,或者执行上述第二方面所述的编码方法的步骤。
第八方面,提供了一种包含指令的计算机程序产品,当所述指令在计算机上运行时,使得计算机执行上述第一方面所述的解码方法的步骤,或者执行上述第二方面所述的编码方法的步骤。或者说,提供了一种计算机程序,所述计算机程序被执行时实现上述第一方面所述的解码方法的步骤,或者实现上述第二方面所述的编码方法的步骤。
上述第三方面、第四方面、第五方面、第六方面、第七方面和第八方面所得到的技术效果与第一方面或第二方面中对应的技术手段得到的技术效果近似,在这里不再赘述。
本申请实施例提供的技术方案至少能够带来以下有益效果:
在本申请实施例中,解码过程中基于指定数值将多个特征点划分为多组,对于同一组中的各个特征点并行确定概率分布,这样能够加快解码效率。相应地,在编码过程中也是按照同样的分组方式将该多个特征组进行分组,依次该多组中每组特征点的第一图像特征编入码流。也即是,本方案能够突破基于VAE进行解码时串行计算所带来的效率瓶颈,有效提高解码效率。
附图说明
图1是本申请实施例提供的一种实施环境的示意图;
图2是本申请实施例提供的一种编解码框架的结构示意图;
图3是本申请实施例提供的一种编解码顺序的示意图;
图4是本申请实施例提供的一种编码方法的流程图;
图5是本申请实施例提供的一种编码中利用周边信息的示意图;
图6是本申请实施例提供的另一种编解码顺序的示意图;
图7是本申请实施例提供的又一种编解码顺序的示意图;
图8是本申请实施例提供的又一种编解码顺序的示意图;
图9是本申请实施例提供的又一种编解码顺序的示意图;
图10是本申请实施例提供的另一种编解码框架的结构示意图;
图11是本申请实施例提供的一种解码方法的流程图;
图12是本申请实施例提供的一种解码装置的结构示意图;
图13是本申请实施例提供的一种编码装置的结构示意图;
图14是本申请实施例提供的一种编解码装置的示意性框图。
具体实施方式
为使本申请实施例的目的、技术方案和优点更加清楚,下面将结合附图对本申请实施方式作进一步地详细描述。
本申请实施例描述的网络架构以及业务场景是为了更加清楚的说明本申请实施例的技术方案,并不构成对于本申请实施例提供的技术方案的限定,本领域普通技术人员可知,随着网络架构的演变和新业务场景的出现,本申请实施例提供的技术方案对于类似的技术问题,同样适用。
在对本申请实施例提供的编解码方法进行详细地解释说明之前,先对本申请实施例涉及的术语和实施环境进行介绍。
为了便于理解,首先对本申请实施例涉及的术语进行解释。
像素深度(bits per pixel,BPP):又称为位/像素,BPP是存储每个像素所用的位数,BPP越小代表压缩码率越小。
码率:在图像压缩中,指单位像素编码所需要的编码长度,码率越高,图像重建质量越好。
峰值信噪比(peak signal to noise ratio,PSNR):是一种评价图像质量的客观标准,PSNR越高代表图像质量越好。
多尺度结构相似性(multi-scale structural similarity index measure,MS-SSIM):是一种评价图像的客观标准,MS-SSIM越高代表图像质量越好。
卷积神经网络(convolution neural network,CNN):是一种包含卷积计算且具有深度结构的前馈神经网络,是深度学习的代表算法之一。
变分自编码器(variational autoencoder,VAE):是用于数据压缩或去噪的自动编码器的一种。
接下来对本申请实施例涉及的实施环境进行介绍。
请参考图1,图1是本申请实施例提供的一种实施环境的示意图。该实施环境包括源装置10、目的地装置20、链路30和存储装置40。其中,源装置10可以产生经编码的图像。因此,源装置10也可以被称为图像编码装置。目的地装置20可以对由源装置10所产生的经编码的图像进行解码。因此,目的地装置20也可以被称为图像解码装置。链路30可以接收源装置10所产生的经编码的图像,并可以将该经编码的图像传输给目的地装置20。存储装置40可以接收源装置10所产生的经编码的图像,并可以将该经编码的图像进行存储,这样的条件下,目的地装置20可以直接从存储装置40中获取经编码的图像。或者,存储装置40可以对应于文件服务器或可以保存由源装置10产生的经编码的图像的另一中间存储装置,这样的条件下,目的地装置20可以经由流式传输或下载存储装置40存储的经编码的图像。
源装置10和目的地装置20均可以包括一个或多个处理器以及耦合到该一个或多个处理器的存储器,该存储器可以包括随机存取存储器(random access memory,RAM)、只读存储器(read-only memory,ROM)、带电可擦可编程只读存储器(electrically erasable programmable read-only memory,EEPROM)、快闪存储器、可用于以可由计算机存取的指令或数据结构的形式存储所要的程序代码的任何其它媒体等。例如,源装置10和目的地装置20均可以包括手机、智能手机、个人数字助手(Personal Digital Assistant,PDA)、可穿戴设备、掌上电脑PPC(pocket PC)、平板电脑、智能车机、智能电视、智能音箱、桌上型计算机、移动计算装置、笔记型(例如,膝上型)计算机、平板计算机、机顶盒、例如所谓的“智能”电话等电话手持机、电视机、相机、显示装置、数字媒体播放器、视频游戏控制台、车载计算机或其类似者。
链路30可以包括能够将经编码的图像从源装置10传输到目的地装置20的一个或多个媒体或装置。在一种可能的实现方式中,链路30可以包括能够使源装置10实时地将经编码的图像直接发送到目的地装置20的一个或多个通信媒体。在本申请实施例中,源装置10可以基于通信标准来调制经编码的图像,该通信标准可以为无线通信协议等,并且可以将经调制的图像发送给目的地装置20。该一个或多个通信媒体可以包括无线和/或有线通信媒体,例如该一个或多个通信媒体可以包括射频(radio frequency,RF)频谱或一个或多个物理传输线。该一个或多个通信媒体可以形成基于分组的网络的一部分,基于分组的网络可以为局域网、广域网或全球网络(例如,因特网)等。该一个或多个通信媒体可以包括路由器、交换器、基站或促进从源装置10到目的地装置20的通信的其它设备等,本申请实施例对此不做具体限定。
在一种可能的实现方式中,存储装置40可以将接收到的由源装置10发送的经编码的图像进行存储,目的地装置20可以直接从存储装置40中获取经编码的图像。这样的条件下,存储装置40可以包括多种分布式或本地存取的数据存储媒体中的任一者,例如,该多种分布式或本地存取的数据存储媒体中的任一者可以为硬盘驱动器、蓝光光盘、数字多功能光盘(digital versatile disc,DVD)、只读光盘(compact disc read-only memory,CD-ROM)、快闪存储器、易失性或非易失性存储器,或用于存储经编码图像的任何其它合适的数字存储媒体等。
在一种可能的实现方式中,存储装置40可以对应于文件服务器或可以保存由源装置10产生的经编码图像的另一中间存储装置,目的地装置20可经由流式传输或下载存储装置40 存储的图像。文件服务器可以为能够存储经编码的图像并且将经编码的图像发送给目的地装置20的任意类型的服务器。在一种可能的实现方式中,文件服务器可以包括网络服务器、文件传输协议(file transfer protocol,FTP)服务器、网络附属存储(network attached storage,NAS)装置或本地磁盘驱动器等。目的地装置20可以通过任意标准数据连接(包括因特网连接)来获取经编码图像。任意标准数据连接可以包括无线信道(例如,Wi-Fi连接)、有线连接(例如,数字用户线路(digital subscriber line,DSL)、电缆调制解调器等),或适合于获取存储在文件服务器上的经编码的图像的两者的组合。经编码的图像从存储装置40的传输可为流式传输、下载传输或两者的组合。
图1所示的实施环境仅为一种可能的实现方式,并且本申请实施例的技术不仅可以适用于图1所示的可以对图像进行编码的源装置10,以及可以对经编码的图像进行解码的目的地装置20,还可以适用于其他可以对图像进行编码和对经编码的图像进行解码的装置,本申请实施例对此不做具体限定。
在图1所示的实施环境中,源装置10包括数据源120、编码器100和输出接口140。在一些实施例中,输出接口140可以包括调节器/解调器(调制解调器)和/或发送器,其中发送器也可以称为发射器。数据源120可以包括图像捕获装置(例如,摄像机等)、含有先前捕获的图像的存档、用于从图像内容提供者接收图像的馈入接口,和/或用于产生图像的计算机图形***,或图像的这些来源的组合。
数据源120可以向编码器100发送图像,编码器100可以对接收到由数据源120发送的图像进行编码,得到经编码的图像。编码器可以将经编码的图像发送给输出接口。在一些实施例中,源装置10经由输出接口140将经编码的图像直接发送到目的地装置20。在其它实施例中,经编码的图像还可存储到存储装置40上,供目的地装置20以后获取并用于解码和/或显示。
在图1所示的实施环境中,目的地装置20包括输入接口240、解码器200和显示装置220。在一些实施例中,输入接口240包括接收器和/或调制解调器。输入接口240可经由链路30和/或从存储装置40接收经编码的图像,然后再发送给解码器200,解码器200可以对接收到的经编码的图像进行解码,得到经解码的图像。解码器可以将经解码的图像发送给显示装置220。显示装置220可与目的地装置20集成或可在目的地装置20外部。一般来说,显示装置220显示经解码的图像。显示装置220可以为多种类型中的任一种类型的显示装置,例如,显示装置220可以为液晶显示器(liquid crystal display,LCD)、等离子显示器、有机发光二极管(organic light-emitting diode,OLED)显示器或其它类型的显示装置。
尽管图1中未示出,但在一些方面,编码器100和解码器200可各自与编码器和解码器集成,且可以包括适当的多路复用器-多路分用器(multiplexer-demultiplexer,MUX-DEMUX)单元或其它硬件和软件,用于共同数据流或单独数据流中的音频和视频两者的编码。在一些实施例中,如果适用的话,那么MUX-DEMUX单元可符合ITU H.223多路复用器协议,或例如用户数据报协议(user datagram protocol,UDP)等其它协议。
编码器100和解码器200各自可为以下各项电路中的任一者:一个或多个微处理器、数字信号处理器(digital signal processing,DSP)、专用集成电路(application specific integrated circuit,ASIC)、现场可编程门阵列(field-programmable gate array,FPGA)、离散逻辑、硬件或其任何组合。如果部分地以软件来实施本申请实施例的技术,那么装置可将用于软件的指 令存储在合适的非易失性计算机可读存储媒体中,且可使用一个或多个处理器在硬件中执行所述指令从而实施本申请实施例的技术。前述内容(包括硬件、软件、硬件与软件的组合等)中的任一者可被视为一个或多个处理器。编码器100和解码器200中的每一者都可以包括在一个或多个编码器或解码器中,所述编码器或所述解码器中的任一者可以集成为相应装置中的组合编码器/解码器(编码解码器)的一部分。
本申请实施例可大体上将编码器100称为将某些信息“发信号通知”或“发送”到例如解码器200的另一装置。术语“发信号通知”或“发送”可大体上指代用于对经压缩的图像进行解码的语法元素和/或其它数据的传送。此传送可实时或几乎实时地发生。替代地,此通信可经过一段时间后发生,例如可在编码时在经编码位流中将语法元素存储到计算机可读存储媒体时发生,解码装置接着可在所述语法元素存储到此媒体之后的任何时间检索所述语法元素。
本申请实施例提供的编解码方法可以应用于多种场景,在各种场景中编解码的图像均可以是图像文件包括的图像,也可以是视频文件包括的图像。需要说明的是,结合图1所示的实施环境,下文中的任一种编码方法可以是源装置10中的编码器100执行的。下文中的任一种解码方法可以是目的地装置20中的解码器200执行的。
需要说明的是,本申请实施例提供的编解码方法可以应用于任何VAE方法的视频、图像压缩框架中,提供的编解码方法。接下来先对基础的VAE方法的编解码模型进行介绍。
请参考图2,在编码端,将原始图像输入编码网络模型提取特征得到多个特征点的待量化的图像特征y,待量化的图像特征y在本申请实施例中也称为第二图像特征y,对该多个特征点的第二图像特征y进行量化得到该多个特征点的第一图像特征
Figure PCTCN2022095149-appb-000001
将该多个特征点的第一图像特征
Figure PCTCN2022095149-appb-000002
输入至超编码网络模型得到该多个特征点的待量化的超先验特征z,待量化的超先验特征z在本申请实施例中也称为第二超先验特征z,对该多个特征点的第二超先验特征z进行量化得到该多个特征点的第一超先验特征
Figure PCTCN2022095149-appb-000003
根据指定的概率分布对该多个特征点的第一超先验特征
Figure PCTCN2022095149-appb-000004
进行熵编码,以将
Figure PCTCN2022095149-appb-000005
编入码流。如图2所示对
Figure PCTCN2022095149-appb-000006
进行熵编码得到的比特序列即为码流包括的部分比特序列,这部分比特序列(如图2中右侧黑白条所示)可称为超先验比特流。
另外,将该多个特征点的第一超先验特征
Figure PCTCN2022095149-appb-000007
输入超解码网络模型得到该多个特征点的先验特征ψ。将该多个特征点的第一图像特征
Figure PCTCN2022095149-appb-000008
输入上下文模型(context model,CM)得到该多个特征点的上下文特征φ。结合该多个特征点的先验特征ψ和上下文特征φ,通过概率分布估计模型(图示为gather model,GM)估计出该多个特征点的概率分布N(μ,σ),基于该多个特征点的概率分布N(μ,σ),依次将该多个特征点中各个特征点的第一图像特征
Figure PCTCN2022095149-appb-000009
以编入码流。如图2所示对
Figure PCTCN2022095149-appb-000010
进行熵编码得到的比特序列即为码流包括的部分比特序列,这部分比特序列(如图2中左侧黑白条所示)可称为图像比特流。
在解码端,首先根据指定的概率分布从码流包括的超先验比特流中熵解码得到该多个特征点的第一超先验特征
Figure PCTCN2022095149-appb-000011
将该多个特征点的第一超先验特征
Figure PCTCN2022095149-appb-000012
输入超解码网络模型得到该多个特征点的先验特征ψ。对于该多个特征点中的首个特征点,基于首个特征点的先验特征来估计首个特征点的概率分布,基于首个特征点的概率分布,从码流包括的图像比特流中解析出首个特征点的第一图像特征。对于该多个特征点中的非首个特征点,如第一特征点,从已 解码的各个特征点的第一图像特征中,确定第一特征点的周边信息,将第一特征点的周边信息输入上下文模型CM得到第一特征点的上下文特征,结合第一特征点的先验特征和上下文特征,通过概率分布估计模型GM估计出第一特征点的概率分布,基于第一特征点的概率分布,从码流包括的图像比特流中解析出第一特征点的第一图像特征。从码流中熵解码出该多个特征点的第一图像特征
Figure PCTCN2022095149-appb-000013
之后,将
Figure PCTCN2022095149-appb-000014
输入解码网络模型得到重建后的图像。
其中,对于估计该多个特征点的概率分布这个计算过程来说,相关技术中编码端设备和解码端设备均是对该多个特征点中的各个特征点依次进行概率分布的计算,如图3所示,相关技术中解码端按照图3中箭头指示的顺序依次估计各个特征点的概率分布,编码端按照图3中箭头指示的顺序依次将各个特征点的第一图像特征编入码流。另外,如图3所示,相关技术中,假设当前解码端解码到黑色填充的一个特征点,斜线填充的特征点为已解码的特征点,对于黑色填充的这个特征点来说,该特征点的周边信息包括黑色粗线框中斜线填充的12个特征点的第一图像特征。
由以上可知,VAE方法的编解码模型包含两部分,一部分为特征提取及解码模块,另一部分为熵编码模块。在熵编码模块中通过引入上下文信息(即周边信息)和超先验信息,能够大幅提升压缩性能。
接下来对本申请实施例提供的编码方法进行介绍。
请参考图4,图4是本申请实施例提供的一种编码方法的流程图,该编码方法应用于编码端设备,该编码方法包括如下步骤。
步骤401:基于待编码的图像,确定该图像的多个特征点中各个特征点的第一图像特征、概率分布和第一超先验特征。
其中,待编码的图像为图像文件中的图像,或者为视频文件中的图像,且待编码的图像的形式可以为任一种形式,本申请实施例对此不作限定。
在本申请实施例中,基于待编码的图像,确定该图像的多个特征点中各个特征点的第一图像特征、概率分布和第一超先验特征的实现过程为:基于该图像,确定该多个特征点的第一图像特征,基于该多个特征点的第一图像特征,确定该多个特征点的第一超先验特征,以及并行确定该多个特征点中各个特征点的概率分布。
其中,基于该图像,确定该多个特征点的第一图像特征的实现过程为:将该图像输入编码网络模型,得到该编码网络模型输出的该多个特征点的第二图像特征,对该多个特征点的第二图像特征进行量化处理,得到该多个特征点的第一图像特征。
基于该多个特征点的第一图像特征,确定该多个特征点的第一超先验特征的实现过程为:将该多个特征点的第一图像特征输入超编码网络模型,得到该超编码网络模型输出的该多个特征点的第二超先验特征,对该多个特征点的第二超先验特征进行量化处理,得到该多个特征点的第一超先验特征。
其中,上述实现过程中所涉及的量化处理的方式可以有多种,例如标量量化,变量量化的量化步长可以基于不同的编码速率来确定,也即是,事先存储编码速率与量化步长的对应关系,基于本申请实施例采用的编码速率从该对应关系中获取对应的量化步长。另外,标量量化还可以存在偏置量,即通过偏置量对待量化的数据(如第二图像特征或第二超先验特征)进行偏置处理后再按照量化步长进行标量量化。
需要说明的是,下文的量化处理的方式与此处的类似,下文的量化处理方式可以参考此处的方式,本申请实施例在后文不再赘述。
该多个特征点包括第一特征点,确定第一特征点的概率分布的实现过程为:若第一特征点为该多个特征点中的非首个特征点,则基于第一特征点的第一图像特征,确定第一特征点的先验特征,从该多个特征点的第一图像特征中,确定第一特征点的周边信息,将第一特征点的周边信息输入上下文模型,得到上下文模型输出的第一特征点的上下文特征,基于第一特征点的先验特征和第一特征点的上下文特征,确定第一特征点的概率分布。其中,第一特征点为多个特征点中的一个特征点。
其中,基于第一特征点的第一图像特征,确定第一特征点的先验特征的实现过程为:基于第一特征点的第一图像特征,确定第一特征点的第一超先验特征,基于第一特征点的第一超先验特征,确定第一特征点的先验特征。需要说明的是,确定第一特征点的第一超先验特征的实现过程,即确定该多个特征点中任一特征点的第一超先验特征的实现过程,该过程已在上文进行了介绍,这里不再赘述。而基于第一特征点的第一超先验特征,确定第一特征点的先验特征的实现过程为:将第一特征点的第一超先验特征输入超解码网络模型,得到该超解码网络模型输出的第一特征点的先验特征。
上述介绍的编码网络模型、超编码网络模型和超解码网络模型均是预先训练好的,本申请实施例对编码网络模型、超编码网络模型和超解码网络模型的网络结构和训练方法不作限定。例如,编码网络模型、超编码网络模型和超解码网络模型的网络结构均可以是全连接网络或卷积神经网络CNN,卷积神经网络中的卷积可以是2D卷积或3D卷积。另外,本申请实施例对编码网络模型、超编码网络模型和超解码网络模型的网络结构所包含的层数和每一层的节点数也不作限定。
在本申请实施例中,以编码网络模型、超编码网络模型和超解码网络模型的网络结构均为CNN,CNN中的卷积为2D卷积为例,编码网络模型输出多个特征点的第二图像特征由一个C*W*H维矩阵表示,量化处理得到的该多个特征点的第一图像特征也由一个C*W*H维矩阵表示,其中C为CNN的通道数,W*H表示该多个特征点所组成的特征图的大小。相应地,基于超编码网络模型得到多个特征点的第二超先验特征、量化得到的多个特征点的第一超先验特征、基于超解码网络模型得到的多个特征点的先验特征也均由C*W*H维矩阵表示。
另外,本申请实施例中的上下文模型也为预先训练好的,本申请实施例对上下文模型的网络结构和训练方法不作限定。例如,上下文模型的网络结构可以是掩膜区CNN(mask region CNN,Mask R-CNN),其中Mask R-CNN中使用感受野来提取上下文特征,上下文模型中可以使用一个或多个感受野,该一个或多个感受野中各个感受野的尺寸不同,本申请实施例对此不作限定。可选地,在本申请实施例中,上下文模型使用的感受野包括尺寸为5*5的感受野。另外,上下文模型中的卷积可以为2D卷积或3D卷积,假设上下文模型中的卷积为2D卷积,那么感受野的尺寸可以为3*3、5*5或7*7等。
需要说明的是,第一特征点的周边信息即为确定第一特征点的上下文特征所需利用的图像特征。可选地,从该多个特征点的第一图像特征中,确定第一特征点的周边信息的实现过程为:根据预设规则从该多个特征点的第一图像特征中,确定第一特征线的周边信息。可选地,第一特征点的周边信息包括以第一特征点为几何中心的邻域内至少n个特征点的第一图像特征,该邻域的大小基于上下文模型使用的感受野的尺寸确定,第一特征点的周边信息至 少包括第一特征点周边的n个特征点的第一图像特征,n大于或等于4。
示例性地,参见图5,假设上下文模型使用一个尺寸为5*5的感受野,上下文模型中的卷积为2D卷积,那么领域的大小即为5*5,根据预设规则所确定的特征点A的周边信息包括以特征点A为几何中心的粗线框内的6个特征点的第一图像特征,且这六个特征点位于特征点A的上方(包括正上方和斜上方)。特征点B的周边信息包括以特征线B为几何中心的粗线框内的12个特征点的第一图像特征,且这12个特征点位于特征点B的正左方的2个特征点和上方的10个特征点。特征点C的周边信息也包括12个特征点的第一图像特征,这12个特征点与特征点C的相对位置,类似于特征点B与特征点B对应的12个特征点的相对位置。特征点D的周边信息包括以特征点D为几何中心的粗线框内的8个特征点的第一图像特征,且这8个特征点位于特征点D的正左方的2个特征点和上方的6个特征点。
在第一特征点为该多个特征点中的非首个特征点的情况下,在确定第一特征点的周边信息之后,将第一特征点的周边信息之后,将第一特征点的周边信息输入上下文模型,得到上下文模型输出的第一特征点的上下文特征。之后,基于第一特征点的先验特征和第一特征点的上下文特征,确定第一特征点的概率分布。
可选地,基于第一特征点的先验特征和第一特征点的上下文特征,确定第一特征点的概率分布的实现过程为:将第一特征点的先验特征和第一特征点的上下文特征输入概率分布估计模型,得到概率分布估计模型输出第一特征点的概率分布,概率分布由均值和标准差表示。其中,概率分布估计模型为预先训练好的,概率分布估计模型的网络结构为神经网络,例如CNN,本申请实施例对概率分布估计模型的网络结构所包含的层数和每一层的节点数不作限定。可选地,概率分布估计模型为上文介绍的GM模型。
另外,若第一特征点为该多个特征点中的首个特征点,则基于第一特征点的先验特征,确定第一特征点的概率分布。也就是说,对于首个特征点来说,编码过程中不利用周边信息,或者说将首个特征点的周边信息设定为0。需要说明的是,若第一特征点为该多个特征点中的首个特征点,则确定首个特征点的概率分布的实现过程为:将首个特征点的先验特征输入概率分布估计模型,得到概率分布估计模型输出的首个特征点的概率分布。或者,首个特征点的周边信息为0,确定首个特征点的概率分布的实现过程为:将首个特征点的周边信息输入上下文模型,得到上下文模型输出的首个特征点的上下文特征,首个特征点的上下文特征为0,将首个特征点的先验特征和上下文特征0输入概率分布估计模型,得到概率分布估计模型输出的首个特征点的概率分布。
需要说明的是,若上下文模型中使用多个感受野,则在基于上下文模型确定各个特征点的上下文特征的过程中,基于该多个感受野中的各个感受野,分别对各个特征点的周边信息进行特征提取,得到相应感受野对应的各个特征点的多个第一上下文特征,也即各个特征点的上下文特征基于相应特征点的多个第一上下文特征确定,该多个第一上下文特征与该多个感受野一一对应,简单地说,使用几个感受野,对于每个特征点来说即得到几个第一上下文特征。
基于此,在一种实现方式中,各个特征点的上下文特征包括相应特征点的多个第一上下文特征,基于上下文模型中使用的多个感受野得到各个特征点的多个第一上下文特征之后,上下文模型输出每个特征点的多个第一上下文特征,再将每个特征点的先验特征和相应特征点的多个第一上下文特征输入概率分布估计模型,得到概率分布估计模型输出的相应特征点 的概率分布。在这种实现方式中,各个特征点的上下文特征包括相应特征点的多个第一上下文特征。
示例性地,上下文模型使用尺寸分别为3*3、5*5和7*7的三个感受野,那么对于每个特征点来说得到三个第一上下文特征,将每个特征点的先验特征和相应特征点的三个第一上下文特征输入概率分布估计模型,得到概率分布估计模型输出的相应特征点的概率分布。
在另一种实现方式中,基于上下文模型中使用的多个感受野得到各个特征点的多个第一上下文特征之后,继续通过上下文模型对每个特征点的多个第一上下文特征进行处理,得到上下文模型输出相应特征点的上下文特征,之后,将每个特征点的先验特征和上下文特征输入概率分布估计模型,得到概率分布估计模型输出的相应特征点的概率分布。在这种实现方式中,各个特征点的上下文特征是融合相应特征点的多个第一上下文特征而得到的上下文特征。
以上介绍了基于待编码的图像,确定该图像的多个特征点中各个特征点的第一图像特征、概率分布和第一超先验特征的实现过程,在本申请实施例中,该实现过程与前述介绍的VAE方法的相关过程类似。上述实施例中,在得到各个特征点的第一图像特征之后,基于各个特征点的第一图像特征,分别确定各个特征点的先验特征和上下文特征,确定先验特征和上下文特征这两个步骤可以视为两个分支,这两个分支可以并行执行,以加快编码速度,另外,通过并行确定各个特征点的概率分布,能够保证编码效率。
步骤402:基于指定数值将该多个特征点划分为多组。
为了能够在解码端对特征点进行并行解码,以提高解码效率,相比于基于VAE的相关技术,本方案对各个特征点的编解码顺序进行了优化,以使得能够并行确定部分特征点的概率分布。在本申请实施例中,基于指定数值将该多个特征点划分为多组,每组包括至少一个特征点,后续编码端设备可以按照下文步骤403的介绍依次将该多组中每组特征点的第一图像特征编入码流。
其中,指定数值基于上下文模型使用的感受野的尺寸确定。可选地,若上下文模型使用多个尺寸不同的感受野,则指定数值是通过多个尺寸不同的感受野中最大感受野的尺寸确定。
示例性地,以编码过程使用卷积网络,卷积网络中的卷积为2D卷积为例,指定数值由符合ks表示,若上下文模型使用一个尺寸为5*5的感受野,那么指定数值ks=5。若上下文模型使用尺寸分别为3*3、5*5和7*7的感受野,那么指定数值ks=7。
基于指定数值将该多个特征点划分为多组的实现过程为:基于该指定数值确定斜率,基于该斜率,将该多个特征点划分为多组。其中,该斜率用于指示划分为同一组的特征点所在直线的倾斜程度。需要说明的是,在2D卷积对应的分组方法中,斜率是直观的,如图5所示,根据本方案将特征点A、B、C和D划分为同一组,可以看出,特征点A、B、C和D实际在一条直线上,该斜率能够指示该直线的倾斜程度。而在3D卷积的实现方式中,斜率并不是直观的,在本申请实施例中以2D卷积为例对分组方法进行介绍,3D卷积对应的分组方法的原理与2D卷积对应的分组方法的原理相同。
示例性地,以2D卷积为例,指定数值ks=5,那么斜率
Figure PCTCN2022095149-appb-000015
其中,
Figure PCTCN2022095149-appb-000016
表示向上取整。也即是,基于该指定数值确定斜率的实现方式为:根据公式
Figure PCTCN2022095149-appb-000017
确定斜率k。
仍以2D卷积为例,假设首个特征点为特征图中左上角的特征点,首个特征点的坐标为(0,0),基于该斜率,将该多个特征点划分为多组的实现方式为:基于该斜率,通过循环方 式将该多个特征点划分为多组。其中,循环方式的第t次循环包括:若该多个特征点中存在未被划分的特征点,则将该多个特征点中横坐标为t-i*k且纵坐标为i的特征点划分为一组。其中,k为斜率,t、i以及t-i*k均为整数,且t和i的最小值为0。
如图6所示,首个特征点为左上角的特征点,首个特征点的编解码序号为1,图6中编解码序号相同的特征点为同一组,编解码序号的从小到大的顺序即为该多组中各组的编解码顺序,最后一个特征点为右下角的特征点,最后一个特征点的编解码序号最大。可以看出图6所示的编解码顺序是从左上角开始,先向右,再逐渐向右下角。
可以想象,若将图6逆时针旋转90度后,即得到如图7所示的另一种编解码顺序。在图7中,首个特征点为特征图中左下角的特征点,编解码顺序为从左下角开始,先向上,再逐渐向右上角,最后一个特征点为右上角的特征点。或者,将图6基于由左上角连接右下角形成的斜线进行镜像变换,即得到如图8所示的另一种编解码顺序,在图8中,首个特征点为特征图中左上角的特征点,编解码顺序为从左上角开始,先向下,再逐渐向右下角,最后一个特征点为右下角的特征点。可以看出,根据本方案实际上能够得到多种编解码顺序,直观地看,通过对图6进行旋转或/或镜像变换得到,即能够得到八种编解码顺序,而在八种编解码顺序在本质上是相似的。在对应这八种编解码顺序的分组方式中,均以首个特征点为坐标原点,首个特征点指向第二个特征点的方向为横轴,特征图上与横轴垂直的另一边为纵轴,按照斜率将该多个特征点划分为多组。
对于3D卷积来说,基于该斜率,将该多个特征点划分为多组的实现方式与2D卷积对应的实现方式类似,3D卷积对应的循环方式的第t次循环包括:若该多个特征点中存在未被划分的特征点,则将该多个特征点中坐标(x,y,z)满足x+k*y+k*k*z-t=0的特征点划分为一组,其中,k为斜率,x、y、z以及t均为整数,且x、y、z以及t的最小值均为0。也即是,将该多个特征点视为3D特征图包括的特征点,该3D特征图包括多个2D特征图,每个2D特征图所在平面与xy平面平行,被划分为同一组的特征点分散在3D特征图包括的各个2D特征图上,这样可以做到空间并行编解码,并行度很高。
或者,按照z从小到大的顺序,按照与上述2D卷积对应的分组方式相类似的方式,依次对该多个2D特征图中每个2D特征图中的特征点进行分组。如图9所示,首先对z=0的平面上的特征点进行分组,也即对xy平面的2D特征图中的特征点进行分组,分组方法与2D的类似,z=0的平面分组完成后,然后对z=1的平面上的特征点进行分组,分组方法与2D的类似,然后按照这样的方式直至将所有的特征点均分组。这样所实现的编解码实际是在平面上作并行,在空间上作串行。
需要说明的是,本方案能够相对于相关技术来说,通过调整各个特征点的编解码顺序,在不改变各个特征点可利用的周边信息的情况下,实现后续解码过程中并行确定概率分布。如图5所示,与相关技术中类似地,在保证编解码时最多可利用的周边信息的情况下,仅调整编解码顺序,既不会降低编解码性能,又提高了编解码效率。
步骤403:基于该多个特征点的概率分布,依次将该多组中每组特征点的第一图像特征编入码流。
在本申请实施例中,在将该多个特征点分为多组之后,基于该多个特征点的概率分布,依次将该多组中每组特征点的第一图像特征编入码流。也即是,按照分组后的编解码顺序,先对特征点的编解码序号较小的组进行编码,再对特征点的编解码序号较大的组进行编码, 直至将该多个特征点中的各个特征点的第一图像特征均编入码流为止。
其中,基于该多个特征点的概率分布,依次将该多组中每组特征点的第一图像特征编入码流的实现过程为:基于该多个特征的概率分布,依次对该多组中每组特征点的第一图像特征进行熵编码,得到相应组中特征点对应的图像比特序列,将相应组中特征点的图像比特序列写入码流。可选地,码流中该多个特征点的图像比特序列构成图像比特流。
在本申请实施例中,熵编码采用基于概率分布的熵编码模型进行熵编码,熵编码可以采用算术编码(arithmetic coding)、区间编码(range coding,RC)或者哈夫曼(huffman)编码中的一种,本申请实施例不做限定。
步骤404:将该多个特征点的第一超先验特征编入码流。
在本申请实施例中,由于解码端的解码需要依赖特征点的超先验特征,因此,在编码端还需要将该多个特征点的第一超先验特征编入码流。其中,将该多个特征点的第一超先验特征编入码流的实现过程为:根据指定的概率分布,将该多个特征点的第一超先验特征编入码流。在一种实现方式中,根据指定的概率分布,对该多个特征点的第一超先验特征进行熵编码,得到该多个特征点的超先验比特序列,将该多个特征点的超先验比特序列写入码流。也即是,第一超先验特征也可以通过熵编码的方式编入码流。可选地,码流中该多个特征点的超先验比特序列构成超先验比特流。也即是,码流包括两部分,一部分为图像比特流,另一部分为超先验比特流。
其中,指定的概率分布为预先通过概率分布网络模型确定的概率分布,本申请实施例不限定用于训练得到该指定的概率分布所采用的概率分布网络模型的网络结构和训练方法。例如,概率分布网络模型的网络结构可以为全连接网络或者CNN。另外,本申请实施例对概率分布网络模型的网络结构所包含的层数和每一层的节点数也不做限定。
至此,编码端设备通过步骤401至步骤404完成了对待编码的图像的编码,也即得到了码流。需要说明的是,步骤402和步骤403可以串行执行,也即将特征点分组后再依次编码,或者,步骤402和步骤403并行执行,也即按照上述循环方式进行分组的同时,每分好一组,即将该组中特征点的第一图像特征编入码流,然后划分下一组,直至分好最后一组,将最后一组中特征点的第一图像特征编入码流。
接下来结合图2,并通过下述步骤1至步骤7再次对本申请实施例提供的编码方法进行示例性地解释说明。
1.将待编码的图像输入编码网络模型,得到多个特征点的第二图像特征y,对y进行量化得到该多个特征点的第一图像特征
Figure PCTCN2022095149-appb-000018
第一图像特征
Figure PCTCN2022095149-appb-000019
即为待编入码流的图像特征。
2.将该多个特征点的第一图像特征
Figure PCTCN2022095149-appb-000020
输入超编码网络模型,得到该多个特征点的第二超先验特征z,对z进行量化得到该多个特征点的第一超先验特征
Figure PCTCN2022095149-appb-000021
3.将
Figure PCTCN2022095149-appb-000022
输入超解码网络模型,得到该多个特征点的先验特征ψ。
4.将
Figure PCTCN2022095149-appb-000023
输入上下文模型,得到该多个特征点的上下文特征φ。
5.结合先验特征ψ和上下文特征φ,通过概率分布估计模型,得到该多个特征点的概率分布。
6.根据指定的概率分布,对
Figure PCTCN2022095149-appb-000024
进行熵编码,得到超先验比特流。
7.对
Figure PCTCN2022095149-appb-000025
进行熵编码,包括步骤a至步骤c:
a.当前循环次数为t。
b.编码坐标为(t-k*i,i)的特征点的第一图像特征,其中,k为斜率,i为整数。
c.t=t+1,返回步骤a,直至编码完全部特征点的第一图像特征。
需要说明的是,上述步骤1至步骤7中所涉及的各个网络模型中的卷积为2D卷积,且从左上角的特征点开始,先向右,再逐渐向右下角进行编码,假设k=3,这样编码端的编解码顺序即如图6所示。
综上所述,在本申请实施例中,为了在解码过程中并行确定概率分布,以提高解码效率,在编码过程中基于指定数值将多个特征点划分为多组,依次将该多组中每组特征点的第一图像特征编入码流。这样,在解码过程中也按照同样的方式分组,对于同一组中的各个特征点并行确定概率分布,以提高解码效率。也即是,本方案能够突破基于VAE进行解码时串行计算所带来的效率瓶颈,有效提高解码效率。
接下来对本申请实施例提供的解码方法进行介绍。
请参考图11,图11是本申请实施例提供的一种解码方法的流程图,该解码方法应用于解码端设备,该解码方法包括如下步骤。
步骤1101:基于码流确定待解码的图像的多个特征点中各个特征点的先验特征。
在本申请实施例中,基于码流确定待解码的图像的多个特征点中各个特征点的先验特征的实现过程为:基于码流确定该多个特征点的第一超先验特征,基于该多个特征点的第一超先验特征,确定该多个特征点的先验特征。
其中,基于码流确定该多个特征点的第一超先验特征的实现过程可以为:根据指定的概率分布,对码流进行熵解码得到该多个特征点的第一超先验特征。基于该多个特征点的第一超先验特征,确定该多个特征点的先验特征的实现过程可以为:将该多个特征点的第一超先验特征输入超解码网络模型,得到该超解码网络模型输出的该多个特征点的先验特征。
如图2所示,在解码端根据指定的概率分布,对码流包括的超先验比特流进行熵解码,得到该多个特征点的第一超先验特征
Figure PCTCN2022095149-appb-000026
Figure PCTCN2022095149-appb-000027
输入超解码网络模型,得到超解码网络模型输出的该多个特征点的先验特征ψ。
需要说明的是,本步骤的解码方法与编码端的编码方法中相对应,本步骤中的指定的概率分布与编码端中的指定的概率分布相同,本步骤中的超解码网络模型与编码端的超解码网络模型的网络结构一致。
步骤1102:基于指定数值将该多个特征点划分为多组。
与编码端类似,在解码端同样需要基于指定数值将该多个特征点划分为多组,且本步骤中分组的方式与编码端中分组的方式相同,即,基于指定数值将该多个特征点划分为多组的实现过程可以为:基于指定数值确定斜率,基于该斜率,将该多个特征点划分为多组。其中,指定数值基于上下文模型使用的感受野的尺寸确定,斜率用于指示划分为同一组的特征点所在直线的倾斜程度。可选地,若上下文模型使用多个尺寸不同的感受野,则指定数值是通过多个尺寸不同的感受野中最大感受野的尺寸确定。需要说明的是,分组的具体实现方式参照前述编码方法中的相关介绍,这里不再赘述。
步骤1103:基于该多个特征点的先验特征,依次确定该多组中每组特征点的第一图像特征,其中,确定任一组特征点的第一图像特征的步骤为:并行确定该任一组中各个特征点的概率分布,基于该任一组中各个特征点的概率分布,从码流中解析出该任一组中各个特征点 的第一图像特征。
在本申请实施例中,在将该多个特征点划分为多组的情况下,在解码端基于该多个特征点的先验特征,依次确定该多组中每组特征点的第一图像特征。其中,对于任一组中各个特征点来说,并行确定该任一组中各个特征点的概率分布,之后,再基于该任一组中各个特征点的概率分布,从码流中解析出该任一组中各个特征点的第一图像特征。
示例性地,假设在编码端分组后形成如图6中所示的编解码序号,在解码端同样按照图6所示的编解码序号进行概率分布的确定。先确定编解码序号为1的特征点的概率分布,再确定编解码序号为2的特征点的概率分布,再确定编解码序号为3的特征点的概率分布,以此类推,直至将该多个特征点的概率分布全部确定为止。由图6可以看出,本方案中能够并行确定编解码序号相同的一组特征点的概率分布,解码效率得到大幅提升。
其中,该多个特征点包括第一特征点,确定第一特征点的概率分布的实现方式为:若第一特征点为该多个特征点中的非首个特征点,则从已解码的各个特征点的第一图像特征中,确定第一特征点的周边信息,将第一特征点的周边信息输入上下文模型,得到上下文模型输出的第一特征点的上下文特征,基于第一特征点的先验特征和第一特征点的上下文特征,确定第一特征点的概率分布。其中,第一特征点为该任一组中的一个特征点。
可选地,第一特征点的周边信息包括以第一特征点为几何中心的邻域内已解码的特征点的第一图像特征,该邻域的大小基于上下文模型使用的感受野的尺寸确定,第一特征点的周边信息至少包括第一特征点周边的n个特征点的第一图像特征,n大于或等于4。
需要说明的是,解码端的解码方法中第一特征点的周边信息,与编码端的编码方法中第一特征点的周边信息相同,这里不再赘述。
另外,若第一特征点为多个特征点中的首个特征点,则基于第一特征点的先验特征,确定第一特征点的概率分布。解码端确定首个特征点的概率分布的实现过程与编码端相同,这里不再赘述。
步骤1104:基于该多个特征点的第一图像特征重建图像。
在本申请实施例中,基于该多个特征点的第一图像特征重建图像的实现方式为:将该多个特征点的第一图像特征输入解码网络模型,得到该解码网络模型输出的重建后的图像。其中,本步骤中解码网络模型与编码端的编码网络模型的网络结构相对应,即,解码网络模型中的解码步骤为编码网络模型中的编码步骤的逆过程,例如在图10所示的编解码框架,解码网络模型的网络结构与编码网络模型的网络结构为相反的。
至此,解码端设备通过步骤1101至步骤1104完成了对码流的解码,也即重建出了图像。需要说明的是,步骤1102和步骤1103可以串行执行,也即将特征点分组后再依次解码,或者,步骤1102和步骤1103并行执行,也即按照上述循环方式进行分组的同时,每分好一组,即并行确定该组中各个特征点的概率分布,基于概率分布从码流中解析出该组中特征点的第一图像特征,然后划分下一组,依此类推,直至分好最后一组,从码流中解析出最后一组中特征点的概率分布。
接下来结合图2,并通过下述步骤1至步骤4再次对本申请实施例提供的解码方法进行示例性地解释说明。
1.读取码流,根据指定的概率分布,对码流包括的超先验比特流进行熵解码,以从码流中解析得到多个特征点的超先验特征
Figure PCTCN2022095149-appb-000028
2.将
Figure PCTCN2022095149-appb-000029
输入超先验解码网络,得到该多个特征点的先验特征ψ。
3.按照下述步骤a至步骤e,对码流包括的图像比特流进行熵解码,得到该多个特征点的第一图像特征
Figure PCTCN2022095149-appb-000030
a.当前循环次数为t;
b.将坐标为(t-k*i,i)的特征点的周边信息输入上下文模型中,得到坐标为(t-k*i,i)的特征点的上下文特征。
c.结合坐标为(t-k*i,i)的特征点的先验特征和上下文特征,通过概率分布估计模型,得到坐标为(t-k*i,i)的特征点的概率分布。
d.基于坐标为(t-k*i,i)的特征点的概率分布,从码流中解码出坐标为(t-k*i,i)的特征点的第一图像特征
e.t=t+1,返回步骤a,直至解码得到全部特征点的第一图像特征
Figure PCTCN2022095149-appb-000031
4.将该多个特征点的第一图像特征
Figure PCTCN2022095149-appb-000032
输入解码网络,得到重建后的图像。
需要说明的是,上述步骤1至步骤4中所涉及的各个网络模型中的卷积为2D卷积,且从左上角的特征点开始,先向右,再逐渐向右下角进行解码,假设k=3,这样解码端的解码顺序即如图6所示。
为了验证本申请实施例提供的编解码方法的性能和效率,采用本申请实施例所提供的编码方法,分别在测试集Kodak和CLIC上进行实验。其中,测试集Kodak中待编码的图像的分辨率为512*768,测试集CLIC中待编码的图像的分辨率为2048*1367。在一次实验中,编解码中的上下文模型使用单个感受野,该感受野的尺寸为5*5,该实验的实验结果如表1所示。其中,Ctx串行代表相关技术的编解码方法,Ctx并行代表本申请实施例提供的编解码方法,Enc代表编码,Dec代表解码,本方案与相关技术的编解码框架相同,但特征点的编解码顺序不同。可以看出,相比于现有技术,本方案能够大幅节省解码时间,本方案的编解码效率更高。需要说明的是,由于本方案与相关技术相比,并未减少或改变可利用的周边信息,因此本方案的编解码性能与相关技术相当,也即本方案并不会降低重建后的图像的质量。
表1
Figure PCTCN2022095149-appb-000033
在另一次实验中,如图10所示的编解码框架,该编解码框架中的上下文模型使用三个感受野,这三个感受野的尺寸分别为3*3、5*5和7*7,本方案与相关技术的编解码框架相同,但特征点的编解码顺序不同,该实验的实验结果如表2所示。其中,比率Ratio表示本方案相比于相关技术的编解码时间的节省率,Enc-R为编码时间的节省率,Dec-R为解码时间的节省率,节省率为正表示节省了时间,节省率为负表示增加了时间。可以看出,本方案相比于相关技术,在测试集Kodak上能够节省84.6%的解码时间,在测试集CLIC上能够节省92%的编码时间,采用本方案的话,解码时间的节省率随着图像分辨率的变高而变高,这正是由 于图像分辨率越高,采用本方案的过程中能够并行解码的特征点的比例越高。
其中,
Figure PCTCN2022095149-appb-000034
t s表示相关技术的编码时间,且t p表示本方案的编码时间,或者,t s表示相关技术的解码时间,且t p表示本方案的解码时间。
表2
Figure PCTCN2022095149-appb-000035
由上述可知,本方案实际是一种利用上下文特征进行基于概率分布的熵编码的并行化方法,相比于相关技术,在不改变可利用的周边信息的条件下,实现解码时间的大幅减少。且图像的分辨率越高,编解码时间的节省率越高,上下文模型越复杂(如感受野越多),编解码时间的节省率也越高,在多层级的上下文模型和概率分布估计模型中,本方案相比于相关技术能够节省近10倍的时间。另外,本方案无需改变相关技术的整体方法,因此无需重新训练编解码框架中的网络模型,也即本方案更加便于应用,且不会降低编解码性能。
综上所述,在本申请实施例中,解码过程中基于指定数值将多个特征点划分为多组,对于同一组中的各个特征点并行确定概率分布,这样能够加快解码效率。也即是,本方案能够突破基于VAE进行解码时串行计算所带来的效率瓶颈,有效提高解码效率。
图12是本申请实施例提供的一种解码装置1200的结构示意图,该解码装置1200可以由软件、硬件或者两者的结合实现成为解码端设备的部分或者全部,该解码端设备可以为图1所示的源装置。参见图12,该装置1200包括:第一确定模块1201、分组模块1202、第二确定模块1203和重建模块1204。
第一确定模块1201,用于基于码流确定待解码的图像的多个特征点中各个特征点的先验特征;
分组模块1202,用于基于指定数值将该多个特征点划分为多组;
第二确定模块1203,用于基于该多个特征点的先验特征,依次确定该多组中每组特征点的第一图像特征;其中,确定任一组特征点的第一图像特征的步骤为:并行确定该任一组中各个特征点的概率分布,基于该任一组中各个特征点的概率分布,从码流中解析出该任一组中各个特征点的第一图像特征;
重建模块1204,用于基于该多个特征点的第一图像特征重建图像。
可选地,该多个特征点包括第一特征点,第二确定模块1203包括:
第一处理子模块,用于若第一特征点为该多个特征点中的非首个特征点,则从已解码的各个特征点的第一图像特征中,确定第一特征点的周边信息,第一特征点为该任一组中的一个特征点;
第二处理子模块,用于将第一特征点的周边信息输入上下文模型,得到上下文模型输出 的第一特征点的上下文特征;
第三处理子模块,用于基于第一特征点的先验特征和第一特征点的上下文特征,确定第一特征点的概率分布。
可选地,第一特征点的周边信息包括以第一特征点为几何中心的邻域内已解码的特征点的第一图像特征,该邻域的大小基于上下文模型使用的感受野的尺寸确定,该周边信息至少包括第一特征点周边的n个特征点的第一图像特征,n大于或等于4。
可选地,该多个特征点包括第一特征点,第二确定模块1203包括:
第四处理子模块,用于若第一特征点为该多个特征点中的首个特征点,则基于第一特征点的先验特征,确定第一特征点的概率分布。
可选地,指定数值基于上下文模型使用的感受野的尺寸确定;
分组模块1202包括:
第一确定子模块,用于基于指定数值确定斜率,斜率用于指示划分为同一组的特征点所在直线的倾斜程度;
划分子模块,用于基于斜率,将该多个特征点划分为多组。
可选地,若上下文模型使用多个尺寸不同的感受野,则指定数值是通过多个尺寸不同的感受野中最大感受野的尺寸确定。
可选地,上下文模型使用的感受野包括尺寸为5*5的感受野。
综上所述,在本申请实施例中,解码过程中基于指定数值将多个特征点划分为多组,对于同一组中的各个特征点并行确定概率分布,这样能够加快解码效率。也即是,本方案能够突破基于VAE进行解码时串行计算所带来的效率瓶颈,有效提高解码效率。
需要说明的是:上述实施例提供的解码装置在解码时,仅以上述各功能模块的划分进行举例说明,实际应用中,可以根据需要而将上述功能分配由不同的功能模块完成,即将装置的内部结构划分成不同的功能模块,以完成以上描述的全部或者部分功能。另外,上述实施例提供的解码装置与解码方法实施例属于同一构思,其具体实现过程详见方法实施例,这里不再赘述。
图13是本申请实施例提供的一种编码装置1300的结构示意图,该编码装置1300可以由软件、硬件或者两者的结合实现成为编码端设备的部分或者全部,该编码端设备可以为图1所示的目的地装置。参见图13,该装置1300包括:第一确定模块1301、分组模块1302、第一编码模块1303和第二编码模块1304。
第一确定模块1301,用于基于待编码的图像,确定该图像的多个特征点中各个特征点的第一图像特征、概率分布和第一超先验特征;
分组模块1302,用于基于指定数值将该多个特征点划分为多组;
第一编码模块1303,用于基于该多个特征点的概率分布,依次将该多组中每组特征点的第一图像特征编入码流;
第二编码模块1304,用于将该多个特征点的第一超先验特征编入码流。
可选地,第一确定模块1301包括:
第一确定子模块,用于基于该图像,确定该多个特征点的第一图像特征;
第二确定子模块,用于基于该多个特征点的第一图像特征,确定该多个特征点的第一超 先验特征,以及并行确定该多个特征点中各个特征点的概率分布。
可选地,该多个特征点包括第一特征点,第二确定子模块用于:
若第一特征点为该多个特征点中的非首个特征点,则基于第一特征点的第一图像特征,确定第一特征点的先验特征,第一特征点为该多个特征点中的一个特征点;
从该多个特征点的第一图像特征中,确定第一特征点的周边信息;
将第一特征点的周边信息输入上下文模型,得到上下文模型输出的第一特征点的上下文特征;
基于第一特征点的先验特征和第一特征点的上下文特征,确定第一特征点的概率分布。
可选地,该多个特征点包括第一特征点,第二确定子模块用于:
若第一特征点为该多个特征点中的首个特征点,则基于第一特征点的先验特征,确定第一特征点的概率分布。
可选地,该指定数值基于上下文模型使用的感受野的尺寸确定;
分组模块1302包括:
第三确定子模块,用于基于指定数值确定斜率,斜率用于指示划分为同一组的特征点所在直线的倾斜程度;
划分子模块,用于基于斜率,将该多个特征点划分为多组。
可选地,若上下文模型使用多个尺寸不同的感受野,则该指定数值是通过该多个尺寸不同的感受野中最大感受野的尺寸确定。
可选地,上下文模型使用的感受野包括尺寸为5*5的感受野。
综上所述,在本申请实施例中,为了在解码过程中并行确定概率分布,以加快解码效率,在编码过程中基于指定数值将多个特征点划分为多组,依次将该多组中每组特征点的第一图像特征编入码流。这样,在解码过程中也按照同样的方式分组,对于同一组中的各个特征点并行确定概率分布,以加快解码效率。也即是,本方案能够突破基于VAE进行解码时串行计算所带来的效率瓶颈,有效提高解码效率。
需要说明的是:上述实施例提供的编码装置在编码时,仅以上述各功能模块的划分进行举例说明,实际应用中,可以根据需要而将上述功能分配由不同的功能模块完成,即将装置的内部结构划分成不同的功能模块,以完成以上描述的全部或者部分功能。另外,上述实施例提供的编码装置与编码方法实施例属于同一构思,其具体实现过程详见方法实施例,这里不再赘述。
图14为用于本申请实施例的一种编解码装置1400的示意性框图。其中,编解码装置1400可以包括处理器1401、存储器1402和总线***1403。其中,处理器1401和存储器1402通过总线***1403相连,该存储器1402用于存储指令,该处理器1401用于执行该存储器1402存储的指令,以执行本申请实施例描述的各种的编码或解码方法。为避免重复,这里不再详细描述。
在本申请实施例中,该处理器1401可以是中央处理单元(central processing unit,CPU),该处理器1401还可以是其他通用处理器、DSP、ASIC、FPGA或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件等。通用处理器可以是微处理器或者该处理器也可以是任何常规的处理器等。
该存储器1402可以包括ROM设备或者RAM设备。任何其他适宜类型的存储设备也可以用作存储器1402。存储器1402可以包括由处理器1401使用总线1403访问的代码和数据14021。存储器1402可以进一步包括操作***14023和应用程序14022,该应用程序14022包括允许处理器1401执行本申请实施例描述的编码或解码方法的至少一个程序。例如,应用程序14022可以包括应用1至N,其进一步包括执行在本申请实施例描述的编码或解码方法的编码或解码应用(简称编解码应用)。
该总线***1403除包括数据总线之外,还可以包括电源总线、控制总线和状态信号总线等。但是为了清楚说明起见,在图中将各种总线都标为总线***1403。
可选地,编解码装置1400还可以包括一个或多个输出设备,诸如显示器1404。在一个示例中,显示器1404可以是触感显示器,其将显示器与可操作地感测触摸输入的触感单元合并。显示器1404可以经由总线1403连接到处理器1401。
需要指出的是,编解码装置1400可以执行本申请实施例中的编码方法,也可执行本申请实施例中的解码方法。
本领域技术人员能够领会,结合本文公开描述的各种说明性逻辑框、模块和算法步骤所描述的功能可以硬件、软件、固件或其任何组合来实施。如果以软件来实施,那么各种说明性逻辑框、模块、和步骤描述的功能可作为一或多个指令或代码在计算机可读媒体上存储或传输,且由基于硬件的处理单元执行。计算机可读媒体可包含计算机可读存储媒体,其对应于有形媒体,例如数据存储媒体,或包括任何促进将计算机程序从一处传送到另一处的媒体(例如,基于通信协议)的通信媒体。以此方式,计算机可读媒体大体上可对应于(1)非暂时性的有形计算机可读存储媒体,或(2)通信媒体,例如信号或载波。数据存储媒体可为可由一或多个计算机或一或多个处理器存取以检索用于实施本申请中描述的技术的指令、代码和/或数据结构的任何可用媒体。计算机程序产品可包含计算机可读媒体。
作为实例而非限制,此类计算机可读存储媒体可包括RAM、ROM、EEPROM、CD-ROM或其它光盘存储装置、磁盘存储装置或其它磁性存储装置、快闪存储器或可用来存储指令或数据结构的形式的所要程序代码并且可由计算机存取的任何其它媒体。并且,任何连接被恰当地称作计算机可读媒体。举例来说,如果使用同轴缆线、光纤缆线、双绞线、数字订户线(DSL)或例如红外线、无线电和微波等无线技术从网站、服务器或其它远程源传输指令,那么同轴缆线、光纤缆线、双绞线、DSL或例如红外线、无线电和微波等无线技术包含在媒体的定义中。但是,应理解,所述计算机可读存储媒体和数据存储媒体并不包括连接、载波、信号或其它暂时媒体,而是实际上针对于非暂时性有形存储媒体。如本文中所使用,磁盘和光盘包含压缩光盘(CD)、激光光盘、光学光盘、DVD和蓝光光盘,其中磁盘通常以磁性方式再现数据,而光盘利用激光以光学方式再现数据。以上各项的组合也应包含在计算机可读媒体的范围内。
可通过例如一或多个数字信号处理器(DSP)、通用微处理器、专用集成电路(ASIC)、现场可编程逻辑阵列(FPGA)或其它等效集成或离散逻辑电路等一或多个处理器来执行指令。因此,如本文中所使用的术语“处理器”可指前述结构或适合于实施本文中所描述的技术的任一其它结构中的任一者。另外,在一些方面中,本文中所描述的各种说明性逻辑框、模块、和步骤所描述的功能可以提供于经配置以用于编码和解码的专用硬件和/或软件模块内,或者并入在组合编解码器中。而且,所述技术可完全实施于一或多个电路或逻辑元件中。在一种示例 下,编码器100及解码器200中的各种说明性逻辑框、单元、模块可以理解为对应的电路器件或逻辑元件。
本申请实施例的技术可在各种各样的装置或设备中实施,包含无线手持机、集成电路(IC)或一组IC(例如,芯片组)。本申请实施例中描述各种组件、模块或单元是为了强调用于执行所揭示的技术的装置的功能方面,但未必需要由不同硬件单元实现。实际上,如上文所描述,各种单元可结合合适的软件和/或固件组合在编码解码器硬件单元中,或者通过互操作硬件单元(包含如上文所描述的一或多个处理器)来提供。
也就是说,在上述实施例中,可以全部或部分地通过软件、硬件、固件或者其任意结合来实现。当使用软件实现时,可以全部或部分地以计算机程序产品的形式实现。所述计算机程序产品包括一个或多个计算机指令。在计算机上加载和执行所述计算机指令时,全部或部分地产生按照本申请实施例所述的流程或功能。所述计算机可以是通用计算机、专用计算机、计算机网络或其他可编程装置。所述计算机指令可以存储在计算机可读存储介质中,或者从一个计算机可读存储介质向另一个计算机可读存储介质传输,例如,所述计算机指令可以从一个网站站点、计算机、服务器或数据中心通过有线(例如:同轴电缆、光纤、数据用户线(digital subscriber line,DSL))或无线(例如:红外、无线、微波等)方式向另一个网站站点、计算机、服务器或数据中心进行传输。所述计算机可读存储介质可以是计算机能够存取的任何可用介质,或者是包含一个或多个可用介质集成的服务器、数据中心等数据存储设备。所述可用介质可以是磁性介质(例如:软盘、硬盘、磁带)、光介质(例如:数字通用光盘(digital versatile disc,DVD))或半导体介质(例如:固态硬盘(solid state disk,SSD))等。值得注意的是,本申请实施例提到的计算机可读存储介质可以为非易失性存储介质,换句话说,可以是非瞬时性存储介质。
应当理解的是,本文提及的“多个”是指两个或两个以上。在本申请实施例的描述中,除非另有说明,“/”表示或的意思,例如,A/B可以表示A或B;本文中的“和/或”仅仅是一种描述关联对象的关联关系,表示可以存在三种关系,例如,A和/或B,可以表示:单独存在A,同时存在A和B,单独存在B这三种情况。另外,为了便于清楚描述本申请实施例的技术方案,在本申请实施例中,采用了“第一”、“第二”等字样对功能和作用基本相同的相同项或相似项进行区分。本领域技术人员可以理解“第一”、“第二”等字样并不对数量和执行次序进行限定,并且“第一”、“第二”等字样也并不限定一定不同。
需要说明的是,本申请实施例所涉及的信息(包括但不限于用户设备信息、用户个人信息等)、数据(包括但不限于用于分析的数据、存储的数据、展示的数据等)以及信号,均为经用户授权或者经过各方充分授权的,且相关数据的收集、使用和处理需要遵守相关国家和地区的相关法律法规和标准。例如,本申请实施例中涉及到的图像、视频等都是在充分授权的情况下获取的。
以上所述为本申请提供的实施例,并不用以限制本申请,凡在本申请的精神和原则之内,所作的任何修改、等同替换、改进等,均应包含在本申请的保护范围之内。

Claims (29)

  1. 一种解码方法,其特征在于,所述方法包括:
    基于码流确定待解码的图像的多个特征点中各个特征点的先验特征;
    基于指定数值将所述多个特征点划分为多组;
    基于所述多个特征点的先验特征,依次确定所述多组中每组特征点的第一图像特征;其中,确定任一组特征点的第一图像特征的步骤为:并行确定所述任一组中各个特征点的概率分布,基于所述任一组中各个特征点的概率分布,从所述码流中解析出所述任一组中各个特征点的第一图像特征;
    基于所述多个特征点的第一图像特征重建所述图像。
  2. 如权利要求1所述的方法,其特征在于,所述多个特征点包括第一特征点,确定所述第一特征点的概率分布,包括:
    若所述第一特征点为所述多个特征点中的非首个特征点,则从已解码的各个特征点的第一图像特征中,确定所述第一特征点的周边信息,所述第一特征点为所述任一组中的一个特征点;
    将所述第一特征点的周边信息输入上下文模型,得到所述上下文模型输出的所述第一特征点的上下文特征;
    基于所述第一特征点的先验特征和所述第一特征点的上下文特征,确定所述第一特征点的概率分布。
  3. 如权利要求2所述的方法,其特征在于,所述第一特征点的周边信息包括以所述第一特征点为几何中心的邻域内已解码的特征点的第一图像特征,所述邻域的大小基于所述上下文模型使用的感受野的尺寸确定,所述周边信息至少包括所述第一特征点周边的n个特征点的第一图像特征,所述n大于或等于4。
  4. 如权利要求1所述的方法,其特征在于,所述多个特征点包括第一特征点,则确定所述第一特征点的概率分布,包括:
    若所述第一特征点为所述多个特征点中的首个特征点,则基于所述第一特征点的先验特征,确定所述第一特征点的概率分布。
  5. 如权利要求2或3所述的方法,其特征在于,所述指定数值基于所述上下文模型使用的感受野的尺寸确定;
    所述基于指定数值将所述多个特征点划分为多组,包括:
    基于所述指定数值确定斜率,所述斜率用于指示划分为同一组的特征点所在直线的倾斜程度;
    基于所述斜率,将所述多个特征点划分为所述多组。
  6. 如权利要求5所述的方法,其特征在于,若所述上下文模型使用多个尺寸不同的感受野,则所述指定数值是通过所述多个尺寸不同的感受野中最大感受野的尺寸确定。
  7. 一种编码方法,其特征在于,所述方法包括:
    基于待编码的图像,确定所述图像的多个特征点中各个特征点的第一图像特征、概率分布和第一超先验特征;
    基于指定数值将所述多个特征点划分为多组;
    基于所述多个特征点的概率分布,依次将所述多组中每组特征点的第一图像特征编入码流;
    将所述多个特征点的第一超先验特征编入所述码流。
  8. 如权利要求7所述的方法,其特征在于,所述基于待编码的图像,确定所述图像的多个特征点中各个特征点的第一图像特征、概率分布和第一超先验特征,包括:
    基于所述图像,确定所述多个特征点的第一图像特征;
    基于所述多个特征点的第一图像特征,确定所述多个特征点的第一超先验特征,以及并行确定所述多个特征点中各个特征点的概率分布。
  9. 如权利要求8所述的方法,其特征在于,所述多个特征点包括第一特征点,确定所述第一特征点的概率分布,包括:
    若所述第一特征点为所述多个特征点中的非首个特征点,则基于所述第一特征点的第一图像特征,确定所述第一特征点的先验特征,所述第一特征点为所述多个特征点中的一个特征点;
    从所述多个特征点的第一图像特征中,确定所述第一特征点的周边信息;
    将所述第一特征点的周边信息输入上下文模型,得到所述上下文模型输出的所述第一特征点的上下文特征;
    基于所述第一特征点的先验特征和所述第一特征点的上下文特征,确定所述第一特征点的概率分布。
  10. 如权利要求8所述的方法,其特征在于,所述多个特征点包括第一特征点,确定所述第一特征点的概率分布,包括:
    若所述第一特征点为所述多个特征点中的首个特征点,则基于所述第一特征点的先验特征,确定所述第一特征点的概率分布。
  11. 如权利要求9所述的方法,其特征在于,所述指定数值基于所述上下文模型使用的感受野的尺寸确定;
    所述基于指定数值将所述多个特征点划分为多组,包括:
    基于所述指定数值确定斜率,所述斜率用于指示划分为同一组的特征点所在直线的倾斜程度;
    基于所述斜率,将所述多个特征点划分为所述多组。
  12. 如权利要求11所述的方法,其特征在于,若所述上下文模型使用多个尺寸不同的感受野,则所述指定数值是通过所述多个尺寸不同的感受野中最大感受野的尺寸确定。
  13. 一种解码装置,其特征在于,所述装置包括:
    第一确定模块,用于基于码流确定待解码的图像的多个特征点中各个特征点的先验特征;
    分组模块,用于基于指定数值将所述多个特征点划分为多组;
    第二确定模块,用于基于所述多个特征点的先验特征,依次确定所述多组中每组特征点的第一图像特征;其中,确定任一组特征点的第一图像特征的步骤为:并行确定所述任一组中各个特征点的概率分布,基于所述任一组中各个特征点的概率分布,从所述码流中解析出所述任一组中各个特征点的第一图像特征;
    重建模块,用于基于所述多个特征点的第一图像特征重建所述图像。
  14. 如权利要求13所述的装置,其特征在于,所述多个特征点包括第一特征点,所述第二确定模块包括:
    第一处理子模块,用于若所述第一特征点为所述多个特征点中的非首个特征点,则从已解码的各个特征点的第一图像特征中,确定所述第一特征点的周边信息,所述第一特征点为所述任一组中的一个特征点;
    第二处理子模块,用于将所述第一特征点的周边信息输入上下文模型,得到所述上下文模型输出的所述第一特征点的上下文特征;
    第三处理子模块,用于基于所述第一特征点的先验特征和所述第一特征点的上下文特征,确定所述第一特征点的概率分布。
  15. 如权利要求14所述的装置,其特征在于,所述第一特征点的周边信息包括以所述第一特征点为几何中心的邻域内已解码的特征点的第一图像特征,所述邻域的大小基于所述上下文模型使用的感受野的尺寸确定,所述周边信息至少包括所述第一特征点周边的n个特征点的第一图像特征,所述n大于或等于4。
  16. 如权利要求13所述的装置,其特征在于,所述多个特征点包括第一特征点,所述第二确定模块包括:
    第四处理子模块,用于若所述第一特征点为所述多个特征点中的首个特征点,则基于所述第一特征点的先验特征,确定所述第一特征点的概率分布。
  17. 如权利要求14或16所述的装置,其特征在于,所述指定数值基于所述上下文模型使用的感受野的尺寸确定;
    所述分组模块包括:
    第一确定子模块,用于基于所述指定数值确定斜率,所述斜率用于指示划分为同一组的特征点所在直线的倾斜程度;
    划分子模块,用于基于所述斜率,将所述多个特征点划分为所述多组。
  18. 如权利要求17所述的装置,其特征在于,若所述上下文模型使用多个尺寸不同的感受野,则所述指定数值是通过所述多个尺寸不同的感受野中最大感受野的尺寸确定。
  19. 一种编码装置,其特征在于,所述装置包括:
    第一确定模块,用于基于待编码的图像,确定所述图像的多个特征点中各个特征点的第一图像特征、概率分布和第一超先验特征;
    分组模块,用于基于指定数值将所述多个特征点划分为多组;
    第一编码模块,用于基于所述多个特征点的概率分布,依次将所述多组中每组特征点的第一图像特征编入码流;
    第二编码模块,用于将所述多个特征点的第一超先验特征编入所述码流。
  20. 如权利要求19所述的装置,其特征在于,所述第一确定模块包括:
    第一确定子模块,用于基于所述图像,确定所述多个特征点的第一图像特征;
    第二确定子模块,用于基于所述多个特征点的第一图像特征,确定所述多个特征点的第一超先验特征,以及并行确定所述多个特征点中各个特征点的概率分布。
  21. 如权利要求20所述的装置,其特征在于,所述多个特征点包括第一特征点,所述第二确定子模块用于:
    若所述第一特征点为所述多个特征点中的非首个特征点,则基于所述第一特征点的第一图像特征,确定所述第一特征点的先验特征,所述第一特征点为所述多个特征点中的一个特征点;
    从所述多个特征点的第一图像特征中,确定所述第一特征点的周边信息;
    将所述第一特征点的周边信息输入上下文模型,得到所述上下文模型输出的所述第一特征点的上下文特征;
    基于所述第一特征点的先验特征和所述第一特征点的上下文特征,确定所述第一特征点的概率分布。
  22. 如权利要求20所述的装置,其特征在于,所述多个特征点包括第一特征点,所述第二确定子模块用于:
    若所述第一特征点为所述多个特征点中的首个特征点,则基于所述第一特征点的先验特征,确定所述第一特征点的概率分布。
  23. 如权利要求21所述的装置,其特征在于,所述指定数值基于所述上下文模型使用的感受野的尺寸确定;
    所述分组模块包括:
    第三确定子模块,用于基于所述指定数值确定斜率,所述斜率用于指示划分为同一组的特征点所在直线的倾斜程度;
    划分子模块,用于基于所述斜率,将所述多个特征点划分为所述多组。
  24. 如权利要求23所述的装置,其特征在于,若所述上下文模型使用多个尺寸不同的感受野,则所述指定数值是通过所述多个尺寸不同的感受野中最大感受野的尺寸确定。
  25. 一种解码端设备,其特征在于,所述解码端设备包括存储器和处理器;
    所述存储器,用于存储计算机程序;
    所述处理器,用于执行所述计算机程序实现权利要求1-6任一所述方法的步骤。
  26. 一种编码端设备,其特征在于,所述编码端设备包括存储器和处理器;
    所述存储器,用于存储计算机程序;
    所述处理器,用于执行所述计算机程序实现权利要求7-12任一所述方法的步骤。
  27. 一种计算机可读存储介质,其特征在于,所述存储介质内存储有计算机程序,所述计算机程序被处理器执行时实现权利要求1-12任一所述的方法的步骤。
  28. 一种计算机程序,其特征在于,所述计算机程序被执行时实现如权利要求1-12中任一所述的方法。
  29. 一种计算机程序产品,其特征在于,所述计算机程序产品包含指令,所述指令被计算机执行时实现权利要求1-12任一所述的方法的步骤。
PCT/CN2022/095149 2021-05-29 2022-05-26 编解码方法、装置、设备、存储介质、计算机程序及产品 WO2022253088A1 (zh)

Priority Applications (2)

Application Number Priority Date Filing Date Title
EP22815134.6A EP4336835A1 (en) 2021-05-29 2022-05-26 Encoding method and apparatus, decoding method and apparatus, device, storage medium, and computer program and product
US18/521,067 US20240095964A1 (en) 2021-05-29 2023-11-28 Encoding and decoding method, apparatus, and device, storage medium, computer program, and computer program product

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202110596003.6 2021-05-29
CN202110596003.6A CN115412735A (zh) 2021-05-29 2021-05-29 编解码方法、装置、设备、存储介质及计算机程序

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US18/521,067 Continuation US20240095964A1 (en) 2021-05-29 2023-11-28 Encoding and decoding method, apparatus, and device, storage medium, computer program, and computer program product

Publications (1)

Publication Number Publication Date
WO2022253088A1 true WO2022253088A1 (zh) 2022-12-08

Family

ID=84155847

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/095149 WO2022253088A1 (zh) 2021-05-29 2022-05-26 编解码方法、装置、设备、存储介质、计算机程序及产品

Country Status (4)

Country Link
US (1) US20240095964A1 (zh)
EP (1) EP4336835A1 (zh)
CN (1) CN115412735A (zh)
WO (1) WO2022253088A1 (zh)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110602494A (zh) * 2019-08-01 2019-12-20 杭州皮克皮克科技有限公司 基于深度学习的图像编码、解码***及编码、解码方法
US20200160565A1 (en) * 2018-11-19 2020-05-21 Zhan Ma Methods And Apparatuses For Learned Image Compression
CN111641832A (zh) * 2019-03-01 2020-09-08 杭州海康威视数字技术股份有限公司 编码方法、解码方法、装置、电子设备及存储介质
US20200327701A1 (en) * 2019-04-11 2020-10-15 Fujitsu Limited Image encoding method and apparatus and image decoding method and apparatus
CN112866694A (zh) * 2020-12-31 2021-05-28 杭州电子科技大学 联合非对称卷积块和条件上下文的智能图像压缩优化方法

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20200160565A1 (en) * 2018-11-19 2020-05-21 Zhan Ma Methods And Apparatuses For Learned Image Compression
CN111641832A (zh) * 2019-03-01 2020-09-08 杭州海康威视数字技术股份有限公司 编码方法、解码方法、装置、电子设备及存储介质
US20200327701A1 (en) * 2019-04-11 2020-10-15 Fujitsu Limited Image encoding method and apparatus and image decoding method and apparatus
CN110602494A (zh) * 2019-08-01 2019-12-20 杭州皮克皮克科技有限公司 基于深度学习的图像编码、解码***及编码、解码方法
CN112866694A (zh) * 2020-12-31 2021-05-28 杭州电子科技大学 联合非对称卷积块和条件上下文的智能图像压缩优化方法

Also Published As

Publication number Publication date
US20240095964A1 (en) 2024-03-21
EP4336835A1 (en) 2024-03-13
CN115412735A (zh) 2022-11-29

Similar Documents

Publication Publication Date Title
JP7261300B2 (ja) 適応ポイントクラウド属性コーディングのための方法、装置、及びコンピュータプログラム
US11062210B2 (en) Method and apparatus for training a neural network used for denoising
CN111327902B (zh) 点云的编解码方法及装置
US11538197B2 (en) Channel-wise autoregressive entropy models for image compression
US10154288B2 (en) Apparatus and method to improve image or video quality or encoding performance by enhancing discrete cosine transform coefficients
JP2023517486A (ja) 画像再スケーリング
WO2022253088A1 (zh) 编解码方法、装置、设备、存储介质、计算机程序及产品
WO2023142715A1 (zh) 视频编码方法、实时通信方法、装置、设备及存储介质
WO2023020492A1 (zh) 视频帧调整方法、装置、电子设备和存储介质
WO2023050433A1 (zh) 视频编解码方法、编码器、解码器及存储介质
CN115103191A (zh) 图像处理方法、装置、设备及存储介质
WO2023082773A1 (zh) 视频编解码方法、装置、设备、存储介质及计算机程序
WO2023169303A1 (zh) 编解码方法、装置、设备、存储介质及计算机程序产品
CN112188199A (zh) 自适应点云属性编码的方法、装置、电子设备和存储介质
WO2020123339A1 (en) Motion estimation through input perturbation
WO2023185305A1 (zh) 编码方法、装置、存储介质及计算机程序产品
US8244071B2 (en) Non-dyadic spatial scalable wavelet transform
WO2023071462A1 (zh) 点云的编解码方法、装置、设备、存储介质及程序产品
WO2024011381A1 (zh) 点云编解码方法、装置、设备及存储介质
WO2024078252A1 (zh) 特征数据编解码方法及相关装置
US20230016302A1 (en) Task-oriented dynamic mesh compression using occupancy networks
JP2024523004A (ja) 画像コーデック
US20240087173A1 (en) Base mesh coding by using surface reflection symmetry
US20240087585A1 (en) Encoding method and apparatus, decoding method and apparatus, device, storage medium, and computer program
KR20240021158A (ko) 이미지 코덱

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22815134

Country of ref document: EP

Kind code of ref document: A1

WWE Wipo information: entry into national phase

Ref document number: 2022815134

Country of ref document: EP

ENP Entry into the national phase

Ref document number: 2022815134

Country of ref document: EP

Effective date: 20231205

NENP Non-entry into the national phase

Ref country code: DE